Coverage for pds_crawler/__init__.py: 92%
Shortcuts on this page
r m x toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
Shortcuts on this page
r m x toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1# -*- coding: utf-8 -*-
2# pds-crawler - ETL to index PDS data to pdssp
3# Copyright (C) 2023 - CNES (Jean-Christophe Malapert for Pôle Surfaces Planétaires)
4# This file is part of pds-crawler <https://github.com/pdssp/pds_crawler>
5# SPDX-License-Identifier: LGPL-3.0-or-later
6"""
7The objective of the pds_crawler is to create a catalog of observations by retrieving all
8the metadata of the georeferenced observations completed by collection metadata corresponding
9to the PDS3 catalogs.
11The system architecture consists of several components that work together to crawl
12planetary data. The main component can be grouped in two layers:
14* persistence layer that includes three storage systems: STAC_Storage, PDS_Storage, and HDF5_Storage, which store data in the File System.
15* busisness layer contains the Extractor and Transformer components.
17The Extractor component extracts data from the Planetary Data System, which comprises
18a web service and a website and sends the retrieved data to both PDS and HDF5 storage.
19The Transformer component then transforms the data into the STAC format and stores it
20in the STAC_Storage.
22The Models component, used by layers, includes four groups of models:
24* PDS3 objects - catalogs,
25* ODE WS - collections,
26* ODE WS - records,
27* and STAC.
29The diagram below shows the control flow of the components
31.. mermaid::
33 graph TB
34 STAC_Storage ==> FileSystem
35 PDS_Storage ==> FileSystem
36 HDF5_Storage ==> FileSystem
37 Extractor ==> PDS_Storage
38 Extractor ==> HDF5_Storage
39 Transformer ==> STAC_Storage
40 Extractor ==> Planetary_Data_System
41 subgraph Planetary_Data_System
42 sq01[Website]
43 sq02[Web service]
44 end
45 subgraph PDS_crawler
46 subgraph Persistence
47 subgraph STAC_Storage
48 sq11[STAC storage]
49 sq12[Strategy]
50 end
51 subgraph PDS_Storage
52 sq21[PDS storage]
53 sq22[PDS objects]
54 end
55 subgraph HDF5_Storage
56 sq3[HDF5 storage]
57 end
58 end
59 subgraph FileSystem
60 sq3[File System]
61 end
63 subgraph Business
65 subgraph Extractor
66 subgraph ODE_Archive
67 sq71[Website]
68 end
69 subgraph ODE_Web_Service
70 sq81[Collections]
71 sq82[Records]
72 end
73 end
75 subgraph Transformer
76 subgraph Stac_Transformation
77 sq91[Transformation]
78 end
79 end
80 end
82 subgraph Models
83 sq4[PDS3 objects - catalogs]
84 sq5[ODE WS - collections]
85 sq6[ODE WS - records]
86 sq7[STAC]
87 end
88 end
90The diagram below shows the data flow
92.. mermaid::
94 graph TD
95 A[PDS ODE Web Service - collection] --> |JSON| D(Extraction)
96 B[PDS ODE Web Service - records] --> |JSON| E(Extraction)
97 C[PDS ODE Web Site] --> |REFERENCE_CATALOG, MISSION_CATALOG,<br>PERSONNEL_CATALOG, INSTRUMENT_CATALOG,<br>INSTRUMENT_HOST_CATALOG,DATA_SET_CATALOG,<br>VOL_DESC, DATA_SET_MAP_PROJECTION_CATALOG| F(Extraction)
98 E(Extraction) --> |Files| H[Storage File System]
99 F(Extraction) --> |Files| M[Storage File System]
100 D(Extraction) --> |JSON PdsRegistryModel| I[HDF5]
101 I[HDF5] --> |PdsRegistryModel| N[Transform]
102 M[Storage File System] --> |PdsRecordsModel, DataSetMapProjectionModel,<br>MissionModel, ReferencesModel,<br>PersonnelsModel, VolumeModel,<br>InstrumentModel, InstrumentHostModel,<br>DataSetModel| L[Transform]
103 H[Storage File System] --> |PdsRecordModel| N[Transform]
104 I[HDF5] --> |PdsRegistryModel| L[Transform]
105 N[Transform] --> |STAC Item, STAC Collection, STAC Catalog| O[STAC repository]
106 L[Transform] --> |STAC Collection, STAC Catalog| O[STAC repository]
108In summary, the architecture consists of three storage systems, an extractor that
109retrieves data from various sources, a transformer that converts the data into the
110STAC format, and a web service and website component that provides access to catalogs.
111Finally, the models component includes four models that represent the different types
112of data that the system manages
113"""
114import logging.config
115import os
116from logging import debug
117from logging import getLogger
118from logging import NullHandler
119from logging import setLogRecordFactory
120from logging import warning
122from ._version import __author__
123from ._version import __author_email__
124from ._version import __copyright__
125from ._version import __description__
126from ._version import __license__
127from ._version import __name_soft__
128from ._version import __title__
129from ._version import __url__
130from ._version import __version__
131from .custom_logging import LogRecord
133getLogger(__name__).addHandler(NullHandler())
135try:
136 PATH_TO_CONF = os.path.dirname(os.path.realpath(__file__))
137 logging.config.fileConfig(
138 os.path.join(PATH_TO_CONF, "logging.conf"),
139 disable_existing_loggers=False,
140 )
141 debug(f"file {os.path.join(PATH_TO_CONF, 'logging.conf')} loaded")
142except Exception as exception: # pylint: disable=broad-except
143 warning(f"cannot load logging.conf : {exception}")
144setLogRecordFactory(LogRecord) # pylint: disable=no-member