Coverage for pds_crawler/__init__.py: 92%

Shortcuts on this page

r m x   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

26 statements  

1# -*- coding: utf-8 -*- 

2# pds-crawler - ETL to index PDS data to pdssp 

3# Copyright (C) 2023 - CNES (Jean-Christophe Malapert for Pôle Surfaces Planétaires) 

4# This file is part of pds-crawler <https://github.com/pdssp/pds_crawler> 

5# SPDX-License-Identifier: LGPL-3.0-or-later 

6""" 

7The objective of the pds_crawler is to create a catalog of observations by retrieving all 

8the metadata of the georeferenced observations completed by collection metadata corresponding 

9to the PDS3 catalogs. 

10 

11The system architecture consists of several components that work together to crawl 

12planetary data. The main component can be grouped in two layers: 

13 

14* persistence layer that includes three storage systems: STAC_Storage, PDS_Storage, and HDF5_Storage, which store data in the File System. 

15* busisness layer contains the Extractor and Transformer components. 

16 

17The Extractor component extracts data from the Planetary Data System, which comprises 

18a web service and a website and sends the retrieved data to both PDS and HDF5 storage. 

19The Transformer component then transforms the data into the STAC format and stores it 

20in the STAC_Storage. 

21 

22The Models component, used by layers, includes four groups of models: 

23 

24* PDS3 objects - catalogs, 

25* ODE WS - collections, 

26* ODE WS - records, 

27* and STAC. 

28 

29The diagram below shows the control flow of the components 

30 

31.. mermaid:: 

32 

33 graph TB 

34 STAC_Storage ==> FileSystem 

35 PDS_Storage ==> FileSystem 

36 HDF5_Storage ==> FileSystem 

37 Extractor ==> PDS_Storage 

38 Extractor ==> HDF5_Storage 

39 Transformer ==> STAC_Storage 

40 Extractor ==> Planetary_Data_System 

41 subgraph Planetary_Data_System 

42 sq01[Website] 

43 sq02[Web service] 

44 end 

45 subgraph PDS_crawler 

46 subgraph Persistence 

47 subgraph STAC_Storage 

48 sq11[STAC storage] 

49 sq12[Strategy] 

50 end 

51 subgraph PDS_Storage 

52 sq21[PDS storage] 

53 sq22[PDS objects] 

54 end 

55 subgraph HDF5_Storage 

56 sq3[HDF5 storage] 

57 end 

58 end 

59 subgraph FileSystem 

60 sq3[File System] 

61 end 

62 

63 subgraph Business 

64 

65 subgraph Extractor 

66 subgraph ODE_Archive 

67 sq71[Website] 

68 end 

69 subgraph ODE_Web_Service 

70 sq81[Collections] 

71 sq82[Records] 

72 end 

73 end 

74 

75 subgraph Transformer 

76 subgraph Stac_Transformation 

77 sq91[Transformation] 

78 end 

79 end 

80 end 

81 

82 subgraph Models 

83 sq4[PDS3 objects - catalogs] 

84 sq5[ODE WS - collections] 

85 sq6[ODE WS - records] 

86 sq7[STAC] 

87 end 

88 end 

89 

90The diagram below shows the data flow 

91 

92.. mermaid:: 

93 

94 graph TD 

95 A[PDS ODE Web Service - collection] --> |JSON| D(Extraction) 

96 B[PDS ODE Web Service - records] --> |JSON| E(Extraction) 

97 C[PDS ODE Web Site] --> |REFERENCE_CATALOG, MISSION_CATALOG,<br>PERSONNEL_CATALOG, INSTRUMENT_CATALOG,<br>INSTRUMENT_HOST_CATALOG,DATA_SET_CATALOG,<br>VOL_DESC, DATA_SET_MAP_PROJECTION_CATALOG| F(Extraction) 

98 E(Extraction) --> |Files| H[Storage File System] 

99 F(Extraction) --> |Files| M[Storage File System] 

100 D(Extraction) --> |JSON PdsRegistryModel| I[HDF5] 

101 I[HDF5] --> |PdsRegistryModel| N[Transform] 

102 M[Storage File System] --> |PdsRecordsModel, DataSetMapProjectionModel,<br>MissionModel, ReferencesModel,<br>PersonnelsModel, VolumeModel,<br>InstrumentModel, InstrumentHostModel,<br>DataSetModel| L[Transform] 

103 H[Storage File System] --> |PdsRecordModel| N[Transform] 

104 I[HDF5] --> |PdsRegistryModel| L[Transform] 

105 N[Transform] --> |STAC Item, STAC Collection, STAC Catalog| O[STAC repository] 

106 L[Transform] --> |STAC Collection, STAC Catalog| O[STAC repository] 

107 

108In summary, the architecture consists of three storage systems, an extractor that 

109retrieves data from various sources, a transformer that converts the data into the 

110STAC format, and a web service and website component that provides access to catalogs. 

111Finally, the models component includes four models that represent the different types 

112of data that the system manages 

113""" 

114import logging.config 

115import os 

116from logging import debug 

117from logging import getLogger 

118from logging import NullHandler 

119from logging import setLogRecordFactory 

120from logging import warning 

121 

122from ._version import __author__ 

123from ._version import __author_email__ 

124from ._version import __copyright__ 

125from ._version import __description__ 

126from ._version import __license__ 

127from ._version import __name_soft__ 

128from ._version import __title__ 

129from ._version import __url__ 

130from ._version import __version__ 

131from .custom_logging import LogRecord 

132 

133getLogger(__name__).addHandler(NullHandler()) 

134 

135try: 

136 PATH_TO_CONF = os.path.dirname(os.path.realpath(__file__)) 

137 logging.config.fileConfig( 

138 os.path.join(PATH_TO_CONF, "logging.conf"), 

139 disable_existing_loggers=False, 

140 ) 

141 debug(f"file {os.path.join(PATH_TO_CONF, 'logging.conf')} loaded") 

142except Exception as exception: # pylint: disable=broad-except 

143 warning(f"cannot load logging.conf : {exception}") 

144setLogRecordFactory(LogRecord) # pylint: disable=no-member