Purpose of the Software

The purpose of this software is to crawl and extract planetary data from various sources, including a web service and a website, transform the data into the SpatioTemporal Asset Catalog (STAC) format, and store it in three different storage systems. The software also includes four models that represent the different types of data that the system manages.

The objective of the pds_crawler is to create a catalog of observations by retrieving all the metadata of the georeferenced observations completed by collection metadata corresponding to the PDS3 catalogs.

The system architecture consists of several components that work together to crawl planetary data. The main component can be grouped in two layers:

  • persistence layer that includes three storage systems: STAC_Storage, PDS_Storage, and HDF5_Storage, which store data in the File System.

  • busisness layer contains the Extractor and Transformer components.

The Extractor component extracts data from the Planetary Data System, which comprises a web service and a website and sends the retrieved data to both PDS and HDF5 storage. The Transformer component then transforms the data into the STAC format and stores it in the STAC_Storage.

The Models component, used by layers, includes four groups of models:

  • PDS3 objects - catalogs,

  • ODE WS - collections,

  • ODE WS - records,

  • and STAC.

The diagram below shows the control flow of the components

graph TB STAC_Storage ==> FileSystem PDS_Storage ==> FileSystem HDF5_Storage ==> FileSystem Extractor ==> PDS_Storage Extractor ==> HDF5_Storage Transformer ==> STAC_Storage Extractor ==> Planetary_Data_System subgraph Planetary_Data_System sq01[Website] sq02[Web service] end subgraph PDS_crawler subgraph Persistence subgraph STAC_Storage sq11[STAC storage] sq12[Strategy] end subgraph PDS_Storage sq21[PDS storage] sq22[PDS objects] end subgraph HDF5_Storage sq3[HDF5 storage] end end subgraph FileSystem sq3[File System] end subgraph Business subgraph Extractor subgraph ODE_Archive sq71[Website] end subgraph ODE_Web_Service sq81[Collections] sq82[Records] end end subgraph Transformer subgraph Stac_Transformation sq91[Transformation] end end end subgraph Models sq4[PDS3 objects - catalogs] sq5[ODE WS - collections] sq6[ODE WS - records] sq7[STAC] end end

The diagram below shows the data flow

graph TD A[PDS ODE Web Service - collection] --> |JSON| D(Extraction) B[PDS ODE Web Service - records] --> |JSON| E(Extraction) C[PDS ODE Web Site] --> |REFERENCE_CATALOG, MISSION_CATALOG,<br>PERSONNEL_CATALOG, INSTRUMENT_CATALOG,<br>INSTRUMENT_HOST_CATALOG,DATA_SET_CATALOG,<br>VOL_DESC, DATA_SET_MAP_PROJECTION_CATALOG| F(Extraction) E(Extraction) --> |Files| H[Storage File System] F(Extraction) --> |Files| M[Storage File System] D(Extraction) --> |JSON PdsRegistryModel| I[HDF5] I[HDF5] --> |PdsRegistryModel| N[Transform] M[Storage File System] --> |PdsRecordsModel, DataSetMapProjectionModel,<br>MissionModel, ReferencesModel,<br>PersonnelsModel, VolumeModel,<br>InstrumentModel, InstrumentHostModel,<br>DataSetModel| L[Transform] H[Storage File System] --> |PdsRecordModel| N[Transform] I[HDF5] --> |PdsRegistryModel| L[Transform] N[Transform] --> |STAC Item, STAC Collection, STAC Catalog| O[STAC repository] L[Transform] --> |STAC Collection, STAC Catalog| O[STAC repository]

In summary, the architecture consists of three storage systems, an extractor that retrieves data from various sources, a transformer that converts the data into the STAC format, and a web service and website component that provides access to catalogs. Finally, the models component includes four models that represent the different types of data that the system manages