Tutorial

Introduction

This section describes tutorials for the use of Python API and the command line.

Python API

Data extraction

Package: extractor

Description: This package contains a collection of modules to extract information from PDS WS and PDS web site.

Exported components:

  • PdsRegistry: A module that provides classes to extract the list of PDS3 collections.

  • PdsRecordsWs: A module that provides classes to extract metadata of the observations and collections by querying ODE web services.

  • PDSCatalogDescription: A module that provides PDS3 catalogs for a given collection

  • PDSCatalogsDescription: A module that provides PDS3 catalogs for collections

Usage: To use this package, you can import and use the exported components as follows:

from pds_crawler.extractor import PdsRegistry
from pds_crawler.models import PdsRegistryModel
from pds_crawler.load import Database
from typing import Tuple, Dict, List

# Create a database to store the results
database = Database('work/database')

# Create an instance of PdsRegistry to get the collections
pds_registry = PdsRegistry(database)

# Retrieve all the georeferenced collections list
results: Tuple[Dict[str,str], List[PdsRegistryModel]] = pds_registry.get_pds_collections()

By knowing the collection and a record, it is possible to retrieve additional metadata describing general collection information. This metadata is richer than the metadata provided in the records of the collection.

Now , we need to download the records. To limit the time to wait, only the first page is downloaded

from pds_crawler.extractor import PdsRegistry, PdsRecordsWs
pds_collection = results[1][60]
pds_records_ws = PdsRecordsWs(database)
pds_records_ws.download_pds_records_for_one_collection(pds_collection, 1)

Now, we can retrieve the catalogs that describes the metadata for mission, plateform, instrument and collection

from pds_crawler.extractor import PDSCatalogsDescription, PDSCatalogDescription

# download the catalogs in the storage
cats = PDSCatalogsDescription(database)
cats.download([pds_collection])

# Retrieve the catalogs from the storage
pds_objects_cat = PDSCatalogDescription(database)
pds_objects_cat.get_ode_catalogs(pds_collection)

Data transformation

Package: transformer

Description: This package contains a collection of modules to transform extracted information to STAC.

Exported components:

  • StacCatalogTransformer: A module that converts the extracted PDS catalogs to STAC.

  • StacRecordsTransformer: A module that converts the extracted responses from PDsRecordsWs to STAC.

Usage: To use this package, you can import and use the exported components as follows:

from pds_crawler.extractor import PDSCatalogsDescription
from pds_crawler.transformer import StacCatalogTransformer
cats = PDSCatalogsDescription(database)
cats.download([pds_collection])
transf = StacCatalogTransformer(database)
transf.to_stac(cats, [pds_collection])
transf.save()

ETL API

Provides a ETL API.

from pds_crawler.etl import PdsSourceEnum, PdsDataEnum, StacETL
etl = StacETL("work/database")
etl.extract(PdsSourceEnum.COLLECTIONS_INDEX)

Command line

List of all georeferenced collections available in PDS

pds_crawler extract --type_extract ode_collections

Save a georeferenced collection in the cache

pds_crawler extract --type_extract ode_collections_save --dataset_id CH1-ORB-L-M3-4-L1B-RADIANCE-V1.0

Download ODE records for the cached collection

pds_crawler extract --type_extract ode_records

Download PDS objects for the cached collection

pds_crawler extract --type_extract pds_objects

Transform the downloaded ODE records to items and its parents

pds_crawler transform --type_stac ode_records

Update catalogs and collections from downloaded PDS objects

pds_crawler transform --type_stac pds_objects