Tutorial¶
Introduction¶
This section describes tutorials for the use of Python API and the command line.
Python API¶
Data extraction¶
Package: extractor
Description: This package contains a collection of modules to extract information from PDS WS and PDS web site.
Exported components:
PdsRegistry: A module that provides classes to extract the list of PDS3 collections.
PdsRecordsWs: A module that provides classes to extract metadata of the observations and collections by querying ODE web services.
PDSCatalogDescription: A module that provides PDS3 catalogs for a given collection
PDSCatalogsDescription: A module that provides PDS3 catalogs for collections
Usage: To use this package, you can import and use the exported components as follows:
from pds_crawler.extractor import PdsRegistry
from pds_crawler.models import PdsRegistryModel
from pds_crawler.load import Database
from typing import Tuple, Dict, List
# Create a database to store the results
database = Database('work/database')
# Create an instance of PdsRegistry to get the collections
pds_registry = PdsRegistry(database)
# Retrieve all the georeferenced collections list
results: Tuple[Dict[str,str], List[PdsRegistryModel]] = pds_registry.get_pds_collections()
By knowing the collection and a record, it is possible to retrieve additional metadata describing general collection information. This metadata is richer than the metadata provided in the records of the collection.
Now , we need to download the records. To limit the time to wait, only the first page is downloaded
from pds_crawler.extractor import PdsRegistry, PdsRecordsWs
pds_collection = results[1][60]
pds_records_ws = PdsRecordsWs(database)
pds_records_ws.download_pds_records_for_one_collection(pds_collection, 1)
Now, we can retrieve the catalogs that describes the metadata for mission, plateform, instrument and collection
from pds_crawler.extractor import PDSCatalogsDescription, PDSCatalogDescription
# download the catalogs in the storage
cats = PDSCatalogsDescription(database)
cats.download([pds_collection])
# Retrieve the catalogs from the storage
pds_objects_cat = PDSCatalogDescription(database)
pds_objects_cat.get_ode_catalogs(pds_collection)
Data transformation¶
Package: transformer
Description: This package contains a collection of modules to transform extracted information to STAC.
Exported components:
StacCatalogTransformer: A module that converts the extracted PDS catalogs to STAC.
StacRecordsTransformer: A module that converts the extracted responses from PDsRecordsWs to STAC.
Usage: To use this package, you can import and use the exported components as follows:
from pds_crawler.extractor import PDSCatalogsDescription
from pds_crawler.transformer import StacCatalogTransformer
cats = PDSCatalogsDescription(database)
cats.download([pds_collection])
transf = StacCatalogTransformer(database)
transf.to_stac(cats, [pds_collection])
transf.save()
ETL API¶
Provides a ETL API.
from pds_crawler.etl import PdsSourceEnum, PdsDataEnum, StacETL
etl = StacETL("work/database")
etl.extract(PdsSourceEnum.COLLECTIONS_INDEX)
Command line¶
List of all georeferenced collections available in PDS
pds_crawler extract --type_extract ode_collections
Save a georeferenced collection in the cache
pds_crawler extract --type_extract ode_collections_save --dataset_id CH1-ORB-L-M3-4-L1B-RADIANCE-V1.0
Download ODE records for the cached collection
pds_crawler extract --type_extract ode_records
Download PDS objects for the cached collection
pds_crawler extract --type_extract pds_objects
Transform the downloaded ODE records to items and its parents
pds_crawler transform --type_stac ode_records
Update catalogs and collections from downloaded PDS objects
pds_crawler transform --type_stac pds_objects
