Coverage for pds_crawler/extractor/__init__.py: 100%
Shortcuts on this page
r m x toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
Shortcuts on this page
r m x toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1# -*- coding: utf-8 -*-
2# pds-crawler - ETL to index PDS data to pdssp
3# Copyright (C) 2023 - CNES (Jean-Christophe Malapert for Pôle Surfaces Planétaires)
4# This file is part of pds-crawler <https://github.com/pdssp/pds_crawler>
5# SPDX-License-Identifier: LGPL-3.0-or-later
6"""
7Package: extractor
9Description:
10This package contains a collection of modules to extract information from PDS WS and PDS web site.
12Exported components:
14* `PdsRegistry`: A module that provides classes to extract the list of PDS3 collections.
15* `PdsRecordsWs`: A module that provides classes to extract metadata of the observations and collections by querying ODE web services.
16* `PDSCatalogDescription`: A module that provides PDS3 catalogs for a given collection
17* `PDSCatalogsDescription`: A module that provides PDS3 catalogs for collections
19Usage:
20To use this package, you can import and use the exported components as follows:
22.. code-block:: python
24 from pds_crawler.extractor import PdsRegistry
25 from pds_crawler.models import PdsRegistryModel
26 from pds_crawler.load import Database
27 from typing import Tuple, Dict, List
29 # Create a database to store the results
30 database = Database('work/database')
32 # Create an instance of PdsRegistry to get the collections
33 pds_registry = PdsRegistry(database)
35 # Retrieve all the georeferenced collections list
36 results: Tuple[Dict[str,str], List[PdsRegistryModel]] = pds_registry.get_pds_collections()
39By knowing the collection and a record, it is possible to retrieve additional metadata
40describing general collection information. This metadata is richer than the metadata
41provided in the records of the collection.
43Now , we need to download the records. To limit the time to wait, only the first
44page is downloaded
46.. code-block:: python
48 from pds_crawler.extractor import PdsRegistry, PdsRecordsWs
49 pds_collection = results[1][60]
50 pds_records_ws = PdsRecordsWs(database)
51 pds_records_ws.download_pds_records_for_one_collection(pds_collection, 1)
53Now, we can retrieve the catalogs that describes the metadata for mission, plateform,
54instrument and collection
56.. code-block:: python
58 from pds_crawler.extractor import PDSCatalogsDescription, PDSCatalogDescription
60 # download the catalogs in the storage
61 cats = PDSCatalogsDescription(database)
62 cats.download([pds_collection])
64 # Retrieve the catalogs from the storage
65 pds_objects_cat = PDSCatalogDescription(database)
66 pds_objects_cat.get_ode_catalogs(pds_collection)
68"""
69from .pds_ode_website import PDSCatalogDescription
70from .pds_ode_website import PDSCatalogsDescription
71from .pds_ode_ws import PdsRecordsWs
72from .pds_ode_ws import PdsRegistry
74__all__ = [
75 "PdsRegistry",
76 "PdsRecordsWs",
77 "PDSCatalogDescription",
78 "PDSCatalogsDescription",
79]