Coverage for pds_crawler/extractor/__init_

1# -*- coding: utf-8 -*-

2# pds-crawler - ETL to index PDS data to pdssp

4# This file is part of pds-crawler <https://github.com/pdssp/pds_crawler>

5# SPDX-License-Identifier: LGPL-3.0-or-later

6"""

7Package: extractor

9Description:

10This package contains a collection of modules to extract information from PDS WS and PDS web site.

12Exported components:

14* `PdsRegistry`: A module that provides classes to extract the list of PDS3 collections.

15* `PdsRecordsWs`: A module that provides classes to extract metadata of the observations and collections by querying ODE web services.

16* `PDSCatalogDescription`: A module that provides PDS3 catalogs for a given collection

17* `PDSCatalogsDescription`: A module that provides PDS3 catalogs for collections

19Usage:

20To use this package, you can import and use the exported components as follows:

22.. code-block:: python

24 from pds_crawler.extractor import PdsRegistry

25 from pds_crawler.models import PdsRegistryModel

26 from pds_crawler.load import Database

27 from typing import Tuple, Dict, List

29 # Create a database to store the results

30 database = Database('work/database')

32 # Create an instance of PdsRegistry to get the collections

33 pds_registry = PdsRegistry(database)

35 # Retrieve all the georeferenced collections list

36 results: Tuple[Dict[str,str], List[PdsRegistryModel]] = pds_registry.get_pds_collections()

39By knowing the collection and a record, it is possible to retrieve additional metadata

40describing general collection information. This metadata is richer than the metadata

41provided in the records of the collection.

43Now , we need to download the records. To limit the time to wait, only the first

44page is downloaded

46.. code-block:: python

48 from pds_crawler.extractor import PdsRegistry, PdsRecordsWs

49 pds_collection = results[1][60]

50 pds_records_ws = PdsRecordsWs(database)

51 pds_records_ws.download_pds_records_for_one_collection(pds_collection, 1)

53Now, we can retrieve the catalogs that describes the metadata for mission, plateform,

54instrument and collection

56.. code-block:: python

58 from pds_crawler.extractor import PDSCatalogsDescription, PDSCatalogDescription

60 # download the catalogs in the storage

61 cats = PDSCatalogsDescription(database)

62 cats.download([pds_collection])

64 # Retrieve the catalogs from the storage

65 pds_objects_cat = PDSCatalogDescription(database)

66 pds_objects_cat.get_ode_catalogs(pds_collection)

68"""

69from .pds_ode_website import PDSCatalogDescription

70from .pds_ode_website import PDSCatalogsDescription

71from .pds_ode_ws import PdsRecordsWs

72from .pds_ode_ws import PdsRegistry

74__all__ = [

75 "PdsRegistry",

76 "PdsRecordsWs",

77 "PDSCatalogDescription",

78 "PDSCatalogsDescription",

79]

Coverage for pds_crawler/extractor/init.py: 100%

6 statements