Coverage for pds_crawler/extractor/__init__.py: 100%

Shortcuts on this page

r m x   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

6 statements  

1# -*- coding: utf-8 -*- 

2# pds-crawler - ETL to index PDS data to pdssp 

3# Copyright (C) 2023 - CNES (Jean-Christophe Malapert for Pôle Surfaces Planétaires) 

4# This file is part of pds-crawler <https://github.com/pdssp/pds_crawler> 

5# SPDX-License-Identifier: LGPL-3.0-or-later 

6""" 

7Package: extractor 

8 

9Description: 

10This package contains a collection of modules to extract information from PDS WS and PDS web site. 

11 

12Exported components: 

13 

14* `PdsRegistry`: A module that provides classes to extract the list of PDS3 collections. 

15* `PdsRecordsWs`: A module that provides classes to extract metadata of the observations and collections by querying ODE web services. 

16* `PDSCatalogDescription`: A module that provides PDS3 catalogs for a given collection 

17* `PDSCatalogsDescription`: A module that provides PDS3 catalogs for collections 

18 

19Usage: 

20To use this package, you can import and use the exported components as follows: 

21 

22.. code-block:: python 

23 

24 from pds_crawler.extractor import PdsRegistry 

25 from pds_crawler.models import PdsRegistryModel 

26 from pds_crawler.load import Database 

27 from typing import Tuple, Dict, List 

28 

29 # Create a database to store the results 

30 database = Database('work/database') 

31 

32 # Create an instance of PdsRegistry to get the collections 

33 pds_registry = PdsRegistry(database) 

34 

35 # Retrieve all the georeferenced collections list 

36 results: Tuple[Dict[str,str], List[PdsRegistryModel]] = pds_registry.get_pds_collections() 

37 

38 

39By knowing the collection and a record, it is possible to retrieve additional metadata 

40describing general collection information. This metadata is richer than the metadata 

41provided in the records of the collection. 

42 

43Now , we need to download the records. To limit the time to wait, only the first 

44page is downloaded 

45 

46.. code-block:: python 

47 

48 from pds_crawler.extractor import PdsRegistry, PdsRecordsWs 

49 pds_collection = results[1][60] 

50 pds_records_ws = PdsRecordsWs(database) 

51 pds_records_ws.download_pds_records_for_one_collection(pds_collection, 1) 

52 

53Now, we can retrieve the catalogs that describes the metadata for mission, plateform, 

54instrument and collection 

55 

56.. code-block:: python 

57 

58 from pds_crawler.extractor import PDSCatalogsDescription, PDSCatalogDescription 

59 

60 # download the catalogs in the storage 

61 cats = PDSCatalogsDescription(database) 

62 cats.download([pds_collection]) 

63 

64 # Retrieve the catalogs from the storage 

65 pds_objects_cat = PDSCatalogDescription(database) 

66 pds_objects_cat.get_ode_catalogs(pds_collection) 

67 

68""" 

69from .pds_ode_website import PDSCatalogDescription 

70from .pds_ode_website import PDSCatalogsDescription 

71from .pds_ode_ws import PdsRecordsWs 

72from .pds_ode_ws import PdsRegistry 

73 

74__all__ = [ 

75 "PdsRegistry", 

76 "PdsRecordsWs", 

77 "PDSCatalogDescription", 

78 "PDSCatalogsDescription", 

79]