API
PLETHarvester
- class harvest_plet.plet.PLETHarvester
Bases:
objectInterface to interact with the DASSH PLET database for harvesting marine biological datasets.
Provides functionality to list datasets, query specific data using spatiotemporal filters (including OSPAR COMP areas), and save results to CSV.
The default API endpoint can be overridden per instance by assigning to
base_urlorBASE_URLafter construction.- get_dataset_names() List[str]
Retrieve all available dataset names from the DASSH website.
- Returns:
A list of dataset names available for download.
- Return type:
List[str]
- classmethod get_instances() Dict[str, Dict[str, str]]
Return all configured endpoint instances from endpoints.json.
- harvest_data(start_date: date, end_date: date, wkt: str, dataset_name: str, csv: bool = False, out_dir: str | None = None, name: str | None = None, retries: int = 3, backoff_factor: float = 60.0, timeout: float = 600.0) str | None
Harvest dataset from DASSH API for a given time range and spatial region. Optionally write output to a CSV file.
- Parameters:
start_date – Start date of query.
end_date – End date of query.
wkt – WKT string representing the polygon region.
dataset_name – Dataset name to retrieve.
csv – If True, save as CSV file. If False, return as string.
out_dir – Directory to save the CSV file (required if csv=True).
name – Name of the CSV file (required if csv=True).
retries – Number of retry attempts on failure.
backoff_factor – Backoff multiplier for retry wait time.
timeout – Request timeout in seconds.
- Returns:
CSV string if csv=False, otherwise None.
- Raises:
ValueError – For invalid inputs.
RuntimeError – If request fails after all retries.
OSPARRegions
- class harvest_plet.ospar_comp.OSPARRegions
Bases:
objectA class for fetching and handling OSPAR WFS component data. Description of data: https://odims.ospar.org/en/submissions/ospar_comp_au_2023_01/ json url: https://odims.ospar.org/geoserver/odims/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=ospar_comp_au_2023_01_001&outputFormat=application/json
- get_all_ids() List[str]
Get a list of all feature IDs in the dataset.
- Returns:
A list of feature IDs.
- Return type:
List[str]
- get_wkt(id: str, simplify: bool = False) str | None
Retrieve the WKT (Well-Known Text) geometry string for a given feature ID.
Optionally simplifies the geometry to reduce its size while preserving topology. Tiny polygons are removed and coordinates are rounded to 0.01 precision.
harvest_for_assessment
- harvest_plet.harvest_for_assessment.harvest_for_assessment(start_date: date, end_date: date, out_dir: str | None = None, overwrite: bool = False, logs_dir: str | None = None, plet_harvester: PLETHarvester | None = None) DataFrame
Harvest datasets for all OSPAR regions within a given date range, with caching and optional logging.
Data for each dataset and region combination is retrieved and stored in a local cache directory. If the file already exists and overwrite is False, the data is skipped. Logs are written to a timestamped log file. The combined data is returned as a pandas dataframe.
- param start_date:
Start date of the data harvest (inclusive).
- type start_date:
date
- param end_date:
End date of the data harvest (inclusive).
- type end_date:
date
- param out_dir:
Directory to store cached CSV files. Defaults to ‘.cache’ if None.
- type out_dir:
Optional[str]
- param overwrite:
Whether to overwrite existing cached files. Defaults to False.
- type overwrite:
bool
- param logs_dir:
Directory to store log files. Defaults to a ‘logs’ folder in the package if None.
- type logs_dir:
Optional[str]
- param plet_harvester:
Optional preconfigured PLET harvester instance. If None, a new default PLETHarvester instance is created.
- type plet_harvester:
Optional[PLETHarvester]
- Returns:
Merged dataframe with dataset_name and region_id as first columns.
- Return type:
pd.DataFrame