API

PLETHarvester

class harvest_plet.plet.PLETHarvester

Bases: object

Interface to interact with the DASSH PLET database for harvesting marine biological datasets.

Provides functionality to list datasets, query specific data using spatiotemporal filters (including OSPAR COMP areas), and save results to CSV.

The default API endpoint can be overridden per instance by assigning to base_url or BASE_URL after construction.

BASE_URL: str = 'https://www.dassh.ac.uk/plet/cgi-bin/get_form.py'

DEFAULT_INSTANCE: str = 'PLET'

SITE_URL: str = 'https://www.dassh.ac.uk/lifeforms/'

property base_url: str: Return the currently configured harvest endpoint.

get_base_url() → str: Convenience method for reading the harvest endpoint.

get_dataset_names() → List[str]

Retrieve all available dataset names from the DASSH website.

Returns:: A list of dataset names available for download.
Return type:: List[str]

get_instance() → str | None: Return the currently active instance key, if selected.

classmethod get_instances() → Dict[str, Dict[str, str]]: Return all configured endpoint instances from endpoints.json.

get_site_url() → str: Convenience method for reading the website endpoint.

harvest_data(start_date: date, end_date: date, wkt: str, dataset_name: str, csv: bool = False, out_dir: str | None = None, name: str | None = None, retries: int = 3, backoff_factor: float = 60.0, timeout: float = 600.0) → str | None

Harvest dataset from DASSH API for a given time range and spatial region. Optionally write output to a CSV file.

Parameters:

start_date – Start date of query.
end_date – End date of query.
wkt – WKT string representing the polygon region.
dataset_name – Dataset name to retrieve.
csv – If True, save as CSV file. If False, return as string.
out_dir – Directory to save the CSV file (required if csv=True).
name – Name of the CSV file (required if csv=True).
retries – Number of retry attempts on failure.
backoff_factor – Backoff multiplier for retry wait time.
timeout – Request timeout in seconds.

Returns:

CSV string if csv=False, otherwise None.

Raises:

ValueError – For invalid inputs.
RuntimeError – If request fails after all retries.

set_base_url(value: str) → None: Convenience method for updating the harvest endpoint.

set_instance(instance_name: str) → None: Switch both URLs to a named instance from endpoints.json.

set_site_url(value: str) → None: Convenience method for updating the website endpoint.

property site_url: str: Return the currently configured website endpoint.

OSPARRegions

class harvest_plet.ospar_comp.OSPARRegions

Bases: object

A class for fetching and handling OSPAR WFS component data. Description of data: https://odims.ospar.org/en/submissions/ospar_comp_au_2023_01/ json url: https://odims.ospar.org/geoserver/odims/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=ospar_comp_au_2023_01_001&outputFormat=application/json

get_all_ids() → List[str]

Get a list of all feature IDs in the dataset.

Returns:: A list of feature IDs.
Return type:: List[str]

get_wkt(id: str, simplify: bool = False) → str | None

Retrieve the WKT (Well-Known Text) geometry string for a given feature ID.

Optionally simplifies the geometry to reduce its size while preserving topology. Tiny polygons are removed and coordinates are rounded to 0.01 precision.

Parameters:

id (str) – The unique identifier of the feature.
simplify (bool) – Whether to simplify the geometry before returning.

Returns:

The WKT string of the geometry, or None if not found.

Return type:

Optional[str]

plot_map(id: str | None = None, show: bool = True, output_dir: str | None = None) → None

Plot the geometry of a specific feature ID or all features on a static map.

If an ID is provided, only that feature is plotted. Otherwise, all features in the dataset are plotted.

Parameters:

id (Optional[str]) – Feature ID to plot. If None, plots all features.
show (bool) – Whether to display the plot interactively.
output_dir (Optional[str]) – Directory to save the plot image. If None, plot is not saved.

Returns:

None

Return type:

None

harvest_for_assessment

harvest_plet.harvest_for_assessment.harvest_for_assessment(start_date: date, end_date: date, out_dir: str | None = None, overwrite: bool = False, logs_dir: str | None = None, plet_harvester: PLETHarvester | None = None) → DataFrame

Harvest datasets for all OSPAR regions within a given date range, with caching and optional logging.

Data for each dataset and region combination is retrieved and stored in a local cache directory. If the file already exists and overwrite is False, the data is skipped. Logs are written to a timestamped log file. The combined data is returned as a pandas dataframe.

param start_date:

Start date of the data harvest (inclusive).

type start_date:

date

param end_date:

End date of the data harvest (inclusive).

type end_date:

date

param out_dir:

Directory to store cached CSV files. Defaults to ‘.cache’ if None.

type out_dir:

Optional[str]

param overwrite:

Whether to overwrite existing cached files. Defaults to False.

type overwrite:

bool

param logs_dir:

Directory to store log files. Defaults to a ‘logs’ folder in the package if None.

type logs_dir:

Optional[str]

param plet_harvester:

Optional preconfigured PLET harvester instance. If None, a new default PLETHarvester instance is created.

type plet_harvester:

Optional[PLETHarvester]

Returns:: Merged dataframe with dataset_name and region_id as first columns.
Return type:: pd.DataFrame