API

PLETHarvester

class harvest_plet.plet.PLETHarvester

Bases: object

Interface to interact with the DASSH PLET database for harvesting marine biological datasets.

Provides functionality to list datasets, query specific data using spatiotemporal filters (including OSPAR COMP areas), and save results to CSV.

The default API endpoint can be overridden per instance by assigning to base_url or BASE_URL after construction.

BASE_URL: str = 'https://www.dassh.ac.uk/plet/cgi-bin/get_form.py'
DEFAULT_INSTANCE: str = 'PLET'
SITE_URL: str = 'https://www.dassh.ac.uk/lifeforms/'
property base_url: str

Return the currently configured harvest endpoint.

get_base_url() str

Convenience method for reading the harvest endpoint.

get_dataset_names() List[str]

Retrieve all available dataset names from the DASSH website.

Returns:

A list of dataset names available for download.

Return type:

List[str]

get_instance() str | None

Return the currently active instance key, if selected.

classmethod get_instances() Dict[str, Dict[str, str]]

Return all configured endpoint instances from endpoints.json.

get_site_url() str

Convenience method for reading the website endpoint.

harvest_data(start_date: date, end_date: date, wkt: str, dataset_name: str, csv: bool = False, out_dir: str | None = None, name: str | None = None, retries: int = 3, backoff_factor: float = 60.0, timeout: float = 600.0) str | None

Harvest dataset from DASSH API for a given time range and spatial region. Optionally write output to a CSV file.

Parameters:
  • start_date – Start date of query.

  • end_date – End date of query.

  • wkt – WKT string representing the polygon region.

  • dataset_name – Dataset name to retrieve.

  • csv – If True, save as CSV file. If False, return as string.

  • out_dir – Directory to save the CSV file (required if csv=True).

  • name – Name of the CSV file (required if csv=True).

  • retries – Number of retry attempts on failure.

  • backoff_factor – Backoff multiplier for retry wait time.

  • timeout – Request timeout in seconds.

Returns:

CSV string if csv=False, otherwise None.

Raises:
set_base_url(value: str) None

Convenience method for updating the harvest endpoint.

set_instance(instance_name: str) None

Switch both URLs to a named instance from endpoints.json.

set_site_url(value: str) None

Convenience method for updating the website endpoint.

property site_url: str

Return the currently configured website endpoint.

OSPARRegions

class harvest_plet.ospar_comp.OSPARRegions

Bases: object

A class for fetching and handling OSPAR WFS component data. Description of data: https://odims.ospar.org/en/submissions/ospar_comp_au_2023_01/ json url: https://odims.ospar.org/geoserver/odims/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=ospar_comp_au_2023_01_001&outputFormat=application/json

get_all_ids() List[str]

Get a list of all feature IDs in the dataset.

Returns:

A list of feature IDs.

Return type:

List[str]

get_wkt(id: str, simplify: bool = False) str | None

Retrieve the WKT (Well-Known Text) geometry string for a given feature ID.

Optionally simplifies the geometry to reduce its size while preserving topology. Tiny polygons are removed and coordinates are rounded to 0.01 precision.

Parameters:
  • id (str) – The unique identifier of the feature.

  • simplify (bool) – Whether to simplify the geometry before returning.

Returns:

The WKT string of the geometry, or None if not found.

Return type:

Optional[str]

plot_map(id: str | None = None, show: bool = True, output_dir: str | None = None) None

Plot the geometry of a specific feature ID or all features on a static map.

If an ID is provided, only that feature is plotted. Otherwise, all features in the dataset are plotted.

Parameters:
  • id (Optional[str]) – Feature ID to plot. If None, plots all features.

  • show (bool) – Whether to display the plot interactively.

  • output_dir (Optional[str]) – Directory to save the plot image. If None, plot is not saved.

Returns:

None

Return type:

None

harvest_for_assessment

harvest_plet.harvest_for_assessment.harvest_for_assessment(start_date: date, end_date: date, out_dir: str | None = None, overwrite: bool = False, logs_dir: str | None = None, plet_harvester: PLETHarvester | None = None) DataFrame

Harvest datasets for all OSPAR regions within a given date range, with caching and optional logging.

Data for each dataset and region combination is retrieved and stored in a local cache directory. If the file already exists and overwrite is False, the data is skipped. Logs are written to a timestamped log file. The combined data is returned as a pandas dataframe.

param start_date:

Start date of the data harvest (inclusive).

type start_date:

date

param end_date:

End date of the data harvest (inclusive).

type end_date:

date

param out_dir:

Directory to store cached CSV files. Defaults to ‘.cache’ if None.

type out_dir:

Optional[str]

param overwrite:

Whether to overwrite existing cached files. Defaults to False.

type overwrite:

bool

param logs_dir:

Directory to store log files. Defaults to a ‘logs’ folder in the package if None.

type logs_dir:

Optional[str]

param plet_harvester:

Optional preconfigured PLET harvester instance. If None, a new default PLETHarvester instance is created.

type plet_harvester:

Optional[PLETHarvester]

Returns:

Merged dataframe with dataset_name and region_id as first columns.

Return type:

pd.DataFrame