ocean_data_gateway.gateway.Gateway

class ocean_data_gateway.gateway.Gateway(*args, **kwargs)[source]

Bases: ocean_data_gateway.utils.Reader

Wraps together the individual readers in order to have a single way to search.

kwargs_all

Input keyword arguments that are not specific to one of the readers. These may include “approach”, “parallel”, “kw” containing the time and space region to search for, etc.

Type

dict

kwargs

Keyword arguments that contain specific arguments for the readers.

Type

dict

Attributes
data

Return the data, given metadata.

dataset_ids

Find dataset_ids for each source/reader.

meta

Find and return metadata for datasets.

sources

Set up data sources (readers).

Methods

clear()

data_by_dataset(dataset_id)

Return the data for a single dataset_id.

get(k[,d])

items()

keys()

Regular dict-like way to return keys.

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

as a 2-tuple; but raise KeyError if D is empty.

qc([dataset_ids, verbose, skip_units])

Light quality check on data.

setdefault(k[,d])

update([E, ]**F)

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

Regular dict-like way to return values.

__init__(*args, **kwargs)[source]
Parameters
  • kw (dict) – Contains space and time search constraints: min_lon, max_lon, min_lat, max_lat, min_time, max_time.

  • approach (string) – approach is defined as ‘stations’ or ‘region’ depending on user choice.

  • parallel (boolean, optional) – If True, run with simple parallelization using multiprocessing. If False, run serially. True by default. If input in this manner, the same value is used for all readers. If input by individual reader dictionary, the value can vary by reader.

  • readers (ocean_data_gateway Reader, list of readers, optional) – Use this to use fewer than the full set of readers. For example, readers=odg.erddap or to specifically include all by name readers = [odg.ErddapReader, odg.axdsReader, odg.localReader].

  • erddap (dict, optional) – Dictionary of reader specifications. For example, erddap={‘known_server’: ‘ioos’}. See odg.erddap.ErddapReader for more input options.

  • axds (dict, optional) – Dictionary of reader specifications. For example, axds={‘axds_type’: ‘platform2’}. See odg.axds.AxdsReader for more input options.

  • local (dict, optional) – Dictionary of reader specifications. For example, local={‘filenames’: filenames} for a list of filenames. See odg.local.LocalReader for more input options.

  • criteria (dict, str, optional) – A dictionary describing how to recognize variables by their name and attributes with regular expressions to be used with cf-xarray. It can be local or a URL point to a nonlocal gist. This is required for running QC in Gateway. For example: >>> my_custom_criteria = {“salt”: { … “standard_name”: “sea_water_salinity$|sea_water_practical_salinity$”, … “name”: (?i)sal$|(?i)s.sea_water_practical_salinity$”}}

  • var_def (dict, optional) – A dictionary with the same keys as criteria (criteria can have more) that describes QC definitions and units. It should include the variable units, fail_span, and suspect_span. For example: >>> var_def = {“salt”: {“units”: “psu”, … “fail_span”: [-10, 60], “suspect_span”: [-1, 45]}}

Notes

To select search variables, input the variable names to each reader individually in the format erddap={‘variables’: [list of variables]}. Make sure that the variable names are correct for each individual reader. Check individual reader docs for more information.

Alternatively, the user can input criteria and then input as variables the nicknames provided in criteria for variable names. These should then be input generally, not to an individual reader.

Input keyword arguments that are not specific to one of the readers will be collected in local dictionary kwargs_all. These may include “approach”, “parallel”, “kw” containing the time and space region to search for, etc.

Input keyword arguments that are specific to readers will be collected in local dictionary kwargs.

Methods

__init__(*args, **kwargs)

param kw

Contains space and time search constraints: min_lon, max_lon,

clear()

data_by_dataset(dataset_id)

Return the data for a single dataset_id.

get(k[,d])

items()

keys()

Regular dict-like way to return keys.

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

as a 2-tuple; but raise KeyError if D is empty.

qc([dataset_ids, verbose, skip_units])

Light quality check on data.

setdefault(k[,d])

update([E, ]**F)

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

Regular dict-like way to return values.

Attributes

data

Return the data, given metadata.

dataset_ids

Find dataset_ids for each source/reader.

meta

Find and return metadata for datasets.

sources

Set up data sources (readers).

_abc_impl = <_abc_data object>
clear()None.  Remove all items from D.
property data

Return the data, given metadata.

THIS IS NOW OUTDATED.

Notes

This is either done in parallel with the multiprocessing library or in serial.

data_by_dataset(dataset_id)[source]

Return the data for a single dataset_id.

All available sources are checked (in order) for the dataset. Once a dataset matching dataset_id is found, it is returned.

Returns

Return type

An xarray Dataset

Notes

Data is read into memory.

property dataset_ids

Find dataset_ids for each source/reader.

Returns

Return type

A list of dataset_ids where each entry in the list corresponds to one source/reader, which in turn contains a list of dataset_ids.

get(k[, d])D[k] if k in D, else d.  d defaults to None.
items()a set-like object providing a view on D’s items
keys()

Regular dict-like way to return keys.

property meta

Find and return metadata for datasets.

Returns

Return type

A list with an entry for each reader. Each entry in the list contains a pandas DataFrames of metadata for that reader.

Notes

This is done by querying each data source function for metadata and then using the metadata for quick returns.

This will not rerun if the metadata has already been found.

Different sources have different metadata, though certain attributes are always available.

pop(k[, d])v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()(k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

qc(dataset_ids=None, verbose=False, skip_units=False)[source]

Light quality check on data.

This runs one IOOS QARTOD on data as a first order quality check. Only returns data that is quality checked.

Requires pint for unit handling. Requires user-input criteria and var_def to run.

This is slow if your data is both chunks of time and space, so this should first narrow by both as much as possible.

Parameters
  • dataset_ids (str, list, optional) – Read in data for dataset_ids specifically. If none are provided, data will be read in for all self.keys().

  • verbose (boolean, optional) – If True, report summary statistics on QC flag distribution in datasets.

  • skip_units (boolean, optional) – If True, do not interpret or alter units and assume the data is in the units described in var_def already.

Returns

Return type

Dataset with added variables for each variable in dataset that was checked, with name of [variable]+’_qc’.

Notes

Code has been saved for data in DataFrames, but is changing so that data will be in Datasets. This way, can use cf-xarray functionality for custom variable names and easier to have recognizable units for variables with netcdf than csv.

setdefault(k[, d])D.get(k,d), also set D[k]=d if k not in D
property sources

Set up data sources (readers).

Notes

All readers are included by default (readers as listed in odg._SOURCES). See

__init__ for options.

update([E, ]**F)None.  Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

Regular dict-like way to return values.