Detailed usage

[1]:
from cf_xarray.units import units  # isort:skip
import pint_xarray  # isort:skip

pint_xarray.unit_registry = units  # isort:skip
import ocean_data_gateway as odg
import pandas as pd
import xarray as xr
import numpy as np
pd.set_option('display.max_rows', 5)

General Options

[2]:
kw = {
    "min_lon": -124.0,
    "max_lon": -123.0,
    "min_lat": 39.0,
    "max_lat": 40.0,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

Parallel

You can control readers individually as needed. For example, you could input the keyword parallel, which every reader accepts, per individual reader (in case you want different values for different readers), or you can input it for all readers by including it in kwargs generally. It runs in parallel using the joblib Parallel and delayed modules with multiprocesses — running loops on different cores.

[3]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'parallel': True,
          'erddap': {
                           'known_server': 'ioos',
#                            'parallel': False,
                           'variables': 'salinity',
          },
          'axds': {'catalog_name': None,
#                          'parallel': True,
                         'axds_type': 'platform2',
                         'variables': 'Salinity'},
          }
data = odg.Gateway(**kwargs)

Reader Choice

Your reader choices can be selected as follows, where odg.erddap connects to ERDDAP servers, the odg.axds connects to Axiom databases, and the odg.local enables easy local file read-in. If you don’t input any reader, it will use all of them. Alternatively you can input some subset.

[4]:
readers = [odg.erddap,
           odg.axds,
           odg.local]

Use only ERDDAP reader and Axiom reader:

[5]:
data = odg.Gateway(kw=kw, approach='region',
                   readers=[odg.erddap,
                            odg.axds])

Configure custom criteria and variable definitions

For full functionality of ocean_data_gateway, you’ll want to input both criteria and var_def. However, it is not strictly necessary to have these defined for all search scenarios. Both are required for running the QC function, and criteria are useful for being able to search for variables in readers with ease.

Custom variable criteria

Capability in a dependency, cf-xarray, allows the user to call variables in xarray Datasets by user-defined names using regular expressions to match variables definitions with these names. These dictionaries can be input as local dictionaries or can be brought in from, for example, a gist URL. To demonstrate, here is an available custom criteria dictionary:

[6]:
url = 'https://gist.githubusercontent.com/kthyng/c3cc27de6b4449e1776ce79215d5e732/raw/af448937e4896535e36ef6522df8460e8f928cd6/my_custom_criteria.py'
criteria = odg.return_response(url)
criteria
[6]:
{'ssh': {'standard_name': 'sea_surface_height$|sea_surface_elevation|sea_surface_height_above_sea_level$',
  'name': '(?i)sea_surface_elevation(?!.*?_qc)|(?i)sea_surface_height_above_sea_level_geoid_mllw$|(?i)zeta$|(?i)Sea Surface Height(?!.*?_qc)|(?i)Water Surface above Datum(?!.*?_qc)'},
 'temp': {'name': '(?i)temp$|(?i)temperature$|(?i)tem$|(?i)s.sea_water_temperature$|(?i)temperature(?!.*(skin|ground|air|_qc))'},
 'salt': {'standard_name': 'sea_water_salinity$|sea_water_practical_salinity$',
  'name': '(?i)salinity(?!.*(soil|_qc))|(?i)sea_water_salinity$|(?i)sea_water_practical_salinity$|(?i)salinity$|(?i)salt$|(?i)sal$|(?i)s.sea_water_practical_salinity$'},
 'u': {'standard_name': 'eastward_sea_water_velocity$|sea_water_x_velocity|surface_eastward_sea_water_velocity',
  'name': '(?i)eastward_sea_water_velocity(?!.*?_qc)|(?i)sea_water_x_velocity(?!.*?_qc)|(?i)uo(?!.*?_qc)'},
 'v': {'standard_name': 'northward_sea_water_velocity$|sea_water_y_velocity|surface_northward_sea_water_velocity',
  'name': '(?i)northward_sea_water_velocity(?!.*?_qc)|(?i)sea_water_y_velocity(?!.*?_qc)|(?i)vo(?!.*?_qc)'},
 'wind_speed': {'standard_name': 'wind_speed$'}}

The keys of the dictionary are the nicknames or aliases that can be used to refer to the variable, so they can be input as variables (see that section). Each variable nickname then has a sub-dictionary that contains keys that are the names of attributes that may be in the variable metadata and values which are a string of regular expressions that will be used to search for matches in the readers as well as in the QC function for variables in the datasets.

Examples of keys: * standard_name * long_name * Axis * coordinates * name (this will search the name of the variable itself and should always be used) * units

Hints for defining the regular expressions: * | is a logical “or” to indicate that any of the items in the string would count as a match if they match individually. * $ at the end means it will only find exact matches for the end of the variable, so temperature$ would not match temperature_air * (?!.*?_qc) at the end means it will not match with the string if “_qc” is anywhere in the string * (?!.*(skin|ground)) at the end means to ignore match if it contains “skin” or “ground” * (?i) at the beginning means to ignore case * .* at the beginning of a regex expression to indicate there could be characters in front of the match, so that .*temperature would match with sea_water_temperature.

Variable definitions

Variables to be used in the model-data comparison need to be chosen and have some basic information attached: units, and reasonable ranges for the variable in the units (fail_span and suspect_span). These will be used to align the data and models to be sure we are making appropriate comparisons. The ranges are used for basic QC. Like criteria, var_def can be defined locally or be brought in from a nonlocal gist. For example,

[7]:
url = 'https://gist.githubusercontent.com/kthyng/b8056748a811479460b6d5fc5cb5537b/raw/6b531cc5d3072ff6a4f5174f882d7d91d880cbf8/my_var_def.py'
var_def = odg.return_response(url)
var_def
[7]:
{'temp': {'units': 'degree_Celsius',
  'fail_span': [-100, 100],
  'suspect_span': [-10, 40]},
 'salt': {'units': 'psu', 'fail_span': [-10, 60], 'suspect_span': [-1, 45]},
 'u': {'units': 'm/s', 'fail_span': [-10, 10], 'suspect_span': [-5, 5]},
 'v': {'units': 'm/s', 'fail_span': [-10, 10], 'suspect_span': [-5, 5]},
 'ssh': {'units': 'm', 'fail_span': [-10, 10], 'suspect_span': [-3, 3]}}

Region

Search by time/space region.

All variables

Don’t input anything with the variables keyword, or use 'variables': None:

[8]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],
          'variables': None
}
data = odg.Gateway(**kwargs)

By variables(s)

If you have input criteria as described above, you can request variables from the readers using your variable nicknames (demonstrated here). If not, but you still want to be able to search by variable, see “More ways to access with variables” for the more tedious process of figuring out what variable names to input and the format for doing so.

Here is an example of the variables retrieved from the readers for nonlocal criteria and var_def:

[9]:
criteria_url = 'https://gist.githubusercontent.com/kthyng/c3cc27de6b4449e1776ce79215d5e732/raw/af448937e4896535e36ef6522df8460e8f928cd6/my_custom_criteria.py'
var_def_url = 'https://gist.githubusercontent.com/kthyng/b8056748a811479460b6d5fc5cb5537b/raw/6b531cc5d3072ff6a4f5174f882d7d91d880cbf8/my_var_def.py'

kwargs = {
          'criteria': criteria_url,
          'var_def': var_def_url,
          'kw': kw,
          'approach': 'region',
          'variables': ['ssh', 'temp']
}
data = odg.Gateway(**kwargs)

The variables that correspond to the input nicknames are specific to each reader, so to look at them we have to dig down to the individual source:

[10]:
print(f'ERDDAP IOOS: {data.sources[0].variables}')
print(f'ERDDAP Coastwatch: {data.sources[1].variables}')
print(f'AXDS platforms: {data.sources[2].variables}')
ERDDAP IOOS: ['sea_surface_height_above_sea_level', 'sea_surface_height_above_sea_level_geoid_mllw', 'temperature']
ERDDAP Coastwatch: ['temperature0', 'temp', 'temperature', 'temperature1']
AXDS platforms: ['Sea Surface Height', 'Water Surface above Datum', 'Temperature: Water Temperature', 'Temperature: Sea Surface Temperature', 'Temperature: Surface Temperature']

The criteria and var_def can also be defined locally, and can be very simple:

[11]:
criteria = {'temp': {'name': 'temperature'}}
var_def = {'temp': {'units': 'degree_Celsius',
  'fail_span': [-100, 100],
  'suspect_span': [-10, 40]}}


kwargs = {
          'criteria': criteria,
          'var_def': var_def,
          'kw': kw,
          'approach': 'region',
          'variables': 'temp'
}
data = odg.Gateway(**kwargs)

print(f'ERDDAP IOOS: {data.sources[0].variables}')
print(f'ERDDAP Coastwatch: {data.sources[1].variables}')
print(f'AXDS platforms: {data.sources[2].variables}')
ERDDAP IOOS: ['temperature_qc', 'temperature']
ERDDAP Coastwatch: ['temperature0', 'temperature', 'temperature1']
AXDS platforms: []

Modifying variables

The user may alter the variables found. This may be useful if the input criteria returned variables that the user doesn’t want. They could modify their criteria to fix this, or simply remove the variables they don’t want to include. In this case, the dataset_ids will be updated once run again.

[12]:
criteria = {'salt': {'name': '(?i)salinity'}}
var_def = {'salt': {'units': 'psu', 'fail_span': [-10, 60], 'suspect_span': [-1, 45]}}

kwargs = {
          'criteria': criteria,
          'var_def': var_def,
          'kw': kw,
          'approach': 'region',
          'variables': 'salt'
}
data = odg.Gateway(**kwargs)

print(f'ERDDAP IOOS: {data.sources[0].variables}')
print(f'ERDDAP Coastwatch: {data.sources[1].variables}')
print(f'AXDS platforms: {data.sources[2].variables}')
ERDDAP IOOS: ['salinity_qc', 'salinity']
ERDDAP Coastwatch: ['salinity']
AXDS platforms: ['Salinity']

We don’t want salinity_qc variables to be specifically returned. We could change criteria so that variable names with _qc are not found as matches with

criteria = {'salt': {'name': '(?i)salinity(?!.*?_qc)'}}

or we can remove the variable after the search has been created as follows:

[13]:
data.sources[0].variables.pop(1)
[13]:
'salinity'
[14]:
print(f'ERDDAP IOOS: {data.sources[0].variables}')
print(f'ERDDAP Coastwatch: {data.sources[1].variables}')
print(f'AXDS platforms: {data.sources[2].variables}')
ERDDAP IOOS: ['salinity_qc']
ERDDAP Coastwatch: ['salinity']
AXDS platforms: ['Salinity']

Stations

You can search by a general station name to be searched for or by the specific database dataset_id if you know it (from performing a search previously, for example). The syntax is the same either way.

By station name

This demonstrates the case that you know names of stations, but they might not be the names in the particular databases.

In the follow example, I use some station id’s I know off the top of my head. The module will check all of the readers for the station names.

[15]:
kwargs = {
          'approach': 'stations',
          'stations': ['8771972','SFBOFS','42020','TABS_B']
}
data = odg.Gateway(**kwargs)
[16]:
data.dataset_ids
[16]:
['noaa_nos_co_ops_8771972',
 'wmo_42020',
 'tabs_b',
 '03158b5d-f712-45f2-b05d-e4954372c1ce']

By Dataset ID

Once we know the database dataset_ids, we can use them directly for future searches. Note that they can generally be input to the search instead of associated individually with each reader, and are input as “stations” like in the previous example.

[17]:
kwargs = {
          'approach': 'stations',
          'stations': ['tabs_b', 'wmo_42020', 'noaa_nos_co_ops_8771972','03158b5d-f712-45f2-b05d-e4954372c1ce'],
          'erddap': {
                          'known_server': 'ioos',
          },
          'axds': {
                          'axds_type': 'layer_group',}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[17]:
['tabs_b',
 'wmo_42020',
 'noaa_nos_co_ops_8771972',
 '03158b5d-f712-45f2-b05d-e4954372c1ce']

For axds_type=='layer_group', you can input either the module UUID or the layer_group UUID — it generally doesn’t matter unless the layer_group (a subsidiary of the module) has a different opendap url than the module or other layer_groups associated with the module.

Here we show using either the module or layer_group UUID, and inputting it in either the base level of the kwargs dictionary or specified for a reader. They all have the same result.

[18]:
# Example with module uuid input as dataset_id for 'layer_group'
kwargs = {
          'approach': 'stations',
                'stations': '03158b5d-f712-45f2-b05d-e4954372c1ce',
            'axds': {
                'axds_type': 'layer_group',}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[18]:
['03158b5d-f712-45f2-b05d-e4954372c1ce']
[19]:
# Example with module uuid input as dataset_id for 'layer_group'
kwargs = {
          'approach': 'stations',
            'axds': {
                'axds_type': 'layer_group',
                'stations': '03158b5d-f712-45f2-b05d-e4954372c1ce'}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[19]:
['03158b5d-f712-45f2-b05d-e4954372c1ce']
[20]:
# Example with layer_group uuid input as station for 'layer_group'
kwargs = {
          'approach': 'stations',
              'stations': '04784baa-6be8-4aa7-b039-269f35e92e91',
            'axds': {
                'axds_type': 'layer_group',}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[20]:
['03158b5d-f712-45f2-b05d-e4954372c1ce']
[21]:
# Example with layer_group uuid input as station for 'layer_group'
kwargs = {
          'approach': 'stations',
            'axds': {
                'axds_type': 'layer_group',
              'stations': '04784baa-6be8-4aa7-b039-269f35e92e91'}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[21]:
['03158b5d-f712-45f2-b05d-e4954372c1ce']

Include Time Range

By default, the full available time range will be returned for each dataset unless the user specifies one to narrow the returned datasets in time.

Data defined in previous cell shows long time range for any of the sources you can tell there are 4 sources considered since the list in the previous code cell has 4 elements.

[22]:
data.sources[0].kw
[22]:
{'min_time': '1900-01-01', 'max_time': '2100-12-31'}

A shorter time range is shown in the following since it is specified.

[23]:
kwargs = {
          'kw': {'min_time': '2017-1-1',
                 'max_time': '2017-1-2'},
          'approach': 'stations',
          'stations': ['8771972']
}
data = odg.Gateway(**kwargs)
data.sources[0].kw
[23]:
{'min_time': '2017-1-1', 'max_time': '2017-1-2'}

Reader Options

ERDDAP Reader

By default, the Data module will use erddap with two known servers: IOOS and Coastwatch.

[24]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap]
}
data = odg.Gateway(**kwargs)
data.sources[0].name, data.sources[1].name
[24]:
('erddap_ioos', 'erddap_coastwatch')

Choose one known server

The user can specify to use just one of these:

[25]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap],
          'erddap': {
                      'known_server': ['ioos'],  # or 'coastwatch'
          }
}
data = odg.Gateway(**kwargs)
data.sources[0].name
[25]:
'erddap_ioos'

New ERDDAP Server

You can give the necessary information to use a different ERDDAP server.

[26]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap],
            'erddap': {
                'known_server': 'ifremer',
                'protocol': 'tabledap',
                'server': 'http://www.ifremer.fr/erddap'
            }
}
data = odg.Gateway(**kwargs)
[27]:
data.dataset_ids
[27]:
['copernicus-fos',
 'ArgoFloats',
 'OceanGlidersGDACTrajectories',
 'ArgoFloats-synthetic-BGC']

AXDS Reader

By default the Gateway class will use axds with two types of data: ‘platform2’ (like gliders) or ‘layer_group’ (model output, gridded products).

[28]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.axds]
}
data = odg.Gateway(**kwargs)
data.sources[0].name, data.sources[1].name
[28]:
('axds_platform2', 'axds_layer_group')

Specify AXDS Type

The user can specify to use just one of these:

[29]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.axds],
          'axds': {
                          'axds_type': 'platform2',  # or 'layer_group'
          }
}
data = odg.Gateway(**kwargs)
data.sources[0].name
[29]:
'axds_platform2'

Local Files

I can’t remember the process by which I got these files from a portal now, but they are just meant to be sample files anyway. Hopefully this will work reasonably well with other files too.

The region and stations approach doesn’t work as well with local files if the user would only be inputting filenames if they know they want to use them. It could be useful to use the approaches in the case that the user has a bunch of files somewhere or a catalog that already exists and they just want to point to that and have the code filter down. That code is not in place but could be if that is a good use case.

So it currently doesn’t matter which approach is used for local files. There is a default kw and region if nothing is input and in this case that is fine since neither are used.

[30]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

Can look at metadata or data

[31]:
data.meta
[31]:
lat_variable lon_variable geospatial_lat_min geospatial_lon_min geospatial_lat_max catalog_dir time_coverage_start coords time_variable variables geospatial_lon_max download_url time_coverage_end
ANIMctd14.nc lat lon 69.850874 -152.581114 71.488255 /Users/kthyng/.ocean_data_gateway/catalogs/ 2014-07-31T15:33:33.999999314 [time, lat, lon, pressure] time [station_name, sal, tem, fluoro, turbidity, PA... -141.717438 /Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe... 2014-08-07T21:35:54.000004381
SBE16plus_01604787_2015_08_09_final.csv NaN NaN 70.6349 -150.237 70.6349 /Users/kthyng/.ocean_data_gateway/catalogs/ 2014-08-01T12:00:05Z NaN NaN [time, latitude, longitude, water_depth, Condu... -150.237 /Users/kthyng/Downloads/Harrison_Bay_CTD_Moori... 2015-08-09T06:00:05Z
[32]:
data['ANIMctd14.nc']
[32]:
<xarray.Dataset>
Dimensions:              (nzmax: 1587, profile: 57)
Coordinates:
    time                 (profile) datetime64[ns] 2014-08-07T02:02:34.0000028...
    lat                  (profile) float64 71.27 71.23 71.18 ... 70.45 70.46
    lon                  (profile) float64 -152.2 -152.3 ... -145.8 -145.8
    pressure             (profile, nzmax) float64 2.187 2.399 ... -9.999e+03
Dimensions without coordinates: nzmax, profile
Data variables:
    station_name         (profile) |S12 b'1.01        ' ... b'T-XA        '
    sal                  (profile, nzmax) float64 24.85 24.85 ... -9.999e+03
    tem                  (profile, nzmax) float64 1.625 1.589 ... -9.999e+03
    fluoro               (profile, nzmax) float64 0.6842 0.7452 ... -9.999e+03
    turbidity            (profile, nzmax) float64 0.604 0.6895 ... -9.999e+03
    PAR                  (profile, nzmax) float64 9.596 9.097 ... -9.999e+03
    platform_variable    float64 9.969e+36
    instrument_variable  float64 9.969e+36
    crs                  float64 9.969e+36
Attributes: (12/35)
    Conventions:                CF-1.6
    Metadata_Conventions:       Unidata Dataset Discovery v1.0
    featureType:                profile
    cdm_data_type:              Station
    nodc_template_version:      NODC_NetCDF_Profile_Incomplete_Templete_v1.1
    standard_name_vocabulary:   NetCDF Climate and Forecast(CF) Metadata Conv...
    ...                         ...
    keywords:                   OCEAN TEMPERATURE,SALINITY,TURBIDITY,WATER PR...
    acknowledgement:            Kasper, J., CTD measurements collected from s...
    publisher_name:             Tim Whiteaker
    publisher_email:            whiteaker@utexas.edu
    publisher_url:              http://arcticstudies.org/animida_iii
    license:                    Creative Commons Attribution 3.0 United State...
[33]:
data['SBE16plus_01604787_2015_08_09_final.csv']
[33]:
time latitude longitude water_depth Conductivity_[S/m] Pressure_[db] Temperature_ITS90_[deg C] Salinity_Practical_[PSU] Voltage0_[volts] Instrument_Time_[juliandays] flag
0 2014-08-01T12:00:05Z 70.6349 -150.237 13.0 2.495646 12.687 -1.4619 31.0905 0.3091 213.500058 0.0
1 2014-08-01T13:00:05Z 70.6349 -150.237 13.0 2.495454 12.699 -1.4595 31.0854 0.3265 213.541725 0.0
... ... ... ... ... ... ... ... ... ... ... ...
8945 2015-08-09T05:00:05Z 70.6349 -150.237 13.0 2.591448 12.777 0.3619 30.5086 0.3873 586.208391 0.0
8946 2015-08-09T06:00:05Z 70.6349 -150.237 13.0 2.585462 12.754 0.2862 30.5062 0.2441 586.250058 0.0

8947 rows × 11 columns

Other Functionality

Data subselection

You can pull out the data for one, several, or all of the dataset_ids found in your search, as demonstrated here.

[34]:
kw = {'min_lon': -94,
 'max_lon': -92,
 'min_lat': 28,
 'max_lat': 30,
 'min_time': pd.Timestamp('2021-05-27'),
 'max_time': pd.Timestamp('2021-06-02')}

kwargs = {
          'kw': kw,
          'approach': 'region',
          'parallel': False,
          'readers': [odg.erddap],
          'erddap': {
                          'known_server': ['ioos'],
                           'variables': [
                                       ['sea_surface_height_above_sea_level_geoid_mllw']
                           ]
          },
}

data = odg.Gateway(**kwargs)
[35]:
# all dataset_ids found for this search
data.dataset_ids
[35]:
['noaa_nos_co_ops_8770475',
 'noaa_nos_co_ops_8770520',
 'noaa_nos_co_ops_8768094',
 'noaa_nos_co_ops_8770570',
 'noaa_nos_co_ops_8766072',
 'noaa_nos_co_ops_8770822']

Read in data for 1 dataset_id

Need to index with 0 to pull out the initial reader from the list (in this case there is only one).

[36]:
data['noaa_nos_co_ops_8770822']
[36]:
<xarray.Dataset>
Dimensions:                                        (time: 1458, timeseries: 1)
Coordinates:
    latitude                                       (timeseries) float64 29.68
    longitude                                      (timeseries) float64 -93.84
  * time                                           (time) datetime64[ns] 2021...
Dimensions without coordinates: timeseries
Data variables:
    sea_surface_height_above_sea_level_geoid_mllw  (time, timeseries) float64 ...
Attributes: (12/53)
    cdm_data_type:                 TimeSeries
    cdm_timeseries_variables:      station,longitude,latitude
    contributor_email:             None,feedback@axiomdatascience.com
    contributor_name:              Gulf of Mexico Coastal Ocean Observing Sys...
    contributor_role:              funder,processor
    contributor_role_vocabulary:   NERC
    ...                            ...
    standard_name_vocabulary:      CF Standard Name Table v72
    summary:                       Timeseries data from 'Texas Point, Sabine ...
    time_coverage_end:             2021-09-08T13:18:00Z
    time_coverage_start:           2015-09-03T15:42:00Z
    title:                         Texas Point, Sabine Pass
    Westernmost_Easting:           -93.8369
[37]:
# See what dataset_ids have been read in
data.keys()
[37]:
dict_keys(['noaa_nos_co_ops_8770822'])

Read in data for 2 dataset_ids

[38]:
data['noaa_nos_co_ops_8770822']
data['noaa_nos_co_ops_8770475']
[38]:
<xarray.Dataset>
Dimensions:                                        (time: 1453, timeseries: 1)
Coordinates:
    latitude                                       (timeseries) float64 29.87
    longitude                                      (timeseries) float64 -93.93
  * time                                           (time) datetime64[ns] 2021...
Dimensions without coordinates: timeseries
Data variables:
    sea_surface_height_above_sea_level_geoid_mllw  (time, timeseries) float64 ...
Attributes: (12/53)
    cdm_data_type:                 TimeSeries
    cdm_timeseries_variables:      station,longitude,latitude
    contributor_email:             None,feedback@axiomdatascience.com
    contributor_name:              Gulf of Mexico Coastal Ocean Observing Sys...
    contributor_role:              funder,processor
    contributor_role_vocabulary:   NERC
    ...                            ...
    standard_name_vocabulary:      CF Standard Name Table v72
    summary:                       Timeseries data from 'Port Arthur, TX' (ur...
    time_coverage_end:             2021-09-08T13:06:00Z
    time_coverage_start:           2015-05-05T13:00:00Z
    title:                         Port Arthur, TX
    Westernmost_Easting:           -93.93
[39]:
# See what dataset_ids have been read in
data.keys()
[39]:
dict_keys(['noaa_nos_co_ops_8770822', 'noaa_nos_co_ops_8770475'])

Read in data for all dataset_ids

[40]:
for dataset_id in data.dataset_ids:
    data[dataset_id]
[41]:
# See what dataset_ids have been read in
data.keys()
[41]:
dict_keys(['noaa_nos_co_ops_8770822', 'noaa_nos_co_ops_8770475', 'noaa_nos_co_ops_8770520', 'noaa_nos_co_ops_8768094', 'noaa_nos_co_ops_8770570', 'noaa_nos_co_ops_8766072'])

Data QC

Some quality checking of the data is possible. This requires user-input criteria to select which variables to keep and be able to identify variables by a standardized name, and var_def to have the basic information to run the QC on the variable as well as make sure it is in the known units.

Basic quality control is done for range testing of data. Currently, the output is available as datasets of flags and as a summary report (verbose=True).

[42]:
criteria_url = 'https://gist.githubusercontent.com/kthyng/c3cc27de6b4449e1776ce79215d5e732/raw/af448937e4896535e36ef6522df8460e8f928cd6/my_custom_criteria.py'
var_def_url = 'https://gist.githubusercontent.com/kthyng/b8056748a811479460b6d5fc5cb5537b/raw/6b531cc5d3072ff6a4f5174f882d7d91d880cbf8/my_var_def.py'

filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']
data = odg.Gateway(criteria=criteria_url, var_def=var_def_url, readers=odg.local, local={'filenames': filenames})

QC can be run on a specific dataset_id or ids too, where the dataset_ids have to be input as nested lists that match the heirarchy of the sources:

[43]:
data.qc(dataset_ids='ANIMctd14.nc')
[43]:
{'ANIMctd14.nc': <xarray.Dataset>
 Dimensions:   (nzmax: 1587, profile: 57)
 Coordinates:
     time      (profile) datetime64[ns] 2014-08-07T02:02:34.000002890 ... 2014...
     lat       (profile) float64 71.27 71.23 71.18 71.12 ... 70.38 70.45 70.46
     lon       (profile) float64 -152.2 -152.3 -152.4 ... -146.0 -145.8 -145.8
     pressure  (profile, nzmax) float64 2.187 2.399 ... -9.999e+03 -9.999e+03
 Dimensions without coordinates: nzmax, profile
 Data variables:
     tem       (profile, nzmax) float64 1.625 1.589 ... -9.999e+03 -9.999e+03
     sal       (profile, nzmax) float64 24.85 24.85 ... -9.999e+03 -9.999e+03
     tem_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4
     sal_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 4 1 ... 4 4 4 4 4 4 4 4 4 4
 Attributes: (12/35)
     Conventions:                CF-1.6
     Metadata_Conventions:       Unidata Dataset Discovery v1.0
     featureType:                profile
     cdm_data_type:              Station
     nodc_template_version:      NODC_NetCDF_Profile_Incomplete_Templete_v1.1
     standard_name_vocabulary:   NetCDF Climate and Forecast(CF) Metadata Conv...
     ...                         ...
     keywords:                   OCEAN TEMPERATURE,SALINITY,TURBIDITY,WATER PR...
     acknowledgement:            Kasper, J., CTD measurements collected from s...
     publisher_name:             Tim Whiteaker
     publisher_email:            whiteaker@utexas.edu
     publisher_url:              http://arcticstudies.org/animida_iii
     license:                    Creative Commons Attribution 3.0 United State...}

Closer examination of the data, above, indicates that the missing values in the data are presented in the QC check as Failing, which is why there are so many values coming through as FAIL.