Quick Start Demo of Ocean Data Gateway

Goal: to be able to search for and handle the read in of ocean datasets easily. The package we’ve written for this is called ocean_data_gateway, and here we show a short demo.

[1]:
import ocean_data_gateway as odg
import pandas as pd
pd.set_option('display.max_rows', 5)

Find Data in a Region

Here we will search for data in the Bering Sea region.

[2]:
kw = {
    "min_lon": -180,
    "max_lon": -158,
    "min_lat": 50,
    "max_lat": 66,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

All the servers

Set up search object, data, then do an initial metadata search to find the dataset_ids of the relevant datasets. We are searching for all variables currently.

[3]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region')

# find dataset_ids to make sure it works
data.dataset_ids[:5]

CPU times: user 6.52 s, sys: 580 ms, total: 7.1 s
Wall time: 2min 5s
[3]:
['yugayu-lake-bethel-ak',
 'noaa_nos_co_ops_9461710',
 'org_mxak_captains_bay',
 'noaa_nos_co_ops_9458917',
 'noaa_nos_co_ops_9461341']

The search checked dataset_ids for each of 5 readers and found the following number of datasets in them:

[4]:
len(data.dataset_ids)
[4]:
441

This searches through 2 ERDDAP servers (but more can be added by the user), 2 Axiom databases, and any known local files.

Just one server

Since that search took 1.5 min just for the dataset_ids, let’s narrow which databases are checked.

[5]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, erddap={'known_server': 'ioos'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))
['yugayu-lake-bethel-ak', 'noaa_nos_co_ops_9461710', 'org_mxak_captains_bay', 'noaa_nos_co_ops_9458917', 'noaa_nos_co_ops_9461341'] 224
CPU times: user 21.6 ms, sys: 9.78 ms, total: 31.4 ms
Wall time: 978 ms
[6]:
%%time
data.meta
CPU times: user 316 ms, sys: 124 ms, total: 441 ms
Wall time: 10.3 s
[6]:
database download_url info_url is_prediction geospatial_lat_min geospatial_lat_max geospatial_lon_min geospatial_lon_max time_coverage_start time_coverage_end defaultDataQuery subsetVariables keywords id infoUrl institution featureType source sourceUrl variable names
gov_usda_nrcs_sntl_973 http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/gov_... False 64.53482 64.53482 -163.42140 -163.42140 2000-10-25T21:00:00Z 2021-09-07T22:00:00Z wind_speed_qc_agg,relative_humidity_qc_agg,win... NA NA 104683 https://sensors.ioos.us/#metadata/104683/station SNOTEL TimeSeriesProfile NA https://wcc.sc.egov.usda.gov/nwcc/site?sitenum... None
noaa_nos_co_ops_9462554 http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/noaa... True 53.61000 53.61000 -167.04500 -167.04500 2015-05-05T13:23:00Z 2021-09-15T08:11:00Z sea_surface_height_amplitude_due_to_geocentric... NA NA 15495 https://sensors.ioos.us/#metadata/15495/station NOAA Center for Operational Oceanographic Prod... TimeSeries NA https://sensors.axds.co/api/ None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
noaa_nos_co_ops_9461710 http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/noaa... True 52.23200 52.23200 -174.17260 -174.17260 2006-08-06T23:00:00Z 2021-09-15T12:00:00Z wind_speed_qc_agg,sea_surface_height_amplitude... NA NA 12012 https://sensors.ioos.us/#metadata/12012/station NOAA Center for Operational Oceanographic Prod... TimeSeries NA https://tidesandcurrents.noaa.gov/api/ None
yugayu-lake-bethel-ak http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/yuga... False 60.79995 60.79995 -161.76575 -161.76575 2020-10-24T18:15:00Z 2021-04-23T18:15:00Z air_temperature,sea_water_temperature,z,time&t... NA NA 105532 https://sensors.ioos.us/#metadata/105532/station Fresh Eyes on Ice TimeSeriesProfile NA https://app.beadedstream.com/projects/7604/sit... None

224 rows × 20 columns

[7]:
%%time
data['noaa_nos_co_ops_nmta2']
CPU times: user 356 ms, sys: 97.5 ms, total: 454 ms
Wall time: 23.8 s
[7]:
<xarray.Dataset>
Dimensions:                (time: 463, timeseries: 1)
Coordinates:
    latitude               (timeseries) float64 64.5
    longitude              (timeseries) float64 -165.4
  * time                   (time) datetime64[ns] 2021-04-01 ... 2021-04-02T23...
Dimensions without coordinates: timeseries
Data variables:
    station                (timeseries) object 'NMTA2 - 9468756 - Nome, Norto...
    rowSize                (timeseries) int32 488615
    z                      (time, timeseries) float64 0.0 0.0 0.0 ... 0.0 0.0
    air_pressure           (time, timeseries) float64 1.018e+03 ... 998.7
    air_temperature        (time, timeseries) float64 -14.8 -14.8 ... -9.3 -9.3
    sea_water_temperature  (time, timeseries) float64 nan nan nan ... nan nan
    wind_speed_of_gust     (time, timeseries) float64 26.4 24.16 ... 24.16 25.28
    wind_speed             (time, timeseries) float64 9.8 8.2 6.7 ... 8.2 8.8
    wind_from_direction    (time, timeseries) float64 320.0 320.0 ... 60.0 70.0
Attributes: (12/53)
    cdm_data_type:                 TimeSeries
    cdm_timeseries_variables:      station,longitude,latitude
    contributor_email:             ,feedback@axiomdatascience.com
    contributor_name:              World Meteorological Organization (WMO),Ax...
    contributor_role:              contributor,processor
    contributor_role_vocabulary:   NERC
    ...                            ...
    standard_name_vocabulary:      CF Standard Name Table v72
    summary:                       Timeseries data from 'NMTA2 - 9468756 - No...
    time_coverage_end:             2021-09-08T13:30:00Z
    time_coverage_start:           2015-05-05T12:24:00Z
    title:                         NMTA2 - 9468756 - Nome, Norton Sound, AK
    Westernmost_Easting:           -165.43

One variable in one server

[8]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap,
                   erddap={'known_server': 'ioos', 'variables': 'sea_water_temperature'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))
['yugayu-lake-bethel-ak', 'noaa_nos_co_ops_9461710', 'noaa_nos_co_ops_atka2', 'shageluk-lake-shageluk-ak', 'noaa_nos_co_ops_9468333'] 26
CPU times: user 84.6 ms, sys: 13.5 ms, total: 98.1 ms
Wall time: 804 ms
[9]:
%%time
data.meta
CPU times: user 37.6 ms, sys: 6.38 ms, total: 43.9 ms
Wall time: 962 ms
[9]:
database download_url info_url is_prediction geospatial_lat_min geospatial_lat_max geospatial_lon_min geospatial_lon_max time_coverage_start time_coverage_end defaultDataQuery subsetVariables keywords id infoUrl institution featureType source sourceUrl variable names
noaa_nos_co_ops_snda2 http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/noaa... False 55.337000 55.337000 -160.502000 -160.502000 2015-05-05T12:06:00Z 2021-09-08T10:30:00Z air_temperature,wind_speed_of_gust,sea_water_t... NA NA 13824 https://sensors.ioos.us/#metadata/13824/station NOAA Center for Operational Oceanographic Prod... TimeSeries NA https://sensors.axds.co/api/ [sea_water_temperature]
gov_usgs_waterdata_15304000 http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/gov_... False 61.868744 61.868744 -158.113785 -158.113785 2015-05-05T12:15:00Z 2021-09-08T10:45:00Z river_discharge,lwe_thickness_of_precipitation... NA NA 11626 https://sensors.ioos.us/#metadata/11626/station USGS National Water Information System (NWIS) TimeSeries NA https://sensors.axds.co/api/ [sea_water_temperature]
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
noaa_nos_co_ops_9461710 http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/noaa... True 52.232000 52.232000 -174.172600 -174.172600 2006-08-06T23:00:00Z 2021-09-15T12:00:00Z wind_speed_qc_agg,sea_surface_height_amplitude... NA NA 12012 https://sensors.ioos.us/#metadata/12012/station NOAA Center for Operational Oceanographic Prod... TimeSeries NA https://tidesandcurrents.noaa.gov/api/ [sea_water_temperature]
yugayu-lake-bethel-ak http://erddap.sensors.ioos.us/erddap http://erddap.sensors.ioos.us/erddap/tabledap/... http://erddap.sensors.ioos.us/erddap/info/yuga... False 60.799950 60.799950 -161.765750 -161.765750 2020-10-24T18:15:00Z 2021-04-23T18:15:00Z air_temperature,sea_water_temperature,z,time&t... NA NA 105532 https://sensors.ioos.us/#metadata/105532/station Fresh Eyes on Ice TimeSeriesProfile NA https://app.beadedstream.com/projects/7604/sit... [sea_water_temperature]

26 rows × 20 columns

[10]:
%%time
data['noaa_nos_co_ops_9459450']
CPU times: user 549 ms, sys: 187 ms, total: 736 ms
Wall time: 12.3 s
[10]:
<xarray.Dataset>
Dimensions:                (time: 486, timeseries: 1)
Coordinates:
    latitude               (timeseries) float64 55.33
    longitude              (timeseries) float64 -160.5
  * time                   (time) datetime64[ns] 2021-04-01 ... 2021-04-02T23...
Dimensions without coordinates: timeseries
Data variables:
    sea_water_temperature  (time, timeseries) float64 3.8 3.8 3.8 ... 3.7 3.7
Attributes: (12/54)
    cdm_data_type:                 TimeSeries
    cdm_timeseries_variables:      station,longitude,latitude
    contributor_email:             feedback@axiomdatascience.com
    contributor_name:              Axiom Data Science
    contributor_role:              processor
    contributor_role_vocabulary:   NERC
    ...                            ...
    station_id:                    12009
    summary:                       Timeseries data from 'Sand Point' (noaa_no...
    time_coverage_end:             2021-09-15T14:00:00Z
    time_coverage_start:           1972-09-10T10:00:00Z
    title:                         Sand Point
    Westernmost_Easting:           -160.5043

Use Local Files

Local files can be easily input into the gateway using Python package intake under the hood. It is set up to automatically recognize either csv or netcdf files and be able to read them in.

[11]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})
[12]:
data.meta
[12]:
coords geospatial_lat_min geospatial_lon_max lat_variable catalog_dir time_coverage_end time_variable download_url geospatial_lon_min geospatial_lat_max time_coverage_start lon_variable variables
ANIMctd14.nc [time, lat, lon, pressure] 69.850874 -141.717438 lat /Users/kthyng/.ocean_data_gateway/catalogs/ 2014-08-07T21:35:54.000004381 time /Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe... -152.581114 71.488255 2014-07-31T15:33:33.999999314 lon [station_name, sal, tem, fluoro, turbidity, PA...
SBE16plus_01604787_2015_08_09_final.csv NaN 70.6349 -150.237 NaN /Users/kthyng/.ocean_data_gateway/catalogs/ 2015-08-09T06:00:05Z NaN /Users/kthyng/Downloads/Harrison_Bay_CTD_Moori... -150.237 70.6349 2014-08-01T12:00:05Z NaN [time, latitude, longitude, water_depth, Condu...
[13]:
data['ANIMctd14.nc']
[13]:
<xarray.Dataset>
Dimensions:              (nzmax: 1587, profile: 57)
Coordinates:
    time                 (profile) datetime64[ns] 2014-08-07T02:02:34.0000028...
    lat                  (profile) float64 71.27 71.23 71.18 ... 70.45 70.46
    lon                  (profile) float64 -152.2 -152.3 ... -145.8 -145.8
    pressure             (profile, nzmax) float64 2.187 2.399 ... -9.999e+03
Dimensions without coordinates: nzmax, profile
Data variables:
    station_name         (profile) |S12 b'1.01        ' ... b'T-XA        '
    sal                  (profile, nzmax) float64 24.85 24.85 ... -9.999e+03
    tem                  (profile, nzmax) float64 1.625 1.589 ... -9.999e+03
    fluoro               (profile, nzmax) float64 0.6842 0.7452 ... -9.999e+03
    turbidity            (profile, nzmax) float64 0.604 0.6895 ... -9.999e+03
    PAR                  (profile, nzmax) float64 9.596 9.097 ... -9.999e+03
    platform_variable    float64 9.969e+36
    instrument_variable  float64 9.969e+36
    crs                  float64 9.969e+36
Attributes: (12/35)
    Conventions:                CF-1.6
    Metadata_Conventions:       Unidata Dataset Discovery v1.0
    featureType:                profile
    cdm_data_type:              Station
    nodc_template_version:      NODC_NetCDF_Profile_Incomplete_Templete_v1.1
    standard_name_vocabulary:   NetCDF Climate and Forecast(CF) Metadata Conv...
    ...                         ...
    keywords:                   OCEAN TEMPERATURE,SALINITY,TURBIDITY,WATER PR...
    acknowledgement:            Kasper, J., CTD measurements collected from s...
    publisher_name:             Tim Whiteaker
    publisher_email:            whiteaker@utexas.edu
    publisher_url:              http://arcticstudies.org/animida_iii
    license:                    Creative Commons Attribution 3.0 United State...
[14]:
data['SBE16plus_01604787_2015_08_09_final.csv']
[14]:
time latitude longitude water_depth Conductivity_[S/m] Pressure_[db] Temperature_ITS90_[deg C] Salinity_Practical_[PSU] Voltage0_[volts] Instrument_Time_[juliandays] flag
0 2014-08-01T12:00:05Z 70.6349 -150.237 13.0 2.495646 12.687 -1.4619 31.0905 0.3091 213.500058 0.0
1 2014-08-01T13:00:05Z 70.6349 -150.237 13.0 2.495454 12.699 -1.4595 31.0854 0.3265 213.541725 0.0
... ... ... ... ... ... ... ... ... ... ... ...
8945 2015-08-09T05:00:05Z 70.6349 -150.237 13.0 2.591448 12.777 0.3619 30.5086 0.3873 586.208391 0.0
8946 2015-08-09T06:00:05Z 70.6349 -150.237 13.0 2.585462 12.754 0.2862 30.5062 0.2441 586.250058 0.0

8947 rows × 11 columns

Data QC

The user can lightly QC the data by calling data.qc() as demonstrated here. A summary of the results can be provided if requested (verbose=True), and a variable containing the qc flags (’*_qc’) is created to go along with each variable used in the dataset. To use the QC function, you need to input criteria and var_def at your initial search. More information under “Details”.

[15]:
criteria = {'temp': {'name': 'temperature', 'standard_name': '.*temperature'},
            'salt': {'name': 'salinity', 'standard_name': '.*salinity'}}
var_def = {'temp': {'units': 'degree_Celsius',
           'fail_span': [-100, 100],
           'suspect_span': [-10, 40]},
           'salt': {'units': 'psu', 'fail_span': [-10, 60], 'suspect_span': [-1, 45]}}

filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']

data = odg.Gateway(criteria=criteria, var_def=var_def, readers=odg.local, local={'filenames': filenames})
[16]:
data['ANIMctd14.nc']
[16]:
<xarray.Dataset>
Dimensions:              (nzmax: 1587, profile: 57)
Coordinates:
    time                 (profile) datetime64[ns] 2014-08-07T02:02:34.0000028...
    lat                  (profile) float64 71.27 71.23 71.18 ... 70.45 70.46
    lon                  (profile) float64 -152.2 -152.3 ... -145.8 -145.8
    pressure             (profile, nzmax) float64 2.187 2.399 ... -9.999e+03
Dimensions without coordinates: nzmax, profile
Data variables:
    station_name         (profile) |S12 b'1.01        ' ... b'T-XA        '
    sal                  (profile, nzmax) float64 24.85 24.85 ... -9.999e+03
    tem                  (profile, nzmax) float64 1.625 1.589 ... -9.999e+03
    fluoro               (profile, nzmax) float64 0.6842 0.7452 ... -9.999e+03
    turbidity            (profile, nzmax) float64 0.604 0.6895 ... -9.999e+03
    PAR                  (profile, nzmax) float64 9.596 9.097 ... -9.999e+03
    platform_variable    float64 9.969e+36
    instrument_variable  float64 9.969e+36
    crs                  float64 9.969e+36
Attributes: (12/35)
    Conventions:                CF-1.6
    Metadata_Conventions:       Unidata Dataset Discovery v1.0
    featureType:                profile
    cdm_data_type:              Station
    nodc_template_version:      NODC_NetCDF_Profile_Incomplete_Templete_v1.1
    standard_name_vocabulary:   NetCDF Climate and Forecast(CF) Metadata Conv...
    ...                         ...
    keywords:                   OCEAN TEMPERATURE,SALINITY,TURBIDITY,WATER PR...
    acknowledgement:            Kasper, J., CTD measurements collected from s...
    publisher_name:             Tim Whiteaker
    publisher_email:            whiteaker@utexas.edu
    publisher_url:              http://arcticstudies.org/animida_iii
    license:                    Creative Commons Attribution 3.0 United State...
[17]:
data_qc = data.qc(verbose=True)
ANIMctd14.nc
tem_qc
Flag == 4 (FAIL): 74825
Flag == 1 (GOOD): 15634
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
sal_qc
Flag == 4 (FAIL): 75119
Flag == 1 (GOOD): 15340
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
[18]:
data_qc['ANIMctd14.nc']
[18]:
<xarray.Dataset>
Dimensions:   (nzmax: 1587, profile: 57)
Coordinates:
    time      (profile) datetime64[ns] 2014-08-07T02:02:34.000002890 ... 2014...
    lat       (profile) float64 71.27 71.23 71.18 71.12 ... 70.38 70.45 70.46
    lon       (profile) float64 -152.2 -152.3 -152.4 ... -146.0 -145.8 -145.8
    pressure  (profile, nzmax) float64 2.187 2.399 ... -9.999e+03 -9.999e+03
Dimensions without coordinates: nzmax, profile
Data variables:
    tem       (profile, nzmax) float64 1.625 1.589 ... -9.999e+03 -9.999e+03
    sal       (profile, nzmax) float64 24.85 24.85 ... -9.999e+03 -9.999e+03
    tem_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4
    sal_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 4 1 ... 4 4 4 4 4 4 4 4 4 4
Attributes: (12/35)
    Conventions:                CF-1.6
    Metadata_Conventions:       Unidata Dataset Discovery v1.0
    featureType:                profile
    cdm_data_type:              Station
    nodc_template_version:      NODC_NetCDF_Profile_Incomplete_Templete_v1.1
    standard_name_vocabulary:   NetCDF Climate and Forecast(CF) Metadata Conv...
    ...                         ...
    keywords:                   OCEAN TEMPERATURE,SALINITY,TURBIDITY,WATER PR...
    acknowledgement:            Kasper, J., CTD measurements collected from s...
    publisher_name:             Tim Whiteaker
    publisher_email:            whiteaker@utexas.edu
    publisher_url:              http://arcticstudies.org/animida_iii
    license:                    Creative Commons Attribution 3.0 United State...