Detailed usage

[1]:
import ocean_data_gateway as odg
import pandas as pd
pd.set_option('display.max_rows', 5)

General Options

[2]:
kw = {
    "min_lon": -124.0,
    "max_lon": -123.0,
    "min_lat": 39.0,
    "max_lat": 40.0,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

Parallel

You can control readers individually as needed. For example, you could input the keyword parallel, which every reader accepts, per individual reader (in case you want different values for different readers), or you can input it for all readers by including it in kwargs generally. It runs in parallel using the joblib Parallel and delayed modules with multiprocesses — running loops on different cores.

[3]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'parallel': True,
          'erddap': {
                           'known_server': 'ioos',
#                            'parallel': False,
                           'variables': 'salinity',
          },
          'axds': {'catalog_name': None,
#                          'parallel': True,
                         'axds_type': 'platform2',
                         'variables': 'Salinity'},
          }
data = odg.Gateway(**kwargs)

Reader Choice

Your reader choices can be selected as follows, where odg.erddap connects to ERDDAP servers, the odg.axds connects to Axiom databases, and the odg.local enables easy local file read-in. If you don’t input any reader, it will use all of them. Alternatively you can input some subset.

[4]:
readers = [odg.erddap,
           odg.axds,
           odg.local]

Use only ERDDAP reader and Axiom reader:

[5]:
data = odg.Gateway(kw=kw, approach='region',
                   readers=[odg.erddap,
                            odg.axds])

Region

Search by time/space region.

All variables

Don’t input anything with the variables keyword, or use 'variables': None:

[6]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],
          'variables': None
}
data = odg.Gateway(**kwargs)

By variables(s)

If no variables are specified for a given reader, datasets with any variables will be returned from a search. This is most relevant for a region search.

However, if you want to specify a variable or variables, keep in mind that different readers have different names for variables, which is why you can’t just input a variable name for all the readers.

This is only relevant for the ERDDAP and Axiom readers currently (it will retain all variables in local files). The Axiom reader of type platform2 will search by variable where the available variable names are specified, and of type layer_group, the query method will be used for variable searching.

Let’s say you want to search for salinity. You can input the base of the word as variables (“sal” or “salinity” but not “salt” since the checker searches for matches with the whole input variable name and “salt” isn’t used for any variable names) and the code will make sure it exactly matches a known variable name. If it cannot match, it will throw an error with suggestions. This is not done automatically since for example “soil_salinity” matches for “salinity”. You need to do this for each known_server for the erddap reader separately, and specific variables will only be used to filter for the axds reader for axds_type='platform2'. Any variable names can be input for the axds reader for axds_type='layer_group'.

[7]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'stations': '8771972',
          'readers': [odg.erddap,
                      odg.axds],

          'erddap': {
                          'known_server': ['coastwatch','ioos'],
                           'variables': [['sal'],
                                         ['sal']]
          },
          'axds': {
                          'axds_type': ['platform2','layer_group'],
                         'variables': ['sal','salinity']},
}


data = odg.Gateway(**kwargs)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-7-2c87bad22b62> in <module>
     17
     18
---> 19 data = odg.Gateway(**kwargs)

~/projects/ocean_data_gateway/ocean_data_gateway/gateway.py in __init__(self, **kwargs)
    103
    104         self.kwargs = kwargs
--> 105         self.sources
    106
    107     @property

~/projects/ocean_data_gateway/ocean_data_gateway/gateway.py in sources(self)
    214
    215                     if self.kwargs_all["approach"] == "region":
--> 216                         reader = source.region(args_in)
    217                     elif self.kwargs_all["approach"] == "stations":
    218                         reader = source.stations(args_in)

~/projects/ocean_data_gateway/ocean_data_gateway/readers/erddap.py in __init__(self, kwargs)
    757         # make sure variables are on parameter list
    758         if variables is not None:
--> 759             self.check_variables(variables)
    760         self.variables = variables
    761

~/projects/ocean_data_gateway/ocean_data_gateway/readers/erddap.py in check_variables(self, variables, verbose)
    696                      \nor search parameter group values with `ErddapReader().search_variables({variables})`.\
    697                      \n\n Try some of the following variables:\n{str(self.search_variables(variables))}"  # \
--> 698         assert condition, assertion
    699
    700         if condition and verbose:

AssertionError: The input variables are not exact matches to ok variables for known_server ioos.
Check all parameter group values with `ErddapReader().all_variables()`
or search parameter group values with `ErddapReader().search_variables(['sal'])`.

 Try some of the following variables:
                                              count
variable
salinity                                        954
salinity_qc                                     954
...                                             ...
sea_water_practical_salinity_4161sc_a_qc_agg      1
sea_water_practical_salinity_10091sc_a            1

[1148 rows x 1 columns]

You can do this process iteratively, trying out variables for each of the ERDDAP and Axiom readers until you get what you want. Once you have selected variables that match, the code won’t complain anymore.

[8]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],

          'erddap': {
                          'known_server': ['coastwatch','ioos'],
                           'variables': [['salinity', 'sea_water_salinity'],
                                         ['salinity', 'sea_water_practical_salinity']]
          },
          'axds': {
                          'axds_type': ['platform2','layer_group'],
                         'variables': ['Salinity','Salinity']},
}

data = odg.Gateway(**kwargs)

Actions with variables

Alternatively you can proactively search for variables for each reader. Currently the ways to call the individiual libraries aren’t pretty but they’ll work. Note that the number of times a variable is used in the system is also included under “count” to see what the popular names are (many are not widely used).

All available variables

Return all variables for the two ERDDAP known_servers, then for the Axiom reader axds_type='platform2'.

[9]:
odg.erddap.ErddapReader(known_server='coastwatch').all_variables().head()
[9]:
count
variable
abund_m3 2
ac_line 1
ac_sta 1
adg_412 8
adg_412_bias 8
[10]:
odg.erddap.ErddapReader(known_server='ioos').all_variables().head()
[10]:
count
variable
air_pressure 4028
air_pressure_10011met_a 2
air_pressure_10311ahlm_a 2
air_pressure_10311ahlm_a_qc_agg 1
air_pressure_10311ahlm_a_qc_tests 1

The Axiom reader variables are for axds_type='platform2' not axds_type='layer_group since the latter are more unique grid products that don’t conform well.

[11]:
odg.axds.AxdsReader(axds_type='platform2').all_variables().head()
[11]:
count
variable
Ammonium 23
Atmospheric Pressure: Air Pressure at Sea Level 362
Atmospheric Pressure: Barometric Pressure 4152
Backscatter Intensity 286
Battery 2705

All available variables, sorted by count

[12]:
odg.erddap.ErddapReader(known_server='coastwatch').search_variables('').head()
[12]:
count
variable
time 1637
longitude 1352
latitude 1352
altitude 725
sst 208
[13]:
odg.erddap.ErddapReader(known_server='ioos').search_variables('').head()
[13]:
count
variable
time 38331
longitude 38331
latitude 38331
z 37377
station 37377
[14]:
odg.axds.AxdsReader(axds_type='platform2').search_variables('').head()
[14]:
count
variable
Stream Height 19758
Water Surface above Datum 19489
Stream Flow 15203
Temperature: Air Temperature 8369
Precipitation 7364

Variables search, sorted by count

[15]:
odg.erddap.ErddapReader(known_server='coastwatch').search_variables('sal').head()
[15]:
count
variable
salinity 73
salt 4
sea_water_salinity 4
surface_salinity_trend 2
bucket_salinity 1
[16]:
odg.erddap.ErddapReader(known_server='ioos').search_variables('sal').head()
[16]:
count
variable
salinity 954
salinity_qc 954
sea_water_practical_salinity 778
soil_salinity_qc_agg 622
soil_salinity 622
[17]:
odg.axds.AxdsReader(axds_type='platform2').search_variables('sal').head()
[17]:
count
variable
Salinity 3204
Soil Salinity 622

Check variables

And finally you can check to make sure you have good variables. No news is good news in this. Reminder that you don’t check for axds reader for axds_type=‘layer_group’ because that is searched for in the database just by name as a query.

[18]:
odg.erddap.ErddapReader(known_server='coastwatch').check_variables(['salinity', 'sea_water_salinity'])
[19]:
odg.erddap.ErddapReader(known_server='ioos').check_variables(['salinity', 'sea_water_practical_salinity'])
[20]:
odg.axds.AxdsReader(axds_type='platform2').check_variables('Salinity')

Or, all together in one call

[21]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],

          'erddap': {
                          'known_server': ['coastwatch','ioos'],
                           'variables': [['salinity', 'sea_water_salinity'],
                                         ['salinity', 'sea_water_practical_salinity']]
          },
          'axds': {
                          'axds_type': ['platform2','layer_group'],
                         'variables': ['Salinity',
                                       'salinity'  # this one can be called anything that might make a match
                                      ]},
}

data = odg.Gateway(**kwargs)
[22]:
data.dataset_ids
[22]:
[[],
 [],
 [],
 ['5104d464-8a30-4720-aeb7-57e801844e6e',
  'd359748a-fe78-11e7-8128-0023aeec7b98',
  '99737f5d-c984-4bf0-82cd-18508fea413f',
  '3261285c-e3c9-45fd-b777-e6d681a3eaad']]

Stations

You can search by either a general station name to be searched for, or by the specific database dataset_id if you know it (from performing a search previously, for example).

By station name

In the case that you know names of stations, but they might not be the names in the particular databases, you can use this approach.

In the follow example, I use some station id’s I know off the top of my head. Note that the dataset_ids are returned in order of the readers in a list of lists that are being used (ERDDAP IOOS, ERDDAP Coastwatch, Axiom platform2, Axiom layer_group, localreader). The module will check all of the readers for the station names.

There are 2 listings for the station “SFBOFS” because there are two listings in the database: one for unstructured grid output and one for interpolated structured grid output. The module (not ‘layer_group’) uuid is the “dataset_id” for axds_type='layer_group' searches/stations.

[23]:
kwargs = {
          'approach': 'stations',
          'stations': ['8771972','SFBOFS','42020','TABS_B']
}
data = odg.Gateway(**kwargs)
[24]:
data.dataset_ids
[24]:
[['tabs_b', 'wmo_42020', 'noaa_nos_co_ops_8771972'],
 [],
 [],
 ['03158b5d-f712-45f2-b05d-e4954372c1ce',
  '794f7bba-b3d2-4da8-8465-408c27ab433b'],
 []]

By Dataset ID

Once we know the database dataset_ids, we can use them directly for future searches. Note that they need to be associated with the correct reader/database, as shown in the call below.

[25]:
kwargs = {
          'approach': 'stations',
          'erddap': {
                          'known_server': 'ioos',
                           'dataset_ids': [['tabs_b', 'wmo_42020', 'noaa_nos_co_ops_8771972']]
          },
          'axds': {
                          'axds_type': 'layer_group',
                         'dataset_ids': '03158b5d-f712-45f2-b05d-e4954372c1ce'},

}
data = odg.Gateway(**kwargs)
[26]:
data.dataset_ids
[26]:
[['tabs_b', 'wmo_42020', 'noaa_nos_co_ops_8771972'],
 ['03158b5d-f712-45f2-b05d-e4954372c1ce'],
 []]

For axds_type=='layer_group', Axiom module’s uuid’s should be used as dataset_ids (these are returned from the search above in “By station name”). If for some reason you have an Axiom ‘layer_group’ uuid specifically, you should input that as a “station”. In both cases, the module uuid is returned as the dataset_id because that is how ‘layer_group’ information is organized.

[27]:
# Example with module uuid input as dataset_id for 'layer_group'
kwargs = {
          'approach': 'stations',
            'axds': {
                'axds_type': 'layer_group',
                'dataset_ids': '03158b5d-f712-45f2-b05d-e4954372c1ce'}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[27]:
[[], [], ['03158b5d-f712-45f2-b05d-e4954372c1ce'], []]
[28]:
# Example with layer_group uuid input as station for 'layer_group'
kwargs = {
          'approach': 'stations',
            'axds': {
                'axds_type': 'layer_group',
              'stations': '04784baa-6be8-4aa7-b039-269f35e92e91'}
}
data = odg.Gateway(**kwargs)
data.dataset_ids
[28]:
[[], [], ['03158b5d-f712-45f2-b05d-e4954372c1ce'], []]

Include Time Range

By default, the full available time range will be returned for each dataset unless the user specifies one to narrow the returned datasets in time.

Data defined in previous cell shows long time range for any of the sources you can tell there are 4 sources considered since the list in the previous code cell has 4 elements.

[29]:
data.sources[0].kw
[29]:
{'min_time': '1900-01-01', 'max_time': '2100-12-31'}

A shorter time range is shown in the following since it is specified.

[30]:
kwargs = {
          'kw': {'min_time': '2017-1-1',
                 'max_time': '2017-1-2'},
          'approach': 'stations',
          'stations': ['8771972']
}
data = odg.Gateway(**kwargs)
data.sources[0].kw
[30]:
{'min_time': '2017-1-1', 'max_time': '2017-1-2'}

Reader Options

ERDDAP Reader

By default, the Data module will use erddap with two known servers: IOOS and Coastwatch.

[31]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap]
}
data = odg.Gateway(**kwargs)
data.sources[0].name, data.sources[1].name
[31]:
('erddap_ioos', 'erddap_coastwatch')

Choose one known server

The user can specify to use just one of these:

[32]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap],
          'erddap': {
                      'known_server': ['ioos'],  # or 'coastwatch'
          }
}
data = odg.Gateway(**kwargs)
data.sources[0].name
[32]:
'erddap_ioos'

New ERDDAP Server

You can give the necessary information to use a different ERDDAP server.

[33]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap],
            'erddap': {
                'known_server': 'ifremer',
                'protocol': 'tabledap',
                'server': 'http://www.ifremer.fr/erddap'
            }
}
data = odg.Gateway(**kwargs)
[34]:
data.dataset_ids
[34]:
[['OceanGlidersGDACTrajectories',
  'ArgoFloats-synthetic-BGC',
  'ArgoFloats',
  'copernicus-fos']]

AXDS Reader

By default the Gateway class will use axds with two types of data: ‘platform2’ (like gliders) or ‘layer_group’ (model output, gridded products).

[35]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.axds]
}
data = odg.Gateway(**kwargs)
data.sources[0].name, data.sources[1].name
[35]:
('axds_platform2', 'axds_layer_group')

Specify AXDS Type

The user can specify to use just one of these:

[36]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.axds],
          'axds': {
                          'axds_type': 'platform2',  # or 'layer_group'
          }
}
data = odg.Gateway(**kwargs)
data.sources[0].name
[36]:
'axds_platform2'

Local Files

I can’t remember the process by which I got these files from a portal now, but they are just meant to be sample files anyway. Hopefully this will work reasonably well with other files too.

The region and stations approach doesn’t work as well with local files if the user would only be inputting filenames if they know they want to use them. It could be useful to use the approaches in the case that the user has a bunch of files somewhere or a catalog that already exists and they just want to point to that and have the code filter down. That code is not in place but could be if that is a good use case.

So it currently doesn’t matter which approach is used for local files. There is a default kw and region if nothing is input and in this case that is fine since neither are used.

[37]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

Can look at metadata or data

[38]:
data.meta[0]
[38]:
time_variable geospatial_lon_min geospatial_lat_max variables geospatial_lat_min catalog_dir lon_variable time_coverage_end download_url coords geospatial_lon_max time_coverage_start lat_variable
ANIMctd14.nc time -152.581114 71.488255 [station_name, sal, tem, fluoro, turbidity, PA... 69.850874 /Users/kthyng/.ocean_data_gateway/catalogs/ lon 2014-08-07T21:35:54.000004381 /Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe... [time, lat, lon, pressure] -141.717438 2014-07-31T15:33:33.999999314 lat
SBE16plus_01604787_2015_08_09_final.csv NaN -150.237 70.6349 [time, latitude, longitude, water_depth, Condu... 70.6349 /Users/kthyng/.ocean_data_gateway/catalogs/ NaN 2015-08-09T06:00:05Z /Users/kthyng/Downloads/Harrison_Bay_CTD_Moori... NaN -150.237 2014-08-01T12:00:05Z NaN
[39]:
data.data[0]()['ANIMctd14.nc']
[39]:
<xarray.Dataset>
Dimensions:              (nzmax: 1587, profile: 57)
Coordinates:
    time                 (profile) datetime64[ns] 2014-08-07T02:02:34.0000028...
    lat                  (profile) float64 71.27 71.23 71.18 ... 70.45 70.46
    lon                  (profile) float64 -152.2 -152.3 ... -145.8 -145.8
    pressure             (profile, nzmax) float64 2.187 2.399 ... -9.999e+03
Dimensions without coordinates: nzmax, profile
Data variables:
    station_name         (profile) |S12 b'1.01        ' ... b'T-XA        '
    sal                  (profile, nzmax) float64 24.85 24.85 ... -9.999e+03
    tem                  (profile, nzmax) float64 1.625 1.589 ... -9.999e+03
    fluoro               (profile, nzmax) float64 0.6842 0.7452 ... -9.999e+03
    turbidity            (profile, nzmax) float64 0.604 0.6895 ... -9.999e+03
    PAR                  (profile, nzmax) float64 9.596 9.097 ... -9.999e+03
    platform_variable    float64 9.969e+36
    instrument_variable  float64 9.969e+36
    crs                  float64 9.969e+36
Attributes: (12/35)
    Conventions:                CF-1.6
    Metadata_Conventions:       Unidata Dataset Discovery v1.0
    featureType:                profile
    cdm_data_type:              Station
    nodc_template_version:      NODC_NetCDF_Profile_Incomplete_Templete_v1.1
    standard_name_vocabulary:   NetCDF Climate and Forecast(CF) Metadata Conv...
    ...                         ...
    keywords:                   OCEAN TEMPERATURE,SALINITY,TURBIDITY,WATER PR...
    acknowledgement:            Kasper, J., CTD measurements collected from s...
    publisher_name:             Tim Whiteaker
    publisher_email:            whiteaker@utexas.edu
    publisher_url:              http://arcticstudies.org/animida_iii
    license:                    Creative Commons Attribution 3.0 United State...
[40]:
data.data[0]()['SBE16plus_01604787_2015_08_09_final.csv']
[40]:
time latitude longitude water_depth Conductivity_[S/m] Pressure_[db] Temperature_ITS90_[deg C] Salinity_Practical_[PSU] Voltage0_[volts] Instrument_Time_[juliandays] flag
0 2014-08-01T12:00:05Z 70.6349 -150.237 13.0 2.495646 12.687 -1.4619 31.0905 0.3091 213.500058 0.0
1 2014-08-01T13:00:05Z 70.6349 -150.237 13.0 2.495454 12.699 -1.4595 31.0854 0.3265 213.541725 0.0
... ... ... ... ... ... ... ... ... ... ... ...
8945 2015-08-09T05:00:05Z 70.6349 -150.237 13.0 2.591448 12.777 0.3619 30.5086 0.3873 586.208391 0.0
8946 2015-08-09T06:00:05Z 70.6349 -150.237 13.0 2.585462 12.754 0.2862 30.5062 0.2441 586.250058 0.0

8947 rows × 11 columns