Basic use¶
efts-io
is primarily about saving ensemble forecast time series to (resp. loading from) files on disk in netCDF STF 2.0 compliant format, from Python.
While most similar implementations in e.g. R, Matlab so var have been closely related to netCDF file handling, in Python xarray
is a de facto standard for the high level manipulation of tensor-like, multidimensional data. There is a partial mismatch between the STF netCDF conventions devised ten years ago and limited by the capabilities of Fortran netCDF libraries at the time, and the best practices for xarray
in-memory representations. efts-io
is a package bridging the technical gap between these two representations, and reducing the risk of data handling bugs by users when trying to reconcile this technical gap.
Creating a new STF xarray dataset¶
There are several ways to create a dataset for EFTS with efts-io
. One helper function to create a data set is xr_efts
, particularly if you know upfront the geometry (dimensions) of your dataset:
import pandas as pd
import numpy as np
from efts_io import wrapper as w
issue_times = pd.date_range("2010-01-01", periods=31, freq="D")
station_ids = ["410088","410776"]
lead_times = np.arange(start=1, stop=4, step=1)
lead_time_tstep = "hours"
ensemble_size = 10
station_names= ["GOODRADIGBEE B/BELLA", "Licking Hole Ck"]# None
nc_attributes = None
latitudes = None
longitudes = None
areas = None
d = w.xr_efts(
issue_times,
station_ids,
lead_times,
lead_time_tstep,
ensemble_size,
station_names,
latitudes,
longitudes,
areas,
nc_attributes,
)
Let us have a look at the created Dataset:
d
<xarray.Dataset> Size: 608B Dimensions: (station_id: 2, time: 31, ens_member: 10, lead_time: 3) Coordinates: * time (time) datetime64[ns] 248B 2010-01-01 ... 2010-01-31 * station_id (station_id) <U6 48B '410088' '410776' * ens_member (ens_member) int64 80B 1 2 3 4 5 6 7 8 9 10 * lead_time (lead_time) int64 24B 1 2 3 Data variables: station_name (station_id) <U20 160B 'GOODRADIGBEE B/BELLA' 'Licking Hole... lat (station_id) float64 16B nan nan lon (station_id) float64 16B nan nan area (station_id) float64 16B nan nan Attributes: title: not provided institution: not provided catchment: not provided source: not provided comment: not provided STF_convention_version: 2.0 STF_nc_spec: https://github.com/csiro-hydroinformatics/efts/b...
We did not provide custom global attributes to the functions. Defaults were created.
Note that while the intent is that the on-disk netCDF file created later will comply with the STF specs, but this is debatable whether the "in memory" data set just created should advertise the STF related attributes like STF_convention_version
. If saved by the user with to_netcdf
rather than via efts-io
, it is a tad confusing.
d.attrs
{'title': 'not provided', 'institution': 'not provided', 'catchment': 'not provided', 'source': 'not provided', 'comment': 'not provided', 'STF_convention_version': '2.0', 'STF_nc_spec': 'https://github.com/csiro-hydroinformatics/efts/blob/107c553045a37e6ef36b2eababf6a299e7883d50/docs/netcdf_for_water_forecasting.md'}
d.time.attrs
{'standard_name': 'time', 'long_name': 'time', 'axis': 't'}
import cf_xarray as cfx
import xarray as xr
xr.set_options(display_expand_data=False)
xr.set_options(keep_attrs=True)
<xarray.core.options.set_options at 0x7f7d981caa20>
d.lon
<xarray.DataArray 'lon' (station_id: 2)> Size: 16B nan nan Coordinates: * station_id (station_id) <U6 48B '410088' '410776' Attributes: long_name: longitude units: degrees_east axis: x
d.cf
Coordinates: CF Axes: * T: ['time'] X, Y, Z: n/a CF Coordinates: * time: ['time'] longitude, latitude, vertical: n/a Cell Measures: area, volume: n/a Standard Names: * ens_member: ['ens_member'] * lead time: ['lead_time'] * time: ['time'] Bounds: n/a Grid Mappings: n/a Data Variables: Cell Measures: area, volume: n/a Standard Names: area: ['area'] Bounds: n/a Grid Mappings: n/a
d.cf["lon"]
<xarray.DataArray 'lon' (station_id: 2)> Size: 16B nan nan Coordinates: * station_id (station_id) <U6 48B '410088' '410776' Attributes: long_name: longitude units: degrees_east axis: x
Saving as STF 2.0 compliant¶
Placeholder section
This key feature is under (re)construction. The statemements to do so will be as concise as possible while leaving future options open:
efts_ds = EftsDataset(d)
efts_ds.to_stf(path="/home/user/data/streamflow_forecasts.nc", version="2.0")
Validating compliance¶
The package includes functions to validate netCDF STF 2.0 files before loading
These will also be available from command line interfaces.