Basic use¶
efts-io
is primarily about creating, handling and saving and loading ensemble forecast time series to files on disk in netCDF STF 2.0 compliant format, from Python.
While most similar implementations in e.g. R, Matlab so var have been closely related to netCDF file handling, in Python xarray
is a de facto standard for the high level manipulation of tensor-like, multidimensional data. There is a partial mismatch between the STF netCDF conventions devised ten years ago and limited by the capabilities of Fortran netCDF libraries at the time, and the best practices for xarray
in-memory representations. efts-io
is a package bridging the technical gap between these two representations, and reducing the risk of data handling bugs by users when trying to reconcile this technical gap.
Reading from file¶
The package includes small, sample data files. We will start with a file storing a single rainfall time series
import efts_io.helpers as hlp
fn = hlp.derived_rainfall_tas()
# The path to the sample file will depend on your operating system, environment setup etc.
from pathlib import Path
homepath = str(Path.home())
print(fn.replace(homepath, '/your_home_path'))
/your_home_path/src/efts-io/src/efts_io/data/derived_rainfall_tas.nc
Path(fn).exists()
True
Validating compliance of a file before loading¶
The package includes facilities to check the structure of a file on disk and its level of compliance with the STF conventions:
from efts_io.conventions import check_stf_compliance, check_hydrologic_variables
compliance_report = check_stf_compliance(fn)
print(f"compliance_report is a dictionary with keys {list(compliance_report.keys())}")
compliance_report is a dictionary with keys ['INFO', 'WARNING', 'ERROR']
There is no error nor warnings in this sample file:
print("WARNING:", compliance_report["WARNING"])
print("ERROR:", compliance_report["ERROR"])
WARNING: [] ERROR: []
To get a detail of what was checked in the file structure:
print("INFO:", compliance_report["INFO"])
INFO: ["Dimension 'time' is present.", "Dimension 'station' is present.", "Dimension 'lead_time' is present.", "Dimension 'ens_member' is present.", "Dimension 'strLen' is present.", "Global attribute 'title' is present.", "Global attribute 'institution' is present.", "Global attribute 'source' is present.", "Global attribute 'catchment' is present.", "Global attribute 'STF_convention_version' is present.", "Global attribute 'STF_nc_spec' is present.", "Global attribute 'comment' is present.", "Global attribute 'history' is present.", "Mandatory variable 'time' is present.", "Attribute 'standard_name' for variable 'time' is present.", "Attribute 'long_name' for variable 'time' is present.", "Attribute 'units' for variable 'time' is present.", "Attribute 'time_standard' for variable 'time' is present.", "Attribute 'axis' for variable 'time' is present.", "Mandatory variable 'station_id' is present.", "Attribute 'long_name' for variable 'station_id' is present.", "Mandatory variable 'station_name' is present.", "Attribute 'long_name' for variable 'station_name' is present.", "Mandatory variable 'ens_member' is present.", "Attribute 'standard_name' for variable 'ens_member' is present.", "Attribute 'long_name' for variable 'ens_member' is present.", "Attribute 'units' for variable 'ens_member' is present.", "Attribute 'axis' for variable 'ens_member' is present.", "Mandatory variable 'lead_time' is present.", "Attribute 'standard_name' for variable 'lead_time' is present.", "Attribute 'long_name' for variable 'lead_time' is present.", "Attribute 'units' for variable 'lead_time' is present.", "Attribute 'axis' for variable 'lead_time' is present.", "Mandatory variable 'lat' is present.", "Attribute 'long_name' for variable 'lat' is present.", "Attribute 'units' for variable 'lat' is present.", "Attribute 'axis' for variable 'lat' is present.", "Mandatory variable 'lon' is present.", "Attribute 'long_name' for variable 'lon' is present.", "Attribute 'units' for variable 'lon' is present.", "Attribute 'axis' for variable 'lon' is present."]
If we use the venerable ncdump
command line tool, (just to give a low-level overview of the file):
!ncdump -h {fn}
netcdf derived_rainfall_tas { dimensions: station = 3 ; ens_member = 1 ; lead_time = 1 ; time = UNLIMITED ; // (7 currently) strLen = 30 ; variables: float area(station) ; area:units = "sqkm" ; area:_FillValue = -1. ; area:standard_name = "area" ; area:long_name = "station area" ; int ens_member(ens_member) ; ens_member:standard_name = "ens_member" ; ens_member:long_name = "ensemble member" ; ens_member:units = "member id" ; ens_member:axis = "u" ; float lat(station) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; lat:axis = "y" ; int lead_time(lead_time) ; lead_time:standard_name = "lead time" ; lead_time:long_name = "forecast lead time" ; lead_time:axis = "v" ; lead_time:units = "days since time" ; float lon(station) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:axis = "x" ; float rain_obs(time, ens_member, station, lead_time) ; rain_obs:standard_name = "rain_obs" ; rain_obs:long_name = "observed rainfall" ; rain_obs:units = "mm" ; rain_obs:_FillValue = -9999.f ; rain_obs:type = 2. ; rain_obs:type_description = "accumulated over the preceding interval" ; rain_obs:dat_type = "der" ; rain_obs:dat_type_description = "derived (from observations)" ; rain_obs:location_type = "area" ; int station(station) ; int station_id(station) ; station_id:long_name = "station or node identification code" ; char station_name(station, strLen) ; station_name:long_name = "station or node name" ; int time(time) ; time:standard_name = "time" ; time:long_name = "time" ; time:time_standard = "UTC" ; time:axis = "t" ; time:units = "days since 2000-11-14 23:00:00.0 +0000" ; // global attributes: :title = "Precip from Hydro Tasmania\'s observation network areally averaged with inverse distance squared weighting" ; :institution = "CSIRO Land & Water" ; :source = "" ; :catchment = "Hydro Tas" ; :STF_convention_version = 2. ; :STF_nc_spec = "https://wiki.csiro.au/display/wirada/NetCDF+for+SWIFT" ; :comment = "" ; :history = "Thu Jul 17 16:29:16 2025: ncks -d time,8389, -d station,560, HT_swiftRain_daily_stfv2_2000111523+0000-2023111023+0000.nc -O test_output.nc\n", "2024-07-25 15:28:34 +10.0 - File created" ; :NCO = "netCDF Operators version 5.1.4 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ; }
Checking variables¶
check_hydrologic_variables
looks into more details at the variables present in the netcdf file. The STF convention suggests naming conventions, as well as the expectation of certain variable attributes.
compliance_report = check_hydrologic_variables(fn)
compliance_report
{'INFO': ["Hydrologic variable 'rain_obs' follows the recommended naming convention."], 'WARNING': ["Attribute '_FillValue' for variable 'rain_obs' has an unexpected type 'float32'. Expected type: 'float'.", "Attribute 'type' for variable 'rain_obs' has an unexpected type 'float64'. Expected type: 'int'."], 'ERROR': []}
print("INFO:", compliance_report["INFO"])
print("WARNING:", compliance_report["WARNING"])
print("ERROR:", compliance_report["ERROR"])
INFO: ["Hydrologic variable 'rain_obs' follows the recommended naming convention."] WARNING: ["Attribute '_FillValue' for variable 'rain_obs' has an unexpected type 'float32'. Expected type: 'float'.", "Attribute 'type' for variable 'rain_obs' has an unexpected type 'float64'. Expected type: 'int'."] ERROR: []
Two of the variable attributes in the file happen to not quite follow strictly the STF conventions, but in this case this is not a blocking incompatibility.
Loading data¶
We recommend loading the files using a thin wrapper around an xarray
object called EftsDataSet
.
If you were to open the file directly using xarray
, you would encounter an error
try:
import xarray as xr
xr.open_dataset(fn)
except ValueError as e:
print(e)
Failed to decode variable 'lead_time': unable to decode time units 'days since time' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.
EftsDataSet
takes care of the low-level acrobatics required to read STF files, which was designed before xarray
emerged and gained popularity.
from efts_io.wrapper import EftsDataSet
rain_stf = EftsDataSet(fn)
There are helper methods on EftsDataSet
objects, but most add little value so far to the inner xarray
object. Note that the in-memory xarray
structure has a coordinate in a form of a string for station_id
, rather than an integer as is specified in the STF 2.0 convention for on-disk netCDF storage. This is a deliberate choice, as we envisage that ulterior versions of the conventions will feature strings for station identifiers. Limiting station identifiers to integers is in part a legacy of using the Fortran programming language in the past.
Details of a conversation about data design can be found in this thread.
print(rain_stf.data)
<xarray.Dataset> Size: 676B Dimensions: (realisation: 1, lead_time: 1, station_id: 3, time: 7) Coordinates: * realisation (realisation) int32 4B 1 * lead_time (lead_time) int32 4B 0 * station_id (station_id) <U11 132B '28286670' '28294676' '28294677' * time (time) object 56B 2023-11-04T23:00:00+00:00 ... 2023-11-10T... Data variables: area (station_id) float32 12B 3.353 1.76 4.988 lat (station_id) float32 12B -41.85 -41.82 -41.85 lon (station_id) float32 12B 145.6 145.6 145.6 rain_obs (time, realisation, station_id, lead_time) float32 84B 0.09... station_name (station_id) <U30 360B '28286670' '28294676' '28294677' Attributes: title: Precip from Hydro Tasmania's observation network... institution: CSIRO Land & Water source: catchment: Hydro Tas STF_convention_version: 2.0 STF_nc_spec: https://wiki.csiro.au/display/wirada/NetCDF+for+... comment: history: Thu Jul 17 16:29:16 2025: ncks -d time,8389, -d ... NCO: netCDF Operators version 5.1.4 (Homepage = http:...
Saving a STF 2.0 file¶
Since the data was loaded from an STF 2.0 file, one would hope we can round trip and save to disk. The method writeable_to_stf2
performs checks on the in-memory representation, to determine if it has the information to create a compliant netCDF file. In this case, unsurprisingly:
rain_stf.writeable_to_stf2()
True
import tempfile
out_fn = tempfile.NamedTemporaryFile().name
out_fn
'/tmp/tmpa78r0nyz'
Path(out_fn).exists()
False
from efts_io.wrapper import StfVariable, StfDataType
rain_stf.save_to_stf2(
path=out_fn,
variable_name="rain_obs",
var_type=StfVariable.RAINFALL,
data_type=StfDataType.DERIVED_FROM_OBSERVATIONS,
ens=True,
timestep="days",
data_qual=None,
)
compliance_report = check_stf_compliance(out_fn)
One would hope that what the package writes out passes the low-level checks:
print("WARNING:", compliance_report["WARNING"])
print("ERROR:", compliance_report["ERROR"])
WARNING: [] ERROR: []
print("INFO:", compliance_report["INFO"])
INFO: ["Dimension 'time' is present.", "Dimension 'station' is present.", "Dimension 'lead_time' is present.", "Dimension 'ens_member' is present.", "Dimension 'strLen' is present.", "Global attribute 'title' is present.", "Global attribute 'institution' is present.", "Global attribute 'source' is present.", "Global attribute 'catchment' is present.", "Global attribute 'STF_convention_version' is present.", "Global attribute 'STF_nc_spec' is present.", "Global attribute 'comment' is present.", "Global attribute 'history' is present.", "Mandatory variable 'time' is present.", "Attribute 'standard_name' for variable 'time' is present.", "Attribute 'long_name' for variable 'time' is present.", "Attribute 'units' for variable 'time' is present.", "Attribute 'time_standard' for variable 'time' is present.", "Attribute 'axis' for variable 'time' is present.", "Mandatory variable 'station_id' is present.", "Attribute 'long_name' for variable 'station_id' is present.", "Mandatory variable 'station_name' is present.", "Attribute 'long_name' for variable 'station_name' is present.", "Mandatory variable 'ens_member' is present.", "Attribute 'standard_name' for variable 'ens_member' is present.", "Attribute 'long_name' for variable 'ens_member' is present.", "Attribute 'units' for variable 'ens_member' is present.", "Attribute 'axis' for variable 'ens_member' is present.", "Mandatory variable 'lead_time' is present.", "Attribute 'standard_name' for variable 'lead_time' is present.", "Attribute 'long_name' for variable 'lead_time' is present.", "Attribute 'units' for variable 'lead_time' is present.", "Attribute 'axis' for variable 'lead_time' is present.", "Mandatory variable 'lat' is present.", "Attribute 'long_name' for variable 'lat' is present.", "Attribute 'units' for variable 'lat' is present.", "Attribute 'axis' for variable 'lat' is present.", "Mandatory variable 'lon' is present.", "Attribute 'long_name' for variable 'lon' is present.", "Attribute 'units' for variable 'lon' is present.", "Attribute 'axis' for variable 'lon' is present."]
Let's clean up the temporary file, in case the operating system does not later on.
import os, time
time.sleep(1) # limit the risk of file lock on the output file.
if Path(out_fn).exists():
os.remove(out_fn)
Creating a new STF xarray dataset¶
UNDER CONSTRUCTION. This will probably be revised.
There are several ways to create a dataset for ensemble forecast time series with efts-io
. One helper function to create a data set is xr_efts
, particularly if you know upfront the geometry (dimensions) of your dataset:
import pandas as pd
import numpy as np
from efts_io import wrapper as w
issue_times = pd.date_range("2010-01-01", periods=31, freq="D")
station_ids = ["410088","410776"]
lead_times = np.arange(start=1, stop=4, step=1)
lead_time_tstep = "hours"
ensemble_size = 10
station_names= ["GOODRADIGBEE B/BELLA", "Licking Hole Ck"]# None
nc_attributes = None
latitudes = None
longitudes = None
areas = None
d = w.xr_efts(
issue_times,
station_ids,
lead_times,
lead_time_tstep,
ensemble_size,
station_names,
latitudes,
longitudes,
areas,
nc_attributes,
)
Let us have a look at the created Dataset:
d
<xarray.Dataset> Size: 624B Dimensions: (station: 2, time: 31, ens_member: 10, lead_time: 3) Coordinates: * time (time) datetime64[ns] 248B 2010-01-01 ... 2010-01-31 * station (station) int64 16B 1 2 * ens_member (ens_member) int64 80B 1 2 3 4 5 6 7 8 9 10 * lead_time (lead_time) int64 24B 1 2 3 * station_id (station) <U6 48B '410088' '410776' Data variables: station_name (station) <U20 160B 'GOODRADIGBEE B/BELLA' 'Licking Hole Ck' lat (station) float64 16B nan nan lon (station) float64 16B nan nan area (station) float64 16B nan nan Attributes: title: not provided institution: not provided catchment: not provided source: not provided comment: not provided history: not provided STF_convention_version: 2.0 STF_nc_spec: https://github.com/csiro-hydroinformatics/efts-i...
We did not provide custom global attributes to the functions. Defaults were created to be compliant but you should populate it with better values via the nc_attributes
dictionary.
Note that while the intent is that the on-disk netCDF file created later will comply with the STF specs, but this is debatable whether the "in memory" data set just created should advertise the STF related attributes like STF_convention_version
. If saved by the user with to_netcdf
rather than via efts-io
, it is a tad confusing.
d.attrs
{'title': 'not provided', 'institution': 'not provided', 'catchment': 'not provided', 'source': 'not provided', 'comment': 'not provided', 'history': 'not provided', 'STF_convention_version': '2.0', 'STF_nc_spec': 'https://github.com/csiro-hydroinformatics/efts-io/blob/42ee35f0f019e9bad48b94914429476a7e8278dc/docs/netcdf_for_water_forecasting.md'}
d.time.attrs
{'standard_name': 'time', 'long_name': 'time', 'axis': 't'}