Skip to content

efts_io

efts-io package.

Ensemble forecast time series

Modules:

  • attributes

    Management of netCDF attributes.

  • cli

    Module that contains the command line application.

  • conventions

    Naming conventions for the EFTS netCDF file format.

  • debug

    Debugging utilities.

  • dimensions

    Functions to create and manipulate dimensions for netCDF files.

  • helpers

    Helper functions for netcdf file.

  • variables

    Handling of EFTS netCDF variables definitions.

  • wrapper

    A thin wrapper around xarray for reading and writing Ensemble Forecast Time Series (EFTS) data sets.

Classes:

  • DataOriginType

    Type of data origin according to STF 2.0 conventions.

  • EftsDataSet

    Convenience class for access to a Ensemble Forecast Time Series in netCDF file.

  • LocationType

    Type of measurement location according to STF 2.0 conventions.

  • StfVariable

    Hydrological variable type in the STF convention.

  • TimeSeriesType

    Type of time series aggregation according to STF 2.0 conventions.

Functions:

DataOriginType

DataOriginType(code: str, description: str)

Bases: Enum

Type of data origin according to STF 2.0 conventions.

This enumeration defines how the data was obtained or generated, following the STF (Standard Time Format) 2.0 conventions.

Attributes:

  • OBSERVED

    Data observed directly from instruments (e.g., gauged rainfall)

  • DERIVED

    Data derived from observations through processing (e.g., AWAP rainfall)

  • SIMULATED

    Data simulated from historical observations (e.g., flow from GR4H with obs forcing)

  • FORECAST

    Data forecast/simulated from predictions (e.g., flow from GR4H with NWP forcing)

Examples:

>>> from efts_io.attributes import DataOriginType
>>> origin = DataOriginType.OBSERVED
>>> origin.code
'obs'
>>> origin.description
'observed directly'

Parameters:

  • code

    (str) –

    String code defined by STF 2.0 conventions

  • description

    (str) –

    Human-readable description of the data origin

Source code in src/efts_io/conventions.py
1119
1120
1121
1122
1123
1124
1125
1126
1127
def __init__(self, code: str, description: str) -> None:
    """Initialize a DataOriginType with its string code and text description.

    Args:
        code: String code defined by STF 2.0 conventions
        description: Human-readable description of the data origin
    """
    self.code = code
    self.description = description

EftsDataSet

EftsDataSet(data: Union[str, Dataset])

Convenience class for access to a Ensemble Forecast Time Series in netCDF file.

Methods:

Attributes:

  • catchment (str) –

    Get or set the catchment attribute of the dataset.

  • comment (str) –

    Get or set the comment attribute of the dataset.

  • history (str) –

    Gets/sets the history attribute of the dataset.

  • institution (str) –

    Get or set the institution attribute of the dataset.

  • source (str) –

    Get or set the source attribute of the dataset.

  • stf2_int_datatype (str) –

    The type of integer to save to in the STF 2.x netcdf convention: 'i4' or 'i8'.

  • stf_convention_version (float) –

    Get or set the STF_convention_version attribute of the dataset.

  • stf_nc_spec (str) –

    Get or set the STF_nc_spec attribute of the dataset.

  • title (str) –

    Get or set the title attribute of the dataset.

Source code in src/efts_io/wrapper.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
def __init__(self, data: Union[str, xr.Dataset]) -> None:
    """Create a new EftsDataSet object."""
    self.time_dim = None
    self.time_zone = "UTC"
    self.time_zone_timestamps = True  # Not sure about https://github.com/csiro-hydroinformatics/efts-io/issues/3
    self.STATION_DIMNAME = STATION_DIMNAME
    self.stations_varname = STATION_ID_VARNAME
    self.LEAD_TIME_DIMNAME = LEAD_TIME_DIMNAME
    self.ENS_MEMBER_DIMNAME = ENS_MEMBER_DIMNAME
    # self.identifiers_dimensions: list = []
    self.data: xr.Dataset
    from pathlib import Path

    if data is None:
        raise ValueError("input cannot be None")
    if isinstance(data, Path):
        data = str(data)
    if isinstance(data, str):
        new_dataset = load_from_stf2_file(data, self.time_zone_timestamps)
        self.data = new_dataset
    elif isinstance(data, xr.Dataset):
        self.data = data
    else:
        raise TypeError(f"Unsupported type {type(data)}")

    self.stf2_int_datatype = "i4"  # default integer type for STF2 saving

catchment property writable

catchment: str

Get or set the catchment attribute of the dataset.

comment property writable

comment: str

Get or set the comment attribute of the dataset.

history property writable

history: str

Gets/sets the history attribute of the dataset.

institution property writable

institution: str

Get or set the institution attribute of the dataset.

source property writable

source: str

Get or set the source attribute of the dataset.

stf2_int_datatype property writable

stf2_int_datatype: str

The type of integer to save to in the STF 2.x netcdf convention: 'i4' or 'i8'.

stf_convention_version property writable

stf_convention_version: float

Get or set the STF_convention_version attribute of the dataset.

stf_nc_spec property writable

stf_nc_spec: str

Get or set the STF_nc_spec attribute of the dataset.

title property writable

title: str

Get or set the title attribute of the dataset.

append_history

append_history(
    message: str, timestamp: Optional[datetime] = None
) -> None

Append a new entry to the history attribute with a timestamp.

message: The message to append. timestamp: If not provided, the current UTC time is used.

Source code in src/efts_io/wrapper.py
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def append_history(self, message: str, timestamp: Optional[datetime] = None) -> None:
    """Append a new entry to the `history` attribute with a timestamp.

    message: The message to append.
    timestamp: If not provided, the current UTC time is used.
    """
    from datetime import UTC

    if timestamp is None:
        timestamp = datetime.now(UTC).isoformat()

    current_history = self.data.attrs.get(HISTORY_ATTR_KEY, "")
    if current_history:
        self.data.attrs[HISTORY_ATTR_KEY] = f"{current_history}\n{timestamp} - {message}"
    else:
        self.data.attrs[HISTORY_ATTR_KEY] = f"{timestamp} - {message}"

create_data_variables

create_data_variables(
    data_var_def: Dict[str, Dict[str, Any]],
) -> None

Create data variables in the data set.

var_defs_dict["variable_1"].keys() dict_keys(['name', 'longname', 'units', 'dim_type', 'missval', 'precision', 'attributes'])

Source code in src/efts_io/wrapper.py
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
def create_data_variables(self, data_var_def: Dict[str, Dict[str, Any]]) -> None:
    """Create data variables in the data set.

    var_defs_dict["variable_1"].keys()
    dict_keys(['name', 'longname', 'units', 'dim_type', 'missval', 'precision', 'attributes'])
    """
    ens_fcast_data_var_def = [x for x in data_var_def.values() if x["dim_type"] == "4"]
    ens_data_var_def = [x for x in data_var_def.values() if x["dim_type"] == "3"]
    point_data_var_def = [x for x in data_var_def.values() if x["dim_type"] == "2"]

    four_dims_names = (LEAD_TIME_DIMNAME, STATION_ID_DIMNAME, REALISATION_DIMNAME, TIME_DIMNAME)
    three_dims_names = (STATION_ID_DIMNAME, REALISATION_DIMNAME, TIME_DIMNAME)
    two_dims_names = (STATION_ID_DIMNAME, TIME_DIMNAME)

    four_dims_shape = tuple(self.data.sizes[dimname] for dimname in four_dims_names)
    three_dims_shape = tuple(self.data.sizes[dimname] for dimname in three_dims_names)
    two_dims_shape = tuple(self.data.sizes[dimname] for dimname in two_dims_names)
    for vardefs, dims_shape, dims_names in [
        (ens_fcast_data_var_def, four_dims_shape, four_dims_names),
        (ens_data_var_def, three_dims_shape, three_dims_names),
        (point_data_var_def, two_dims_shape, two_dims_names),
    ]:
        for x in vardefs:
            varname = x["name"]
            # TODO:
            # _check_mandatory_keys(x)
            self._new_variable_from_legacy_specs(dims_shape, dims_names, x, varname)

get_all_series

get_all_series(
    variable_name: str = "rain_obs",
    dimension_id: Optional[str] = None,
) -> DataArray

Return a multivariate time series, where each column is the series for one of the identifiers.

Source code in src/efts_io/wrapper.py
496
497
498
499
500
501
502
503
def get_all_series(
    self,
    variable_name: str = "rain_obs",
    dimension_id: Optional[str] = None,  # noqa: ARG002
) -> xr.DataArray:
    """Return a multivariate time series, where each column is the series for one of the identifiers."""
    # Return a multivariate time series, where each column is the series for one of the identifiers (self, e.g. rainfall station identifiers):
    return self.data[variable_name]

get_dim_names

get_dim_names() -> List[str]

Gets the name of all dimensions in the data set.

Source code in src/efts_io/wrapper.py
524
525
526
def get_dim_names(self) -> List[str]:
    """Gets the name of all dimensions in the data set."""
    return [x for x in self.data.sizes.keys()]  # noqa: C416, SIM118

get_ensemble_for_stations

get_ensemble_for_stations(
    variable_name: str = "rain_sim",
    identifier: Optional[str] = None,
    dimension_id: str = ENS_MEMBER_DIMNAME,
    start_time: Timestamp = None,
    lead_time_count: Optional[int] = None,
) -> DataArray

Not yet implemented.

Source code in src/efts_io/wrapper.py
530
531
532
533
534
535
536
537
538
539
540
def get_ensemble_for_stations(
    self,
    variable_name: str = "rain_sim",
    identifier: Optional[str] = None,
    dimension_id: str = ENS_MEMBER_DIMNAME,
    start_time: pd.Timestamp = None,
    lead_time_count: Optional[int] = None,
) -> xr.DataArray:
    """Not yet implemented."""
    # Return a time series, representing a single ensemble member forecast for all stations over the lead time
    raise NotImplementedError

get_ensemble_forecasts

get_ensemble_forecasts(
    variable_name: str = "rain_sim",
    identifier: Optional[str] = None,
    dimension_id: Optional[str] = None,
    start_time: Optional[Timestamp] = None,
    lead_time_count: Optional[int] = None,
) -> DataArray

Not yet implemented. Gets an ensemble forecast for a variable.

Source code in src/efts_io/wrapper.py
542
543
544
545
546
547
548
549
550
551
552
553
554
def get_ensemble_forecasts(
    self,
    variable_name: str = "rain_sim",
    identifier: Optional[str] = None,
    dimension_id: Optional[str] = None,
    start_time: Optional[pd.Timestamp] = None,
    lead_time_count: Optional[int] = None,
) -> xr.DataArray:
    """Not yet implemented. Gets an ensemble forecast for a variable."""
    # Return a time series, ensemble of forecasts over the lead time
    raise NotImplementedError(
        "get_ensemble_forecasts: not yet implemented",
    )

get_ensemble_size

get_ensemble_size() -> int

Return the length of the ensemble size dimension.

Source code in src/efts_io/wrapper.py
582
583
584
def get_ensemble_size(self) -> int:
    """Return the length of the ensemble size dimension."""
    return self._dim_size(REALISATION_DIMNAME)

get_lead_time_count

get_lead_time_count() -> int

Length of the lead time dimension.

Source code in src/efts_io/wrapper.py
586
587
588
def get_lead_time_count(self) -> int:
    """Length of the lead time dimension."""
    return self._dim_size(self.LEAD_TIME_DIMNAME)

get_lead_time_values

get_lead_time_values() -> ndarray

Return the values of the lead time dimension.

Source code in src/efts_io/wrapper.py
590
591
592
def get_lead_time_values(self) -> np.ndarray:
    """Return the values of the lead time dimension."""
    return self.data[self.LEAD_TIME_DIMNAME].values

get_single_series

get_single_series(
    variable_name: str = "rain_obs",
    identifier: Optional[str] = None,
    dimension_id: Optional[str] = None,
) -> DataArray

Return a single point time series for a station identifier.

Source code in src/efts_io/wrapper.py
598
599
600
601
602
603
604
605
606
607
608
def get_single_series(
    self,
    variable_name: str = "rain_obs",
    identifier: Optional[str] = None,
    dimension_id: Optional[str] = None,
) -> xr.DataArray:
    """Return a single point time series for a station identifier."""
    # Return a single point time series for a station identifier. Falls back on def get_all_series if the argument "identifier" is missing
    if dimension_id is None:
        dimension_id = self.get_stations_varname()
    return self.data[variable_name].sel({dimension_id: identifier})

get_station_count

get_station_count() -> int

Return the number of stations in the data set.

Source code in src/efts_io/wrapper.py
610
611
612
def get_station_count(self) -> int:
    """Return the number of stations in the data set."""
    return self._dim_size(STATION_ID_DIMNAME)

get_stations_varname

get_stations_varname() -> str

Return the name of the variable that has the station identifiers.

Source code in src/efts_io/wrapper.py
614
615
616
617
618
def get_stations_varname(self) -> str:
    """Return the name of the variable that has the station identifiers."""
    # Gets the name of the variable that has the station identifiers
    # TODO: station is integer normally in STF (Euargh)
    return STATION_ID_VARNAME

get_time_dim

get_time_dim() -> ndarray

Return the time dimension variable as a vector of date-time stamps.

Source code in src/efts_io/wrapper.py
620
621
622
623
def get_time_dim(self) -> np.ndarray:
    """Return the time dimension variable as a vector of date-time stamps."""
    # Gets the time dimension variable as a vector of date-time stamps
    return self.data.time.values  # but loosing attributes.

new_variable

new_variable(
    varname: str,
    dim_names: Iterable[str],
    var_attributes: dict[str, Any],
    data: Optional[ndarray] = None,
) -> DataArray

Create a new variable in the data set.

Parameters:

  • varname

    (str) –

    Name of the new variable.

  • dim_names

    (Iterable[str]) –

    Names of the dimensions for the new variable.

  • var_attributes

    (dict[str, Any]) –

    Attributes for the new variable. Must include 'units' key. See template_variable_attributes

  • data

    (Optional[ndarray], default: None ) –

    Data for the new variable. If None, the variable is initialized with NaNs. Defaults to None.

Returns:

  • DataArray

    xr.DataArray: The newly created variable as an xarray DataArray.

Source code in src/efts_io/wrapper.py
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
def new_variable(
    self,
    varname: str,
    dim_names: Iterable[str],
    var_attributes: dict[str, Any],
    data: Optional[np.ndarray] = None,
) -> xr.DataArray:
    """Create a new variable in the data set.

    Args:
        varname (str): Name of the new variable.
        dim_names (Iterable[str]): Names of the dimensions for the new variable.
        var_attributes (dict[str, Any]): Attributes for the new variable. Must include 'units' key. See `template_variable_attributes`
        data (Optional[np.ndarray], optional): Data for the new variable. If None, the variable is initialized with NaNs. Defaults to None.

    Returns:
        xr.DataArray: The newly created variable as an xarray DataArray.
    """
    if varname in self.data.variables:
        raise ValueError(f"Variable '{varname}' already exists in the dataset.")
    if UNITS_ATTR_KEY not in var_attributes:
        raise ValueError(f"Variable attributes must include '{UNITS_ATTR_KEY}' key.")
    known_dimnames = self.get_dim_names()
    unknown_dims = [x for x in dim_names if x not in set(known_dimnames)]
    if unknown_dims:
        raise ValueError(f"Unknown dimension names: {unknown_dims}; must be one of {known_dimnames}.")
    dims_shape = tuple(self.data.sizes[dimname] for dimname in dim_names)
    if data is not None:
        if data.shape != dims_shape:
            raise ValueError(
                f"Data shape {data.shape} does not match expected shape {dims_shape} for dimensions {dim_names}.",
            )
        data_array = data
    else:
        data_array = nan_full(dims_shape)
    data_coords = {dim: self.data.coords[dim] for dim in dim_names}
    new_array = xr.DataArray(
        name=varname,
        data=data_array,
        coords=data_coords,
        dims=dim_names,
        attrs=var_attributes.copy(),
    )
    self.data[varname] = new_array
    return new_array

put_lead_time_values

put_lead_time_values(values: Iterable[float]) -> None

Set the values of the lead time dimension.

Source code in src/efts_io/wrapper.py
594
595
596
def put_lead_time_values(self, values: Iterable[float]) -> None:
    """Set the values of the lead time dimension."""
    self.data[self.LEAD_TIME_DIMNAME].values = np.array(values)

save_to_stf2

save_to_stf2(
    path: str,
    variable_name: Optional[str] = None,
    var_type: StfVariable = STREAMFLOW,
    data_type: DataOriginType = OBSERVED,
    ens: bool = False,
    timestep: str = "days",
    data_qual: Optional[DataArray] = None,
) -> None

Save to file.

Source code in src/efts_io/wrapper.py
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
def save_to_stf2(
    self,
    path: str,
    variable_name: Optional[str] = None,
    var_type: StfVariable = StfVariable.STREAMFLOW,
    data_type: DataOriginType = DataOriginType.OBSERVED,
    ens: bool = False,  # noqa: FBT001, FBT002
    timestep: str = "days",
    data_qual: Optional[xr.DataArray] = None,
) -> None:
    """Save to file."""
    from efts_io._ncdf_stf2 import write_nc_stf2

    if isinstance(self.data, xr.Dataset):
        if variable_name is None:
            raise ValueError("Inner data is a DataSet, so an explicit variable name must be explicitely specified.")
        d = self.data[variable_name]
    # elif isinstance(self.data, xr.DataArray):
    #    d = self.data
    else:
        raise TypeError(f"Unsupported data type {type(self.data)}")

    if UNITS_ATTR_KEY not in d.attrs:
        raise ValueError(f"DataArray variable '{d.name}' must have '{UNITS_ATTR_KEY}' attribute defined.")

    write_nc_stf2(
        out_nc_file=path,  # : str,
        dataset=self.data,
        data=d,  # : xr.DataArray,
        var_type=var_type,  # : int = 1,
        data_type=data_type,  # : int = 3,
        stf_nc_vers=2,  # : int = 2,
        ens=ens,  # : bool = False,
        timestep=timestep,  # :str="days",
        data_qual=data_qual,  # : Optional[xr.DataArray] = None,
        overwrite=True,  # :bool=True,
        # loc_info=loc_info, # : Optional[Dict[str, Any]] = None,
        intdata_type=self.stf2_int_datatype,
    )

set_mandatory_global_attributes

set_mandatory_global_attributes(
    title: str = "not provided",
    institution: str = "not provided",
    catchment: str = "not provided",
    source: str = "not provided",
    comment: str = "not provided",
    history: str = "not provided",
    append_history: bool = False,
) -> None

Sets mandatory global attributes for an EFTS dataset.

Source code in src/efts_io/wrapper.py
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
def set_mandatory_global_attributes(
    self,
    title: str = "not provided",
    institution: str = "not provided",
    catchment: str = "not provided",
    source: str = "not provided",
    comment: str = "not provided",
    history: str = "not provided",
    append_history: bool = False,  # noqa: FBT001, FBT002
) -> None:
    """Sets mandatory global attributes for an EFTS dataset."""
    self.title = title
    self.institution = institution
    self.catchment = catchment
    self.source = source
    self.comment = comment
    if append_history:
        self.append_history(history)
    else:
        self.history = history
    self.stf_convention_version = "2.0"
    self.stf_nc_spec = STF_2_0_URL

to_netcdf

to_netcdf(
    path: str, version: Optional[str] = "2.0"
) -> None

Write the data set to a netCDF file.

Source code in src/efts_io/wrapper.py
301
302
303
304
305
306
307
308
def to_netcdf(self, path: str, version: Optional[str] = "2.0") -> None:
    """Write the data set to a netCDF file."""
    if version is None:
        self.data.to_netcdf(path)
    elif version == "2.0":
        self.save_to_stf2(path)
    else:
        raise ValueError("Only version 2.0 is supported for now")

writeable_to_stf2

writeable_to_stf2() -> bool

Check if the dataset can be written to a netCDF file compliant with STF 2.0 specification.

This method checks if the underlying xarray dataset or dataarray has the required dimensions and global attributes as specified by the STF 2.0 convention.

Returns:

  • bool ( bool ) –

    True if the dataset can be written to a STF 2.0 compliant netCDF file, False otherwise.

Source code in src/efts_io/wrapper.py
333
334
335
336
337
338
339
340
341
342
343
def writeable_to_stf2(self) -> bool:
    """Check if the dataset can be written to a netCDF file compliant with STF 2.0 specification.

    This method checks if the underlying xarray dataset or dataarray has the required dimensions and global attributes as specified by the STF 2.0 convention.

    Returns:
        bool: True if the dataset can be written to a STF 2.0 compliant netCDF file, False otherwise.
    """
    from efts_io.conventions import exportable_to_stf2

    return exportable_to_stf2(self.data)

LocationType

Bases: Enum

Type of measurement location according to STF 2.0 conventions.

This enumeration defines whether the measurement represents a point or an area-averaged value.

Attributes:

  • POINT

    Point measurement (e.g., rain gauge, stream gauge)

  • AREA

    Area-averaged measurement (e.g., subcatchment area)

Examples:

>>> from efts_io.attributes import LocationType
>>> loc = LocationType.POINT
>>> loc.value
'Point'

StfVariable

Bases: Enum

Hydrological variable type in the STF convention.

TimeSeriesType

TimeSeriesType(code: int, description: str)

Bases: Enum

Type of time series aggregation according to STF 2.0 conventions.

This enumeration defines how time series data is aggregated or sampled, following the STF (Standard Time Format) 2.0 conventions for water forecasting netCDF files.

Attributes:

  • INSTANTANEOUS

    Data recorded at a specific instant (e.g., stage height)

  • ACCUMULATED

    Data accumulated over the preceding time interval (e.g., rainfall)

  • AVERAGED

    Data averaged over the preceding time interval (e.g., flow, average temp)

  • ACCUMULATED_FORECAST

    Data accumulated since start of forecast (e.g., cumulative flow)

  • POINT_IN_INTERVAL

    Point value recorded in the preceding interval (e.g., max/min temperature)

  • CLIMATOLOGY_INSTANTANEOUS

    Climatology of instantaneous data

  • CLIMATOLOGY_ACCUMULATED

    Climatology of accumulated data

  • CLIMATOLOGY_AVERAGED

    Climatology of averaged data

  • CLIMATOLOGY_ACCUMULATED_FORECAST

    Climatology of forecast-accumulated data

  • CLIMATOLOGY_POINT

    Climatology of point-in-interval data

Examples:

>>> from efts_io.attributes import TimeSeriesType
>>> ts_type = TimeSeriesType.ACCUMULATED
>>> ts_type.code
2
>>> ts_type.description
'accumulated over the preceding interval'

Parameters:

  • code

    (int) –

    Numeric code defined by STF 2.0 conventions

  • description

    (str) –

    Human-readable description of the aggregation type

Source code in src/efts_io/conventions.py
1082
1083
1084
1085
1086
1087
1088
1089
1090
def __init__(self, code: int, description: str) -> None:
    """Initialize a TimeSeriesType with its numeric code and text description.

    Args:
        code: Numeric code defined by STF 2.0 conventions
        description: Human-readable description of the aggregation type
    """
    self.code = code
    self.description = description

create_global_attributes

create_global_attributes(
    title: str,
    institution: str,
    source: str,
    catchment: str,
    comment: str,
    stf_convention_version: float = 2.0,
    stf_nc_spec: str = STF_2_0_URL,
    history: str = "",
) -> dict[str, Any]

Creates STF global attributes.

Parameters:

  • title

    (str) –

    title

  • institution

    (str) –

    institution

  • source

    (str) –

    source

  • catchment

    (str) –

    catchment

  • comment

    (str) –

    comment

  • stf_convention_version

    (float, default: 2.0 ) –

    STF convention version (default: 2.0)

  • stf_nc_spec

    (str, default: STF_2_0_URL ) –

    URL to the STF specification document (default: STF 2.0 URL)

  • history

    (str, default: '' ) –

    audit trail for modifications to the original data (default: "")

Raises:

  • ValueError

    Unexpected or insufficient information

Returns:

  • dict[str, Any]

    dict[str, Any]: dictionary of global attributes

Source code in src/efts_io/attributes.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
def create_global_attributes(
    title: str,
    institution: str,
    source: str,
    catchment: str,
    comment: str,
    stf_convention_version: float = 2.0,
    stf_nc_spec: str = STF_2_0_URL,
    history: str = "",
) -> dict[str, Any]:
    """Creates STF global attributes.

    Args:
        title (str): title
        institution (str): institution
        source (str): source
        catchment (str): catchment
        comment (str): comment
        stf_convention_version (float): STF convention version (default: 2.0)
        stf_nc_spec (str): URL to the STF specification document (default: STF 2.0 URL)
        history (str): audit trail for modifications to the original data (default: "")

    Raises:
        ValueError: Unexpected or insufficient information

    Returns:
        dict[str, Any]: dictionary of global attributes
    """
    # catchment info should not have white spaces (and why was that???)
    # catchment = 'Upper  Murray River '
    # catchment = stringr::str_replace_all(catchment, pattern='\\s+', '_')

    if title == "":
        raise ValueError("Empty title is not accepted as a valid attribute")

    return {
        TITLE_ATTR_KEY: title,
        INSTITUTION_ATTR_KEY: institution,
        SOURCE_ATTR_KEY: source,
        CATCHMENT_ATTR_KEY: catchment,
        STF_CONVENTION_VERSION_ATTR_KEY: stf_convention_version,
        STF_NC_SPEC_ATTR_KEY: stf_nc_spec,
        COMMENT_ATTR_KEY: comment,
        HISTORY_ATTR_KEY: history,
    }

create_mandatory_global_attributes

create_mandatory_global_attributes(
    title: str,
    institution: str,
    catchment: str,
    source: str,
    comment: str,
    history: Optional[str] = None,
) -> Dict[str, str]

Create a dictionary of mandatory global attributes for an EFTS dataset.

Parameters:

  • title

    (str) –

    Title of the dataset.

  • institution

    (str) –

    Institution responsible for the dataset.

  • catchment

    (str) –

    Catchment area description.

  • source

    (str) –

    Source of the data.

  • comment

    (str) –

    Additional comments about the dataset.

  • history

    (Optional[str], default: None ) –

    History of the dataset. If None, a default history message is created. Defaults to None.

Returns:

  • Dict[str, str]

    Dict[str, str]: A dictionary containing the mandatory global attributes.

Source code in src/efts_io/wrapper.py
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
def create_mandatory_global_attributes(
    title: str,
    institution: str,
    catchment: str,
    source: str,
    comment: str,
    history: Optional[str] = None,
) -> Dict[str, str]:
    """Create a dictionary of mandatory global attributes for an EFTS dataset.

    Args:
        title (str): Title of the dataset.
        institution (str): Institution responsible for the dataset.
        catchment (str): Catchment area description.
        source (str): Source of the data.
        comment (str): Additional comments about the dataset.
        history (Optional[str], optional): History of the dataset. If None, a default history message is created. Defaults to None.

    Returns:
        Dict[str, str]: A dictionary containing the mandatory global attributes.
    """
    d = _stf2_mandatory_global_attributes(
        title=title,
        institution=institution,
        catchment=catchment,
        source=source,
        comment=comment,
        history=history or __default_history_attval(),
    )
    return d  # noqa: RET504

create_quality_variable_attributes

create_quality_variable_attributes(
    long_name: str,
    quality_code_standard: str,
    fill_value: int = -1,
) -> dict[str, Any]

Create attributes for a quality code variable (e.g., rain_obs_qual).

Quality code variables have a distinct set of attributes from data variables. Per the STF 2.0 conventions, they require long_name, units (the quality code standard), and _FillValue (an integer, default -1).

Parameters:

  • long_name

    (str) –

    Human-readable name (e.g., "Quality of observed rainfall")

  • quality_code_standard

    (str) –

    Quality code standard used (e.g., "ABC Quality coding")

  • fill_value

    (int, default: -1 ) –

    Integer fill value for missing data (default: -1)

Returns:

  • dict[str, Any]

    Dictionary of attributes ready to use with xarray DataArray or EftsDataSet.new_variable()

Examples:

>>> attrs = create_quality_variable_attributes(
...     long_name="Quality of observed rainfall",
...     quality_code_standard="ABC Quality coding",
... )
>>> attrs['_FillValue']
-1
Source code in src/efts_io/attributes.py
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
def create_quality_variable_attributes(
    long_name: str,
    quality_code_standard: str,
    fill_value: int = -1,
) -> dict[str, Any]:
    """Create attributes for a quality code variable (e.g., rain_obs_qual).

    Quality code variables have a distinct set of attributes from data variables.
    Per the STF 2.0 conventions, they require ``long_name``, ``units`` (the quality
    code standard), and ``_FillValue`` (an integer, default -1).

    Args:
        long_name: Human-readable name (e.g., "Quality of observed rainfall")
        quality_code_standard: Quality code standard used (e.g., "ABC Quality coding")
        fill_value: Integer fill value for missing data (default: -1)

    Returns:
        Dictionary of attributes ready to use with xarray DataArray or EftsDataSet.new_variable()

    Examples:
        >>> attrs = create_quality_variable_attributes(
        ...     long_name="Quality of observed rainfall",
        ...     quality_code_standard="ABC Quality coding",
        ... )
        >>> attrs['_FillValue']
        -1
    """
    return {
        LONG_NAME_ATTR_KEY: long_name,
        UNITS_ATTR_KEY: quality_code_standard,
        FILLVALUE_ATTR_KEY: fill_value,
    }

create_state_variable_attributes

create_state_variable_attributes(
    long_name: str,
    model_name: str,
    sv_name: str,
    sv_description: str,
    fill_value: float = -9999.0,
) -> dict[str, Any]

Create attributes for a state variable (e.g., sv1, sv2).

State variables store internal model states. Per the STF 2.0 conventions, they require long_name, model_name, sv_name, sv_description, and _FillValue.

Parameters:

  • long_name

    (str) –

    Human-readable name (e.g., "state var 1")

  • model_name

    (str) –

    Name of the model (e.g., "GR4H_RR")

  • sv_name

    (str) –

    Name of the state variable in the model (e.g., "UH_Inflow")

  • sv_description

    (str) –

    Description of the state variable (e.g., "Total inflow to Unit Hydrographs in GR4H")

  • fill_value

    (float, default: -9999.0 ) –

    Fill value for missing data (default: -9999.0)

Returns:

  • dict[str, Any]

    Dictionary of attributes ready to use with xarray DataArray or EftsDataSet.new_variable()

Examples:

>>> attrs = create_state_variable_attributes(
...     long_name="state var 1",
...     model_name="GR4H_RR",
...     sv_name="UH_Inflow",
...     sv_description="Total inflow to Unit Hydrographs in GR4H",
... )
>>> attrs['model_name']
'GR4H_RR'
Source code in src/efts_io/attributes.py
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
def create_state_variable_attributes(
    long_name: str,
    model_name: str,
    sv_name: str,
    sv_description: str,
    fill_value: float = -9999.0,
) -> dict[str, Any]:
    """Create attributes for a state variable (e.g., sv1, sv2).

    State variables store internal model states. Per the STF 2.0 conventions,
    they require ``long_name``, ``model_name``, ``sv_name``, ``sv_description``,
    and ``_FillValue``.

    Args:
        long_name: Human-readable name (e.g., "state var 1")
        model_name: Name of the model (e.g., "GR4H_RR")
        sv_name: Name of the state variable in the model (e.g., "UH_Inflow")
        sv_description: Description of the state variable (e.g., "Total inflow to Unit Hydrographs in GR4H")
        fill_value: Fill value for missing data (default: -9999.0)

    Returns:
        Dictionary of attributes ready to use with xarray DataArray or EftsDataSet.new_variable()

    Examples:
        >>> attrs = create_state_variable_attributes(
        ...     long_name="state var 1",
        ...     model_name="GR4H_RR",
        ...     sv_name="UH_Inflow",
        ...     sv_description="Total inflow to Unit Hydrographs in GR4H",
        ... )
        >>> attrs['model_name']
        'GR4H_RR'
    """
    return {
        LONG_NAME_ATTR_KEY: long_name,
        MODEL_NAME_ATTR_KEY: model_name,
        SV_NAME_ATTR_KEY: sv_name,
        SV_DESCRIPTION_ATTR_KEY: sv_description,
        FILLVALUE_ATTR_KEY: fill_value,
    }

create_var_attribute_definition

create_var_attribute_definition(
    data_type_code: int = 2,
    type_description: str = "accumulated over the preceding interval",
    dat_type: str = "der",
    dat_type_description: str = "AWAP data interpolated from observations",
    location_type: str = "Point",
) -> dict[str, str]

Create variable attribute definition (legacy function).

.. deprecated:: This function is maintained for backward compatibility. For new code, use :func:create_variable_attributes with the type-safe enumerations (:class:TimeSeriesType, :class:DataOriginType, :class:LocationType) instead.

Parameters:

  • data_type_code

    (int, default: 2 ) –

    Numeric code for time series type (1-5, 11-15)

  • type_description

    (str, default: 'accumulated over the preceding interval' ) –

    Description of the aggregation type

  • dat_type

    (str, default: 'der' ) –

    String code for data origin ("obs", "der", "sim", "fct")

  • dat_type_description

    (str, default: 'AWAP data interpolated from observations' ) –

    Description of the data

  • location_type

    (str, default: 'Point' ) –

    "Point" or "Area"

Returns:

  • dict[str, str]

    Dictionary of type-related attributes

Examples:

>>> # Old way (still works but not recommended)
>>> attrs = create_var_attribute_definition(
...     data_type_code=2,
...     type_description='accumulated over the preceding interval',
...     dat_type='obs'
... )
>>>
>>> # New recommended way
>>> from efts_io.attributes import create_variable_attributes, TimeSeriesType, DataOriginType
>>> attrs = create_variable_attributes(
...     long_name="observed rainfall",
...     units="mm",
...     time_series_type=TimeSeriesType.ACCUMULATED,
...     data_origin=DataOriginType.OBSERVED,
...     data_description="gauge measurements"
... )
Source code in src/efts_io/attributes.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def create_var_attribute_definition(
    data_type_code: int = 2,
    type_description: str = "accumulated over the preceding interval",
    dat_type: str = "der",
    dat_type_description: str = "AWAP data interpolated from observations",
    location_type: str = "Point",
) -> dict[str, str]:
    """Create variable attribute definition (legacy function).

    .. deprecated::
        This function is maintained for backward compatibility.
        For new code, use :func:`create_variable_attributes` with the type-safe enumerations
        (:class:`TimeSeriesType`, :class:`DataOriginType`, :class:`LocationType`) instead.

    Args:
        data_type_code: Numeric code for time series type (1-5, 11-15)
        type_description: Description of the aggregation type
        dat_type: String code for data origin ("obs", "der", "sim", "fct")
        dat_type_description: Description of the data
        location_type: "Point" or "Area"

    Returns:
        Dictionary of type-related attributes

    Examples:
        >>> # Old way (still works but not recommended)
        >>> attrs = create_var_attribute_definition(
        ...     data_type_code=2,
        ...     type_description='accumulated over the preceding interval',
        ...     dat_type='obs'
        ... )
        >>>
        >>> # New recommended way
        >>> from efts_io.attributes import create_variable_attributes, TimeSeriesType, DataOriginType
        >>> attrs = create_variable_attributes(
        ...     long_name="observed rainfall",
        ...     units="mm",
        ...     time_series_type=TimeSeriesType.ACCUMULATED,
        ...     data_origin=DataOriginType.OBSERVED,
        ...     data_description="gauge measurements"
        ... )
    """
    return {
        TYPE_ATTR_KEY: str(data_type_code),
        TYPE_DESCRIPTION_ATTR_KEY: type_description,
        DAT_TYPE_ATTR_KEY: dat_type,
        DAT_TYPE_DESCRIPTION_ATTR_KEY: dat_type_description,
        LOCATION_TYPE_ATTR_KEY: location_type,
    }

create_variable_attributes

Create variable attributes for STF 2.0 compliant netCDF files.

This is the recommended function for creating metadata attributes for data variables. It uses type-safe enumerations to ensure attributes conform to STF 2.0 conventions without requiring users to remember numeric codes or string identifiers.

Parameters:

  • long_name

    (str) –

    Human-readable name for the variable (e.g., "observed rainfall")

  • units

    (str) –

    Units of measurement (e.g., "mm", "m3/s", "°C")

  • time_series_type

    (TimeSeriesType) –

    How the data is aggregated/sampled (use TimeSeriesType enum)

  • data_origin

    (DataOriginType) –

    How the data was obtained (use DataOriginType enum)

  • data_description

    (str) –

    Detailed description of the data (e.g., "AWAP data interpolated from observations")

  • location_type

    (LocationType, default: POINT ) –

    Whether data is point or area measurement (default: POINT)

  • fill_value

    (float, default: -9999.0 ) –

    Value used for missing data (default: -9999.0)

Returns:

  • dict[str, Any]

    Dictionary of attributes ready to use with xarray DataArray or EftsDataSet.new_variable()

Examples:

>>> from efts_io.attributes import (
...     create_variable_attributes,
...     TimeSeriesType,
...     DataOriginType,
...     LocationType
... )
>>> attrs = create_variable_attributes(
...     long_name="observed rainfall",
...     units="mm",
...     time_series_type=TimeSeriesType.ACCUMULATED,
...     data_origin=DataOriginType.OBSERVED,
...     data_description="gauge measurements from station network",
...     location_type=LocationType.POINT
... )
>>> attrs['type']
2
>>> attrs['type_description']
'accumulated over the preceding interval'
>>> attrs['dat_type']
'obs'
See Also
  • TimeSeriesType: Enumeration of valid time series aggregation types
  • DataOriginType: Enumeration of valid data origin types
  • LocationType: Enumeration of valid location types
  • template_variable_attributes: For getting an empty template dictionary
Source code in src/efts_io/attributes.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
def create_variable_attributes(
    long_name: str,
    units: str,
    time_series_type: TimeSeriesType,
    data_origin: DataOriginType,
    data_description: str,
    location_type: LocationType = LocationType.POINT,
    fill_value: float = -9999.0,
) -> dict[str, Any]:
    """Create variable attributes for STF 2.0 compliant netCDF files.

    This is the recommended function for creating metadata attributes for data variables.
    It uses type-safe enumerations to ensure attributes conform to STF 2.0 conventions
    without requiring users to remember numeric codes or string identifiers.

    Args:
        long_name: Human-readable name for the variable (e.g., "observed rainfall")
        units: Units of measurement (e.g., "mm", "m3/s", "°C")
        time_series_type: How the data is aggregated/sampled (use TimeSeriesType enum)
        data_origin: How the data was obtained (use DataOriginType enum)
        data_description: Detailed description of the data (e.g., "AWAP data interpolated from observations")
        location_type: Whether data is point or area measurement (default: POINT)
        fill_value: Value used for missing data (default: -9999.0)

    Returns:
        Dictionary of attributes ready to use with xarray DataArray or EftsDataSet.new_variable()

    Examples:
        >>> from efts_io.attributes import (
        ...     create_variable_attributes,
        ...     TimeSeriesType,
        ...     DataOriginType,
        ...     LocationType
        ... )
        >>> attrs = create_variable_attributes(
        ...     long_name="observed rainfall",
        ...     units="mm",
        ...     time_series_type=TimeSeriesType.ACCUMULATED,
        ...     data_origin=DataOriginType.OBSERVED,
        ...     data_description="gauge measurements from station network",
        ...     location_type=LocationType.POINT
        ... )
        >>> attrs['type']
        2
        >>> attrs['type_description']
        'accumulated over the preceding interval'
        >>> attrs['dat_type']
        'obs'

    See Also:
        - TimeSeriesType: Enumeration of valid time series aggregation types
        - DataOriginType: Enumeration of valid data origin types
        - LocationType: Enumeration of valid location types
        - template_variable_attributes: For getting an empty template dictionary
    """
    return {
        LONG_NAME_ATTR_KEY: long_name,
        UNITS_ATTR_KEY: units,
        FILLVALUE_ATTR_KEY: fill_value,
        TYPE_ATTR_KEY: time_series_type.code,
        TYPE_DESCRIPTION_ATTR_KEY: time_series_type.description,
        DAT_TYPE_ATTR_KEY: data_origin.code,
        DAT_TYPE_DESCRIPTION_ATTR_KEY: data_description,
        LOCATION_TYPE_ATTR_KEY: location_type.value,
    }

get_parser

get_parser() -> ArgumentParser

Return the CLI argument parser.

Returns:

Source code in src/efts_io/_internal/cli.py
30
31
32
33
34
35
36
37
38
39
def get_parser() -> argparse.ArgumentParser:
    """Return the CLI argument parser.

    Returns:
        An argparse parser.
    """
    parser = argparse.ArgumentParser(prog="efts")
    parser.add_argument("-V", "--version", action="version", version=f"%(prog)s {debug._get_version()}")
    parser.add_argument("--debug-info", action=_DebugInfo, help="Print debug information.")
    return parser

main

main(args: list[str] | None = None) -> int

Run the main program.

This function is executed when you type efts or python -m efts_io.

Parameters:

  • args

    (list[str] | None, default: None ) –

    Arguments passed from the command line.

Returns:

  • int

    An exit code.

Source code in src/efts_io/_internal/cli.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def main(args: list[str] | None = None) -> int:
    """Run the main program.

    This function is executed when you type `efts` or `python -m efts_io`.

    Parameters:
        args: Arguments passed from the command line.

    Returns:
        An exit code.
    """
    parser = get_parser()
    opts = parser.parse_args(args=args)
    print(opts)
    return 0

open_efts

open_efts(ncfile: Any) -> EftsDataSet

Open an EFTS NetCDF file.

Source code in src/efts_io/wrapper.py
764
765
766
767
768
769
770
771
772
def open_efts(ncfile: Any) -> EftsDataSet:
    """Open an EFTS NetCDF file."""
    # raise NotImplemented("open_efts")
    # if isinstance(ncfile, str):
    #     nc = ncdf4::nc_open(ncfile, readunlim = FALSE, write = writein)
    # } else if (methods::is(ncfile, "ncdf4")) {
    #     nc = ncfile
    # }
    return EftsDataSet(ncfile)

template_variable_attributes

template_variable_attributes(
    time_series_type: Optional[TimeSeriesType] = None,
    data_origin: Optional[DataOriginType] = None,
    location_type: Optional[LocationType] = None,
    fill_value: float = -9999.0,
) -> dict[str, Any]

Create a template dictionary for variable attributes.

This function provides a starting point for creating variable attributes that comply with STF 2.0 conventions. For the recommended type-safe approach, use the enumerations from efts_io.attributes.

Parameters:

  • time_series_type

    (Optional[TimeSeriesType], default: None ) –

    TimeSeriesType enum or None (pre-fills type info if provided)

  • data_origin

    (Optional[DataOriginType], default: None ) –

    DataOriginType enum or None (pre-fills data origin if provided)

  • location_type

    (Optional[LocationType], default: None ) –

    LocationType enum or None (defaults to POINT)

  • fill_value

    (float, default: -9999.0 ) –

    Value for missing data (default: -9999.0)

Returns:

  • dict[str, Any]

    Dictionary with all required attribute keys

Examples:

>>> from efts_io import EftsDataSet
>>> from efts_io.attributes import TimeSeriesType, DataOriginType
>>>
>>> # Using type-safe enums (recommended)
>>> attrs = template_variable_attributes(
...     time_series_type=TimeSeriesType.ACCUMULATED,
...     data_origin=DataOriginType.OBSERVED
... )
>>> attrs['long_name'] = "observed rainfall"
>>> attrs['units'] = "mm"
>>>
>>> # Or get a blank template
>>> attrs = template_variable_attributes()
Note

For complete attribute creation in one call, use: from efts_io.attributes import create_variable_attributes

See Also
  • efts_io.attributes.create_variable_attributes: Type-safe attribute creation
  • efts_io.attributes.TimeSeriesType: Valid time series aggregation types
  • efts_io.attributes.DataOriginType: Valid data origin types
  • efts_io.attributes.LocationType: Valid location types
Source code in src/efts_io/attributes.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def template_variable_attributes(
    time_series_type: Optional["TimeSeriesType"] = None,
    data_origin: Optional["DataOriginType"] = None,
    location_type: Optional["LocationType"] = None,
    fill_value: float = -9999.0,
) -> dict[str, Any]:
    """Create a template dictionary for variable attributes.

    This function provides a starting point for creating variable attributes
    that comply with STF 2.0 conventions. For the recommended type-safe approach,
    use the enumerations from efts_io.attributes.

    Args:
        time_series_type: TimeSeriesType enum or None (pre-fills type info if provided)
        data_origin: DataOriginType enum or None (pre-fills data origin if provided)
        location_type: LocationType enum or None (defaults to POINT)
        fill_value: Value for missing data (default: -9999.0)

    Returns:
        Dictionary with all required attribute keys

    Examples:
        >>> from efts_io import EftsDataSet
        >>> from efts_io.attributes import TimeSeriesType, DataOriginType
        >>>
        >>> # Using type-safe enums (recommended)
        >>> attrs = template_variable_attributes(
        ...     time_series_type=TimeSeriesType.ACCUMULATED,
        ...     data_origin=DataOriginType.OBSERVED
        ... )
        >>> attrs['long_name'] = "observed rainfall"
        >>> attrs['units'] = "mm"
        >>>
        >>> # Or get a blank template
        >>> attrs = template_variable_attributes()

    Note:
        For complete attribute creation in one call, use:
        `from efts_io.attributes import create_variable_attributes`

    See Also:
        - efts_io.attributes.create_variable_attributes: Type-safe attribute creation
        - efts_io.attributes.TimeSeriesType: Valid time series aggregation types
        - efts_io.attributes.DataOriginType: Valid data origin types
        - efts_io.attributes.LocationType: Valid location types
    """
    from efts_io.attributes import LocationType

    if location_type is None:
        location_type = LocationType.POINT

    return _create_template_variable_attributes(
        time_series_type=time_series_type,
        data_origin=data_origin,
        location_type=location_type,
        fill_value=fill_value,
    )

validate_global_attributes

validate_global_attributes(
    attrs: dict[str, Any],
) -> list[str]

Validate a dictionary of global attributes against STF 2.0 conventions.

Parameters:

  • attrs

    (dict[str, Any]) –

    Dictionary of attributes to validate

Returns:

  • list[str]

    List of error message strings. Empty list means valid.

Examples:

>>> from efts_io.attributes import create_global_attributes
>>> attrs = create_global_attributes("Title", "Inst", "Src", "Catch", "Comment")
>>> validate_global_attributes(attrs)
[]
Source code in src/efts_io/attributes.py
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
def validate_global_attributes(attrs: dict[str, Any]) -> list[str]:
    """Validate a dictionary of global attributes against STF 2.0 conventions.

    Args:
        attrs: Dictionary of attributes to validate

    Returns:
        List of error message strings. Empty list means valid.

    Examples:
        >>> from efts_io.attributes import create_global_attributes
        >>> attrs = create_global_attributes("Title", "Inst", "Src", "Catch", "Comment")
        >>> validate_global_attributes(attrs)
        []
    """
    errors: list[str] = []
    required_keys = {
        TITLE_ATTR_KEY: str,
        INSTITUTION_ATTR_KEY: str,
        SOURCE_ATTR_KEY: str,
        CATCHMENT_ATTR_KEY: str,
        STF_CONVENTION_VERSION_ATTR_KEY: (int, float),
        STF_NC_SPEC_ATTR_KEY: str,
        COMMENT_ATTR_KEY: str,
        HISTORY_ATTR_KEY: str,
    }

    for key, expected_type in required_keys.items():
        if key not in attrs:
            errors.append(f"Missing required attribute '{key}'")
        elif not isinstance(attrs[key], expected_type):
            errors.append(
                f"Attribute '{key}' has type '{type(attrs[key]).__name__}',"
                f" expected '{expected_type.__name__ if isinstance(expected_type, type) else ' or '.join(t.__name__ for t in expected_type)}'",
            )

    if TITLE_ATTR_KEY in attrs and isinstance(attrs[TITLE_ATTR_KEY], str) and attrs[TITLE_ATTR_KEY] == "":
        errors.append(f"Attribute '{TITLE_ATTR_KEY}' must not be empty")

    return errors

validate_quality_variable_attributes

validate_quality_variable_attributes(
    attrs: dict[str, Any],
) -> list[str]

Validate a dictionary of quality variable attributes against STF 2.0 conventions.

Parameters:

  • attrs

    (dict[str, Any]) –

    Dictionary of attributes to validate

Returns:

  • list[str]

    List of error message strings. Empty list means valid.

Examples:

>>> from efts_io.attributes import create_quality_variable_attributes
>>> attrs = create_quality_variable_attributes("Quality of observed rainfall", "ABC Quality coding")
>>> validate_quality_variable_attributes(attrs)
[]
Source code in src/efts_io/attributes.py
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
def validate_quality_variable_attributes(attrs: dict[str, Any]) -> list[str]:
    """Validate a dictionary of quality variable attributes against STF 2.0 conventions.

    Args:
        attrs: Dictionary of attributes to validate

    Returns:
        List of error message strings. Empty list means valid.

    Examples:
        >>> from efts_io.attributes import create_quality_variable_attributes
        >>> attrs = create_quality_variable_attributes("Quality of observed rainfall", "ABC Quality coding")
        >>> validate_quality_variable_attributes(attrs)
        []
    """
    errors: list[str] = []
    required_keys = {
        LONG_NAME_ATTR_KEY: str,
        UNITS_ATTR_KEY: str,
        FILLVALUE_ATTR_KEY: int,
    }

    for key, expected_type in required_keys.items():
        if key not in attrs:
            errors.append(f"Missing required attribute '{key}'")
        elif not isinstance(attrs[key], expected_type):
            errors.append(
                f"Attribute '{key}' has type '{type(attrs[key]).__name__}', expected '{expected_type.__name__}'",
            )

    return errors

validate_state_variable_attributes

validate_state_variable_attributes(
    attrs: dict[str, Any],
) -> list[str]

Validate a dictionary of state variable attributes against STF 2.0 conventions.

Parameters:

  • attrs

    (dict[str, Any]) –

    Dictionary of attributes to validate

Returns:

  • list[str]

    List of error message strings. Empty list means valid.

Examples:

>>> from efts_io.attributes import create_state_variable_attributes
>>> attrs = create_state_variable_attributes("sv1", "GR4H_RR", "UH_Inflow", "desc")
>>> validate_state_variable_attributes(attrs)
[]
Source code in src/efts_io/attributes.py
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
def validate_state_variable_attributes(attrs: dict[str, Any]) -> list[str]:
    """Validate a dictionary of state variable attributes against STF 2.0 conventions.

    Args:
        attrs: Dictionary of attributes to validate

    Returns:
        List of error message strings. Empty list means valid.

    Examples:
        >>> from efts_io.attributes import create_state_variable_attributes
        >>> attrs = create_state_variable_attributes("sv1", "GR4H_RR", "UH_Inflow", "desc")
        >>> validate_state_variable_attributes(attrs)
        []
    """
    errors: list[str] = []
    required_keys = {
        LONG_NAME_ATTR_KEY: str,
        MODEL_NAME_ATTR_KEY: str,
        SV_NAME_ATTR_KEY: str,
        SV_DESCRIPTION_ATTR_KEY: str,
        FILLVALUE_ATTR_KEY: (int, float),
    }

    for key, expected_type in required_keys.items():
        if key not in attrs:
            errors.append(f"Missing required attribute '{key}'")
        elif not isinstance(attrs[key], expected_type):
            errors.append(
                f"Attribute '{key}' has type '{type(attrs[key]).__name__}',"
                f" expected '{expected_type.__name__ if isinstance(expected_type, type) else ' or '.join(t.__name__ for t in expected_type)}'",
            )

    return errors

validate_variable_attributes

validate_variable_attributes(
    attrs: dict[str, Any],
) -> list[str]

Validate a dictionary of data variable attributes against STF 2.0 conventions.

Checks that all required keys are present and that coded values are valid.

Parameters:

  • attrs

    (dict[str, Any]) –

    Dictionary of attributes to validate

Returns:

  • list[str]

    List of error message strings. Empty list means valid.

Examples:

>>> errors = validate_variable_attributes({})
>>> len(errors) > 0
True
Source code in src/efts_io/attributes.py
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
def validate_variable_attributes(attrs: dict[str, Any]) -> list[str]:
    """Validate a dictionary of data variable attributes against STF 2.0 conventions.

    Checks that all required keys are present and that coded values are valid.

    Args:
        attrs: Dictionary of attributes to validate

    Returns:
        List of error message strings. Empty list means valid.

    Examples:
        >>> errors = validate_variable_attributes({})
        >>> len(errors) > 0
        True
    """
    errors: list[str] = []
    required_keys = {
        LONG_NAME_ATTR_KEY: str,
        UNITS_ATTR_KEY: str,
        FILLVALUE_ATTR_KEY: (int, float),
        TYPE_ATTR_KEY: int,
        TYPE_DESCRIPTION_ATTR_KEY: str,
        DAT_TYPE_ATTR_KEY: str,
        DAT_TYPE_DESCRIPTION_ATTR_KEY: str,
        LOCATION_TYPE_ATTR_KEY: str,
    }

    for key, expected_type in required_keys.items():
        if key not in attrs:
            errors.append(f"Missing required attribute '{key}'")
        elif not isinstance(attrs[key], expected_type):
            errors.append(
                f"Attribute '{key}' has type '{type(attrs[key]).__name__}',"
                f" expected '{expected_type.__name__ if isinstance(expected_type, type) else ' or '.join(t.__name__ for t in expected_type)}'",
            )

    if TYPE_ATTR_KEY in attrs and isinstance(attrs[TYPE_ATTR_KEY], int) and attrs[TYPE_ATTR_KEY] not in _VALID_TYPE_CODES:
        errors.append(
            f"Attribute '{TYPE_ATTR_KEY}' has value {attrs[TYPE_ATTR_KEY]},"
            f" expected one of {sorted(_VALID_TYPE_CODES)}",
        )

    if DAT_TYPE_ATTR_KEY in attrs and isinstance(attrs[DAT_TYPE_ATTR_KEY], str) and attrs[DAT_TYPE_ATTR_KEY] not in _VALID_DAT_TYPE_CODES:
        errors.append(
            f"Attribute '{DAT_TYPE_ATTR_KEY}' has value '{attrs[DAT_TYPE_ATTR_KEY]}',"
            f" expected one of {sorted(_VALID_DAT_TYPE_CODES)}",
        )

    if LOCATION_TYPE_ATTR_KEY in attrs and isinstance(attrs[LOCATION_TYPE_ATTR_KEY], str) and attrs[LOCATION_TYPE_ATTR_KEY] not in _VALID_LOCATION_TYPES:
        errors.append(
            f"Attribute '{LOCATION_TYPE_ATTR_KEY}' has value '{attrs[LOCATION_TYPE_ATTR_KEY]}',"
            f" expected one of {sorted(_VALID_LOCATION_TYPES)}",
        )

    return errors

xr_efts

xr_efts(
    issue_times: Iterable[ConvertibleToTimestamp],
    station_ids: Iterable[str],
    lead_times: Optional[Iterable[int]] = None,
    lead_time_tstep: str = "hours",
    ensemble_size: int = 1,
    station_names: Optional[Iterable[str]] = None,
    latitudes: Optional[Iterable[float]] = None,
    longitudes: Optional[Iterable[float]] = None,
    areas: Optional[Iterable[float]] = None,
    nc_attributes: Optional[Dict[str, str]] = None,
) -> Dataset

Create an xarray Dataset for EFTS data.

Source code in src/efts_io/wrapper.py
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
def xr_efts(
    issue_times: Iterable[ConvertibleToTimestamp],
    station_ids: Iterable[str],
    lead_times: Optional[Iterable[int]] = None,
    lead_time_tstep: str = "hours",
    ensemble_size: int = 1,
    # variables
    station_names: Optional[Iterable[str]] = None,
    latitudes: Optional[Iterable[float]] = None,
    longitudes: Optional[Iterable[float]] = None,
    areas: Optional[Iterable[float]] = None,
    nc_attributes: Optional[Dict[str, str]] = None,
) -> xr.Dataset:
    """Create an xarray Dataset for EFTS data."""
    # Check that station ids are unique:
    if len(set(station_ids)) != len(station_ids):
        raise ValueError("Station names must be unique.")
    # I learned today that xarray 2025.7.1 can now accept pandas datetimeindex as coordinates
    # for backward compatibility with older xarray versions, we convert to list here.
    # See https://github.com/csiro-hydroinformatics/efts-io/issues/13, in the future may change design.
    if isinstance(issue_times, pd.DatetimeIndex):
        # This will convert each item to a tstamp such as
        # Timestamp('2023-01-01 00:00:00+1000', tz='UTC+10:00')
        issue_times = list(issue_times)  # issue_times is iterable,and iterated over indeed.
    if lead_times is None:
        lead_times = [0]
    coords = {
        TIME_DIMNAME: issue_times,
        # STATION_DIMNAME: np.arange(start=1, stop=len(station_ids) + 1, step=1),
        STATION_ID_DIMNAME: station_ids,  # np.arange(start=1, stop=len(station_ids) + 1, step=1),
        REALISATION_DIMNAME: np.arange(start=1, stop=ensemble_size + 1, step=1),
        LEAD_TIME_DIMNAME: lead_times,
        # Initially, I was exploring attaching a coordinate to an existing dimension STATION_DIMNAME, using:
        # https://docs.xarray.dev/en/latest/generated/xarray.DataArray.assign_coords.html#xarray.DataArray.assign_coords
        # then using https://github.com/pydata/xarray/issues/2028#issuecomment-1265252754  to be able to
        # index by station IDs. But in July 2025 decided to not have a STATION_DIMNAME dimension, which is
        # an artefact from legacy conventions (Fortran 1-based indexing and other related limitations).
        # Keeping a number based STATION_DIMNAME here is only making things more difficult and data subsetting more prone to bugs.
        # STATION_ID_VARNAME: (STATION_DIMNAME, station_ids),
    }
    n_stations = len(station_ids)
    latitudes = latitudes if latitudes is not None else nan_full(n_stations)
    longitudes = longitudes if longitudes is not None else nan_full(n_stations)
    areas = areas if areas is not None else nan_full(n_stations)
    station_names = station_names if station_names is not None else [f"{i}" for i in station_ids]
    data_vars = {
        STATION_NAME_VARNAME: (STATION_ID_DIMNAME, station_names),
        LAT_VARNAME: (STATION_ID_DIMNAME, latitudes),
        LON_VARNAME: (STATION_ID_DIMNAME, longitudes),
        AREA_VARNAME: (STATION_ID_DIMNAME, areas),
    }
    nc_attributes = nc_attributes or _stf2_mandatory_global_attributes()
    d = xr.Dataset(
        data_vars=data_vars,
        coords=coords,
        attrs=nc_attributes,
    )
    # Credits to the work reported in https://github.com/pydata/xarray/issues/2028#issuecomment-1265252754
    # d = d.set_xindex(STATION_ID_VARNAME)
    d.time.attrs = {
        STANDARD_NAME_ATTR_KEY: TIME_DIMNAME,
        LONG_NAME_ATTR_KEY: TIME_DIMNAME,
        # TIME_STANDARD_KEY: "UTC",
        AXIS_ATTR_KEY: "t",
        # UNITS_ATTR_KEY: "days since 2000-11-14 23:00:00.0 +0000",
    }
    d.lead_time.attrs = {
        STANDARD_NAME_ATTR_KEY: "lead time",
        LONG_NAME_ATTR_KEY: "forecast lead time",
        AXIS_ATTR_KEY: "v",
        UNITS_ATTR_KEY: f"{lead_time_tstep} since time",
    }
    d.realisation.attrs = {
        STANDARD_NAME_ATTR_KEY: ENS_MEMBER_DIMNAME,  # TODO: should we keep the STF 2.0 ens_member as a standard name?
        LONG_NAME_ATTR_KEY: "ensemble member",
        UNITS_ATTR_KEY: "member id",
        AXIS_ATTR_KEY: "u",
    }
    d.station_id.attrs = {LONG_NAME_ATTR_KEY: "station or node identification code"}
    d.station_name.attrs = {LONG_NAME_ATTR_KEY: "station or node name"}
    d.lat.attrs = {LONG_NAME_ATTR_KEY: "latitude", UNITS_ATTR_KEY: "degrees_north", AXIS_ATTR_KEY: "y"}
    d.lon.attrs = {LONG_NAME_ATTR_KEY: "longitude", UNITS_ATTR_KEY: "degrees_east", AXIS_ATTR_KEY: "x"}
    d.area.attrs = {
        LONG_NAME_ATTR_KEY: "station area",
        UNITS_ATTR_KEY: "km^2",
        STANDARD_NAME_ATTR_KEY: AREA_VARNAME,
    }
    return d