NetCDF for Water Forecasting Conventions¶
Version¶
This document specifies conventions at version 2.0.
Foreword¶
As of July 2025 the latest version of these conventions should be available at https://csiro-hydroinformatics.github.io/efts-io/netcdf_for_water_forecasting/.
The initial point of truth in March 2018 is/was at this location March 2018. Credits for the specifications go to James Bennett (CSIRO).
Purpose¶
Plain text files are not well suited to storing the large volumes of data generated for and by ensemble streamflow forecasts with numerical weather prediction models. netCDF is a binary file format developed primarily for climate, ocean and meteorological data. Detailed, formalised descriptions of the data (metadata) can be included inside the netCDF file, and netCDF can store highly compressed data, making the format suitable for the STF project. However, netCDF has traditionally been used to store time slices of gridded data, rather than complete time series of point data. This document describes the conventions we have developed for storing complete time series data used in ensemble streamflow forecasting in netCDF.
NetCDF Introduction and Terms¶
NetCDF is a binary format, which renders it unintelligible to text editors. It also results in a significant decrease in data size, dis-ambiguity in format, platform independence and implementation independence. The implementation independence arises from the usage of standard libraries for reading and writing in NetCDF format. All tools in this project which use netCDF format, including SWIFT, use these libraries.
The netCDF format uses dimensions, and variables, to store data. Data is stored in variables, each variable can be considered as an array, and is independent of each other variable. The data space of these variables is defined by the dimensions. For example, for a gridded rainfall data set, the dimensions may be latitude and longitude, and the variable may be millimetres per day.
Metadata is stored in netCDF format as attributes. Attributes can be defined as global, applying to the whole data set, or defined as specific to a variable. For instance, the origin of a variable (e.g. Rain gauge) may be stored specifically for that variable, whereas the agency responsible for the data set may be stored as a global attribute.
Schematic¶
The netCDF specification has been inspired by the Deltares NETCDF-CF_TIMESERIES structure for compatibilities purposes.
Dimensions¶
The netCDF files have five required dimensions:
- "time" (NF90_FLOAT) - a floating point of unlimited length
- "station" (NF90_INT) - an integer of fixed length
- "lead_time" (NF90_FLOAT) - a floating point of fixed length. "lead_time" is implicitly linked to the "time" dimension. lead_time zero is the same date and time as the time dimension and therefore not expected as a legitimate value in the variable "lead_time" for data types described in this specification (see Description of time types).
- "ens_member" (NF90_INT) - an integer of fixed length
- "strLen" (NF90_INT) - an integer of fixed length = 30. (Strings are implemented as character arrays of length 30.)
Global Attributes¶
STF NetCDF files have the following global attributes:
- "title" - (string) A succinct description of what is in the dataset (no format or text is specified for this description). E.g. Title = 'Rainfall forecasts generated by ACCESS'
- "institution" - (string) Specifies where the original data was produced. e.g. 'CSIRO Land & Water'
- "source" - (string) Published or web-based references that describe the data or methods used to produce it.
- "catchment" - (string) specifies the catchment for which the data is created. If any of the data in the file are area-averaged over subareas delineated for a particular catchment model, this attribute must be included. The "Catchment" should be reflected in any folder structures used to run the catchment model. The "Catchment" attribute may not contain spaces. Underscores are permitted. E.g. Catchment = 'South_Esk'
- "STF_convention_version" - (float) gives the version of STF convention that the data is. The current version of the STF convention is 2.0. This version is required to ensure project tools can correctly read the data file.
- "STF_nc_spec" - (string) gives the location of the convention file. At present, this is this website; i.e.: STF_nc_spec = 'https://wiki.csiro.au/display/wirada/NetCDF+for+Short-Term+Forecasting/'
- "comment" - (string) miscellaneous information about the data or methods used to produce it.
- "history" - (string) Provides an audit trail for modifications to the original data. Well-behaved generic netCDF filters will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file. Each line begins with a timestamp (e.g. 1970-01-01 00:00:00) indicating the date and time of day that the program was executed.
List of Variables¶
The following abbreviations are used to construct variable names:
- q - streamflow
- pet - potential evapotranspiration
- rain - rainfall
- sv - state variable
- qul - data quality
- obs - observed
- sim - simulated
Mandatory Variables¶
The data set requires the following variables (dimensions are in brackets):
- float time(time)
- int station_id(station)
- char station_name(strLen, station)
- int ens_member(ens_member)
- float lead_time(lead_time)
- float lat (station)
- float lon (station)
Optional Variables¶
Optional variables (dimensions given in brackets):
Geolocation:
- float y (station)
- float x (station)
- float area (station)
- float elevation (station)
Observations and simulations:
- float rain_obs (lead_time, station, ens_member, time)
- float q_obs (lead_time, station, ens_member, time)
- float pet_obs (lead_time, station, ens_member, time)
- float q_sim (lead_time, station, ens_member, time)
- float rain_sim (lead_time, station, ens_member, time)
- float pet_sim (lead_time, station, ens_member, time)
- float sv[#] (lead_time, station, ens_member, time)
Quality codes:
- float rain_obs_qul (lead_time, station, ens_member, time)
- float q_obs_qul (lead_time, station, ens_member, time)
- float pet_obs_qul (lead_time, station, ens_member, time)
- float rain_sim_qul (lead_time, station, ens_member, time)
- float q_sim_qul (lead_time, station, ens_member, time)
- float pet_sim_qul (lead_time, station, ens_member, time)
Description of Variables¶
Dimensions and attributes of mandatory and optional variables are described below.
time¶
Description: Time vector (int32)
Dimensions:¶
- time
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | time |
The long name for the variable | long_name | String | time |
Time units | units | String | hours since 1970-01-01 00:00:00.0 +0000 |
Time standard from which times are offset | time_standard | String | UTC |
Axis label | axis | String | t |
The units can also be days or months since 1970-01-01. They are in UTC by default.
NB#1: Using units of months requires special treatment. When adding months to a given time, the addition method depends on the day of the month of the time units, as follows:
- If the day of the month specified in the time units is less than 24, simply add months. E.g. time units are 'months since 1970-02-15 00:00:00.0 +0000'. Adding one month yields a time stamp of 1970-03-15 00:00:00.0 +0000
- If the day of month is greater than or equal to 24, the time stamp is calculated by counting back from the end of a given month. E.g. time units are 'months since 1970-02-26 00:00:00.0 +0000'. Adding one month yields a time stamp of 1970-03-29 00:00:00.0 +0000
NB#2: When data are not forecasts, the first value should indicate over which period the variables are aggregated - i.e., do use values of zero (see Description of time types).
station_id¶
Description: Station identification number (int32)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | station or node identification code |
station_name¶
Description: Station name (string)
Dimensions:¶
- strLen
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | station or node name |
ens_member¶
Description: Vector of length equal to 1:no. of ensemble members. Vector has a minimum length of 1. (int32)
Dimensions:¶
- ens_member
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | ens_member |
The long name for the variable | long_name | String | ensemble member |
Units | units | String | member id |
Axis label | axis | String | u |
lead_time¶
Description: Vector giving time since a forecast was issued. If the variable is not a forecast, this vector can have length of zero (int32)
Dimensions:¶
- lead_time
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | lead time |
The long name for the variable | long_name | String | forecast lead time |
Units | units | String | hours since time |
Axis label | axis | String | u |
The units can also be days or months since time of forecast.
- NB#1: Unit of lead_time does not have equal the unit of time (e.g. time might be "hours since..." and lead_time can be "months since time")
- NB#2: Using units of months requires special treatment. See time variable, above
- NB#3: As the time is relative no time zone is required.
- NB#4: A value of zero is not expected in this variable for all data types described in this specification (see Description of time types).
lat¶
Description: Vector of latitudes of stations in decimal degrees (single)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | latitude |
Units | units | String | degrees_north |
Axis label | axis | String | y |
lon¶
Description: Vector of longitudes of stations in decimal degrees (single)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | longitude |
Units | units | String | degrees_east |
Axis label | axis | String | x |
y¶
Description: Position vector in projected coordinates (single)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | northing_GDA94_zone55 |
The long name for the variable | long_name | String | northing from the GDA94 datum in MGA Zone 55 |
Axis label | axis | String | y |
x¶
Description: Position vector in projected coordinates (single)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | easting_GDA94_zone55 |
The long name for the variable | long_name | String | easting from the GDA94 datum in MGA Zone 55 |
Axis label | axis | String | x |
area¶
Description: Area over which non-point data apply (e.g. subcatchment area) (single)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | area |
The long name for the variable | long_name | String | station area |
Units | units | String | sqm |
elevation¶
Description: Elevation of station (single)
Dimensions:¶
- station
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The short name for the variable | standard_name | String | elevation |
The long name for the variable | long_name | String | station elevation above sea level |
Units | units | String | m |
pet_obs/rain_obs/q_obs/swe_obs/tmin_obs/tmax_obs/tave_obs¶
Description: Observed data (double)
pet = potential evapotranspiration ; rain = precipitation; q = streamflow; swe = snow water equivalent, tmin = minimum surface air temperature; tmax = maximum surface air temperature
Example: q_obs
Dimensions:¶
- lead_time
- station
- ens_member
- time
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | observed rainfall |
Units | units | String | mm |
Missing data value | _FillValue | float | -9999f |
Type of aggregation | type | int | 2 |
Description of type of aggregation. | type_description | String | accumulated over the preceding interval |
Type of data. Code as follows: "obs" - observed directly; "der" - derived from observations | dat_type | string | der |
Description of type of data | dat_type_description | string | AWAP data interpolated from observations |
Location type of data. Takes value of "Point" (e.g. for a rain gauge) or "Area" (e.g. for a subarea). Default value is "Point". | location_type | String | Point |
pet_sim/rain_sim/q_sim/swe_sim/tmin_sim/tmax_sim/tave_sim¶
Description: Simulated data (double)
pet = potential evapotranspiration ; rain = precipitation; q = streamflow; swe = snow water equivalent, Tmin = minimum surface air temperature; Tmax = maximum surface air temperature
Dimensions:¶
- lead_time
- station
- ens_member
- time
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | simulated rainfall |
Units | units | String | m3/s |
Missing data value | _FillValue | float | -9999f |
Type of aggregation | type | int | 3 |
Description of type of aggregation. | type_description | String | averaged over the preceding interval |
Type of data. Code as follows: "sim" - simulated from historical forcings; "fct" - forecast | dat_type | string | fct |
Description of type of data | dat_type_description | string | forecast data |
Location type of data. Takes value of "Point" (e.g. for a rain gauge) or "Area" (e.g. for a subarea). Default value is "Point". | location_type | String | Point |
[variable]_obs_qul/[variable]_sim_qul
¶
Description: Data quality
Dimensions:¶
- lead_time
- station
- ens_member
- time
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | Quality of observed rainfall |
Quality code standard | units | String | ABC Quality coding |
Missing data value | _FillValue | int | -1 |
sv1/sv2/sv[#]¶
Description: State variables (double)
Dimensions:¶
- lead_time
- station
- ens_member
- time
Attributes¶
Description | Name | Type | Example |
---|---|---|---|
The long name for the variable | long_name | String | state var 1 |
Name of model | model_name | String | GR4H_RR |
Name of state variable in model | sv_name | String | UH_Inflow |
Description of state variable | sv_description | String | Total inflow to Unit Hydrographs in GR4H |
Missing data value | _FillValue | float | -9999f |
Description of time types¶
Type ID | Description | Example variable |
---|---|---|
1 | instantaneous data | stage height |
2 | accumulated over the preceding interval | rainfall |
3 | averaged over the preceding interval | flow, average temp |
4 | accumulated since start of forecast | flow |
5 | point value recorded in the preceding interval | max/min temperature |
11* | climatology data - instantaneous data | climatology stage height |
12* | climatology data - accumulated over the preceding interval | climatology rainfall |
13* | climatology data - averaged over the preceding interval | climatology flow |
14* | climatology data - accumulated since start of forecast | climatology flow |
15* | climatology data - point value recorded in the preceding interval | climatology max temp |
*NB - please specify the period over which climatology data is calculated and how it is calculated in the global "comment" attribute, as well as any applicable references in the "source" global attribute.
Description of data types¶
Type ID | Description | Example variable |
---|---|---|
obs | observed directly | gauged rainfall |
der | derived from observations | awap rainfall |
sim | simulated from observations | flow simulated by GR4H forced by observations |
fct | simulated from forecasts | flow forecast by GR4H forced by NWP forecasts |