Skip to content

NetCDF for Water Forecasting Conventions

Version

This document specifies conventions at version 2.0.

Foreword

As of July 2025 the latest version of these conventions should be available at https://csiro-hydroinformatics.github.io/efts-io/netcdf_for_water_forecasting/.

The initial point of truth in March 2018 is/was at this location March 2018. Credits for the specifications go to James Bennett (CSIRO).

Purpose

Plain text files are not well suited to storing the large volumes of data generated for and by ensemble streamflow forecasts with numerical weather prediction models. netCDF is a binary file format developed primarily for climate, ocean and meteorological data. Detailed, formalised descriptions of the data (metadata) can be included inside the netCDF file, and netCDF can store highly compressed data, making the format suitable for the STF project. However, netCDF has traditionally been used to store time slices of gridded data, rather than complete time series of point data. This document describes the conventions we have developed for storing complete time series data used in ensemble streamflow forecasting in netCDF.

NetCDF Introduction and Terms

NetCDF is a binary format, which renders it unintelligible to text editors. It also results in a significant decrease in data size, dis-ambiguity in format, platform independence and implementation independence. The implementation independence arises from the usage of standard libraries for reading and writing in NetCDF format. All tools in this project which use netCDF format, including SWIFT, use these libraries.

The netCDF format uses dimensions, and variables, to store data. Data is stored in variables, each variable can be considered as an array, and is independent of each other variable. The data space of these variables is defined by the dimensions. For example, for a gridded rainfall data set, the dimensions may be latitude and longitude, and the variable may be millimetres per day.

Metadata is stored in netCDF format as attributes. Attributes can be defined as global, applying to the whole data set, or defined as specific to a variable. For instance, the origin of a variable (e.g. Rain gauge) may be stored specifically for that variable, whereas the agency responsible for the data set may be stored as a global attribute.

Schematic

The netCDF specification has been inspired by the Deltares NETCDF-CF_TIMESERIES structure for compatibilities purposes.

Dimensions

The netCDF files have five required dimensions:

  • "time" (NF90_FLOAT) - a floating point of unlimited length
  • "station" (NF90_INT) - an integer of fixed length
  • "lead_time" (NF90_FLOAT) - a floating point of fixed length. "lead_time" is implicitly linked to the "time" dimension. lead_time zero is the same date and time as the time dimension and therefore not expected as a legitimate value in the variable "lead_time" for data types described in this specification (see Description of time types).
  • "ens_member" (NF90_INT) - an integer of fixed length
  • "strLen" (NF90_INT) - an integer of fixed length = 30. (Strings are implemented as character arrays of length 30.)

Global Attributes

STF NetCDF files have the following global attributes:

  • "title" - (string) A succinct description of what is in the dataset (no format or text is specified for this description). E.g. Title = 'Rainfall forecasts generated by ACCESS'
  • "institution" - (string) Specifies where the original data was produced. e.g. 'CSIRO Land & Water'
  • "source" - (string) Published or web-based references that describe the data or methods used to produce it.
  • "catchment" - (string) specifies the catchment for which the data is created. If any of the data in the file are area-averaged over subareas delineated for a particular catchment model, this attribute must be included. The "Catchment" should be reflected in any folder structures used to run the catchment model. The "Catchment" attribute may not contain spaces. Underscores are permitted. E.g. Catchment = 'South_Esk'
  • "STF_convention_version" - (float) gives the version of STF convention that the data is. The current version of the STF convention is 2.0. This version is required to ensure project tools can correctly read the data file.
  • "STF_nc_spec" - (string) gives the location of the convention file. At present, this is this website; i.e.: STF_nc_spec = 'https://wiki.csiro.au/display/wirada/NetCDF+for+Short-Term+Forecasting/'
  • "comment" - (string) miscellaneous information about the data or methods used to produce it.
  • "history" - (string) Provides an audit trail for modifications to the original data. Well-behaved generic netCDF filters will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file. Each line begins with a timestamp (e.g. 1970-01-01 00:00:00) indicating the date and time of day that the program was executed.

List of Variables

The following abbreviations are used to construct variable names:

  • q - streamflow
  • pet - potential evapotranspiration
  • rain - rainfall
  • sv - state variable
  • qul - data quality
  • obs - observed
  • sim - simulated

Mandatory Variables

The data set requires the following variables (dimensions are in brackets):

  • float time(time)
  • int station_id(station)
  • char station_name(strLen, station)
  • int ens_member(ens_member)
  • float lead_time(lead_time)
  • float lat (station)
  • float lon (station)

Optional Variables

Optional variables (dimensions given in brackets):

Geolocation:

  • float y (station)
  • float x (station)
  • float area (station)
  • float elevation (station)

Observations and simulations:

  • float rain_obs (lead_time, station, ens_member, time)
  • float q_obs (lead_time, station, ens_member, time)
  • float pet_obs (lead_time, station, ens_member, time)
  • float q_sim (lead_time, station, ens_member, time)
  • float rain_sim (lead_time, station, ens_member, time)
  • float pet_sim (lead_time, station, ens_member, time)
  • float sv[#] (lead_time, station, ens_member, time)

Quality codes:

  • float rain_obs_qul (lead_time, station, ens_member, time)
  • float q_obs_qul (lead_time, station, ens_member, time)
  • float pet_obs_qul (lead_time, station, ens_member, time)
  • float rain_sim_qul (lead_time, station, ens_member, time)
  • float q_sim_qul (lead_time, station, ens_member, time)
  • float pet_sim_qul (lead_time, station, ens_member, time)

Description of Variables

Dimensions and attributes of mandatory and optional variables are described below.

time

Description: Time vector (int32)

Dimensions:

  1. time

Attributes

Description Name Type Example
The short name for the variable standard_name String time
The long name for the variable long_name String time
Time units units String hours since 1970-01-01 00:00:00.0 +0000
Time standard from which times are offset time_standard String UTC
Axis label axis String t

The units can also be days or months since 1970-01-01. They are in UTC by default.

NB#1: Using units of months requires special treatment. When adding months to a given time, the addition method depends on the day of the month of the time units, as follows:

  1. If the day of the month specified in the time units is less than 24, simply add months. E.g. time units are 'months since 1970-02-15 00:00:00.0 +0000'. Adding one month yields a time stamp of 1970-03-15 00:00:00.0 +0000
  2. If the day of month is greater than or equal to 24, the time stamp is calculated by counting back from the end of a given month. E.g. time units are 'months since 1970-02-26 00:00:00.0 +0000'. Adding one month yields a time stamp of 1970-03-29 00:00:00.0 +0000

NB#2: When data are not forecasts, the first value should indicate over which period the variables are aggregated - i.e., do use values of zero (see Description of time types).

station_id

Description: Station identification number (int32)

Dimensions:

  1. station

Attributes

Description Name Type Example
The long name for the variable long_name String station or node identification code

station_name

Description: Station name (string)

Dimensions:

  1. strLen
  2. station

Attributes

Description Name Type Example
The long name for the variable long_name String station or node name

ens_member

Description: Vector of length equal to 1:no. of ensemble members. Vector has a minimum length of 1. (int32)

Dimensions:

  1. ens_member

Attributes

Description Name Type Example
The short name for the variable standard_name String ens_member
The long name for the variable long_name String ensemble member
Units units String member id
Axis label axis String u

lead_time

Description: Vector giving time since a forecast was issued. If the variable is not a forecast, this vector can have length of zero (int32)

Dimensions:

  1. lead_time

Attributes

Description Name Type Example
The short name for the variable standard_name String lead time
The long name for the variable long_name String forecast lead time
Units units String hours since time
Axis label axis String u

The units can also be days or months since time of forecast.

  • NB#1: Unit of lead_time does not have equal the unit of time (e.g. time might be "hours since..." and lead_time can be "months since time")
  • NB#2: Using units of months requires special treatment. See time variable, above
  • NB#3: As the time is relative no time zone is required.
  • NB#4: A value of zero is not expected in this variable for all data types described in this specification (see Description of time types).

lat

Description: Vector of latitudes of stations in decimal degrees (single)

Dimensions:

  1. station

Attributes

Description Name Type Example
The long name for the variable long_name String latitude
Units units String degrees_north
Axis label axis String y

lon

Description: Vector of longitudes of stations in decimal degrees (single)

Dimensions:

  1. station

Attributes

Description Name Type Example
The long name for the variable long_name String longitude
Units units String degrees_east
Axis label axis String x

y

Description: Position vector in projected coordinates (single)

Dimensions:

  1. station

Attributes

Description Name Type Example
The short name for the variable standard_name String northing_GDA94_zone55
The long name for the variable long_name String northing from the GDA94 datum in MGA Zone 55
Axis label axis String y

x

Description: Position vector in projected coordinates (single)

Dimensions:

  1. station

Attributes

Description Name Type Example
The short name for the variable standard_name String easting_GDA94_zone55
The long name for the variable long_name String easting from the GDA94 datum in MGA Zone 55
Axis label axis String x

area

Description: Area over which non-point data apply (e.g. subcatchment area) (single)

Dimensions:

  1. station

Attributes

Description Name Type Example
The short name for the variable standard_name String area
The long name for the variable long_name String station area
Units units String sqm

elevation

Description: Elevation of station (single)

Dimensions:

  1. station

Attributes

Description Name Type Example
The short name for the variable standard_name String elevation
The long name for the variable long_name String station elevation above sea level
Units units String m

pet_obs/rain_obs/q_obs/swe_obs/tmin_obs/tmax_obs/tave_obs

Description: Observed data (double)

pet = potential evapotranspiration ; rain = precipitation; q = streamflow; swe = snow water equivalent, tmin = minimum surface air temperature; tmax = maximum surface air temperature

Example: q_obs

Dimensions:

  1. lead_time
  2. station
  3. ens_member
  4. time

Attributes

Description Name Type Example
The long name for the variable long_name String observed rainfall
Units units String mm
Missing data value _FillValue float -9999f
Type of aggregation type int 2
Description of type of aggregation. type_description String accumulated over the preceding interval
Type of data. Code as follows: "obs" - observed directly; "der" - derived from observations dat_type string der
Description of type of data dat_type_description string AWAP data interpolated from observations
Location type of data. Takes value of "Point" (e.g. for a rain gauge) or "Area" (e.g. for a subarea). Default value is "Point". location_type String Point

pet_sim/rain_sim/q_sim/swe_sim/tmin_sim/tmax_sim/tave_sim

Description: Simulated data (double)

pet = potential evapotranspiration ; rain = precipitation; q = streamflow; swe = snow water equivalent, Tmin = minimum surface air temperature; Tmax = maximum surface air temperature

Dimensions:

  1. lead_time
  2. station
  3. ens_member
  4. time

Attributes

Description Name Type Example
The long name for the variable long_name String simulated rainfall
Units units String m3/s
Missing data value _FillValue float -9999f
Type of aggregation type int 3
Description of type of aggregation. type_description String averaged over the preceding interval
Type of data. Code as follows: "sim" - simulated from historical forcings; "fct" - forecast dat_type string fct
Description of type of data dat_type_description string forecast data
Location type of data. Takes value of "Point" (e.g. for a rain gauge) or "Area" (e.g. for a subarea). Default value is "Point". location_type String Point

[variable]_obs_qul/[variable]_sim_qul

Description: Data quality

Dimensions:

  1. lead_time
  2. station
  3. ens_member
  4. time

Attributes

Description Name Type Example
The long name for the variable long_name String Quality of observed rainfall
Quality code standard units String ABC Quality coding
Missing data value _FillValue int -1

sv1/sv2/sv[#]

Description: State variables (double)

Dimensions:

  1. lead_time
  2. station
  3. ens_member
  4. time

Attributes

Description Name Type Example
The long name for the variable long_name String state var 1
Name of model model_name String GR4H_RR
Name of state variable in model sv_name String UH_Inflow
Description of state variable sv_description String Total inflow to Unit Hydrographs in GR4H
Missing data value _FillValue float -9999f

Description of time types

Type ID Description Example variable
1 instantaneous data stage height
2 accumulated over the preceding interval rainfall
3 averaged over the preceding interval flow, average temp
4 accumulated since start of forecast flow
5 point value recorded in the preceding interval max/min temperature
11* climatology data - instantaneous data climatology stage height
12* climatology data - accumulated over the preceding interval climatology rainfall
13* climatology data - averaged over the preceding interval climatology flow
14* climatology data - accumulated since start of forecast climatology flow
15* climatology data - point value recorded in the preceding interval climatology max temp

*NB - please specify the period over which climatology data is calculated and how it is calculated in the global "comment" attribute, as well as any applicable references in the "source" global attribute.

Description of data types

Type ID Description Example variable
obs observed directly gauged rainfall
der derived from observations awap rainfall
sim simulated from observations flow simulated by GR4H forced by observations
fct simulated from forecasts flow forecast by GR4H forced by NWP forecasts