obsfilter
Filters data from a BUFR file and produces output either a Bufr
, Geopoints
or Table
object. A tutorial about the use of obsfilter()
can be found here.
Filtering Efficiency
You may filter observations according to a wide variety of parameters or combinations thereof: you may filter on date and time, on location (meteorological station, WMO block, user defined area, proximity to user defined line) and range of values. Regarding the structure of the input BUFR file, note that some of the filtering parameters such as observation type, subtype, date and time are located in the header part of the BUFR message, whilst others are located in the data part of the BUFR message itself. This implies that the filtering of BUFR data according to parameters located in the header does not require decoding of the remaining information and thus is considerably (about 10 times) faster. Internally, filtering is always done first on the header parameters (if specified).
Note
This function performs the same task as the Observation Filter icon in Metview’s user interface. It accepts its parameters as keyword arguments, described below.
- obsfilter(**kwargs)
Filters and extracts BUFR data.
- Parameters
data (
Bufr
) – The input data.output ({"bufr", "geopoints", "polar_vector", "xy_vector", "ncols", "csv"}, default: "bufr") –
Specifies the output format. The possible options are as follows:
”bufr”: the output is a new
Bufr
. Messages containing subsets are split with each subset forming an individual message in the output. Not all the filtering options are available in this mode.”geopoints”: the output is a standard 6-column
Geopoints
.”polar_vector”: the output is a
Geopoints
with polar vector data.”xy_vector”: the output is a
Geopoints
with xy vector data.”ncols”: the output is a
Geopoints
with arbitrary number of data columns.”csv”: the output is a
Table
containing only the parameters specified inparameter
(no location, data, time and levels is extracted).
parameter (str or list[str], default: "012004") – Specifies the parameters to be extracted by their ecCodes BUFR keynames. For compatibility the BUFR descriptors can still be given here but their usage is discouraged. Available when
output
is not “bufr”.missing_data ({"ignore", "include"}, default: "ignore") – If set to “ignore”, missing data is not included in the
output
file. If set to “include”, missing data will be written to the output file, its value being set to that specified bymissing_data_value
. Note that whenoutput
is one of the two geopoints vector formats, the test for missing data is only performed on the firstparameter
. Available whenoutput
is not “bufr”.missing_data_value (number, default: 3.0e+38) – Any missing observations will be written as this value. It is wise, therefore, to ensure that this value is outside the range of possible values for the requested parameter(s). Note that when
output
is one of the two geopoints vector formats, the test for missing data is only performed on the firstparameter
. Available whenoutput
is not “bufr” andmissing_data
is “include”.level (str, default: "surface") –
The possible values are as follows:
”surface”: use this for surface observations (e.g. SYNOP)
”single”: defines level filter by a single pressure value (e.g. TEMP)
”thickness”: defines a pressure layer filter (e.g. TEMP)
”occurrence”: defines a filter based on the occurrence of
parameter
within a BUFR message/subset”descriptor_value”: defines a filter based on the value of the
level_descriptor
parameter.”descriptor_range”: defines a filter based on the value range of the
level_descriptor
parameter.
Available when
output
is not “bufr”.level_descriptor (str, default: "07004") – Specifies the parameter defining the level when
level
is “descriptor_value” or “descriptor_range”.first_level (str, default: "30") – Specifies the first value for the
level
filter. Iflevel
is “single” or “thickness” this must be a pressure value given in hPa. Iflevel
is “thickness” this defines the bottom of the layer (towards the surface). Iflevel
is “descriptor_range” it sets the minimum of the range.second_level (str, default: "10") – Specifies the second value for the
level
filter. Iflevel
is “thickness” this must be a pressure value given in hPa at the top of the layer. Iflevel
is “descriptor_range” it sets the maximum of the range.occurrence_index (number, default: 1) – Specifies the numerical index of a
parameter
that has several values within one observation (e.g. cloud amount on different levels or water temperature at different depths). Available iflevel
is set to “occurrence”.observation_types (number or str or list of these, default: "-1") – Specifies the numerical code or text string for the desired observation type.
observation_subtypes (num or str or list of these, default: "-1") – Specifies the numerical code or text string for the desired observation subtype. Note that institutions are free to define their own subtypes hence these are not an international standard.
date_and_time_from ({"metadata", "data"}, default: "metadata") – Specifies if date and time should be taken from the BUFR header section (“metadata”) or from the data section (“data”).
date (number) – Specifies the observation(s) date in YYYYMMDD format. Relative dates are allowed: e.g. -1 (yesterday). Specifying a value for
date
requires setting a value fortime
.time (number) – Specifies the time of the observation(s). The required format is HHMM.
resolution_in_mins (number, default: 0) – Specifies a time window in minutes around the value chosen for
time
.wmo_blocks (number or str or list of these, default: "any") – Specifies a WMO block number, which identify a geographical region. E.g. ‘02’ for Sweden and Finland, 16 for Italy and Greece.
wmo_stations (number or str or list of these, default: "any") – Specifies a list of WMO stations, using the five digit station identifier (the first two of which are the WMO block number).
location_filter ({"none", "area", "line"}, default: "none") – Specifies a location filter.
area (list, default: [60, -12, 50, 3]) – Specifies the coordinates of the area of interest in the form of [North, West, South, East]. Enabled when
location_filter
is “area”.line (list, default: [40, -5, 60, 25]) – Specifies the coordinates of a transect line in [lat1, lon1, lat2, lon2] format. This will filter all the observations close enough to the line - how close is defined by
delta_in_km
. Enabled whenlocation_filter
is “line”.delta_in_km (number, default: 50) – Specifies the width of the cross section line in km defined in
line
.custom_filter ({"none", "value", "range", "exclude"}, default: "none") – Allows to filter observations by the value of a
custom_parameter
. You can select observations equal to a value (option “value”) or within/outside a given range of values (options “range” or “exclude”).custom_parameter (str, default: "01007") – Specifies the parameter for
custom_filter
. Use an ecCodes BUFR keyname here. For compatibility a BUFR descriptors can still be given here but their usage is discouraged.custom_values (number or list[number], default: 200) – Specifies the value condition for
custom_filter
. You may specify a list of values here. Ifcustom_filter
is “range” or “exclude” you need to specify a list with two elements here.fail_on_error ({"yes", "no"}, default: "yes") – Controls the behaviour when an error happens during the data filtering/extraction. If it is “yes” the Python script running
obs_filter()
will abort. If it is set to “no”obs_filter()
will return a partial or empty result, in the latter casefail_on_empty_output
applies.fail_on_empty_output ({"yes", "no"}, default: "no") – Controls the behaviour when the resulting output is empty. If the value is “yes” the Python script running
obs_filter()
will abort. If it is set to “no”obs_filter()
will return an empty result, which is a properly formatted empty Geopoints or CSV file, or a zero sized file when theoutput
is “bufr”.
- Return type