obsfilter
Filters data from a BUFR file and produces output either a Bufr, Geopoints or Table object. A tutorial about the use of obsfilter() can be found here.
Filtering Efficiency
You may filter observations according to a wide variety of parameters or combinations thereof: you may filter on date and time, on location (meteorological station, WMO block, user defined area, proximity to user defined line) and range of values. Regarding the structure of the input BUFR file, note that some of the filtering parameters such as observation type, subtype, date and time are located in the header part of the BUFR message, whilst others are located in the data part of the BUFR message itself. This implies that the filtering of BUFR data according to parameters located in the header does not require decoding of the remaining information and thus is considerably (about 10 times) faster. Internally, filtering is always done first on the header parameters (if specified).
Note
This function performs the same task as the Observation Filter icon in Metview’s user interface. It accepts its parameters as keyword arguments, described below.
- obsfilter(**kwargs)
Filters and extracts BUFR data.
- Parameters
data (
Bufr) – The input data.output ({"bufr", "geopoints", "polar_vector", "xy_vector", "ncols", "csv"}, default: "bufr") –
Specifies the output format. The possible options are as follows:
”bufr”: the output is a new
Bufr. Messages containing subsets are split with each subset forming an individual message in the output. Not all the filtering options are available in this mode.”geopoints”: the output is a standard 6-column
Geopoints.”polar_vector”: the output is a
Geopointswith polar vector data.”xy_vector”: the output is a
Geopointswith xy vector data.”ncols”: the output is a
Geopointswith arbitrary number of data columns.”csv”: the output is a
Tablecontaining only the parameters specified inparameter(no location, data, time and levels is extracted).
parameter (str or list[str], default: "012004") – Specifies the parameters to be extracted by their ecCodes BUFR keynames. For compatibility the BUFR descriptors can still be given here but their usage is discouraged. Available when
outputis not “bufr”.missing_data ({"ignore", "include"}, default: "ignore") – If set to “ignore”, missing data is not included in the
outputfile. If set to “include”, missing data will be written to the output file, its value being set to that specified bymissing_data_value. Note that whenoutputis one of the two geopoints vector formats, the test for missing data is only performed on the firstparameter. Available whenoutputis not “bufr”.missing_data_value (number, default: 3.0e+38) – Any missing observations will be written as this value. It is wise, therefore, to ensure that this value is outside the range of possible values for the requested parameter(s). Note that when
outputis one of the two geopoints vector formats, the test for missing data is only performed on the firstparameter. Available whenoutputis not “bufr” andmissing_datais “include”.level (str, default: "surface") –
The possible values are as follows:
”surface”: use this for surface observations (e.g. SYNOP)
”single”: defines level filter by a single pressure value (e.g. TEMP)
”thickness”: defines a pressure layer filter (e.g. TEMP)
”occurrence”: defines a filter based on the occurrence of
parameterwithin a BUFR message/subset”descriptor_value”: defines a filter based on the value of the
level_descriptorparameter.”descriptor_range”: defines a filter based on the value range of the
level_descriptorparameter.
Available when
outputis not “bufr”.level_descriptor (str, default: "07004") – Specifies the parameter defining the level when
levelis “descriptor_value” or “descriptor_range”.first_level (str, default: "30") – Specifies the first value for the
levelfilter. Iflevelis “single” or “thickness” this must be a pressure value given in hPa. Iflevelis “thickness” this defines the bottom of the layer (towards the surface). Iflevelis “descriptor_range” it sets the minimum of the range.second_level (str, default: "10") – Specifies the second value for the
levelfilter. Iflevelis “thickness” this must be a pressure value given in hPa at the top of the layer. Iflevelis “descriptor_range” it sets the maximum of the range.occurrence_index (number, default: 1) – Specifies the numerical index of a
parameterthat has several values within one observation (e.g. cloud amount on different levels or water temperature at different depths). Available iflevelis set to “occurrence”.observation_types (number or str or list of these, default: "-1") – Specifies the numerical code or text string for the desired observation type.
observation_subtypes (num or str or list of these, default: "-1") – Specifies the numerical code or text string for the desired observation subtype. Note that institutions are free to define their own subtypes hence these are not an international standard.
date_and_time_from ({"metadata", "data"}, default: "metadata") – Specifies if date and time should be taken from the BUFR header section (“metadata”) or from the data section (“data”).
date (number) – Specifies the observation(s) date in YYYYMMDD format. Relative dates are allowed: e.g. -1 (yesterday). Specifying a value for
daterequires setting a value fortime.time (number) – Specifies the time of the observation(s). The required format is HHMM.
resolution_in_mins (number, default: 0) – Specifies a time window in minutes around the value chosen for
time.wmo_blocks (number or str or list of these, default: "any") – Specifies a WMO block number, which identify a geographical region. E.g. ‘02’ for Sweden and Finland, 16 for Italy and Greece.
wmo_stations (number or str or list of these, default: "any") – Specifies a list of WMO stations, using the five digit station identifier (the first two of which are the WMO block number).
location_filter ({"none", "area", "line"}, default: "none") – Specifies a location filter.
area (list, default: [60, -12, 50, 3]) – Specifies the coordinates of the area of interest in the form of [North, West, South, East]. Enabled when
location_filteris “area”.line (list, default: [40, -5, 60, 25]) – Specifies the coordinates of a transect line in [lat1, lon1, lat2, lon2] format. This will filter all the observations close enough to the line - how close is defined by
delta_in_km. Enabled whenlocation_filteris “line”.delta_in_km (number, default: 50) – Specifies the width of the cross section line in km defined in
line.custom_filter ({"none", "value", "range", "exclude"}, default: "none") – Allows to filter observations by the value of a
custom_parameter. You can select observations equal to a value (option “value”) or within/outside a given range of values (options “range” or “exclude”).custom_parameter (str, default: "01007") – Specifies the parameter for
custom_filter. Use an ecCodes BUFR keyname here. For compatibility a BUFR descriptors can still be given here but their usage is discouraged.custom_values (number or list[number], default: 200) – Specifies the value condition for
custom_filter. You may specify a list of values here. Ifcustom_filteris “range” or “exclude” you need to specify a list with two elements here.fail_on_error ({"yes", "no"}, default: "yes") – Controls the behaviour when an error happens during the data filtering/extraction. If it is “yes” the Python script running
obs_filter()will abort. If it is set to “no”obs_filter()will return a partial or empty result, in the latter casefail_on_empty_outputapplies.fail_on_empty_output ({"yes", "no"}, default: "no") – Controls the behaviour when the resulting output is empty. If the value is “yes” the Python script running
obs_filter()will abort. If it is set to “no”obs_filter()will return an empty result, which is a properly formatted empty Geopoints or CSV file, or a zero sized file when theoutputis “bufr”.
- Return type
