obsfilter

../../_images/OBSFILTER.png

Filters data from a BUFR file and produces output either a Bufr, Geopoints or Table object. A tutorial about the use of obsfilter() can be found here.

Filtering Efficiency

You may filter observations according to a wide variety of parameters or combinations thereof: you may filter on date and time, on location (meteorological station, WMO block, user defined area, proximity to user defined line) and range of values. Regarding the structure of the input BUFR file, note that some of the filtering parameters such as observation type, subtype, date and time are located in the header part of the BUFR message, whilst others are located in the data part of the BUFR message itself. This implies that the filtering of BUFR data according to parameters located in the header does not require decoding of the remaining information and thus is considerably (about 10 times) faster. Internally, filtering is always done first on the header parameters (if specified).

Note

This function performs the same task as the Observation Filter icon in Metview’s user interface. It accepts its parameters as keyword arguments, described below.

obsfilter(**kwargs)

Filters and extracts BUFR data.

Parameters
  • data (Bufr) – The input data.

  • output ({"bufr", "geopoints", "polar_vector", "xy_vector", "ncols", "csv"}, default: "bufr") –

    Specifies the output format. The possible options are as follows:

    • ”bufr”: the output is a new Bufr. Messages containing subsets are split with each subset forming an individual message in the output. Not all the filtering options are available in this mode.

    • ”geopoints”: the output is a standard 6-column Geopoints.

    • ”polar_vector”: the output is a Geopoints with polar vector data.

    • ”xy_vector”: the output is a Geopoints with xy vector data.

    • ”ncols”: the output is a Geopoints with arbitrary number of data columns.

    • ”csv”: the output is a Table containing only the parameters specified in parameter (no location, data, time and levels is extracted).

  • parameter (str or list[str], default: "012004") – Specifies the parameters to be extracted by their ecCodes BUFR keynames. For compatibility the BUFR descriptors can still be given here but their usage is discouraged. Available when output is not “bufr”.

  • missing_data ({"ignore", "include"}, default: "ignore") – If set to “ignore”, missing data is not included in the output file. If set to “include”, missing data will be written to the output file, its value being set to that specified by missing_data_value. Note that when output is one of the two geopoints vector formats, the test for missing data is only performed on the first parameter. Available when output is not “bufr”.

  • missing_data_value (number, default: 3.0e+38) – Any missing observations will be written as this value. It is wise, therefore, to ensure that this value is outside the range of possible values for the requested parameter(s). Note that when output is one of the two geopoints vector formats, the test for missing data is only performed on the first parameter. Available when output is not “bufr” and missing_data is “include”.

  • level (str, default: "surface") –

    The possible values are as follows:

    • ”surface”: use this for surface observations (e.g. SYNOP)

    • ”single”: defines level filter by a single pressure value (e.g. TEMP)

    • ”thickness”: defines a pressure layer filter (e.g. TEMP)

    • ”occurrence”: defines a filter based on the occurrence of parameter within a BUFR message/subset

    • ”descriptor_value”: defines a filter based on the value of the level_descriptor parameter.

    • ”descriptor_range”: defines a filter based on the value range of the level_descriptor parameter.

    Available when output is not “bufr”.

  • level_descriptor (str, default: "07004") – Specifies the parameter defining the level when level is “descriptor_value” or “descriptor_range”.

  • first_level (str, default: "30") – Specifies the first value for the level filter. If level is “single” or “thickness” this must be a pressure value given in hPa. If level is “thickness” this defines the bottom of the layer (towards the surface). If level is “descriptor_range” it sets the minimum of the range.

  • second_level (str, default: "10") – Specifies the second value for the level filter. If level is “thickness” this must be a pressure value given in hPa at the top of the layer. If level is “descriptor_range” it sets the maximum of the range.

  • occurrence_index (number, default: 1) – Specifies the numerical index of a parameter that has several values within one observation (e.g. cloud amount on different levels or water temperature at different depths). Available if level is set to “occurrence”.

  • observation_types (number or str or list of these, default: "-1") – Specifies the numerical code or text string for the desired observation type.

  • observation_subtypes (num or str or list of these, default: "-1") – Specifies the numerical code or text string for the desired observation subtype. Note that institutions are free to define their own subtypes hence these are not an international standard.

  • date_and_time_from ({"metadata", "data"}, default: "metadata") – Specifies if date and time should be taken from the BUFR header section (“metadata”) or from the data section (“data”).

  • date (number) – Specifies the observation(s) date in YYYYMMDD format. Relative dates are allowed: e.g. -1 (yesterday). Specifying a value for date requires setting a value for time.

  • time (number) – Specifies the time of the observation(s). The required format is HHMM.

  • resolution_in_mins (number, default: 0) – Specifies a time window in minutes around the value chosen for time.

  • wmo_blocks (number or str or list of these, default: "any") – Specifies a WMO block number, which identify a geographical region. E.g. ‘02’ for Sweden and Finland, 16 for Italy and Greece.

  • wmo_stations (number or str or list of these, default: "any") – Specifies a list of WMO stations, using the five digit station identifier (the first two of which are the WMO block number).

  • location_filter ({"none", "area", "line"}, default: "none") – Specifies a location filter.

  • area (list, default: [60, -12, 50, 3]) – Specifies the coordinates of the area of interest in the form of [North, West, South, East]. Enabled when location_filter is “area”.

  • line (list, default: [40, -5, 60, 25]) – Specifies the coordinates of a transect line in [lat1, lon1, lat2, lon2] format. This will filter all the observations close enough to the line - how close is defined by delta_in_km. Enabled when location_filter is “line”.

  • delta_in_km (number, default: 50) – Specifies the width of the cross section line in km defined in line.

  • custom_filter ({"none", "value", "range", "exclude"}, default: "none") – Allows to filter observations by the value of a custom_parameter. You can select observations equal to a value (option “value”) or within/outside a given range of values (options “range” or “exclude”).

  • custom_parameter (str, default: "01007") – Specifies the parameter for custom_filter. Use an ecCodes BUFR keyname here. For compatibility a BUFR descriptors can still be given here but their usage is discouraged.

  • custom_values (number or list[number], default: 200) – Specifies the value condition for custom_filter. You may specify a list of values here. If custom_filter is “range” or “exclude” you need to specify a list with two elements here.

  • fail_on_error ({"yes", "no"}, default: "yes") – Controls the behaviour when an error happens during the data filtering/extraction. If it is “yes” the Python script running obs_filter() will abort. If it is set to “no” obs_filter() will return a partial or empty result, in the latter case fail_on_empty_output applies.

  • fail_on_empty_output ({"yes", "no"}, default: "no") – Controls the behaviour when the resulting output is empty. If the value is “yes” the Python script running obs_filter() will abort. If it is set to “no” obs_filter() will return an empty result, which is a properly formatted empty Geopoints or CSV file, or a zero sized file when the output is “bufr”.

Return type

Bufr, Geopoints or Table