Basic Filtering¶
The Filter Object¶
The core object in GCMFilters is the gcm_filters.Filter
object. Its full documentation below enumerates all possible options.
 gcm_filters.Filter(filter_scale: float, dx_min: float, filter_shape: gcm_filters.filter.FilterShape = FilterShape.GAUSSIAN, transition_width: float = 3.141592653589793, ndim: int = 2, n_steps: int = 0, grid_type: gcm_filters.kernels.GridType = GridType.REGULAR, grid_vars: dict = <factory>) None [source]¶
A class for applying diffusionbased smoothing filters to gridded data.
 filter_scalefloat
The filter scale, which has different meaning depending on filter shape
 dx_minfloat
The smallest grid spacing. Should have same units as
filter_scale
 n_stepsint, optional
Number of total steps in the filter (A biharmonic step counts as two steps)
n_steps == 0
means the number of steps is chosen automatically filter_shapeFilterShape
FilterShape.GAUSSIAN
: The target filter has shape \(e^{(k filter_scale)^2/24}\)FilterShape.TAPER
: The target filter has target grid scale Lf. Smaller scales are zeroed out. Scales larger thanpi * filter_scale / 2
are left asis. In between is a smooth transition.
 transition_widthfloat, optional
Width of the transition region in the “Taper” filter. Theoretical minimum is 1; not recommended.
 ndimint, optional
Laplacian is applied on a grid of dimension ndim
 grid_typeGridType
what sort of grid we are dealing with
 grid_varsdict
dictionary of extra parameters used to initialize the grid Laplacian
 gcm_filters.filter_spec¶
 Type
FilterSpec
Details related to filter_scale
, filter_shape
, transition_width
, and n_steps
can be found in the Filter Theory.
The following sections explain the options for grid_type
and grid_vars
in more detail.
Grid types¶
To define a filter, we need to pick a grid and associated Laplacian that matches our data. The currently implemented grid types are:
In [1]: import gcm_filters
In [2]: list(gcm_filters.GridType)
Out[2]:
[<GridType.REGULAR: 1>,
<GridType.REGULAR_AREA_WEIGHTED: 2>,
<GridType.REGULAR_WITH_LAND: 3>,
<GridType.REGULAR_WITH_LAND_AREA_WEIGHTED: 4>,
<GridType.IRREGULAR_WITH_LAND: 5>,
<GridType.MOM5U: 6>,
<GridType.MOM5T: 7>,
<GridType.TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED: 8>,
<GridType.TRIPOLAR_POP_WITH_LAND: 9>,
<GridType.VECTOR_C_GRID: 10>]
This list will grow as we implement more Laplacians.
The following table provides an overview of these different grid type options: what grid they are suitable for, whether they handle land (i.e., continental boundaries), what boundary condition the Laplacian operators use, and whether they come with a scalar or vector Laplacian. You can also find links to example usages.

Grid 
Handles land 
Boundary condition 
Laplacian type 
Example 


Cartesian grid 
no 
periodic 
Scalar Laplacian 


Cartesian grid 
yes 
periodic 
Scalar Laplacian 
see below 

locally orthogonal grid 
yes 
periodic 
Scalar Laplacian 
Example: Different filter types; Example: Filtering on a tripole grid 

Velocitypoint on Arakawa BGrid 
yes 
periodic 
Scalar Laplacian 


Tracerpoint on Arakawa BGrid 
yes 
periodic 
Scalar Laplacian 


locally orthogonal grid 
yes 
tripole 
Scalar Laplacian 


yes 
periodic 
Vector Laplacian 
Grid types with scalar Laplacians can be used for filtering scalar fields (such as temperature), and grid types with vector Laplacians can be used for filtering vector fields (such as velocity).
Grid types for simple fixed factor filtering¶
The remaining grid types are for a special type of filtering: simple fixed factor filtering to achieve a fixed coarsening factor (see also the Filter Theory). If you specify one of the following grid types for your data, gcm_filters
will internally transform your original (locally orthogonal) grid to a uniform Cartesian grid with dx = dy = 1, and perform fixed factor filtering on the uniform grid. After this is done, gcm_filters
transforms the filtered field back to your original grid.
In practice, this coordinate transformation is achieved by area weighting and deweighting (see Filter Theory). This is why the following grid types have the suffix AREA_WEIGHTED
.

Grid 
Handles land 
Boundary condition 
Laplacian type 
Example 


locally orthogonal grid 
no 
periodic 
Scalar Laplacian 


locally orthogonal grid 
yes 
periodic 
Scalar Laplacian 
Example: Different filter types; Example: Filtering on a tripole grid 

locally orthogonal grid 
yes 
tripole 
Scalar Laplacian 
Grid variables¶
Each grid type from the above two tables has different grid variables that must be provided as Xarray DataArrays. For example, let’s assume we are on a Cartesian grid (with uniform grid spacing equal to 1), and we want to use the grid type REGULAR_WITH_LAND
. To find out what the required grid variables for this grid type are, we can use this utility function.
In [3]: gcm_filters.required_grid_vars(gcm_filters.GridType.REGULAR_WITH_LAND)
Out[3]: ['wet_mask']
wet_mask
is a binary array representing the topography on our grid. Here the convention is that the array is 1 in the ocean (“wet points”) and 0 on land (“dry points”).
In [4]: import numpy as np
In [5]: import xarray as xr
In [6]: ny, nx = (128, 256)
In [7]: mask_data = np.ones((ny, nx))
In [8]: mask_data[(ny // 4):(3 * ny // 4), (nx // 4):(3 * nx // 4)] = 0
In [9]: wet_mask = xr.DataArray(mask_data, dims=['y', 'x'])
In [10]: wet_mask.plot()
Out[10]: <matplotlib.collections.QuadMesh at 0x7f310741aa60>
We have made a big island.
Creating the Filter Object¶
We create a filter object as follows.
In [11]: filter = gcm_filters.Filter(
....: filter_scale=4,
....: dx_min=1,
....: filter_shape=gcm_filters.FilterShape.TAPER,
....: grid_type=gcm_filters.GridType.REGULAR_WITH_LAND,
....: grid_vars={'wet_mask': wet_mask}
....: )
....:
In [12]: filter
Out[12]: Filter(filter_scale=4, dx_min=1, filter_shape=<FilterShape.TAPER: 2>, transition_width=3.141592653589793, ndim=2, n_steps=16, grid_type=<GridType.REGULAR_WITH_LAND: 3>)
The string representation for the filter object in the last line includes some of the parameters it was initiliazed with, to help us keep track of what we are doing. We have created a Taper filter that will filter our data by a fixed factor of 4.
Applying the Filter¶
We can now apply the filter object that we created above to some data. Let’s create a random 3D cube of data that matches our grid.
In [13]: nt = 10
In [14]: data = np.random.rand(nt, ny, nx)
In [15]: da = xr.DataArray(data, dims=['time', 'y', 'x'])
In [16]: da
Out[16]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.21482559, 0.34827533, 0.99115514, ..., 0.31080071,
0.53049763, 0.46167167],
[0.91414664, 0.1466529 , 0.19001729, ..., 0.82167823,
0.34162895, 0.16819143],
[0.63748149, 0.32902255, 0.89419113, ..., 0.23931612,
0.70149501, 0.41709187],
...,
[0.00740154, 0.00569117, 0.28220987, ..., 0.08536584,
0.49428569, 0.62542103],
[0.78411381, 0.31720441, 0.25315187, ..., 0.01088536,
0.10367744, 0.79067807],
[0.32577845, 0.48168196, 0.649204 , ..., 0.01677366,
0.75490046, 0.00380429]],
[[0.05588465, 0.89490286, 0.90186422, ..., 0.76552882,
0.66164577, 0.00242357],
[0.00602197, 0.54340005, 0.47926164, ..., 0.90342315,
0.65416368, 0.13364844],
[0.45265473, 0.62476603, 0.21727683, ..., 0.8269115 ,
0.07841149, 0.41554473],
...
[0.74973159, 0.94444632, 0.55066954, ..., 0.15045334,
0.62795389, 0.37040103],
[0.05698078, 0.87069418, 0.93716051, ..., 0.52677276,
0.78530815, 0.09943461],
[0.39743043, 0.74337617, 0.83241167, ..., 0.22350924,
0.58301802, 0.11856071]],
[[0.12447479, 0.00923474, 0.31615442, ..., 0.06531484,
0.817872 , 0.87985659],
[0.82445245, 0.92848131, 0.35863431, ..., 0.96500016,
0.27812668, 0.03211635],
[0.91513602, 0.80247855, 0.92863826, ..., 0.20248401,
0.41530577, 0.58374913],
...,
[0.89297321, 0.48417899, 0.09877201, ..., 0.66867598,
0.59636351, 0.73775463],
[0.5414683 , 0.84745727, 0.2142708 , ..., 0.92382817,
0.79031301, 0.52012836],
[0.69686792, 0.80823503, 0.78360249, ..., 0.23605224,
0.31355182, 0.34213993]]])
Dimensions without coordinates: time, y, x
We now mask our data with the wet_mask
.
In [17]: da_masked = da.where(wet_mask)
In [18]: da_masked.isel(time=0).plot()
Out[18]: <matplotlib.collections.QuadMesh at 0x7f30ff2f9a30>
Now that we have some data, we can apply our filter. We need to specify which dimension names to apply the filter over. In this case, it is y
, x
.
In [19]: %time da_filtered = filter.apply(da_masked, dims=['y', 'x'])
CPU times: user 79.9 ms, sys: 12 ms, total: 91.9 ms
Wall time: 92.7 ms
In [20]: da_filtered
Out[20]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.4172859 , 0.44066133, 0.48267258, ..., 0.37930248,
0.38034291, 0.39962993],
[0.45279895, 0.46566277, 0.49802588, ..., 0.44491946,
0.44058854, 0.44753946],
[0.46351011, 0.47080046, 0.49532031, ..., 0.47527721,
0.46779336, 0.46532188],
...,
[0.49755664, 0.50126811, 0.48573478, ..., 0.36518244,
0.40599994, 0.46310028],
[0.42198917, 0.45294133, 0.47865345, ..., 0.30512787,
0.32905049, 0.37907262],
[0.39487306, 0.42940222, 0.4731927 , ..., 0.3150185 ,
0.32545848, 0.3609767 ]],
[[0.41804694, 0.47260544, 0.51928121, ..., 0.52784809,
0.46597333, 0.41423249],
[0.38259106, 0.42496724, 0.47981002, ..., 0.54789976,
0.47459218, 0.40225109],
[0.37309529, 0.40501947, 0.45941727, ..., 0.53404006,
0.46976511, 0.40096884],
...
[0.54732379, 0.60141455, 0.61294758, ..., 0.360898 ,
0.41605412, 0.47793768],
[0.52149051, 0.585214 , 0.61275567, ..., 0.34610425,
0.40057793, 0.45533347],
[0.47420115, 0.52649768, 0.55295606, ..., 0.34194682,
0.39066573, 0.42836296]],
[[0.54414573, 0.54565976, 0.54754221, ..., 0.46802521,
0.49523066, 0.52605625],
[0.55324482, 0.54521661, 0.53547611, ..., 0.51993292,
0.53472374, 0.54927423],
[0.56152551, 0.53095052, 0.50372881, ..., 0.57322753,
0.58494179, 0.58158461],
...,
[0.60063748, 0.53753151, 0.47556577, ..., 0.54638271,
0.60189586, 0.62545642],
[0.58706702, 0.5459631 , 0.50855335, ..., 0.49455808,
0.55523304, 0.59257783],
[0.55845533, 0.54621159, 0.53721998, ..., 0.45808458,
0.50514634, 0.54534977]]])
Dimensions without coordinates: time, y, x
Let’s visualize what the filter did.
In [21]: da_filtered.isel(time=0).plot()
Out[21]: <matplotlib.collections.QuadMesh at 0x7f30ff219d90>
Using Dask¶
Up to now, we have filtered eagerly; when we called .apply
, the results were computed immediately and stored in memory.
GCMFilters
is also designed to work seamlessly with Dask array inputs. With dask, we can filter lazily, deferring the filter computations and possibly executing them in parallel.
We can do this with our synthetic data by converting them to dask.
In [22]: da_dask = da_masked.chunk({'time': 2})
In [23]: da_dask
Out[23]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<xarray<thisarray>, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x
We now filter our data lazily.
In [24]: da_filtered_lazy = filter.apply(da_dask, dims=['y', 'x'])
In [25]: da_filtered_lazy
Out[25]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<transpose, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x
Nothing has actually been computed yet. We can trigger computation as follows:
In [26]: %time da_filtered_computed = da_filtered_lazy.compute()
CPU times: user 174 ms, sys: 11.8 ms, total: 186 ms
Wall time: 107 ms
Here we got only a very modest speedup because our example data are too small. For bigger data, the performance benefit will be more evident.