Tutorial 4. Introduction to the Particle Tracking Module Tools¶

This notebook provides an overview on how to use the particle tracking post processing tools provided in the tfv package.

These have been recently updated (May 2025) to use a familiar workflow to the hydrodynamic post processing tools with the .tfv accessor.

IMPORTANT: This module is under active development and has been released on the basis of obtaining early feedback. Please contact TUFLOW Support with issues / requests / feedback when using this new module.

from pathlib import Path
import xarray as xr
import matplotlib.pyplot as plt
import tfv.xarray

Loading data¶

Similar to a normal TUFLOW FV result containing hydrodynamics, we simply load up the netCDF file using Xarray and then apply the .tfv accessor.

For demonstrative purposes, we’ll also load in the associated hydrodynamic result that also contains an equivalent passive tracer.

Note: In a future release, this accessor may change to become more explicit, e.g., ds.ptm rather than ds.tfv

Note 2: For this tutorial, we only have a very small netcdf file containing 1200 particles, whereas in practice you’d potentially have in the order of 100,000s to possibly millions. The approach to post-processing is still similar, although with real results and a large number of particles it becomes especially important to think about data chunking and using gridding for plotting instead of direct scatter plots - more on this throughout the tutorial.

A note on using Xarray and Dask chunking¶

TUFLOW FV results, and many other datasets stored in netCDF format, are often tens of gigabytes in size. If you need to do some form of analysis across the entire dataset (e.g., spatially AND temporally), you will possibly run into the issue of running out of memory. There are many valid approaches to avoid this, and one of the most straightforward ways is to use Dask which provides a simple method to “chunk” your requests into manageable pieces. The particle tracking module of tfv is designed to work with Dask, and particularly with small chunks in the time dimension because of the way these netCDFs are written.

For more information, please see the documentation for Dask and Xarray.

When using TUFLOW FV results, we are most often looking at single timesteps; e.g., plotting a timestep (even interactively) only reads one timestep at a time. This extends to processing, for example if we have 3D results, or particle results like this tutorial, we often apply some function for every timestep (such as a depth-average function, or a gridded function); also note that these types of functions are often handled by the .tfv library so you may not realise that they are being done by the tools. Thus for many common operations, we only want to see 1 timestep worth of data at a time, so we generally recommend that you chunk by time = 1, which means that when apply functions, Xarray/dask will only read ONE timestep at a time, saving us memory and also speeding up certain things like plotting.

In short: Suggest using .chunk(time=1) for PTM results.

folder = Path('../data')

# Load up our PTM Result
ds = xr.open_dataset(folder / 'PTM_005_ptm.nc', decode_timedelta=True).chunk(time=1)
ptm = ds.tfv

# Load up our TFV domain result
dsf = xr.open_dataset(folder / 'PTM_005.nc').chunk(Time=1)
fv = dsf.tfv

This is what our PTM result looks like. You may have different variables available depending on what was output.

Available Methods¶

We will refer to our PTM accessor to it’s variable name ptm from herein.

This class provides a bunch of extra methods ontop of Xarray to help us look at and filter particles. The key methods available are:

get_particles: Filtering method to select particles by a bounding box or polygon, elevation range, particle age, status or group id.
get_grid: Count particles onto a regular grid and aggregate particle variables (e.g., mass, age)
plot: General instantaneous particle plot. Defaults to a scatter but can also plot as histogram on the fly.
plot_interactive: Interactive version of plot to help you check results
prep_interactive_slider: Builds the widget that can be passed to plot_interactive to create linked interactive figures.

Plotting¶

The plot methods provided by ptm are intended to allow a user to quickly look at their particles and see where they are dispersing and do sanity checks etc. We’ll start with some simple examples.

By default, when you call plot, the LAST timestep will be shown; this is because often there are no particles at the first timestep of the model!

ptm.plot()

<tfv.particles.ParticleScatter at 0x25aae581350>

../../_images/5f04c17cda5861d7482e37b02e9f84f2cd28e3e5f762cccd86250d6d21515075.png

It can be difficult to see where your particles are located spatially without a boundary or other geographic context shown on the figure. Let’s also use the hydrodynamic result plotting functions to add in a boundary. In practice, you could easily do this instead with a shapefile (e.g., using GeoPandas).

fig, ax = plt.subplots()
ax.set_aspect('equal')

fv.plot(alpha=0.0, colorbar=False, boundary=True, ax=ax)
ptm.plot(time=-1, ax=ax)

ax.set_title('')
plt.show()

There are plenty of kwargs available to the plot function, which allow you to choose whether you want to plot the particles as a scatter (default), or gridded (2D Histogram). To change between the two, use the plot_type argument.

When you choose plot_type=hist, the default coloring will be normalised density. This is great for seeing where particles are because often we have so many that they overlap significantly. You can also color the scatter or hist plot_types with other variables such as age. When you do this with the scatter they may overlap too much for it to be useful, but if you do it with the histogram you can choose a good statistic to be plotted that will be more meaningful (statistic = one of {'mean', 'max', 'min', 'median'}).

plt.close('all')

fig, axes = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True, sharex=True, sharey=True)

ax1, ax2, ax3 = axes

ptm.plot(ax=ax1, color_by='age', colorbar=False)
ptm.plot(ax=ax2, plot_type='hist', color_by='age', colorbar=False, dx=150, dy=150)
ptm.plot(ax=ax3, plot_type='hist', colorbar=False, nx=100, ny=100)

ax1.set_title('Particle Age')
ax2.set_title('Mean Particle Age')
ax3.set_title('Particle Density')

for ax in axes:
    ax.set_xticklabels([])
    ax.set_yticklabels([])

plt.show()

You can animate the plot figure using the plot_interactive tool. This provides a simple time selecting toolbar so you can scroll through your PTM results. To use this, simply call plot_interactive. You can pass through any arguments that you’re giving plot to this interactive tool.

For results with a very large numbers of particles (e.g., order of tens of thousands to millions, it can be very slow to plot the individual scatter points directly

This is where the stride keyword argument comes in handy which allows you to skip every Nth particle to speed things up. E.g., you could use stride=100 and still get a great idea of what your particles are doing.

The other option, and generally preferred, is to use the histogram gridded style plot (plot_type='hist'), which is much faster than trying to display 100,000s of particles.

We’ve included an example video below based on a much larger model result to demonstrate what this might look like for many particles. This result had approximately 120,000 particles and the interactive viewer was able to be used with good performance on a (close by) network mount to the data.

%matplotlib widget

ptm.plot_interactive(plot_type='hist')

Example interactive histogram figure

plt.close('all')
%matplotlib inline

Analysing Results¶

Preparing “processed” results for particle tracking results is difficult because particles are not based on any fixed grid (which is also their advantage over Eulerian schemes for certain types of modelling).

Nonetheless, many interprettable results still boil down to statistical representations and that usually requires that we bring the particles onto some common grid.

The tfv accessor includes two functions to assist a user with processing particle results:

.get_particles - An all in one selection function to help query particles by time, group, location (x,y,z) and status.
.get_grid - An all in one gridding tool to help put the particles onto a regular grid. This function includes get_particles before it, so you can jump straight to this function.

Example 1 - 2D simple gridding¶

This is an example of simple gridding of particles in 2D (i.e., ignoring the z dimension). The arguments for ptm.get_grid allow you to set the dx/dy/dz grid step, or alternatively as nx/ny/nz which are just a number of grid steps regardless of the bounds.

Other useful args are:

bounds - xlim/ylim bounds to filter particles in just an area
zlims - tuple of zlimits (lower, upper)
group_ids - tuple of particle groups to include
stats - tuple of status codes to include (e.g., if you want to filter stranded particles, or floating only, etc)
density - bool to choose whether to also return normalised particle density.

dxy = 5e-4
dz = 1    # 1m spacing in vertical

grd_2d = ptm.get_grid(
    slice(None),  # time, a required argument. This chooses all timesteps.
    dx=dxy, 
    dy=dxy,
    density=True,
    compute=False)

plt.close('all')
grd_2d['density'][-1].plot(vmin=0.0, vmax=0.01, cmap='turbo')

<matplotlib.collections.QuadMesh at 0x25aaf2e6e90>

../../_images/3e4175ddc0a59b8eb54f9dc738537b2de3d6fd62cdb4e7049aaf4c8b6ba8ccf9.png

Example 1 - 3D gridding¶

For some assessments it’s important to assess concentrations in 3D as well. The example finds the max concentration in 3D with a specified horizontal and vertical cell size (5e-4 degrees and 1m respectively - note that if this was a projected model, dx/dy would be in meters).

For real world ptm results, the file size is likely very large, so it’s important we stick to chunking. Thus it’s good practice to pass in compute=False and then check a few single timesteps before outputing a reduced file (shown below).

dxy = 5e-4
dz = 1 # 1m spacing in vertical

grd_3d = ptm.get_grid(
    slice(None),  # time, a required argument. This chooses all timesteps.
    dx=dxy, 
    dy=dxy,
    dz=dz,
    compute=False)

Now let’s compare this 3D result to our original 2D result. The comparison below shows that the particle counts differ significantly when you look at the number throughout the whole water column versus in smaller vertical bins. This has important implications when calculating derivative outputs such as concentrations.

fig, axes = plt.subplots(ncols=2, figsize=(8, 4), constrained_layout=True, sharex=True, sharey=True)

ax1, ax2 = axes

# Note, the [-1] is saying use the final timestep.
grd_2d['nparts'][-1].plot(ax=ax1, cmap='turbo', vmin=0, vmax=10, add_colorbar=False)
grd_3d['nparts'][-1].mean(dim='z').plot(ax=ax2, cmap='turbo', vmin=0, vmax=10)

ax1.set_title('2D Grid Counts (Full Water Column)')
ax2.set_title('3D Grid Mean Counts with 1m Vertical Bin')

for ax in axes:
    ax.set_xticklabels([])
    ax.set_yticklabels([])

plt.show()

../../_images/7bff5ae8f31bd852f5db1ce4d57cf7e9dcf19cbb1ad9002989880d963f146ff7.png

Note on choosing grid sizes when post-processing:

There is no one-size-fits-all approach for gridding particle results for interpretation. Important decisions must be made regarding the x, y, and z chainage spacing. Decreasing the grid resolution will increase concentration values, but this reaches a limit where results become choppy and discontinuous when grid sizes approach the order of the natural spacing between particles.

Grid size selection should be considered an investigation step in itself, and must be appropriate to the physical problem being studied. Consider what physical scales are relevant to your problem - are you interested in small-scale mixing processes or broader transport patterns? How many particles do you have in your model? The grid resolution should capture meaningful variations without introducing artificial artifacts. This often requires testing multiple grid configurations to achieve a good balance between spatial detail and result coherence.

Exporting gridded results

Often in practice we’d run a particle tracking model, then post-process the results onto a grid. We would save the grid as a “derived” output because we’ll then use this going forward. Here’s an example on how to do that, triggering a dask process that will help you manage memory.

from dask.diagnostics import ProgressBar

# Save the NetCDF to disk
output_folder = Path('./outputs')
output_folder.mkdir(exist_ok=True) # Make the folder, if it doesn't exist

wj = grd_3d.to_netcdf(output_folder / 'my_gridded_results.nc', compute=False)

with ProgressBar():
    wj.compute()