7.3.2. Moving average, Savitzky-Golay and deriving filters#

This section presents the following functions:

7.3.2.1. Smoothing using a moving average#

The moving average is an excellent filter to remove noise that is related to a specific time pattern. A classic example is the day-to-day evaluation of a process that is sensitive to weekends (for example, the number of workers who enter a building). A moving average with a window length of 7 days is ideal to evaluate the general trend of this signal without considering intra-week fluctuations. Let’s first load some noisy data:

import kineticstoolkit.lab as ktk
import matplotlib.pyplot as plt

ts = ktk.load(ktk.doc.download("filters_types_of_noise.ktk.zip"))

ts.plot(["clean", "periodic_noise"], '.-')
_images/bc35e3ab25857fc0657bd551cdac30f250cf60b48c6340e07a9b13104ed84785.png

This signal contains periodic noise with a period of five seconds. Using a 5-second moving average filter:

filtered = ktk.filters.smooth(ts, window_length=5)

gives the blue curve below:

UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'clean' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'quantized' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'artefacts' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
_images/832148810b32e74a8978368a0462381ab1b0014247855641bba81419f39bd075.png

As expected, the 5-sample period noise was completely removed. The signal was, however, also averaged and we therefore lost some dynamics in the signal.

7.3.2.2. Smoothing using a Savitzky-Golay filter#

The Savitzky-Golay filter is a generalization of the moving average. Instead of taking the mean of the n points of a moving window, the Savitzky-Golay filter fits a polynomial of a given order over each window. A moving average is therefore a particular case of the Savitzky-Golay filter with a polynomial of order 0.

It is a powerful filter for data that is heavily quantized, particularly if we want to derive these data. Let’s plot some heavily quantized data:

ts.plot(["clean", "quantized"], ".-")
_images/38bceca976a60418cfb04745792cc053488dbd338043e9ab1397b6f5b08e13e3.png

To smooth this signal using a second-order Savitzky-Golay filter with a window length of 7:

filtered = ktk.filters.savgol(ts, poly_order=2, window_length=7)

which gives the blue curve below:

UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'clean' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'periodic_noise' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'artefacts' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
_images/b04332548e5f5936f33f21031e7fe722a5cb2607e54d57f63deba8b1fb6fc4e5.png

7.3.2.3. Deriving TimeSeries#

Heavily quantized signals are often difficult to derive because they contain lots of plateaus that, once derived, are transformed into series of spikes. For instance, let’s see how deriving a quantized signal works without filtering, using ktk.filters.deriv:

derived = ktk.filters.deriv(ts)

derived.plot(["clean", "quantized"], ".-")
_images/167ce7b8dd768cff4031ac10823656195cbbac43bc695b11914c201a75e884a6.png

We can derive a signal using a Savitzky-Golay filter, which consists of deriving the polynomial that is fitted over the moving window. Using a 2nd-order Savitzky-Golay filter with a window length of 7:

derived_savgol = ktk.filters.savgol(
    ts, poly_order=2, window_length=7, deriv=1
)

which gives the blue curve below:

UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'periodic_noise' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3618] The key 'artefacts' exists in both TimeSeries's data. According to the overwrite=False parameter, its prior value has been overwritten by the new value. Use on_conflict='mute' to mute this warning.
_images/7de14ef63f67f4e48e94b3619ce20b15433af2aeeae5a80e455956133978f51f.png

As observed, the derivative of the highly-quantized signal is similar to the derivative of the clean signal.