7.3.2. Moving average, Savitzky-Golay and deriving filters#
This section presents the following functions:
7.3.2.1. Smoothing using a moving average#
The moving average is an excellent filter to remove noise that is related to a specific time pattern. A classic example is the day-to-day evaluation of a process that is sensitive to weekends (for example, the number of workers who enter a building). A moving average with a window length of 7 days is ideal to evaluate the general trend of this signal without considering intra-week fluctuations. Let’s first load some noisy data:
import kineticstoolkit.lab as ktk
import matplotlib.pyplot as plt
ts = ktk.load(ktk.doc.download("filters_types_of_noise.ktk.zip"))
ts.plot(["clean", "periodic_noise"], '.-')

This signal contains periodic noise with a period of five seconds. Using a 5-second moving average filter:
filtered = ktk.filters.smooth(ts, window_length=5)
gives the blue curve below:
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'clean' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'quantized' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'artefacts' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.

As expected, the 5-sample period noise was completely removed. The signal was, however, also averaged and we therefore lost some dynamics in the signal.
7.3.2.2. Smoothing using a Savitzky-Golay filter#
The Savitzky-Golay filter is a generalization of the moving average. Instead of taking the mean of the n points of a moving window, the Savitzky-Golay filter fits a polynomial of a given order over each window. A moving average is therefore a particular case of the Savitzky-Golay filter with a polynomial of order 0.
It is a powerful filter for data that is heavily quantized, particularly if we want to derive these data. Let’s plot some heavily quantized data:
ts.plot(["clean", "quantized"], ".-")

To smooth this signal using a second-order Savitzky-Golay filter with a window length of 7:
filtered = ktk.filters.savgol(ts, poly_order=2, window_length=7)
which gives the blue curve below:
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'clean' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'periodic_noise' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'artefacts' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.

7.3.2.3. Deriving TimeSeries#
Heavily quantized signals are often difficult to derive because they contain lots of plateaus that, once derived, are transformed into series of spikes. For instance, let’s see how deriving a quantized signal works without filtering, using ktk.filters.deriv:
derived = ktk.filters.deriv(ts)
derived.plot(["clean", "quantized"], ".-")

We can derive a signal using a Savitzky-Golay filter, which consists of deriving the polynomial that is fitted over the moving window. Using a 2nd-order Savitzky-Golay filter with a window length of 7:
derived_savgol = ktk.filters.savgol(
ts, poly_order=2, window_length=7, deriv=1
)
which gives the blue curve below:
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'periodic_noise' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.
UserWarning [/Users/felix/Documents/git/kineticstoolkit_doc/src/kineticstoolkit/timeseries.py:3545] The key 'artefacts' exists in both TimeSeries. According to the overwrite=False parameter, the new value has been ignored. Use on_conflict='mute' to mute this warning.

As observed, the derivative of the highly-quantized signal is similar to the derivative of the clean signal.