5.11. Filtering unidimensional arrays#
Important
This section is about manipulating NumPy arrays, and more precisely about selecting specific indexes in an array using a mask of booleans or integers. This is not about filtering a time series using a moving average or Butterworth filter, which will be seen later in section Filtering TimeSeries.
We learned how to access one data using indexing, and multiple regularly-spaced data using slicing. To read multiple non-regularly-spaced data, we use filtering. We call it filtering because we selectively filter out some data using a mask.
Using these NumPy arrays:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([0.0, 0.58, 0.95, 0.95, 0.58, 0.0, -0.59, -0.96, -0.96, -0.59])
time = np.arange(10) / 10
Let’s say we want to keep only the following indexes:
<>:9: SyntaxWarning: invalid escape sequence '\c'
<>:9: SyntaxWarning: invalid escape sequence '\c'
/var/folders/7v/g5_ntzfx35v4ck07wflynw640000gp/T/ipykernel_64419/3715019716.py:9: SyntaxWarning: invalid escape sequence '\c'
"$\checkmark$",
We can create a mask to select directly which items to keep, and which items to reject.
This mask can be either a boolean mask, which is the same shape as data
, and where each data to keep is True, and where each data to discard is False:
bool_mask = [True, False, False, True, False, False, False, False, True, True]
of an integer mask, where we explicitly list the indexes to keep:
int_mask = [0, 3, 8, 9]
In any case, we then use this mask between brackets, as we would index or slice the array:
plt.subplot(3,1,1)
plt.plot(time, data, "s-")
plt.title("Original data")
plt.subplot(3,1,2)
plt.plot(time[bool_mask], data[bool_mask], "s-")
plt.title("Filtered data, using a mask of bool")
plt.subplot(3,1,3)
plt.plot(time[int_mask], data[int_mask], "s-")
plt.title("Filtered data, using a mask of int")
plt.tight_layout();
Tip
Since indexes can also be negative, then masks of integers can also use negative values.
In section Arithmetics, we learned how to generate arrays of bool by comparing an array to a number using comparison operators such as ==
, <
, >=
, etc. These comparisons are a powerful way to filter an array. For example, to keep every positive value and reject the rest:
to_keep = data >= 0 # Create a boolean mask of the values to keep
print("to_keep =", to_keep)
plt.plot(time, data, "s-", label="Original data")
plt.plot(time[to_keep], data[to_keep], "o-", label="Filtered data")
plt.legend();
to_keep = [ True True True True True True False False False False]
Or, as another example, to replace any negative value by 0:
is_negative = data < 0
print("is_negative =", is_negative)
new_data = data.copy()
new_data[is_negative] = 0
plt.plot(time, data, "s-", label="Original data")
plt.plot(time, new_data, "o-", label="New data")
plt.legend();
is_negative = [False False False False False False True True True True]