Statistical functions

5.7. Statistical functions#

NumPy provides common statistical functions such as:

which all take an array as an argument. For multidimensional arrays, these functions accept an additional axis argument to select which axis (lines, columns, etc.) to perform the operation on. For example, for a matrix, an operation on the first axis is performed on the lines:

import numpy as np

a = np.array(
    [
        [0.0, 0.1, 0.2, 0.3],
        [0.4, 0.5, 0.6, 0.7],
        [0.8, 0.9, 1.0, 1.2],
    ]
)

print(f"Sum on first axis = {np.sum(a, axis=0)}")
print(f"Max on first axis = {np.max(a, axis=0)}")
Sum on first axis = [1.2 1.5 1.8 2.2]
Max on first axis = [0.8 0.9 1.  1.2]

Whereas an operation on the second axis is performed on the columns:

print(f"Sum on second axis = {np.sum(a, axis=1)}")
print(f"Max on second axis = {np.max(a, axis=1)}")
Sum on second axis = [0.6 2.2 3.9]
Max on second axis = [0.3 0.7 1.2]

If the array contains nan values, using these functions can be problematic because any arithmetic operation that includes nan results in nan:

a = np.array(
    [
        [0.0, 0.1, 0.2, 0.3],
        [0.4, 0.5, np.nan, 0.7],
        [0.8, 0.9, 1.0, 1.2],
    ]
)

print(f"Sum on first axis = {np.sum(a, axis=0)}")
print(f"Max on first axis = {np.max(a, axis=0)}")
Sum on first axis = [1.2 1.5 nan 2.2]
Max on first axis = [0.8 0.9 nan 1.2]

For these cases, NumPy provides alternate functions:

that ignore nan values during the calculation.

print(f"Sum on first axis = {np.nansum(a, axis=0)}")
print(f"Max on first axis = {np.nanmax(a, axis=0)}")
Sum on first axis = [1.2 1.5 1.2 2.2]
Max on first axis = [0.8 0.9 1.  1.2]