7.2.6. Missing data#

This section shows how to use these methods:

It happens regularly that we record suboptimal data, which may include missing samples. For instance, in these kinematic data of tennis serve, the marker on the right shoulder dissapears regularly:

import kineticstoolkit.lab as ktk
import numpy as np


filename = ktk.doc.download("kinematics_tennis_serve_nan.c3d")
markers = ktk.read_c3d(filename)["Points"]
markers.plot("Derrick:RSHO")
_images/918cb7d1d0997d6f2bb1bc9f8b25f02eb527f47a5207f4ea300f32a5e43e0f0f.png

7.2.6.1. Finding missing samples#

Similarly to NumPy’s np.isnan function, TimeSeries provide the ktk.geometry.isnan method that returns which samples are missing as a list of bool:

is_missing = markers.isnan("Derrick:RSHO")

Using np.nonzero, the list of missing indexes is:

np.nonzero(is_missing)
(array([134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146,
        147, 192, 193, 194, 195, 331, 332, 333, 334, 335, 336, 337, 338,
        373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385,
        498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510,
        511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 549, 550, 551,
        552, 553, 554, 555, 660, 661, 662, 663, 664, 665, 666, 667, 668,
        669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681,
        682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694,
        695, 696, 697, 698, 699, 700, 701, 735, 736, 737, 738, 739, 740,
        741, 742, 743, 744, 745, 746, 862, 863, 864, 865, 866, 867, 868,
        869, 870, 871, 872, 917, 918, 919, 920, 921, 922, 923, 924, 925,
        926, 927]),)

7.2.6.2. Filling missing samples#

If the gaps are not too wide, we can fill these gaps using ktk.TimeSeries.fill_missing_samples:

filled_markers = markers.fill_missing_samples(
    max_missing_samples=20
)

filled_markers.plot("Derrick:RSHO")
_images/b8698a682a776f5fc7d4fbb5fd4fbfcc84a4b347fca6e7838d9ca58f197f269c.png

We see in the figure above that any gap larger than 20 samples has been left untouched. It is always a good idea to set such as maximal gap according to the sampling rate and movement being recorded, so that we do not interpolate too much and create invalid data.

Now, look at the blue curve. The default linear interpolation method was not optimal for these data, as it introduces obvious discontinuities in the signal. We may opt for another method, such as a cubic spline:

filled_markers = markers.fill_missing_samples(
    max_missing_samples=20,
    method="cubic",
)

filled_markers.plot("Derrick:RSHO")
_images/b131d6bd287bebb1a4ba2b50511d02ab0df19d8f2d97c8ee88d747c20b9032fd.png