Skip to main content

TinyML Exploration: IMU Based Gesture Recognition

Table of Contents

Background

I built an embedded gesture-recognition “wand” that classifies 6 IMU gestures on an STM32F411RE in real time, then used it as a testbed to compare opaque AutoML (in this case NanoEdgeAI) against a fully transparent Scikit-learn → emlearn → C pipeline. AutoML tools can be fast, but when model internals aren’t visible, debugging and long-term maintenance can be painful. The core question here was: can an ML newbie use an open pipeline to create models that match (or beat) NanoEdgeAI on real hardware accuracy, latency, and memory while staying reproducible and inspectable?

Hardware + Dataset

  • MCU: NUCLEO-F411RE (Cortex-M4F, 512 KB flash / 128 KB SRAM)
  • IMU: MPU9250 (accel + gyro used; mag ignored) over I2C
  • Sampling: 200 Hz, accel ±16 g, gyro ±2000 dps, 16-bit
  • Gestures: circle, lightning, swipe up/down/left/right
  • After cleaning + standardizing duration, the final dataset used for training was 1200 labeled samples (200 per gesture). Each sample was resampled to 100 time steps and flattened into a 600-element feature vector (6 channels × 100).

Visualization of gestures to be classified.
Visualization of gestures to be classified.

Acceleration readings per axis by gesture.
Acceleration readings per axis by gesture (95% confidence interval).

Two model pipelines

1) NanoEdgeAI (opaque / generated)

  • Generated and ranked many candidates; I exported the top SVM, RF, and MLP by NanoEdgeAI’s “quality index” and deployed the generated C to the MCU.

2) Transparent pipeline (reproducible)

  • Trained Scikit-learn MLP + RF variants, selected configs using a weighted score prioritizing balanced accuracy (with small penalties for estimated compute + memory), then converted to C using emlearn for on-device inference.

Final evaluation was on-device using a new participant (not in the training set), 50 trials per gesture, recording predicted class + inference time.

Key results (on-device)

ModelOn-device accuracyOn-device inference timeCompiled flash
Custom MLP (Scikit→emlearn)91.78%3178 µs115.836 KB
NanoEdgeAI MLP91.00%824 µs35.880 KB
NanoEdgeAI SVM81.67%454 µs29.004 KB
Custom RF (Scikit→emlearn)76.32%15.27 µs30.292 KB
NanoEdgeAI RF51.00%412 µs37.484 KB

Scikit + Emlearn MLP model confusion matrix.
Transparent pipeline MLP model embedded inference performance.
Scikit + Emlearn RF model confusion matrix.
Transparent pipeline RF model embedded inference performance.

Two practical observations mattered more than the headline numbers:

  • “Blind spots” happened. NanoEdgeAI’s SVM and RF failed completely on the circle gesture during live testing (0% correct), and NanoEdgeAI RF also showed a blind spot for lightning.
  • Estimated performance ≠ deployed performance. Across models, real on-device behavior diverged from tool-reported estimates—especially for NanoEdgeAI.

What I’d do differently next

  • Quantize / fixed-point custom models to shrink flash (the custom MLP used float weights, inflating size).
  • A lot more time could be put into the implementation of the custom models. Due to the short-timeline it was deemed out of scope.
  • Collect a bigger, more variable dataset (inter-user motion inconsistency was a real limiter).
  • Explore Burn 🦀

Full write-up

If you want the full methodology, plots, and confusion matrices, the complete paper is here: Cracking the Code: Achieving High-Performance Embedded Gesture Recognition with Transparent Pipelines