Performance Evaluation — NavHAL | Vayu Technical Report

The performance of NavHAL was evaluated using a set of micro-benchmarks on an STM32F401RE microcontroller. All experiments were conducted on March 20, 2026. Measurements were obtained using the Cortex-M4 Data Watchpoint and Trace (DWT) cycle counter, enabling cycle-accurate timing independent of clock frequency. This ensures that the reported results reflect intrinsic software overhead rather than variations in processor frequency.

GPIO Toggle Performance

A GPIO pin was toggled for 100,000 iterations across multiple abstraction layers.

The GPIO toggle benchmark, shown in Fig. 5.4, captures the fundamental cost of interacting with hardware, where a single pulse consists of one SET and one RESET operation. Direct register access achieves a cost of 5 cycles per pulse, which represents the hardware-imposed lower bound for this operation and serves as a baseline for comparison. NavHAL achieves the same execution cost of 5 cycles, indicating that its abstraction introduces no additional overhead. This equivalence suggests that all abstraction layers in NavHAL are resolved at compile time, allowing the generated code to directly map to hardware operations without intermediate penalties. In contrast, the STM32 HAL implementation requires 26 cycles per pulse, which is approximately five times higher than the baseline. This additional cost can be attributed to the layered design of the HAL, including function call overhead, parameter validation, and generalized handling of peripheral configurations. Arduino exhibits the highest cost at 100 cycles per pulse, corresponding to nearly twenty times the baseline. This significant overhead arises from its highly abstracted and generic programming model, which relies on runtime handling and does not provide direct mapping to hardware registers.

Inline operations incur minimal cost, with execution requiring only 2 cycles, shown in Fig. 5.5, which effectively represents the lower bound for function execution when the compiler eliminates call overhead through inlining. In contrast, non-inlined function calls introduce additional latency due to stack operations, parameter passing, and control transfer. The LL implementation incurs a cost of 10 cycles per call, reflecting the overhead of a standard function invocation with minimal abstraction. NavHAL, when not inlined, requires 18 cycles per call, indicating a higher overhead compared to LL due to its abstraction layers. However, this overhead remains modest and is primarily attributable to controlled function wrapping and interface design rather than excessive layering. Importantly, NavHAL mitigates this cost through the use of aggressive inlining in performance-critical paths, ensuring that frequently executed operations approach the lower bound observed for inline execution. These results highlight that while function calls inherently introduce overhead, careful design choices such as limiting call depth and leveraging compile-time inlining enable NavHAL to maintain efficient execution without sacrificing abstraction.

Determinism and Jitter

Deterministic execution is critical for real-time systems, where consistent timing guarantees are required for reliable operation. As shown in Fig. 5.6, both direct register access (LL) and NavHAL without interrupts exhibit constant execution time at 5 cycles, indicating fully deterministic behavior in the absence of external interference. When interrupts are enabled, NavHAL demonstrates bounded latency, with execution time increasing up to 41 cycles. This variation represents interrupt-induced jitter and remains within a predictable and controlled range. In contrast, Arduino exhibits significantly higher variability, with worst-case latency reaching 187 cycles, reflecting both interrupt effects and additional overhead from its abstraction model. The results indicate that the jitter observed in NavHAL arises solely from interrupt preemption rather than intrinsic software overhead, thereby preserving deterministic behavior under controlled conditions while maintaining predictable bounds in interrupt-driven scenarios.

Interrupt Overhead

Interrupt latency and dispatch overhead.

Interrupt latency directly affects responsiveness in real-time control systems, as it determines how quickly the system can react to external events. As shown in Fig. 5.7, NavHAL exhibits a latency of 33 cycles with an additional dispatch overhead of approximately 16 cycles, resulting in a low overall interrupt handling cost. STM32 HAL shows slightly higher latency at 46 cycles and a comparable overhead of 18 cycles, reflecting the impact of its layered abstraction and generalized interrupt handling mechanisms. In contrast, Arduino demonstrates significantly higher latency at 187 cycles and a substantial overhead of 159 cycles, indicating considerable delay introduced by its abstraction model and runtime handling. These results highlight that NavHAL maintains efficient interrupt handling comparable to low-level implementations while minimizing additional overhead. Such low and predictable interrupt costs are essential for maintaining responsiveness and stability in time-critical applications, whereas the significantly higher overhead observed in Arduino can adversely impact control loop performance and system determinism.

Design Principles

The performance characteristics of NavHAL are achieved through the following design principles:

Compile-time specialization: Hardware configuration is resolved at compile time, eliminating runtime branching.
Direct register access: Peripheral interaction is performed through memory-mapped structures without intermediate abstraction layers.
Inline critical paths: Frequently executed operations are implemented as inline functions.
Minimal call depth: API functions are designed to avoid deep call hierarchies.
No dynamic dispatch: The absence of virtual functions or runtime lookup ensures predictable execution.