VAIOS

Performance Analysis

The performance of VAIOS was evaluated using a comprehensive benchmark suite executed on 5 March 2026. The benchmarks were designed to assess the behavior of key subsystems including task scheduling, inter-task communication, memory management, floating-point computation, and DMA-based data transfer. All tests were conducted on an STM32F401RE (Cortex-M4) platform with a 1 ms system tick.

The benchmark suite covered a total of 14 test cases spanning multiple subsystems, all of which completed successfully without errors, timeouts, or failures. The results demonstrate stable operation under both isolated and concurrent workloads.

image image

Performance benchmarks for VAIOS subsystems on STM32F401RE. The results demonstrate efficient kernel execution and high computational throughput using the hardware FPU.

Experimental Platform

The benchmarks were conducted on a Nucleo-F401RE development board. The configuration details are summarized in Table 6.1.

Experimental platform configuration for VAIOS performance characterization.
Field Value
Board STM32F401RE (Nucleo-64)
CPU Cortex-M4 @ 84 MHz
FPU Enabled (hard-float ABI)
DMA Enabled (UART logging)
SysTick 1 ms period
Build Release (-O2)

Benchmark Summary

A comprehensive benchmark suite was executed on 5 March 2026 to evaluate the system’s performance across five critical sub-domains. The tests were performed on an STM32F401RE platform running at 84 MHz.

VAIOS Benchmark Results Summary. All 14 tests passed, demonstrating robust subsystem integration and performance.
Benchmark Status Duration Throughput
FPU: sinf throughput PASS 4 ms 250,000 ops/s
FPU: sqrtf throughput PASS 3 ms 333,333 ops/s
FPU: multi-task context-save PASS 4 ms 250,000 ops/s
DMA: Memory-to-Memory (Looped) PASS 20 ms \sim2,000 KB/s
DMA: Concurrent Streams PASS 21 ms \sim1,904 KB/s
TASK: Context switch rate PASS 212 ms 3,773 sw/s
TASK: Priority preemption PASS 65 ms Verified
TASK: Delay accuracy PASS 101 ms 1% Error
IPC: Semaphore ping-pong PASS 31 ms 32,258 trips/s
IPC: Mutex shared counter PASS 69 ms Verified
MEM: Alloc/free throughput PASS 97 ms 6,185 ops/s
MEM: Fragmentation resilience PASS Coalesced
STRESS: All subsystems concurrent PASS 8,004 ms Verified
Aggregate memory allocation latency for varying workload sizes (Small: 1000 ×\times 32B, Medium: 500 ×\times 128B, Large: 100 ×\times 1024B). The linear search time is proportional to the total heap traversal depth.

Task Scheduling: The scheduler achieved a context switch rate of approximately 3,773 switches per second under release build conditions at 84 MHz. This translates to approximately 22,000 clock cycles per context switch, including PendSV overhead and scheduler logic. Delay accuracy was measured with a 1% error margin (101 ms measured for a 100 ms requested delay), which is the theoretical limit for a 1 ms system tick.

Computational Performance: The hardware FPU demonstrates high efficiency, with sqrtf outperforming sinf by approximately 33%, consistent with the hardware-accelerated square root instruction set. Multi-tasking tests verified the FPU context preservation logic, with lazy stacking ensuring that only active FPU-using tasks incur the register save/restore overhead.

Subsystem Reliability: A 3,000 ms high-contention stress test successfully executed 200 DMA transfers, 6,902 semaphore trips, and numerous memory operations concurrently. Zero allocation failures and zero data corruption incidents were observed, confirming kernel stability under high-load autonomous flight scenarios. Significant reliability fixes were implemented during benchmarking, including a bounded spin-wait in the logging subsystem and optimized DMA flag clearing sequences to prevent spurious interrupts.