Page 1 of 1

Pi 4 SIMD issues

Posted: Tue Jan 02, 2024 4:48 pm
by rhyde
The Cortex-A72 CPU used in the Raspberry Pi 4 seems to have some issues accessing the single-precision registers (S0-S31). An instruction such as

Code: Select all

vmov s0, r0
runs very slow. Storing R0 to an 8-byte memory location (with HO bytes zero) and loading D0 from that memory location is much faster (though it does wipe out S1). For example, a numeric-to-hexadecimal string conversion function I wrote ran almost three times faster by not using the single-precision registers. Note that this issue does not seem to happen on the Raspberry Pi 3's Cortex-A53 CPU.

Cheers,
Randy Hyde