Pi 4 SIMD issues
Posted: Tue Jan 02, 2024 4:48 pm
The Cortex-A72 CPU used in the Raspberry Pi 4 seems to have some issues accessing the single-precision registers (S0-S31). An instruction such as
runs very slow. Storing R0 to an 8-byte memory location (with HO bytes zero) and loading D0 from that memory location is much faster (though it does wipe out S1). For example, a numeric-to-hexadecimal string conversion function I wrote ran almost three times faster by not using the single-precision registers. Note that this issue does not seem to happen on the Raspberry Pi 3's Cortex-A53 CPU.
Cheers,
Randy Hyde
Code: Select all
vmov s0, r0
Cheers,
Randy Hyde