Some interesting ARM 32 results
Posted: Tue Aug 22, 2023 9:16 pm
I've recently been working on some numeric conversion code for AARCH32 (32-bit ARM code). A few interesting results from that work:
1. Lookup tables (256 element) work great for converting integers to strings of hexadecimal digits (assuming you don't mind a 512-byte lookup table). Much faster than the traditional approach (at least, on a Pi 400, YMMV on other CPUs/systems).
2. Tables are a spectacular failure when going the other direction (strings to numeric values). I used a jump table the the Thumb TBH instruction to implement a switch statement to classify each input character and process it accordingly. The traditional "shift 4 and add" algorithm was quite a bit faster.
3. Neon sucks. I did a Neon version of the numeric to string function. The code was very short. Alas, it ran much slower that the lookup table approach (see [1]). I'm sure Neon on A32 is good for something, converting 32-bit values to hex strings is not one of those things. (Note: I tried two different Neon algorithms, one using TBX and the other using the traditional "shift and add" approach, neither worked well.)
4. Surprise, surprise: though the ARM supports 64-bit floating-point arithmetic in hardware (at least on Cortex-A-class CPUs I'm working on), there are no instructions to convert a double-precision float to a 64-bit integer or vice versa. Had to do that in software (64-bit integer to double wasn't so bad, the other direction was a bit hairy).
I still have a lot of cleanup and optimization to do on this code before putting it in "The Art of ARM Assembly, Volume 2" But I did post the code in the "Generic Assembly" topic here, if you're interested in looking at it.
Cheers,
Randy Hyde
1. Lookup tables (256 element) work great for converting integers to strings of hexadecimal digits (assuming you don't mind a 512-byte lookup table). Much faster than the traditional approach (at least, on a Pi 400, YMMV on other CPUs/systems).
2. Tables are a spectacular failure when going the other direction (strings to numeric values). I used a jump table the the Thumb TBH instruction to implement a switch statement to classify each input character and process it accordingly. The traditional "shift 4 and add" algorithm was quite a bit faster.
3. Neon sucks. I did a Neon version of the numeric to string function. The code was very short. Alas, it ran much slower that the lookup table approach (see [1]). I'm sure Neon on A32 is good for something, converting 32-bit values to hex strings is not one of those things. (Note: I tried two different Neon algorithms, one using TBX and the other using the traditional "shift and add" approach, neither worked well.)
4. Surprise, surprise: though the ARM supports 64-bit floating-point arithmetic in hardware (at least on Cortex-A-class CPUs I'm working on), there are no instructions to convert a double-precision float to a 64-bit integer or vice versa. Had to do that in software (64-bit integer to double wasn't so bad, the other direction was a bit hairy).
I still have a lot of cleanup and optimization to do on this code before putting it in "The Art of ARM Assembly, Volume 2" But I did post the code in the "Generic Assembly" topic here, if you're interested in looking at it.
Cheers,
Randy Hyde