I've recently been working on some numeric conversion code for AARCH32 (32-bit ARM code). A few interesting results from that work:
1. Lookup tables (256 element) work great for converting integers to strings of hexadecimal digits (assuming you don't mind a 512-byte lookup table). Much faster than the traditional approach (at least, on a Pi 400, YMMV on other CPUs/systems).
2. Tables are a spectacular failure when going the other direction (strings to numeric values). I used a jump table the the Thumb TBH instruction to implement a switch statement to classify each input character and process it accordingly. The traditional "shift 4 and add" algorithm was quite a bit faster.
3. Neon sucks. I did a Neon version of the numeric to string function. The code was very short. Alas, it ran much slower that the lookup table approach (see [1]). I'm sure Neon on A32 is good for something, converting 32-bit values to hex strings is not one of those things. (Note: I tried two different Neon algorithms, one using TBX and the other using the traditional "shift and add" approach, neither worked well.)
4. Surprise, surprise: though the ARM supports 64-bit floating-point arithmetic in hardware (at least on Cortex-A-class CPUs I'm working on), there are no instructions to convert a double-precision float to a 64-bit integer or vice versa. Had to do that in software (64-bit integer to double wasn't so bad, the other direction was a bit hairy).
I still have a lot of cleanup and optimization to do on this code before putting it in "The Art of ARM Assembly, Volume 2" But I did post the code in the "Generic Assembly" topic here, if you're interested in looking at it.
Cheers,
Randy Hyde