Sometimes bigger is not always better...

Post Reply
rhyde
Site Admin
Posts: 47
Joined: Sun Dec 04, 2022 5:36 pm

Sometimes bigger is not always better...

Post by rhyde »

I have recently (2/29/2024) been working on code for a book I'm working on (Art of ARM Assembly, Volume 2). One of the chapters deals with numeric-string-to-integer conversion. I'd banged out the usual "fetch a char, test it to see if it's a digit, multiply an accumulator by ten, and add in the digit" algorithm. It ran in about 56 seconds with my test data (100,000,000 iterations). Then I worked on an improved version that used a lookup table to classify characters as digits, delimiters, and illegal characters. This sped up the code, running in about 40 seconds for the same input data. Not a bad improvement (though the lookup table was 256 bytes long).

While working on integer-to-string conversions, I'd used the usual (high-speed) approach of processing two digits at a time (using a 512-byte lookup table) and even four digits at a time (using a 40KB lookup table). Perhaps there was a way to speed up string-to-numeric conversions using the same technique? Processing two characters at a time, using a lookup table (each entry is four bytes) requires a 65,536-element table (for a total of 256KB). Very few people could justify that kind of memory usage for this type of routine, but I was still interested in seeing how it performed. The solution I came up with wasn't much better than the 256-byte version--it ran in about 38 seconds, a whole two seconds faster. For 99.999% of the applications out there, the small improvement does not justify the space (and the hits on the cache, affecting the performance of the rest of an application). Nevertheless, it was a good failure, I learned what cannot be done. For those who are interested, I've attached the code (ARM assembly language).

While looking for algorithms to implement a fast string-to-numeric conversion (interestingly, almost everyone argues for the traditional algorithm), I came across many comments along the lines of "premature optimization is a waste of time" and "why do you care how fast this function is?" If the original poster was asking how to do this for a single application, some argument could be made about not rushing to optimize the function until you've profiled your code and determined where the real performance bottlenecks lie. However, this argument does not apply to library code (such as a string-to-integer conversion). Library code is going to be used it a wide variety of applications; some where the performance doesn't matter, some where it matters a lot. Anyone who produces a lot of XML or JSON output can testify that converting integers to human-readable strings, and vice-versa, can be real bottlenecks. Library code should always be optimized, independent of some individual application that uses it, because the library's author(s) will never know how important that code might be.
Cheers,
Randy Hyde
Attachments
Listing9-6.zip
Standard algorithm (56 secs on a Pi 400)
(13.84 KiB) Downloaded 20 times
Listing9-7.zip
Character set via lookup table (40 secs on a Pi 400)
(15.69 KiB) Downloaded 21 times
Listing9-7a.zip
Two-character processing (38 secs on a Pi 400)
(26.98 KiB) Downloaded 25 times
Post Reply