Sunday, August 13, 2017

Faster Normalization

I tested a new floating point normalization routine last night.
The loop that performs the normalization is definitely faster, but the adjustment to the exponent is calculated outside the loop.  This makes the routines slightly slower if no byte oriented normalization is needed, only slightly faster if one pass is needed, and definitely faster if 2-5 passes are needed.

The problem with an optimization like this is that it's difficult to tell which is faster in real world use.  You can't just count clock cycles.   The only sure test is benchmarks.  I ran the Life program I've been using side by side with the previous ROM version.  After running overnight... the new version is definitely faster.  But the Life status bar is only about 2 blocks different after over 200 generations.  More testing and a size comparison will be needed to see if it stays.  If it's always faster and within a few bytes of the previous version, I'll keep it in.  It's definitely faster and smaller than the Microsoft version which only shifted the mantissa a byte at a time with the A register.  It's very obvious the original 6800 code was used here.

No comments:

Post a Comment