Friday, August 11, 2017

Fitting a SQR peg in a Microsoft hole

One of the biggest challenges of optimizing the MC-10's BASIC has been in improving the performance of the floating point library.  Here I'll specifically discuss the replacement of the SQR() (square root) function.

There are two known fast algorithms I've looked at using.  Both depend on the IEEE single precision floating point format which is as follows:
| 1 sign bit | 8 bit exponent | 23 bit mantissa |
But Microcolor BASIC uses a larger mantissa, and uses two slightly different formats internally.  One is packed to save memory, and the other unpacked to simplify calculations.
Here is the packed format:
| 8 bit exponent | 1 bit sign | 31 bit mantissa |
And here is the unpacked format which is used during floating point calculations:
| 1 byte exponent | 32 bit mantissa | 1 byte sign | 
The larger mantissa is simply an extra byte which provides additional accuracy without a huge loss in speed required for double precision, and the mantissa appears to be largely treated as 31 bit.

The two formulas I've looked at for performing a fast SQR are presented below in C source code, and they were taken from this page.

/* Assumes that float is in the IEEE 754 single precision floating point format
 * and that int is 32 bits. */
float sqrt_approx(float z)
{
    int val_int = *(int*)&z; /* Same bits, but as an int */
    /*
     * To justify the following code, prove that
     *
     * ((((val_int / 2^m) - b) / 2) + b) * 2^m = ((val_int - 2^m) / 2) + ((b + 1) / 2) * 2^m)
     *
     * where
     *
     * b = exponent bias
     * m = number of mantissa bits
     *
     * .
     */

    val_int -= 1 << 23; /* Subtract 2^m. */
    val_int >>= 1; /* Divide by 2. */
    val_int += 1 << 29; /* Add ((b + 1) / 2) * 2^m. */

    return *(float*)&val_int; /* Interpret again as float */
}

val_int -= 1 << 23 is easy enough.  In our case the bit is shifted 31 times due to the larger mantissa, and an additional time for the sign bit... so 1 << 31.  But a simpler way to but it is to subtract 1 from the exponent.

val_int >>= 1 is a bit of a problem since on IEEE format, the bit shifts into the mantissa, and in Microcolor BASIC format, it shifts into the sign bit.  Loosing the sign bit is not a problem due to the fact that we should have already generated an FC error (illegal function) if we are trying to use SQR on a negative number.  But it's not actually in the proper bit in the mantissa.

The solution is actually quite simple.  here is the fix using register FPA5 in packed format:
ldd FPA5          ; Get exponent and first mantissa byte
suba  #1               ; subtract 1 from the exponent
rolb                      ; get rid of the space for the sign bit
rora                      ; shift mantissa to match IEEE ...
rorb                      ; ... grab the carry and finish the IEEE match

Now we just >>= 1
rora
rorb
std  FPA5            ; save the exponent & 1st mantissa byte
ror  FPA5+2
ror  FPA5+3
ror  FPA5+4
Now we just need to add 1 << 29.  So in our case, 1 << 37.  That is in the leftmost byte holding the exponent.   We could just add it, but we need to return our number to Microcolor BASIC's format, so will will combine the two for speed, and we end up with adding 1 << 38.  But that only requires addition with a single byte, so we add $10 before shifting or $20 after.
ldd      FPA5
rolb
rola
adaa    #$20 ; this should also clear the carry for the next instruction
rorb
std      FPA5

Other than the initial checks for SQR(-num), normalization, etc... that's about it.  It's small, fast, and simple.  The problem with this approach is that it's just an approximation and the error adds up.

Here is the other approach I've looked at which was written by Greg Walsh.
float invSqrt(float x)
{
    float xhalf = 0.5f*x;
    union
    {
        float x;
        int i;
    } u;
    u.x = x;
    u.i = 0x5f375a86 - (u.i >> 1);
    /* The next line can be repeated any number of times to increase accuracy */
    u.x = u.x * (1.5f - xhalf * u.x * u.x);
    return u.x;
}
The first thing you should notice is that there are 4 multiplies.
Calling the ROM floating point multiply 4 times adds up to a lot of clock cycles.  There are more clock cycles used by the multiply instructions alone than the entire first approach, and that is a small fraction of the total.  It's still much faster than Microsoft's algorithm.  If speed is more important than accuracy, it's not the way to go, but since BASIC programs may depend on accuracy, it is the better choice here.  Since the multiply has already been altered to use the hardware multiply instruction, there will be less penalty than if we were using a CPU like the 6502 or Z80.

This algorithm also calculates an approximation similar to before, but it uses a mathematically derived "magic" constant which appears to generate a more accurate result than the first approach.  You could uses it without the multiplication if an approximation were accurate enough.  (You can read about the "magic" constant on this page.)  The multiplication is to improve the accuracy of the estimate through one iteration of Newton's method.  The constant should probably be extended a byte to match Microcolor BASIC's larger mantissa.  This might improve the accuracy of the estimate which is already within about 4% of actual.

I'm not going to post the code for this approach, it starts out manipulating the number into IEEE format in the same manner as before, it subtracts the appropriate bytes from the "magic" constant, and then it restores Microcolor numeric format for a series of calls that load floating point registers, and perform multiplication, subtraction, etc...  It's a pretty straightforward use of the ROM's floating point library.  The code would also be almost identical for the 6809 or 68hc11 depending on the floating point format you are using.


There are multiple algorithms for performing a faster square root than the original Microsoft approach.  If the error resulting from approximation were acceptable, they would seem much faster than the Walsh approach I'm using.  But for a general purpose interpreter, it's better to maintain accuracy.

2 comments:


  1. Very interesting, I for the color computer seemed more important that I had integer, since it only has floating point

    ReplyDelete
  2. If only it has supported integers! The overhead of doing just about everything in floating point is mind boggling. Other data types could be added, but not in the original 8K. I had to switch a lot of the math library to 16 bit code to squeeze in the ELSE statement. Given the fact that the math library is just under 2K now... you can imaging what adding support for int in every command and math functions would do.

    ReplyDelete