Friday, August 4, 2017

The fast multiply seems to be working now.  It still needs extensive testing and I still have to fix a couple related issues, but that should be easy compared to what I've already had to fix.

So, how fast is it?  Well, we need to do some benchmarking to figure that out.

Ahl's benchmark was a popular BASIC benchmark back in the day.  It was one of the few speed comparisons for Personal Computers in the late 70s an early 80s.  It's not really a great benchmark by today's standards, but it did a decent job of measuring the speed and accuracy of the BASIC math library.  It does zero testing of things like string handling, array manipulation, etc...  Math yeah, no to just about everything else.

Ahl's Benchmark currently lists the accuracy of the new math library at 8.66413117E-4.  That's a higher error than the original ROM, but it's still better than the standard 6502 versions. (FWIW, the accuracy might improve before the ROM image is released.)  The benchmark takes 67 seconds, which is 52 seconds faster than the factory ROM.  That's almost a 44% faster.  In the Creative Computing article that published results for 140 different computers from fastest to slowest, the MC-10 was originally listed about 2/3 of the way down.  With this and the other optimizations I've made, it jumps past over 30 machines on the list and into the top half .  Radio Shack was closing these out for as little as $10 when it was discontinued even though it is faster than machines costing in the thousands at the time.

I also ran a sieve test.  With the first test, the new version is only slightly faster because sieve uses division rather than multiplication.  It totally depends on the other optimizations I've made which make a much smaller difference.  This brings up the possibility of another optimization.  It is normally faster to calculated the reciprocal of the divisor and multiply rather than to use division. Thanks to the hardware multiply, the difference should be even greater.  Example... if we want to do this  C = A/B it will be faster to do this C=A*1/B.   But isn't 1/B division?  It is.  This is mostly for pre-calculating constants you divide by or where the reciprocal can be calculated outside a loop the value is used repeatedly in.

That brings up another issue.  As of now, there are only 14 bytes of free space remaining in the ROM, and that's without printing "MICROCOLOR BASIC", so the 8K version is pretty much done unless I find some additional code size optimizations.  That's too bad, the fast SQR function would probably put the Ahl results under 60 seconds.  Even a faster ^ (power) function might do that.  If I could get it under 50 seconds, that's normally the realm of systems clocked twice as fast.  That's around what the Model 4 benchmarked at and it runs a 4MHz Z80.

I have to admit, it felt good to get this working!  This was one of my original personal goals for the project and it was a bit disappointing that I didn't have time to put it in the contest release.  Also, things weren't looking good this morning when Ahl's Benchmark listed accuracy at over 4000... that's with no exponent.  The error was that bad!  I tried looking up 6803, 6809, or 68hc11 floating point routines that use the hardware integer multiply and found nothing.  The CPU and it's siblings have been out almost 40 years and nobody has even tried this?  I couldn't even find something for other processors.  I'm not saying it doesn't exist, but I eventually gave up looking for it.  I deleted some code and wrote it from scratch... problem solved.

The code is pretty straightforward, but people might wonder why I did a couple things.   If you get to look at the code, just be aware there is one semi-clever thing in the multiply I did to save a few instructions, and you'll be wondering what the heck I was thinking until you figure it out.

Once a "final" release version is out, maybe I'll take some time to make this blog look a little neater.
I'm just not that into the whole blogging thing.


*edit*

I finally found a floating point library that uses the multiply instruction.  It's in the 68hc12 floating point library posted in the GNU 68HC11/HC12 group on yahoo.  The code makes a similar optimizations to the multiply as my code and uses the additional divide instructions from the 68hc12.

No comments:

Post a Comment