When I wrote the post on using invert and multiply to speed up BASIC programs, I realized I hadn't searched through the ROM to find divides that could be altered to use this optimization. A quick search revealed a couple divides in the code.
The LOG function contains a divide that involved a constant. At first glance this seemed ideal, but on closer examination, the constant wasn't the divisor, so we're stuck with that one. Too bad, because Ahl's benchmark depends on the SQR (square root) function, which in turn uses the LOG function. This would have dropped the MC-10's time on that benchmark below one minute. This also would have let me drop one of the constants from the ROM because it would not have been needed anymore, saving the 5 bytes used by the floating point number.
The ASCII to binary, and binary to ASCII conversion code uses a common divide by the constant 10. The change was easy, just switch the constant from the floating point representation of 10 to 1/10, and change the code so it calls multiply instead of dropping into the divide function. On closer examination I had to duplicate a little code because SIN calls the function part way through and it cannot use the multiply due to the divisor not being a constant. I'll revisit that later to see if it can be rewritten, but the important thing is that the code works, and this is used quite often during the execution of a BASIC program. Numeric values are stored as ASCII text even after a BASIC program has been tokenized. There is an even faster technique, but that code would need a complete rewrite to ditch floating point through most of the calculation and I'm not committing to that... at least not yet.
Running the Solitaire solver program on the new ROM with this update side by side against the original ROM overnight seems to indicate the speedup is now 8%, and it's looking like 10% might be possible with a few more optimizations. This would make the MC-10 noticeably faster than 1 MHz 6502 machines when running pretty much anything. One benchmark that outputs a lot of numbers was a solid 20% faster, and Ahl's benchmark, which does not output anything until the end, was a couple seconds faster, however that was run over a VNC connection to my developer box, so that will have to be timed directly from the machine to be sure.
That sounds great, but there is one annoying little detail. I'm out of ROM space and I still have to fix a couple bugs. It looks like I'll have to drop something to fit this in, and the next release will probably be the last one to use 8K unless I find a way to save a significant number of bytes.
To save time converting the inverted constants to floating point, I used the following program on the Color Computer, which supports converting numbers directly to hexadecimal. The program prints the values stored in the variable.
Since numbers are calculated on an almost exactly identical math library, I don't have to worry about a PC based program producing a number that would differ from the MC-10.
10 A=1/10
20 Z=VARPTR(A)
30 FOR I=Z TO Z+4
40 PRINT HEX$(PEEK(I))
50 NEXT I