Saturday, August 25, 2018

The 65C02 and the case of the missing multiply

In the early 80s, Western Design Center released a update to the 6502 called the 65C02.  The 65C02 is a low power CMOS version with some additional instructions, new addressing modes, and a couple bug fixes. 

The additions and fixes are nice, they seem to have saved about 5% of the size on my existing code, but I'm more interested in what is missing, and that is a hardware multiply instruction.  Even though Motorola had already produced processors & microcontrollers with a multiply instruction, that feature is absent from the 65C02 and even the later 65816.  A hardware multiply opcode certainly isn't mandatory.  Multiplying small numbers can often be done with just a few adds, or subtracts, and there is a lot of code out there that doesn't use a multiply at all.  However, once you start indexing large arrays, performing floating point, etc... a hardware multiply can make a noticeable difference in code size and speed.   The number of loops and bit shifts required by the floating point math library in Microsoft BASIC is where things really get really crazy.  Sixteen 8 bit hardware multiplies and some addition replaces a over 100 separate loads, stores, adds, shifts, tests, and loops.  And that's a conservative estimate.

While looking through a PDF of Leventhal's 6502 Assembly Language Programming book, I ran across the chapter on the 65C02 that isn't in my older print edition.  There in the description of the new opcodes is an unsigned 8 bit hardware multiply instruction on the 65C00.  The MUL instruction multiplies A by Y with the results in A Y, and it requires 10 clock cycles, the same as the multiply on the 6803.  After looking into it, the opcode is only available on Rockwell 65C00 microcontrollers, but I didn't look at every data sheet to be sure.  The opcode is $02.  The 65816 uses this as COP (coprocessor enable).  I'm sure this kept the die size smaller than including the multiply, but I'm not aware of any coprocessor for the 65816 so COP is of questionable use,

So what good is this tidbit?  Thanks to modern technology, the MUL instruction could be added to an FPGA core of the 65C02.  When combined with a few patches, Microsoft BASICs such as Applesoft II could have a similar speedup as my patches for the MC-10 and CoCo BASICs.  It wouldn't be quite as fast due to the lack of 16 bit support, but it would offer a significant speed boost to multiplication intensive code.  Ahl's Benchmark would easily drop from 1:53 on the Apple II & C64, to under 1:20.  That's certainly a more significant speed increase than the other opcodes added to the 65C02.

The open cores website has several cycle accurate 65C02 implementations.  After a quick look at some cores for Motorola replacements that include multiply, the changes required don't look too bad.  It's even possible to complete a multiplication in a single clock cycle.  Very interesting;

No comments:

Post a Comment