Saturday, October 13, 2018

Article 'A Great Old-Timey Game-Programming Hack', and a response

Here's a nice little story related to the 6809.  It shows of one of the more interesting optimizations you can use with the 6809.  It's also neat to see that people came up with similar solutions completely isolated from each other.

Link



My MC-10 (6803) 64 column graphics text code's screen scroll also uses the stack register as the destination pointer for similar reasons, but there are differences vs the 6809.

Each register PUSHed or PULLed requires a separate instruction, where the 6809 can PUSH or PULL multiple registers with a single instruction.  As a result, he 6803 code looks more like their earlier code.

With only one stack pointer, you have to use the index register for the other source or destination pointer, and the offset is only 1 byte, so you can only go up to 254 with LDD #,X before you have to change X.  The code looks like this, and it's unrolled for a 256 byte section of the screen:

   LDD #255,x      ; 2 bytes, 5 clock cycles
   PSHB                ; 1 byte, 3 clock cycles
   PSHA                ; 1 byte, 3 clock cycles
   LDD #254,x      ; 2 bytes, 5 clock cycles
   PSHB                ; 1 byte, 3 clock cycles
   PSHA                ; 1 byte, 3 clock cycles
   etc...

You could PUSH/PULL two bytes at a time if you are storing/loading the index register. You would loose the index register as a source pointer, so you have to hard code the address for each pair of bytes.

   LDX ROWADDRESS+254  ; 3 bytes, 5 clock cycles
   PSHX                                    ; 1 byte, 4 clock cycles
   LDX ROWADDRESS+252
   PSHX
   LDX ROWADDRESS+250
   PSHX
   etc...

Using PSHX saves 22 - 9 = 13 clock cycles per pair of bytes moved, or 13 * ((32/2)*(192-8)) = 38,272 clock cycles per scroll!  The code size also half then number of bytes per pair of bytes moved.

So why didn't I do that?

While this would be noticeably faster, you can't just change the index register for each 256 byte block, you have to hard code the addresses for the entire screen.
That may not be a big deal of you have a large RAM expansion, but it's not practical for most MC-10's.  However, if you wanted to implement 4 rows of text at the bottom of the screen similar to the Apple II and several other 8 bit machines, then it's not so bad.

The latest code generates the scroll code on the fly at startup, so I could generate either version of the code depending on the hardware you have.  We'll see.