Thursday, February 1, 2018

More 6803 optimization

One of the little used speed optimizations when writing 6803 code seems to be the replacement of multiple INX instructions with LDB #xx ABX.
Part of this is due to needing to preserve the contents of B, but because multiple INX instructions may be separated by other LDAA ,X or similar instructions.

There are several places in Microcolor basic where INX INX is used, or even more INX instructios are used.  The INX instruction takes 3 clock cycles and is 1 byte.  LDB #2 requires 2 clock cycles and is 2 bytes.  ABX is 3 clock cycles and 1 byte.  So replacing INX INX with LDB #2 ABX saves 1 clock cycle but takes 1 additional byte.  If the INX INX take place in a little use function, or one that does not impact the normal speed of execution of a program, it makes little sense to use this speed optimization.  But it the INX INX pair are inside a loop, it can save a lot of clock cycles.

Scrolling the text screen is one example.  This is not the actual interpreter's code, but it's close.

; X points to the destination address... $20 is 32, or the length of one line
LOOP:
  LDD $20,X
  STD ,X
  INX
  INX

  CPX  #ENDOFSCREEN-32
  BLT LOOP



The screen contains 32 characters / line * 16 lines, and it copies 2 bytes at a time.
Replacing INX INX with LDB #2 ABX saves 16 * 16, or 256 clock cycles over the entire screen.
However, if you unroll the loop just once, it saves an additional 6 clock cycles per pass, and cuts the number of CPX, BNE, and INX equivalent clock cycles in half!  So savings go from 256 to well over 1000 at the cost of 5 bytes over the original code.  That's at least enough clock cycles to execute another 250 more instructions somewhere else.

LOOP:
  LDD $20,X
  STD ,X
  LDD $22,X
  STD 2,X
  LDB #4
  ABX
  CPX #ENDOFSCREEN-32
  BLT LOOP



That is an obvious case, it is less obvious where the INX instructions are split over many lines of code.  This is from Microcolor BASIC.

;* End of command or program line
LE52A
          inx                           ; advance past the end-of-line terminator 3
          ldaa      ,X                  ; get MSB of 'next line' link 4
          inx                           ; advacne to LSB 3
          oraa      ,X                  ; OR in the LSB of the 'next line' link 4
          staa      ENDFLG              ; clear ENDFLG if end of program 3
          beq       LE589               ; goto END if no more program lines 3
;* Start next program line
          inx                           ; point X to new line number 3
          ldd       ,X                  ; get new line number..
          std       CURLIN              ; ..and store in CURLIN
          inx                           ; advance to LSB of line number 3
          stx       CHRPTR              ; set parser position to start of line -1

This can be replaced with a shorter and faster version.  I had to verify the contents of X and B were not required in LE589 or the code this falls through to, but 11 lines have been replaced with 9, and only 7 are executed most of the time.  4 INX instructions were replaced here.  Savings aren't quite so significant within the scope of the code, but this gets executed at the end of every line of BASIC code, so it adds up over time.

;* End of command or program line
LE52A
ldd 1,X ; get 'next line' link 5
bne LE52B ; zero = no more program lines (the LDD sets flags this tests) 3
staa         ENDFLG ; clear ENDFLG, we are at the end of the program 3
bra LE589 ; goto END if no more program lines 3
LE52B
;* Start next program line
        ldd       3,X                  ; get new line number..
        std       CURLIN              ; ..and store in CURLIN
ldab     #4 ; size of line terminator + next line link + 1 2
abx ; point X to new line number 3
        stx       CHRPTR              ; set parser position to start of line -1



No comments:

Post a Comment