Wednesday, April 5, 2017

Another thing I've noticed about the way the interpreter was written, is that it doesn't take advantage of Motorola's indexed addressing to optimize some code.
It is common for the interpreter to do something like this:

* End of command or program line
 LE52A 
           inx                           ; advance past the end-of-line terminator
           ldaa      ,X                  ; get MSB of 'next line' link
           inx                           ; advacne to LSB
           oraa      ,X                  ; OR in the LSB of the 'next line' link
           staa      ENDFLG              ; clear ENDFLG if end of program
           beq       LE589               ; goto END if no more program lines
 * Start next program line
           inx                           ; advance to LSB of line number
           inx                           ; point X to new line number
           ldd       ,X                  ; get new line number..
           std       CURLIN              ; ..and store in CURLIN
           stx       CHRPTR              ; set parser position to start of line -1

The author does not seem to realize that
    LDAA   ,X
is the same as
    LDAA   0,X

If there are several INX instructions used together like this, then the following code is faster and smaller.  Note that the code at LE589 doesn't care that X has not been updated..
;* End of command or program line
LE52A
         ldaa 1,X
          oraa      2,X                  ; OR in the LSB of the 'next line' link
          staa      ENDFLG              ; clear ENDFLG if end of program
          beq       LE589               ; goto END if no more program lines
;* Start next program line
          ldd       3,X                 ; get new line number..
          std       CURLIN              ; ..and store in CURLIN
          ldab #4                   ; advance to LSB of line number
         abx
          stx       CHRPTR              ; set parser position to start of line -1

The 4 INX instructions in the original code require 12 clock cycles.  LDAB ABX requires 5 clock cycles.

The final code also saves the pointer to the next line since it only takes 5 more clock cycles to load the extra byte and save the pointer.  The end result is that the new code actually takes a fewer number of clock cycles than the old code even though it saves the pointer to the next line, and the new code only requires one additional byte. 

No comments:

Post a Comment