A Bitbanger's Blog

Wednesday, April 5, 2017

Another thing I've noticed about the way the interpreter was written, is that it doesn't take advantage of Motorola's indexed addressing to optimize some code.
It is common for the interpreter to do something like this:

* End of command or program line

LE52A

inx ; advance past the end-of-line terminator

ldaa ,X ; get MSB of 'next line' link

inx ; advacne to LSB

oraa ,X ; OR in the LSB of the 'next line' link

staa ENDFLG ; clear ENDFLG if end of program

beq LE589 ; goto END if no more program lines

* Start next program line

inx ; advance to LSB of line number

inx ; point X to new line number

ldd ,X ; get new line number..

std CURLIN ; ..and store in CURLIN

stx CHRPTR ; set parser position to start of line -1

The author does not seem to realize that
LDAA ,X
is the same as
LDAA 0,X

If there are several INX instructions used together like this, then the following code is faster and smaller. Note that the code at LE589 doesn't care that X has not been updated..

;* End of command or program line

LE52A

ldaa 1,X

oraa 2,X ; OR in the LSB of the 'next line' link

staa ENDFLG ; clear ENDFLG if end of program

beq LE589 ; goto END if no more program lines

;* Start next program line

ldd 3,X ; get new line number..

std CURLIN ; ..and store in CURLIN

ldab #4 ; advance to LSB of line number

abx

stx CHRPTR ; set parser position to start of line -1

The 4 INX instructions in the original code require 12 clock cycles. LDAB ABX requires 5 clock cycles.

The final code also saves the pointer to the next line since it only takes 5 more clock cycles to load the extra byte and save the pointer. The end result is that the new code actually takes a fewer number of clock cycles than the old code even though it saves the pointer to the next line, and the new code only requires one additional byte.

A Bitbanger's Blog

Wednesday, April 5, 2017

No comments:

Post a Comment