Monday, April 10, 2017

And now I messed up the FOR NEXT stack frame when I moved some stuff related to the next line pointer around.  This should be fun to track down.  <sigh>


Another speed optimization I'm working on involves parsing every character via the CHRGET subroutine.  CHRGET is a piece of self modifying code that is copied from ROM to the direct page at startup.  The function looks like this:

;*
;* Byte parser subroutine utilizing self-modifying code.
;* This routine is copied to RAM at CHRGET ($00EB) during cold start.
;*
          fcb       INIDAT-PARSER       ; number of bytes to copy
PARSER    inc       CHRPTR+1            ; increment LSB of parse location
          bne       LF7D8               ; branch if no carry
          inc       CHRPTR              ; increment MSB of parse location
LF7D8
;     ldaa      $0000              ; load byte from parse location into ACCA
fcb $B6,$00,$00
          jmp       BPARSE              ; call back-end of parser routine in ROM

If we can place CHRPTR in X, we can move this to ROM right in front of BPARSE and reduce it from 22(?) clock cycles to 7(?) clock cycles.  If we don't have to update CHRPTR until we exit BPARSE, we save even more clock cycles since BPARSE loops back to CHARGET.
This also only resulted in a 1 byte increase in code size thanks to removal of a couple STX CHPTR instructions elsewhere.  Sadly, only 2 out of 20+ calls can use this optimization without major changes, and neither will speed up program execution.


CHRGET2
inx
stx CHRPTR
ldaa ,X ; get the next character

If someone builds an MC-10 clone using a 68HC11 based microcontroller (as has been suggested in the MC-10 yahoo group), the code could easily be optimized my placing CHRPTR in the Y register.  Then every update to CHRPTR just involves loading Y or incrementing Y, and CHRGET becomes INY LDAA ,Y.
This should speed up the interpreter significantly.

No comments:

Post a Comment