A Bitbanger's Blog

Monday, April 10, 2017

And now I messed up the FOR NEXT stack frame when I moved some stuff related to the next line pointer around. This should be fun to track down. <sigh>

Another speed optimization I'm working on involves parsing every character via the CHRGET subroutine. CHRGET is a piece of self modifying code that is copied from ROM to the direct page at startup. The function looks like this:

;*

;* Byte parser subroutine utilizing self-modifying code.

;* This routine is copied to RAM at CHRGET ($00EB) during cold start.

;*

fcb INIDAT-PARSER ; number of bytes to copy

PARSER inc CHRPTR+1 ; increment LSB of parse location

bne LF7D8 ; branch if no carry

inc CHRPTR ; increment MSB of parse location

LF7D8

; ldaa $0000 ; load byte from parse location into ACCA

fcb $B6,$00,$00

jmp BPARSE ; call back-end of parser routine in ROM

If we can place CHRPTR in X, we can move this to ROM right in front of BPARSE and reduce it from 22(?) clock cycles to 7(?) clock cycles. If we don't have to update CHRPTR until we exit BPARSE, we save even more clock cycles since BPARSE loops back to CHARGET.
This also only resulted in a 1 byte increase in code size thanks to removal of a couple STX CHPTR instructions elsewhere. Sadly, only 2 out of 20+ calls can use this optimization without major changes, and neither will speed up program execution.

CHRGET2

inx

stx CHRPTR

ldaa ,X ; get the next character

If someone builds an MC-10 clone using a 68HC11 based microcontroller (as has been suggested in the MC-10 yahoo group), the code could easily be optimized my placing CHRPTR in the Y register. Then every update to CHRPTR just involves loading Y or incrementing Y, and CHRGET becomes INY LDAA ,Y.
This should speed up the interpreter significantly.

A Bitbanger's Blog

Monday, April 10, 2017

No comments:

Post a Comment