There are a group of functions that the Plus/4 ROM copies to RAM on startup. Each of these allows access to all of RAM, including under ROM, via different pointers stored on page 0 (the direct page in Motorola terms).
Each of these functions disables interrupts, pages out the ROM, loads A via a pointer on page zero, pages in ROM, enables interrupts, and returns to the caller. It's a lot of clock cycles to access a single byte.
For programs + data that are small enough to fit in memory without using RAM under ROM, we can patch the ROM to skip the costly sequence of instructions so that it directly accesses memory.
The patch must copy ROM to RAM, set the highest address available to BASIC so that it is before the start address of the ROM, and then install the code listed below.
Each function that is called is replaced with a piece of code that loads A with the page zero address that function uses, then it calls a common patch routine that overwrites the ROM code we copied to RAM so that it directly loads A without the intermediate call. Each JSR in ROM occupies 3 bytes. The opcode for JSR, and 2 bytes for the address to call. The LDA (),Y opcode takes 2 bytes. So we must overwrite the 3rd byte with a NOP. Using this approach will result in all calls to the load routines being patched the first time they are called.
The resulting patched code only requires 6 clock cycles instead of the 29 clock cycles the regular ROM requires. The beauty of this approach is that it patches every call to these functions without us having to find them all.
Warning: This assumes that the ROM does not use any optimizations where a JMP was used to call the code so that it would eliminate the need for two RTS instructions. If we discover that technique was used somewhere, we must identify it and place an RTS after the LDA (),Y instead of a NOP.
; stub routine replacing each piece of code ROM calls
LDA #address ; load A with address normally used by LDA (address),Y
JMP PATCH ; call the patcher
; code to patch the ROM where it calls functions to access RAM under ROM
PATCH:
STA #temp ; save page zero address to use
LDY #0 ; zero y
; point to code we want to patch
; Get LSB of return address from stack and adjust it to address of JSR
PLA ; get LSB of return address
SBC #3 ; subtract 3 (point to address of JSR)
STA RETURNADDRESS ; store it in our own page 0 pointer
; Get MSB or return address from stack and adjust it if carry set
PLA ; get MSB of return address
BCC NEXT ; deal with carry from MSB
DEC
NEXT:
; Push MSB of JSR address onto the stack
PHA
; patch 1st byte with LDA (),Y opcode
STA RETURNADDRESS+1 ; save it in our pointer
LDA #$B1 ; load the opcode we want to patch with
STA (RETURNADDRESS),Y ; patch BASIC
; patch 2nd byte with address passed from stub routine
INY ; next address
LDA TEMP ; get the address that was passed to us
STA (RETURNADDRESS),Y ; patch BASIC
; patch 3rd byte with NOP
INY ; next address
LDA #$EA ; NOP to finish the patch
STA (RETURNADDRESS),Y ; patch it
; Push LSB of patched code to stack and call it with RTS
LDA RETURNADDRESS ; get LSB of JSR
PHA ; push it to the stack
RTS ; call the patched code
Each of these functions disables interrupts, pages out the ROM, loads A via a pointer on page zero, pages in ROM, enables interrupts, and returns to the caller. It's a lot of clock cycles to access a single byte.
For programs + data that are small enough to fit in memory without using RAM under ROM, we can patch the ROM to skip the costly sequence of instructions so that it directly accesses memory.
The patch must copy ROM to RAM, set the highest address available to BASIC so that it is before the start address of the ROM, and then install the code listed below.
Each function that is called is replaced with a piece of code that loads A with the page zero address that function uses, then it calls a common patch routine that overwrites the ROM code we copied to RAM so that it directly loads A without the intermediate call. Each JSR in ROM occupies 3 bytes. The opcode for JSR, and 2 bytes for the address to call. The LDA (),Y opcode takes 2 bytes. So we must overwrite the 3rd byte with a NOP. Using this approach will result in all calls to the load routines being patched the first time they are called.
The resulting patched code only requires 6 clock cycles instead of the 29 clock cycles the regular ROM requires. The beauty of this approach is that it patches every call to these functions without us having to find them all.
Warning: This assumes that the ROM does not use any optimizations where a JMP was used to call the code so that it would eliminate the need for two RTS instructions. If we discover that technique was used somewhere, we must identify it and place an RTS after the LDA (),Y instead of a NOP.
; stub routine replacing each piece of code ROM calls
LDA #address ; load A with address normally used by LDA (address),Y
JMP PATCH ; call the patcher
; code to patch the ROM where it calls functions to access RAM under ROM
PATCH:
STA #temp ; save page zero address to use
LDY #0 ; zero y
; point to code we want to patch
; Get LSB of return address from stack and adjust it to address of JSR
PLA ; get LSB of return address
SBC #3 ; subtract 3 (point to address of JSR)
STA RETURNADDRESS ; store it in our own page 0 pointer
; Get MSB or return address from stack and adjust it if carry set
PLA ; get MSB of return address
BCC NEXT ; deal with carry from MSB
DEC
NEXT:
; Push MSB of JSR address onto the stack
PHA
; patch 1st byte with LDA (),Y opcode
STA RETURNADDRESS+1 ; save it in our pointer
LDA #$B1 ; load the opcode we want to patch with
STA (RETURNADDRESS),Y ; patch BASIC
; patch 2nd byte with address passed from stub routine
INY ; next address
LDA TEMP ; get the address that was passed to us
STA (RETURNADDRESS),Y ; patch BASIC
; patch 3rd byte with NOP
INY ; next address
LDA #$EA ; NOP to finish the patch
STA (RETURNADDRESS),Y ; patch it
; Push LSB of patched code to stack and call it with RTS
LDA RETURNADDRESS ; get LSB of JSR
PHA ; push it to the stack
RTS ; call the patched code