Here's a little patch to speed up the Commodore Plus/4. This modifies the RAM based CHRGOT function that is used to scan through the BASIC code. The standard code disables interrupts, pages out the ROM, reads a byte, pages in the ROM, and enables interrupts for every byte of a program it reads. It does this so that it can provide up to 60K for BASIC. This is certainly a nice feature if you need that much RAM, but if you don't, it slows down programs significantly for no reason.
This simple piece of code speeds up one benchmark by about 4%. It is a pretty significant gain for a few hours work and requires no changes to the ROM. Getting this much extra speed out of the MC-10 was a lot harder and requires a new ROM. Actual performance increases will vary by program. Still, I had hoped for better results.
Only programs that fit in memory below the start address of ROM will work with this as it eliminates the code that pages ROM in and out. It makes no attempt to modify system variables to restrict code to that area, and it does not restrict the use of upper RAM for data. Additional patches that restrict data to the same area of RAM would provide additional speed.
Here is the original code used by the Plus/4 BASIC interpreter from a ROM disassembly. We are most interested in the code starting at $8129 in the ROM. This is copied to RAM on startup:
; CHRGET/CHRGOT - This chunk of code is copied to RAM
; and run from there. It is used to get data UNDER the
; system ROM's for basic.
;
; CHRGET ($0473)
L8123 INC LastBasicLineNo ; $3b (goes to $0473 ) CHRGET
BNE L8129
INC LastBasicLineNo+1 ; $3c
;
; CHRGOT ($0479)
;
L8129 SEI
STA RAM_ON
LDY #$00
LDA (LastBasicLineNo),y ; $3b
STA ROM_ON
CLI
CMP #$3A ; ":" (colon)
BCS L8143 ; if colon, exit
CMP #$20 ; " " (space)
BEQ L8123 ; if space, get NEXT byte from basic
SEC
SBC #$30
SEC
SBC #$D0
L8143 RTS
This contains the new CHRGOT function. It's code was embedded in the BYTE section of the patch that follows this listing. Note that code is designed to exit without any branches for the most commonly found type of byte. This saves a clock cycle for every such byte as branch taken requires one more clock cycle than not taken.
00000r 1 .ORG $0473
000473 1
000473 1 ;
000473 1 ; CHRGET/CHRGOT - This chunk of code is copied to RAM
000473 1 ; and run from there. It is used to get data UNDER the
000473 1 ; system ROM's for basic.
000473 1 ;
000473 1 ; CHRGET ($0473)
000473 1 L8123:
000473 1 E6 3B INC LastBasicLineNo ; $3b (goes to $0473 ) CHRGET
000475 1 D0 02 BNE L8129
000477 1 E6 3C INC LastBasicLineNo+1 ; $3c
000479 1 ;
000479 1 ; CHRGOT ($0479)
000479 1 ;
000479 1 L8129:
000479 1 A0 00 LDY #$00
00047B 1 B1 3B LDA (LastBasicLineNo),y ; $3b
00047D 1 C9 3A CMP #$3A ; Larger than $3A?
00047F 1 90 01 BCC NEXT ; if not, skip to NEXT
000481 1 60 RTS ; return if so
000482 1 NEXT:
000482 1 E9 2F SBC #$2F ; A=A-$30
000484 1 C9 F0 CMP #$F0 ; Is it a " "? (space)
000486 1 F0 EB BEQ L8123 ; if space, get NEXT byte from basic
000488 1 38 SEC ; A=A-$D0
000489 1 E9 D0 SBC #$D0 ; clear carry if digit, set otherwise
00048B 1 60 RTS
00048C 1
00048C 1 .end
This is the source code for the program that patches the CHRGOT function. It is designed to be embedded in a REM statement in the first line of a BASIC program. Note that the 2nd byte of the actual CHRGOT code has been changed from $00 to $01 and is patched once it is copied to it's final destination. Microsoft BASICs don't advance to the next line by using the pointer stored at the start of the line once it starts to parse a line. It scans for the end of line marker which is $00. It assumes anything that follows is a line of BASIC code. Storing the byte as non $00 is required so BASIC can skip to the next line every time the program runs.
000000r 1
000000r 1 .org $1006 ; The address of ML$ in our BASIC program
001006 1
001006 1 A0 12 LDY #18 ; Starts at CHRGOT+18 and works down...
001008 1 NEXT:
001008 1 B9 17 10 LDA CHRGOT,Y ; Get byte of new CHRGOT routine
00100B 1 99 79 04 STA $0479,Y ; Save it over the old routine
00100E 1 88 DEY ; decrement our loop counter/index register
00100F 1 10 F7 BPL NEXT
001011 1 C8 INY
001012 1 98 TYA
001013 1 99 7A 04 STA $047A,Y
001016 1 60 RTS
001017 1 CHRGOT:
001017 1 A0 01 B1 3B .BYTE $A0,$01,$B1,$3B,$C9,$3A,$90,$01,$60,$E9,$2F,$C9,$F0,$F0,$EB,$38,$E9,$D0,$60
00101B 1 C9 3A 90 01
00101F 1 60 E9 2F C9
00102A 1 .end
This is the final BASIC code containing the patch. It can be added to smaller programs to sped them up. After the first time the program has been run, the lines containing the DATA statements, and line 1 can be deleted. The resulting program can be saved with the patch permanently embedded in the REM statement.
0 REM012345678901234567890123456789012345
1 FORI=0 TO 35:READ T:POKE 4102+I,T :NEXT I
2 SYS 4102
10000 DATA 160,18,185,23,16,153,121,4,136,16,247,200,152,153,122,4,96
10010 DATA 160,01,177,59,201,58,144,1,96,233,47,201
10020 DATA 240,240,235,56,233,208,96
Here is the benchmark that prompted me to write this.
10 K=0:I=0:T=0:P=0
30 SCNCLR
100 PRINT "Prime Number Generator"
110 INPUT "Upper Limit";N
120 eTime=TIME
130 T=(N-3)/2
140 DIMA(T+1)
160 FORI=0TOT:A(I)=0:NEXT
200 FORI=0TOT:IFA(I)THENPRINT"..";:NEXT:GOTO330
210P=I+I+3:PRINTP;".";:K=I+P:IFK<=TTHENFORK=KTOTSTEPP:A(K)=1:NEXT:NEXT:GOTO330
260 NEXT
330 eTime=(TIME-eTime)/60
340 PRINT
350 PRINT "Total: ";eTime
360 END
This simple piece of code speeds up one benchmark by about 4%. It is a pretty significant gain for a few hours work and requires no changes to the ROM. Getting this much extra speed out of the MC-10 was a lot harder and requires a new ROM. Actual performance increases will vary by program. Still, I had hoped for better results.
Only programs that fit in memory below the start address of ROM will work with this as it eliminates the code that pages ROM in and out. It makes no attempt to modify system variables to restrict code to that area, and it does not restrict the use of upper RAM for data. Additional patches that restrict data to the same area of RAM would provide additional speed.
Here is the original code used by the Plus/4 BASIC interpreter from a ROM disassembly. We are most interested in the code starting at $8129 in the ROM. This is copied to RAM on startup:
; CHRGET/CHRGOT - This chunk of code is copied to RAM
; and run from there. It is used to get data UNDER the
; system ROM's for basic.
;
; CHRGET ($0473)
L8123 INC LastBasicLineNo ; $3b (goes to $0473 ) CHRGET
BNE L8129
INC LastBasicLineNo+1 ; $3c
;
; CHRGOT ($0479)
;
L8129 SEI
STA RAM_ON
LDY #$00
LDA (LastBasicLineNo),y ; $3b
STA ROM_ON
CLI
CMP #$3A ; ":" (colon)
BCS L8143 ; if colon, exit
CMP #$20 ; " " (space)
BEQ L8123 ; if space, get NEXT byte from basic
SEC
SBC #$30
SEC
SBC #$D0
L8143 RTS
This contains the new CHRGOT function. It's code was embedded in the BYTE section of the patch that follows this listing. Note that code is designed to exit without any branches for the most commonly found type of byte. This saves a clock cycle for every such byte as branch taken requires one more clock cycle than not taken.
00000r 1 .ORG $0473
000473 1
000473 1 ;
000473 1 ; CHRGET/CHRGOT - This chunk of code is copied to RAM
000473 1 ; and run from there. It is used to get data UNDER the
000473 1 ; system ROM's for basic.
000473 1 ;
000473 1 ; CHRGET ($0473)
000473 1 L8123:
000473 1 E6 3B INC LastBasicLineNo ; $3b (goes to $0473 ) CHRGET
000475 1 D0 02 BNE L8129
000477 1 E6 3C INC LastBasicLineNo+1 ; $3c
000479 1 ;
000479 1 ; CHRGOT ($0479)
000479 1 ;
000479 1 L8129:
000479 1 A0 00 LDY #$00
00047B 1 B1 3B LDA (LastBasicLineNo),y ; $3b
00047D 1 C9 3A CMP #$3A ; Larger than $3A?
00047F 1 90 01 BCC NEXT ; if not, skip to NEXT
000481 1 60 RTS ; return if so
000482 1 NEXT:
000482 1 E9 2F SBC #$2F ; A=A-$30
000484 1 C9 F0 CMP #$F0 ; Is it a " "? (space)
000486 1 F0 EB BEQ L8123 ; if space, get NEXT byte from basic
000488 1 38 SEC ; A=A-$D0
000489 1 E9 D0 SBC #$D0 ; clear carry if digit, set otherwise
00048B 1 60 RTS
00048C 1
00048C 1 .end
This is the source code for the program that patches the CHRGOT function. It is designed to be embedded in a REM statement in the first line of a BASIC program. Note that the 2nd byte of the actual CHRGOT code has been changed from $00 to $01 and is patched once it is copied to it's final destination. Microsoft BASICs don't advance to the next line by using the pointer stored at the start of the line once it starts to parse a line. It scans for the end of line marker which is $00. It assumes anything that follows is a line of BASIC code. Storing the byte as non $00 is required so BASIC can skip to the next line every time the program runs.
000000r 1
000000r 1 .org $1006 ; The address of ML$ in our BASIC program
001006 1
001006 1 A0 12 LDY #18 ; Starts at CHRGOT+18 and works down...
001008 1 NEXT:
001008 1 B9 17 10 LDA CHRGOT,Y ; Get byte of new CHRGOT routine
00100B 1 99 79 04 STA $0479,Y ; Save it over the old routine
00100E 1 88 DEY ; decrement our loop counter/index register
00100F 1 10 F7 BPL NEXT
001011 1 C8 INY
001012 1 98 TYA
001013 1 99 7A 04 STA $047A,Y
001016 1 60 RTS
001017 1 CHRGOT:
001017 1 A0 01 B1 3B .BYTE $A0,$01,$B1,$3B,$C9,$3A,$90,$01,$60,$E9,$2F,$C9,$F0,$F0,$EB,$38,$E9,$D0,$60
00101B 1 C9 3A 90 01
00101F 1 60 E9 2F C9
00102A 1 .end
This is the final BASIC code containing the patch. It can be added to smaller programs to sped them up. After the first time the program has been run, the lines containing the DATA statements, and line 1 can be deleted. The resulting program can be saved with the patch permanently embedded in the REM statement.
0 REM012345678901234567890123456789012345
1 FORI=0 TO 35:READ T:POKE 4102+I,T :NEXT I
2 SYS 4102
10000 DATA 160,18,185,23,16,153,121,4,136,16,247,200,152,153,122,4,96
10010 DATA 160,01,177,59,201,58,144,1,96,233,47,201
10020 DATA 240,240,235,56,233,208,96
Here is the benchmark that prompted me to write this.
10 K=0:I=0:T=0:P=0
30 SCNCLR
100 PRINT "Prime Number Generator"
110 INPUT "Upper Limit";N
120 eTime=TIME
130 T=(N-3)/2
140 DIMA(T+1)
160 FORI=0TOT:A(I)=0:NEXT
200 FORI=0TOT:IFA(I)THENPRINT"..";:NEXT:GOTO330
210P=I+I+3:PRINTP;".";:K=I+P:IFK<=TTHENFORK=KTOTSTEPP:A(K)=1:NEXT:NEXT:GOTO330
260 NEXT
330 eTime=(TIME-eTime)/60
340 PRINT
350 PRINT "Total: ";eTime
360 END
This will speed up the benchmark by over 30%. It disables the screen refresh while the benchmark is running. The screen refresh normally steals that many clock cycles away from the CPU.
115 POKE65286,PEEK(65286)AND239
335 POKE65286,PEEK(65286)OR16
No comments:
Post a Comment