Thursday, December 7, 2017

Kicking a dead horse... Commodore Plus/4 style

Here's a little patch to speed up the Commodore Plus/4.  This modifies the RAM based CHRGOT function that is used to scan through the BASIC code.  The standard code disables interrupts, pages out the ROM, reads a byte, pages in the ROM, and enables interrupts for every byte of a program it reads.   It does this so that it can provide up to 60K for BASIC.  This is certainly a nice feature if you need that much RAM, but if you don't, it slows down programs significantly for no reason.

This simple piece of code speeds up one benchmark by about 4%.  It is a pretty significant gain for a few hours work and requires no changes to the ROM.  Getting this much extra speed out of the MC-10 was a lot harder and requires a new ROM.  Actual performance increases will vary by program.  Still, I had hoped for better results.

Only programs that fit in memory below the start address of ROM will work with this as it eliminates the code that pages ROM in and out.  It makes no attempt to modify system variables to restrict code to that area, and it does not restrict the use of upper RAM for data.  Additional patches that restrict data to the same area of RAM would provide additional speed.


Here is the original code used by the Plus/4 BASIC interpreter from a ROM disassembly.  We are most interested in the code starting at $8129 in the ROM.  This is copied to RAM on startup:

        ; CHRGET/CHRGOT - This chunk of code is copied to RAM 
        ; and run from there. It is used to get data UNDER the 
        ; system ROM's for basic.
        ;
; CHRGET ($0473)
L8123   INC   LastBasicLineNo         ; $3b (goes to $0473 ) CHRGET
        BNE   L8129
        INC   LastBasicLineNo+1       ; $3c
;
; CHRGOT ($0479)
;
L8129   SEI    
        STA   RAM_ON
        LDY   #$00
        LDA   (LastBasicLineNo),y     ; $3b
        STA   ROM_ON
        CLI    
        CMP   #$3A   ; ":" (colon)
        BCS   L8143   ; if colon, exit
        CMP   #$20   ; " " (space)
        BEQ   L8123   ; if space, get NEXT byte from basic
        SEC    
        SBC   #$30
        SEC    
        SBC   #$D0
L8143   RTS    



This contains the new CHRGOT function.  It's code was embedded in the BYTE section of the patch that follows this listing.  Note that code is designed to exit without any branches for the most commonly found type of byte.  This saves a clock cycle for every such byte as branch taken requires one more clock cycle than not taken.

00000r 1                .ORG $0473
000473  1               
000473  1                ;
000473  1                ; CHRGET/CHRGOT - This chunk of code is copied to RAM
000473  1                ; and run from there. It is used to get data UNDER the
000473  1                ; system ROM's for basic.
000473  1                ;
000473  1                ; CHRGET ($0473)
000473  1               L8123:
000473  1  E6 3B         INC LastBasicLineNo ; $3b (goes to $0473 ) CHRGET
000475  1  D0 02         BNE L8129
000477  1  E6 3C         INC LastBasicLineNo+1 ; $3c
000479  1                ;
000479  1                ; CHRGOT ($0479)
000479  1                ;
000479  1               L8129:
000479  1  A0 00         LDY #$00
00047B  1  B1 3B         LDA (LastBasicLineNo),y ; $3b
00047D  1  C9 3A         CMP #$3A ; Larger than $3A?
00047F  1  90 01         BCC NEXT ; if not, skip to NEXT
000481  1  60            RTS ; return if so
000482  1               NEXT:
000482  1  E9 2F         SBC #$2F ; A=A-$30
000484  1  C9 F0         CMP #$F0 ; Is it a " "? (space)
000486  1  F0 EB         BEQ L8123 ; if space, get NEXT byte from basic
000488  1  38            SEC ; A=A-$D0
000489  1  E9 D0         SBC #$D0 ; clear carry if digit, set otherwise
00048B  1  60            RTS
00048C  1               
00048C  1                .end



This is the source code for the program that patches the CHRGOT function.  It is designed to be embedded in a REM statement in the first line of a BASIC program.  Note that the 2nd byte of the actual CHRGOT code has been changed from $00 to $01 and is patched once it is copied to it's final destination.  Microsoft BASICs don't advance to the next line by using the pointer stored at the start of the line once it starts to parse a line.  It scans for the end of line marker which is $00.  It assumes anything that follows is a line of BASIC code.  Storing the byte as non $00 is required so BASIC can skip to the next line every time the program runs.

000000r 1               
000000r 1                .org $1006 ; The address of ML$ in our BASIC program
001006  1               
001006  1  A0 12         LDY #18 ; Starts at CHRGOT+18 and works down...
001008  1               NEXT:
001008  1  B9 17 10      LDA CHRGOT,Y ; Get byte of new CHRGOT routine
00100B  1  99 79 04      STA $0479,Y ; Save it over the old routine
00100E  1  88            DEY ; decrement our loop counter/index register
00100F  1  10 F7         BPL NEXT
001011  1  C8            INY
001012  1  98            TYA
001013  1  99 7A 04      STA $047A,Y
001016  1  60            RTS
001017  1               CHRGOT:
001017  1  A0 01 B1 3B   .BYTE $A0,$01,$B1,$3B,$C9,$3A,$90,$01,$60,$E9,$2F,$C9,$F0,$F0,$EB,$38,$E9,$D0,$60
00101B  1  C9 3A 90 01  
00101F  1  60 E9 2F C9  
00102A  1                .end


This is the final BASIC code containing the patch.  It can be added to smaller programs to sped them up.  After the first time the program has been run, the lines containing the DATA statements, and line 1 can be deleted.  The resulting program can be saved with the patch permanently embedded in the REM statement.

0 REM012345678901234567890123456789012345
1 FORI=0 TO 35:READ T:POKE 4102+I,T :NEXT I
2 SYS 4102

10000 DATA 160,18,185,23,16,153,121,4,136,16,247,200,152,153,122,4,96
10010 DATA 160,01,177,59,201,58,144,1,96,233,47,201
10020 DATA 240,240,235,56,233,208,96



Here is the benchmark that prompted me to write this.

10 K=0:I=0:T=0:P=0
30 SCNCLR
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N

120  eTime=TIME
130  T=(N-3)/2
140  DIMA(T+1)

160 FORI=0TOT:A(I)=0:NEXT
200 FORI=0TOT:IFA(I)THENPRINT"..";:NEXT:GOTO330
210P=I+I+3:PRINTP;".";:K=I+P:IFK<=TTHENFORK=KTOTSTEPP:A(K)=1:NEXT:NEXT:GOTO330
260 NEXT

330  eTime=(TIME-eTime)/60
340  PRINT
350  PRINT "Total: ";eTime
360 END


This will speed up the benchmark by over 30%.  It disables the screen refresh while the benchmark is running.  The screen refresh normally steals that many clock cycles away from the CPU.
115 POKE65286,PEEK(65286)AND239
335 POKE65286,PEEK(65286)OR16

No comments:

Post a Comment