Friday, April 27, 2018

Patching the CoCo ROM, where to go from here.

There are several more patches I'm going to release.

The CHRGET/CHRGOT patch and changing the code that divides by 10 used for ASCII conversion are about ready to go.  They should speed up all programs somewhere between 2% and 4% if the results on the MC-10 are any indicator.  These could cause a few programs to act differently or even fail depending on what they do.  That's unlikely, but possible.  So the patch will be designed so it can be assembled with or without that code.

There is a 16x16 multiply that can be replaced.  This speeds up array indexing if I remember right.  The code should be easily ported from the MC-10.

There are a series of 6309 patches that can be easily implemented.  Screen scrolling and memory moves are the easiest to implement and the new code should fit over the old code in memory. 
A fast divide and faster multiply can be implemented.  The 6309 divide and larger multiply instructions are signed, so that will require a work around, but it's doable.  Those will probably offer the greatest improvement for the least work.  There are little patches that can go here and there, but those will require a lot of testing to be sure they don't break anything.  Many of these will even work on a CoCo 1 & 2 since they will fit in the original ROM space.  The fast multiply might even fit.

New square root code.  (SQR).  There wasn't enough space for this in the MC-10 ROM, but this will sit in RAM above ROM so it won't be an issue with the CoCo 3.  Most of the code has already been worked out on the MC-10, but there was one lingering issue.  The new code depends on a "magic" constant.  The existing one only works for double precision numbers and the appropriate number for the CoCo/MC-10 floating point format will have to be generated.  It may also impact precision.  *IF* I get this working, it should run about 20% faster.  This should drop Ahl's benchmark numbers down close to an IBM PC and SQR is also used in the fractal generator code I've been tinkering with.
This mod may take some time to work out.

Monday, April 23, 2018

Code for the higher resolution 3D Fedora Plot I posted a snapshot of on Facebook. This is the original Atari code followed by the changes to run on the CoCo 3.
100 SX=144:SY=56:SZ=64:CX=320:CY=192
110 C1=2.2*SY:C2=1.6*SY
120 DIM RR(CX)
130 FOR I=0 TO CX:RR(I)=CY:NEXT I
140 GRAPHICS 8+16:SETCOLOR 2,0,0:COLOR 1
150 CX=CX*0.5:CY=CY*0.46875:FX=SX/64:FZ=SZ/64
160 XF=4.71238905/SX
170 FOR ZI=64 TO -64 STEP -1
180   ZT=ZI*FX:ZS=ZT*ZT
190   XL=INT(SQR(SX*SX-ZS)+0.5)
200   ZX=ZI*FZ+CX:ZY=CY+ZI*FZ
210   FOR XI=0 TO XL
220     A=SIN(SQR(XI*XI+ZS)*XF)
230     Y1=ZY-A*(C1-C2*A*A)
240     X1=XI+ZX
250     IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
260     X1=ZX-XI
270     IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
280   NEXT XI
290 NEXT ZI

80 TIMER=0
90 POKE 65497,0
140 HSCREEN 2
250 IF RR(X1)>Y1 THEN RR(X1)=Y1:HSET(X1,Y1)
270 IF RR(X1)>Y1 THEN RR(X1)=Y1:HSET(X1,Y1)
300 PRINT"TIME",TIMER/60

Computer timing shows a 25.6% speed increase over the factory ROM.
Just remember this is only because the program uses a lot of multiplication.

0 TIMER=0:POKE65497,0
10 PMODE4,1:PCLS0:SCREEN1,1
20 DR=3.1415/180:DS=SIN(15*DR):DC=COS(15*DR)
30 FORR=5TO330STEP10
40 FORT=0TO360STEP10
50 X=R*COS(T*DR)*.25
60 Y=R*SIN(R*DR)*.25
70 Z=R*SIN(T*DR)*.25
80 XP=X+(DC*Z)
90 YP=Y+(DS*Z)
92 XP=128+XP
95 YP=80-YP
100 PSET(XP,YP,1)
110 NEXT:NEXT
115 T=TIMER/60
120 A$=INKEY$:IFA$=""THEN120
130 PRINT"TIME:",T

Sunday, April 22, 2018

Patching the CoCo ROM, patch #1

In honor of CoCofest, I give you patch #1.  This just patches the multiply and it only uses 6809 code.
Don't bother disassembling it, the source code will get posted once I patch a few more things.

Yeah, I can get away with patching the CoCo 3 ROM from BASIC on this one.  Other patches will have to be an executable.  ONLY FOR THE COCO 3!

1 POKE 65497,0
2 AD=VAL("&HFA0C")
10 FORI=0 TO 64:READ B$:A=VAL("&H"+B$)
11 POKE AD+I,A:NEXT

REM $BB02 JMP $FA0C
20 POKE VAL("&HBB02"),VAL("&H7E"):POKE VAL("&HBB03"),VAL("&HFA"):POKE VAL("&HBB04"),VAL("&H0C")


30 DATA 32,79,E7,60,96,60,3D,ED,63,E6,60,96,5E,3D,ED,61,E6,60,96,5D
40 DATA 3D,ED,65,E6,60,96,5F,3D,E3,62,ED,62,EC,65,E9,61,89,00,ED,60
50 DATA EC,63,D3,15,97,16,D7,63,EC,61,D9,14,99,13,DD,14,A6,60,89,00
60 DATA 97,13,32,67,39



Monday, April 16, 2018

Patching the CoCo ROM Part 8

The multiply code has been done since the 2nd day, but there's something odd going on.  I'm going to have to step through the ROM in a debugger to see what is going on.  Not sure if it's the wrong address for floating point registers or what.  Once that's fixed, I can enable the 2nd optimization (replacing /10 with * 1/10) and finish looking through the ROMs to see if the CHRGET & CHRGOT call patching can be automated or if a list of addresses or exclusions will be needed.
When completed, you should be able to LOADM and EXEC the patch.  That's it.  A version that works with the DOS command might be in the future, but it's not really necessary.

On another note... someone asked me if changing code that divides by a constant to multiplication by 1/constant causes it to run at a different speed on the original ROMs.  Once the initial divide calculation for 1/constant is made, the code should run the same speed as before with standard ROMs, but it will automatically be faster on a machine with an optimized ROM or with the patch I'm working on.  There's no reason not to make the change.

Wednesday, April 11, 2018

First CoCo fast multiply results

After much fiddling with the looped version of the multiply it still wasn't working and wasn't much smaller than the non-looped code, so I dumped it and finished the first version for the CoCo 3.
It still has a bug impacting accuracy, but it's close enough to give us an idea of how it will perform.  The CoCo 3 turned in a time of 45 seconds on Ahl's benchmark in high speed mode. 

Patching the CoCo ROM Part 7

Spent a little time working on a looping version of the multiply code in order to fit it into the same number of bytes as the original code.  Instead of making a patch program I just stuck the bytes directly into the ROM file with a hex editor.

The first version fit once part of it replaced the MICROSOFT text at the end of the sine coefficient table, but the results were wrong.  The fix pushed it over the available space by a few bytes.  Two or three opcodes need to be eliminated in order to fit.  I'll sleep on it and see if I can cut the size a little more.


Monday, April 9, 2018

Patching the CoCo ROM Part 6

The multiply code that will be patched is located at $BB03 in the Color BASIC ROM.
It multplies the FPA1 mantissa by accumulator B, and adds the result to FPA2.

The original code (below) is 43 bytes in length.  I'm working on a looping version using the hardware multiply.  While it will be much faster than the current ROM, it won't be quite as fast as the code from the MC-10.  It just has to fit into those 43 bytes if it's going to work.

;* MULTIPLY FPA1 MANTISSA BY ACCB AND
;* ADD PRODUCT TO FPA2 MANTISSA
LBB03 LDA FPA2 ; GET FPA2 MS BYTE
RORB ; ROTATE CARRY FLAG INTO SHIFT COUNTER;
;* DATA BIT INTO CARRY
BEQ LBB2E ; BRANCH WHEN 8 SHIFTS DONE
BCC LBB20 ; DO NOT ADD FPA1 IF DATA BIT = 0
LDA FPA2+3 ; * ADD MANTISSA LS BYTE
ADDA FPA1+3 ; *
STA FPA2+3 ; *
LDA FPA2+2 ; = ADD MANTISSA NUMBER 3 BYTE
ADCA FPA1+2 ; =
STA FPA2+2 ; =
LDA FPA2+1 ; * ADD MANTISSA NUMBER 2 BYTE
ADCA FPA1+1 ; *
STA FPA2+1 ; *
LDA FPA2 ; = ADD MANTISSA MS BYTE
ADCA FPA1 ; =
LBB20 RORA ; * ROTATE CARRY INTO MS BYTE
STA FPA2 ; *
ROR FPA2+1 ; = ROTATE FPA2 ONE BIT TO THE RIGHT
ROR FPA2+2 ; =
ROR FPA2+3 ; =
ROR FPSBYT ; =
CLRA ; CLEAR CARRY FLAG
BRA LBB03 ; KEEP LOOPING
LBB2E RTS

Patching the CoCo ROM Part 5

In order to shorten the size of the patch for the new parser address, I'm going to attempt patching everything in the ROM area that matches a call to CHRGET and CHRGOT.
Fraught with peril may be the best way to describe this, as it may patch code or data that just happens to have the same byte sequence.  However, it may require less code to come up with a list of exclusions not to patch, than a list of addresses to patch.  CHRGET has 32 hits, and CHRGOT has 29 hits in the Color BASIC disassembly alone.

The way the ROM would be patched, is simply to look for the opcode for a JSR or JMP to the direct page at address $9F (CHRGET) and $A5 (CHRGOT).  The opcode for a JSR using direct addressing is $9D, and the opcode for JMP is $0E.  So it's just a matter of searching through the ROM in a loop looking for those values followed by $9F or $A5.  To make sure it won't patch something it shouldn't, I'll perform a search for those byte sequences in the ROM binaries using a hex editor, then look at what is at the address in the ROM disassembly.  Anything that shouldn't be patched will be added to a list of addresses to exclude.

Thursday, April 5, 2018

Patching the CoCo ROM Part 4

*REVISED*


While this will require patching everywhere CHRGET and CHRGOT are called with the new addresses, it will allow the CoCo parser patch to be as fast as the MC-10 version except when it passes a byte boundary.  255 out of 256 cases is good enough for me.

I looked at calling a function to automatically patch calls from the ROM, but there isn't enough room for the jumps to the rest of the code even if there are no JMPs to CHRGET the code.

org $9F
OLDCHRGET
bra CHRGET         ;catch any unpatched calls to CHRGET
INCHIGHBYTE       ;code to increment the high byte of the current address being parsed
inc CHRGOT+1
bra CHRGOT
OLDCHRGOT
bra CHRGOT        ;catch any unpatched calls to CHRGOT



org $F3
CHRGET                 ; increment BASIC code pointer, then fall through to CHRGOT
inc CHRGOT+2
beq INCHIGHBYTE
CHRGOT                 ; read the current byte of BASIC code we are pointing at
lda >0000              ; self modifying code, the actual read takes place here

*LINE OF CODE IS MISSING HERE*

bcs PARSER2       ;
rts
PARSER2
jmp >BROMHK+4


*edit*
Always double check your code after cutting and pasting!  This is one of the most common causes of bugs in a program.  Doing so saved me a lot of headaches with this code.

Please note that after adding the missing line of code, this patch is too large for the available space at $F3 without changes.  That prompted this revision.  It no longer catches calls to the old CHRGET and CHRGOT addresses, but there is room to catch one of them if I decide to do so.


org $9F
INCHIGHBYTE       ;code to increment the high byte of the current address being parsed
inc CHRGOT+1
bra CHRGOT
PARSER2
  jmp >BROMHK+4  ; test for other characters

org $F3
CHRGET                 ; increment BASIC code pointer, then fall through to CHRGOT
inc CHRGOT+2

beq INCHIGHBYTE
CHRGOT                 ; read the current byte of BASIC code we are pointing at
lda >0000              ; self modifying code, the actual read takes place here

cmpa      #':' ; set Z flag if statement separator or larger
bcs PARSER2       ; 
rts

Patching the CoCo ROM Part 3

Super Extended BASIC seems to have 1012 unused bytes starting at $FA0C according to the disassembly.  The hardware multiply, and code to multiply by 1/10 should fit there.  There should be room for a few more patches.

Patching the CoCo ROM Part 2

After some research, it appears that the CHRGET and CHRGOT functions are followed immediately by system variables.  This means the default location cannot be used. 
There are 13 unused bytes at $F3, but it isn't enough space for the full patch.  The shorter version could be moved there though, and the JMP #### could be replaced with a BRA ##.  It would be faster, but not as fast as the MC-10 patch due to the extra BRA.  It still saves clock cycles but I'm guessing it will be less than 1%, where the MC-10 patch saved 1.2% or more.

   org $A8
   bra  $F3


   org $F3
cmpa #'9+1 ; IS THIS CHARACTER >=(ASCII 9)+1?
bcs PARSER2 ; (BHS is the same as BCC, reverse is BCS)
rts ; return if character >= (ASCII 9) +1
PARSER2
jmp >BROMHK+4


A rewrite of the ROM would let us move CHRGET to right before $F3 and the entire function could fit.  But compatibility would be sacrificed.  Why Microsoft didn't do this in the first place we'll never know, but it seems odd.

Sunday, April 1, 2018

Patching the CoCo ROM Part 1

The TRS-80 Color Computer's Color BASIC ROM is quite similar to the MC-10's Microcolor BASIC.  The major flaws found in the latter, also seem to be in the former.

(this has been revised since the first posting)
The parser uses the same CHRGET/CHRGOT functions to step through programs.  The CHRGET code is stored in the ROM at $A10D, right after a series of other bytes that are copied to direct page RAM at the same time.  The code is copied to address $9F, and CHRGOT simply skips incrementing the address so the interpreter can reload the current byte being examined by the parser.  After reading a character, it jumps to the ROM code at $AA19 to test the character and decide how to process it.

CHRGET
;increment the BASIC parser's pointer
INC CHARAD+1
BNE LA123
INC CHARAD
CHRGOT
; load the current character
LA123 LDA >0000
JMP >BROMHK



ROM portion at $AA19

BROMHK
CMPA #'9+1 ; IS THIS CHARACTER >=(ASCII 9)+1?
BHS LAA28 ; BRANCH IF > 9; Z SET IF = COLON
CMPA #SPACE ; SPACE?
BNE LAA24 ; NO - SET CARRY IF NUMERIC
JMP GETNCH ; IF SPACE, GET NEXT CHAR (IGNORE SPACES)
LAA24 SUBA #'0 ; * SET CARRY IF
SUBA #-'0 ; * CHARACTER > ASCII 0
LAA28 RTS



Patching the code to reduce or eliminate jumps back and forth between CHRGET and ROM simply requires overwriting the JMP opcode with part or all of the ROM code, but adapted for the short branch back to CHRGET.  This also overwrites whatever is on the direct page until the end of the patch.  Anything that uses those addresses including BASIC, Extended BASIC, and Super Extended BASIC is likely to cause the computer to crash.  There is no guarantee either of these patches will work.

First approach, the entire function is in RAM:

org $9F
CHRGET
org $a8

CMPA #'9+1 ; IS THIS CHARACTER >=(ASCII 9)+1?
BHS EXIT ; BRANCH IF > 9; Z SET IF = COLON (BCC)
CMPA #SPACE ; SPACE?
BNE ASCII ; NO - SET CARRY IF NUMERIC
BRA CHRGET ; IGNORE SPACES
ASCII SUBA #'0 ; * SET CARRY IF
SUBA #-'0 ; * CHARACTER > ASCII 0
EXIT RTS



Second approach, returns on most likely condition, only uses 3 additional bytes:

org $9F
CHRGET
org $a8

cmpa #'9+1 ; IS THIS CHARACTER >=(ASCII 9)+1?
bcs PARSER2 ; (BHS is the same as BCC, reverse is BCS)
rts ; return if character >= (ASCII 9) +1
PARSER2
jmp >BROMHK+4 ;a9-ac