Wednesday, December 13, 2017

Another patch to speed up the Plus/4 ROM.

There are a group of functions that the Plus/4 ROM copies to RAM on startup.  Each of these allows access to all of RAM, including under ROM, via different pointers stored on page 0 (the direct page in Motorola terms).

Each of these functions disables interrupts, pages out the ROM, loads A via a pointer on page zero, pages in ROM, enables interrupts, and returns to the caller.  It's a lot of clock cycles to access a single byte.

For programs + data that are small enough to fit in memory without using RAM under ROM, we can patch the ROM to skip the costly sequence of instructions so that it directly accesses memory.

The patch must copy ROM to RAM, set the highest address available to BASIC so that it is before the start address of the ROM, and then install the code listed below.

Each function that is called is replaced with a piece of code that loads A with the page zero address that function uses, then it calls a common patch routine that overwrites the ROM code we copied to RAM so that it directly loads A without the intermediate call.  Each JSR in ROM occupies 3 bytes.  The opcode for JSR, and 2 bytes for the address to call.  The LDA (),Y opcode takes 2 bytes.  So we must overwrite the 3rd byte with a NOP.  Using this approach will result in all calls to the load routines being patched the first time they are called. 

The resulting patched code only requires 6 clock cycles instead of the 29 clock cycles the regular ROM requires.  The beauty of this approach is that it patches every call to these functions without us having to find them all.

Warning: This assumes that the ROM does not use any optimizations where a JMP was used to call the code so that it would eliminate the need for two RTS instructions.  If we discover that technique was used somewhere, we must identify it and place an RTS after the LDA (),Y instead of a NOP.

; stub routine replacing each piece of code ROM calls
LDA #address ; load A with address normally used by LDA (address),Y
JMP PATCH ; call the patcher

; code to patch the ROM where it calls functions to access RAM under ROM
STA #temp ; save page zero address to use
LDY #0 ; zero y

; point to code we want to patch
;  Get LSB of return address from stack and adjust it to address of JSR
PLA ; get LSB of return address
SBC #3 ; subtract 3 (point to address of JSR)
STA RETURNADDRESS ; store it in our own page 0 pointer

;  Get MSB or return address from stack and adjust it if carry set
PLA ; get MSB of return address
BCC NEXT ; deal with carry from MSB
; Push MSB of JSR address onto the stack

; patch 1st byte with LDA (),Y opcode
STA RETURNADDRESS+1 ; save it in our pointer
LDA #$B1 ; load the opcode we want to patch with

; patch 2nd byte with address passed from stub routine
INY ; next address 
LDA TEMP ; get the address that was passed to us

; patch 3rd byte with NOP
INY ; next address
LDA #$EA ; NOP to finish the patch

; Push LSB of patched code to stack and call it with RTS
PHA ; push it to the stack

RTS ; call the patched code

Thursday, December 7, 2017

Status update of a few projects

The USB serial adapter finally showed up so I can start working on the 68HC11 port of BASIC.  The IDE port is going to be on hold until I get that up and running.

The comments for the VZ disassembly are ready to go, I just need to extract the ones that match the VZ ROM and put them in the sed file.  Then it's just a matter of commenting a few pieces of code VTEC added to the ROM.  One of the things I ran across when benchmarking these old machines is how horribly slow the VZ BASIC is.  Even though it's clocked faster than the TRS-80 Model III and they both share a lot of code, the VZ takes almost twice as long on benchmarks.  The patches VTEC made clearly didn't help it.  Once I convert the disassembly back to a source file, I should be able to run a code profiler on it to see where the biggest bottleneck is and to fix it.  There's plenty of unused space for fixes.

Kicking a dead horse... Commodore Plus/4 style

Here's a little patch to speed up the Commodore Plus/4.  This modifies the RAM based CHRGOT function that is used to scan through the BASIC code.  The standard code disables interrupts, pages out the ROM, reads a byte, pages in the ROM, and enables interrupts for every byte of a program it reads.   It does this so that it can provide up to 60K for BASIC.  This is certainly a nice feature if you need that much RAM, but if you don't, it slows down programs significantly for no reason.

This simple piece of code speeds up one benchmark by about 4%.  It is a pretty significant gain for a few hours work and requires no changes to the ROM.  Getting this much extra speed out of the MC-10 was a lot harder and requires a new ROM.  Actual performance increases will vary by program.  Still, I had hoped for better results.

Only programs that fit in memory below the start address of ROM will work with this as it eliminates the code that pages ROM in and out.  It makes no attempt to modify system variables to restrict code to that area, and it does not restrict the use of upper RAM for data.  Additional patches that restrict data to the same area of RAM would provide additional speed.

Here is the original code used by the Plus/4 BASIC interpreter from a ROM disassembly.  We are most interested in the code starting at $8129 in the ROM.  This is copied to RAM on startup:

        ; CHRGET/CHRGOT - This chunk of code is copied to RAM 
        ; and run from there. It is used to get data UNDER the 
        ; system ROM's for basic.
; CHRGET ($0473)
L8123   INC   LastBasicLineNo         ; $3b (goes to $0473 ) CHRGET
        BNE   L8129
        INC   LastBasicLineNo+1       ; $3c
; CHRGOT ($0479)
L8129   SEI    
        STA   RAM_ON
        LDY   #$00
        LDA   (LastBasicLineNo),y     ; $3b
        STA   ROM_ON
        CMP   #$3A   ; ":" (colon)
        BCS   L8143   ; if colon, exit
        CMP   #$20   ; " " (space)
        BEQ   L8123   ; if space, get NEXT byte from basic
        SBC   #$30
        SBC   #$D0
L8143   RTS    

This contains the new CHRGOT function.  It's code was embedded in the BYTE section of the patch that follows this listing.  Note that code is designed to exit without any branches for the most commonly found type of byte.  This saves a clock cycle for every such byte as branch taken requires one more clock cycle than not taken.

00000r 1                .ORG $0473
000473  1               
000473  1                ;
000473  1                ; CHRGET/CHRGOT - This chunk of code is copied to RAM
000473  1                ; and run from there. It is used to get data UNDER the
000473  1                ; system ROM's for basic.
000473  1                ;
000473  1                ; CHRGET ($0473)
000473  1               L8123:
000473  1  E6 3B         INC LastBasicLineNo ; $3b (goes to $0473 ) CHRGET
000475  1  D0 02         BNE L8129
000477  1  E6 3C         INC LastBasicLineNo+1 ; $3c
000479  1                ;
000479  1                ; CHRGOT ($0479)
000479  1                ;
000479  1               L8129:
000479  1  A0 00         LDY #$00
00047B  1  B1 3B         LDA (LastBasicLineNo),y ; $3b
00047D  1  C9 3A         CMP #$3A ; Larger than $3A?
00047F  1  90 01         BCC NEXT ; if not, skip to NEXT
000481  1  60            RTS ; return if so
000482  1               NEXT:
000482  1  E9 2F         SBC #$2F ; A=A-$30
000484  1  C9 F0         CMP #$F0 ; Is it a " "? (space)
000486  1  F0 EB         BEQ L8123 ; if space, get NEXT byte from basic
000488  1  38            SEC ; A=A-$D0
000489  1  E9 D0         SBC #$D0 ; clear carry if digit, set otherwise
00048B  1  60            RTS
00048C  1               
00048C  1                .end

This is the source code for the program that patches the CHRGOT function.  It is designed to be embedded in a REM statement in the first line of a BASIC program.  Note that the 2nd byte of the actual CHRGOT code has been changed from $00 to $01 and is patched once it is copied to it's final destination.  Microsoft BASICs don't advance to the next line by using the pointer stored at the start of the line once it starts to parse a line.  It scans for the end of line marker which is $00.  It assumes anything that follows is a line of BASIC code.  Storing the byte as non $00 is required so BASIC can skip to the next line every time the program runs.

000000r 1               
000000r 1                .org $1006 ; The address of ML$ in our BASIC program
001006  1               
001006  1  A0 12         LDY #18 ; Starts at CHRGOT+18 and works down...
001008  1               NEXT:
001008  1  B9 17 10      LDA CHRGOT,Y ; Get byte of new CHRGOT routine
00100B  1  99 79 04      STA $0479,Y ; Save it over the old routine
00100E  1  88            DEY ; decrement our loop counter/index register
00100F  1  10 F7         BPL NEXT
001011  1  C8            INY
001012  1  98            TYA
001013  1  99 7A 04      STA $047A,Y
001016  1  60            RTS
001017  1               CHRGOT:
001017  1  A0 01 B1 3B   .BYTE $A0,$01,$B1,$3B,$C9,$3A,$90,$01,$60,$E9,$2F,$C9,$F0,$F0,$EB,$38,$E9,$D0,$60
00101B  1  C9 3A 90 01  
00101F  1  60 E9 2F C9  
00102A  1                .end

This is the final BASIC code containing the patch.  It can be added to smaller programs to sped them up.  After the first time the program has been run, the lines containing the DATA statements, and line 1 can be deleted.  The resulting program can be saved with the patch permanently embedded in the REM statement.

0 REM012345678901234567890123456789012345
2 SYS 4102

10000 DATA 160,18,185,23,16,153,121,4,136,16,247,200,152,153,122,4,96
10010 DATA 160,01,177,59,201,58,144,1,96,233,47,201
10020 DATA 240,240,235,56,233,208,96

Here is the benchmark that prompted me to write this.

10 K=0:I=0:T=0:P=0
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N

120  eTime=TIME
130  T=(N-3)/2
140  DIMA(T+1)

260 NEXT

330  eTime=(TIME-eTime)/60
340  PRINT
350  PRINT "Total: ";eTime
360 END

This will speed up the benchmark by over 30%.  It disables the screen refresh while the benchmark is running.  The screen refresh normally steals that many clock cycles away from the CPU.
115 POKE65286,PEEK(65286)AND239
335 POKE65286,PEEK(65286)OR16

Thursday, November 2, 2017

Just posting a link to a simple music player I wrote for the Apple II Mockingboard a few years ago.
It's written for the 6502 AS65 assembler that comes with the CC65 C compiler.
The Mockingboard has two 6522 VIA chips, and two General Instruments AY sound chips. The VIAs are used to interface the sound chips, and provide additional features such as programmable interrupts.

The player uses a repeating timer interrupt from the VIA chip to play music with minimal impact on the main program that uses it.  It's a port of a music player I wrote for the Oric a few years earlier. The Oric has similar hardware out of the box, but only one VIA and one AY chip.  That version used a one shot timer though due to me not having VIA docs at the time.

The interrupt handler is pretty low impact.  It decrements the timer and exits quickly if no sound chip register needs to be set.  It just dumps raw data to the sound chip when required.  The data format is documented in the source code, and the code is pretty well commented.
It's not very practical for large pieces of music due to the lack of repeats and resulting size of the data, but for simple background music that repeats endlessly, or one shot sound effects it works really well.  The code still needs some work in order to play sound samples and music at the same time, but that's just a matter of adding additional counters for each sound.
The code includes sample data that plays chords, and background music for several screens from Donkey Kong.  The emulator disk image should run a demo that lets you select which song to play.

To build the code you will need the AS65 assembler.  I used a simple DOS batch file to build it instead of a make file.  If you use that, it requires a2tools to automatically transfer the file to a disk image.  a2tools can be found on Asimov, but it's only a 32 bit exe.  Let me know if you need a 64 bit version and I'll post my custom build.  Or you can delete that line from the batch file, and use Cyderpress to transfer it to a disk image.

FWIW, this is a good though simple example of how you can drive a piece of hardware in "real time".  You could output and/or sample data at specific intervals.  Just adjust the timer for the rate you need and remember to calculate the delay from the trigger of the interrupt until you actually read or write data so you don't drop any data or set a piece of hardware too late.

Here's the link. There is a newer version of the source code in my project directory, but this is a version I shared previously so it should be working.  If I get a chance to test the newer code and build a new archive I'll post that.

Wednesday, October 4, 2017

When all is sed and done...

Yeah it's a corny title for the post but nobody's been reading my blog anyway so who cares?

To generate the raw disassembly I simply had do define all the ROM entry points from the command line.  There are a lot of them but it's pretty strait forward.  The disassembler follows links (except for RST calls) and it's just sort of slogging through the code finding what's real and garbage.  Using the --xref option on the disassembler helps because we can see if if finds anything calling that piece of code.  Due to having to define so many things as entry points, a lot of the data is useless, but it still helps.

Adding labels and comments is a bit different.  Since labels (variables or functions) can occur in multiple places, we need to use pattern matching to save having to repeat the same operation over and over again.  Comments are only added to a single line each, but you need to find the address they are associated with and then append the comment to the end of the line.  So we have to generate the disassembly with the address at the start of each line which is done using the --lst option.  One side effect of this is it generates a huge call graph at the end of the disassembly which slows down our pattern searching, and it really isn't needed due to the known design of the BASIC interpreter.  We can get rid of it manually, but why not just have a program do it.  We could write a custom program to do this, but there is no need as the Unix sed command can already do all of this!  I haven't released the sed file I'm using yet, but I'll give a quick explanation of what commands I'm using to perform these actions.  This is no substitute for the sed manual or other references on the web!

sed can take commands from the command line but since we are performing so many operations, they have all been placed in a single file that is read by using the -f directive on the command line followed by the name of the file containing our list of sed operations.

Since the raw disassembly needs the call graph removed from the end of the file to speed up the rest of the pattern matching (the call graph is huge), that's the first thing the sed script does.  Thankfully, the disassembler places an identifier at the head of the call graph that says "Call Graph:"
We simply tell sed to search for that string, match any pattern that follows, and delete the entire thing with the d command.  Notice that the string is placed inside forward slash (divide) symbols and the $ signifies matching any pattern that follows.  Then the ,$d tells sed what to do with the text it matched.  Take the entire block $ and deleted it d.
/Call Graph:$/,$d

Now that the Call Graph is gone, sed needs to relabel functions and system variables by global pattern search and replace.  If we want to replace a function label, we search for L followed by the address the function resides at. The s tells sed to search for the sting between the 1st and 2nd slashes, and to replace it with the string inside the 2nd and 3rd slashes.  The g says perform this globally. So everywhere a function is called, it will be given a meaningful label instead.

The same works for system variables, but those are  just raw addresses and not function calls, so the disassembler labels them as a 4 digit hex number followed by h signifying that it is a hexidecimal number.  In the case below, we must identify functions that reside in ROM and that are copied to RAM where the program actually calls them.  Notice the L prefix on the ROM code and h postfix on the RAM routines.  Variables would have h as well.

The last (so far) function we need to perform is to append comments to the end of lines based on address.  Most of the work here was extracting the comments from the pdf of Level II BASIC Decoded and Other Mysteries.  Once the code that is borrowed from Level II BASIC is identified, we can just append the comments to the proper lines using the address.  Again, we need to perform a pattern search, but then we must append the comment to the end of the line.  We match the hexadecimal address followed by the : using s (search) followed by any string $ and simply append the comment string to the end of the line.  The --lst option gives us the hex addresses followed by the colon for each line and this provides a unique pattern we can match with.  Again, strings are contained between / slash symbols.  In case you haven't already noticed, this also means our strings cannot contain the slash symbol.  There are others but I won't go into that.  Just watch the use of special characters that sed looks for.

/0013 /s/$/        ;--- Save BC - Keyboard routine/
/0014 /s/$/        ;--- B = Entry code/
/0016 /s/$/        ;--- Go to driver entry routine (3C2)/
/0018 /s/$/        ;--- RST 18 (JP 1C90H) Compare DE:HL/
/001B /s/$/        ;--- Save BC - Display routine, printer routine/

The end result looks something like this.  The formatting is messed up partially due to Blogger, but the spacing for the comments will need to be aligned.  I think the Entry Point comments will also be deleted since they aren't very useful. *Update* Entry Point labels deleted
                                        ; --- START PROC DBL_SUB ---
0C70: 21 2D 79                   DBL_SUB: LD      HL,792Dh        ;--- Double precision subtraction routine. ** cont--> *
0C73: 7E                                LD      A,(HL)        ;--- Load MSB of saved value
0C74: EE 80                             XOR     80h        ;--- Invert sign
0C76: 77                                LD      (HL),A        ;--- And restore
                                        ; Referenced from 12B4, 0F84, 0E5C
                                        ; --- START PROC DBL_ADD ---
0C77: 21 2E 79                   DBL_ADD: LD      HL,EXP2        ;--- HL=addr of exponent in WRA2 ************ cont--> *
0C7A: 7E                                LD      A,(HL)        ;--- Load exponent from WRA2
0C7B: B7                                OR      A        ;--- Set status flags for exponent
0C7C: C8                                RET     Z        ;--- Exit if WRA2 value zero

No comment

I'm still waiting on the USB to serial adapter for testing the 68hc11 BASIC.  I should have ordered from Newegg instead of ebay.  But that leaves more time to work on the VZ ROM disassembly.

The goal is to mostly to fully commented VZ ROM disassembly, but up until now I have only thrown in a few comments for things like RST calls. 

Below is a little sample of what the disassembly looks like once commends are added.
All function names and comments were added by sed.  The comments were extracted from the OCR text of a book for the TRS-80.   The formatting could use some work, but it's significant progress.  This will result in a fully commented disassembly but with actual labels for system functions and variables that were never in any books.  VZ specific code will still need comments, but this will take care of a significant portion of the ROM. 

This has involved a lot of data entry, and editing, but the speed at which changes can be made and applied to the entire file is well worth the effort.  I should also be able to generate disassemblies of other Z80 machines using Microsoft BASIC much faster if they share system variables and code.  The TRS-80 itself should be trivial other than fixing the ROM entry points for the disassembler and moving system variables.

07C3: 21 25 79                   L07C3: LD      HL,SIGN        ;--- Reset sign flag so that ************ see note--> *
07C6: 7E                                LD      A,(HL)        ;--- mantissa will have a negative sign
07C7: 2F                                CPL        ;--- Invert the sign flag
07C8: 77                                LD      (HL),A        ;--- Store sign flag
07C9: AF                                XOR     A        ;--- Zero A
07CA: 6F                                LD      L,A        ;--- then save it
07CB: 90                                SUB     B        ;--- Complement B (0 - B)
07CC: 47                                LD      B,A        ;--- Save new value of B
07CD: 7D                                LD      A,L        ;--- Reload zero into A
07CE: 9B                                SBC     A,E        ;--- Complement E (0 - E)
07CF: 5F                                LD      E,A        ;--- Save new value for E
07D0: 7D                                LD      A,L        ;--- Reload A with zero
07D1: 9A                                SBC     A,D        ;--- Complement D (0 - D)
07D2: 57                                LD      D,A        ;--- Save new D value
07D3: 7D                                LD      A,L        ;--- Reload A with zero
07D4: 99                                SBC     A,C        ;--- Complement C (0 - C)
07D5: 4F                                LD      C,A        ;--- Save new C value
07D6: C9                                RET        ;---Rtn to caller *********** Unpack a SP number ******

Tuesday, October 3, 2017

First Phase of VZ ROM Disassembly Mostly Completed

The VZ ROM disassembly is mostly complete from just a raw code standpoint.
It needs commented, more labels, etc... but it's to the point where you can follow a lot of the code.

Quite a few things are labeled with the sed script now including:
Math constant tables
Key math library functions (add, subtract, multiply, divide)
Commands in the token table
Code & data copied to RAM on startup
RST calls are tagged so you can see what is being called
Interrupt handler
Many system variables
Many text stings for prompts or errors

There are still a couple holes that haven't been disassembled.  I have to determine if they are used or not and it they are, where the entry points are.  There is some dead space in the ROM that was filled with garbage, and that isn't tagged in the data yet.

A header needs to be created so it can be reassembled.  At that point I can start removing dead code and filler that the VZ doesn't need.  Then the empty space can be used for new commands or to speed up the math library by unrolling some loops.  That's still a ways off though.