Wednesday, October 4, 2017

When all is sed and done...

Yeah it's a corny title for the post but nobody's been reading my blog anyway so who cares?

To generate the raw disassembly I simply had do define all the ROM entry points from the command line.  There are a lot of them but it's pretty strait forward.  The disassembler follows links (except for RST calls) and it's just sort of slogging through the code finding what's real and garbage.  Using the --xref option on the disassembler helps because we can see if if finds anything calling that piece of code.  Due to having to define so many things as entry points, a lot of the data is useless, but it still helps.

Adding labels and comments is a bit different.  Since labels (variables or functions) can occur in multiple places, we need to use pattern matching to save having to repeat the same operation over and over again.  Comments are only added to a single line each, but you need to find the address they are associated with and then append the comment to the end of the line.  So we have to generate the disassembly with the address at the start of each line which is done using the --lst option.  One side effect of this is it generates a huge call graph at the end of the disassembly which slows down our pattern searching, and it really isn't needed due to the known design of the BASIC interpreter.  We can get rid of it manually, but why not just have a program do it.  We could write a custom program to do this, but there is no need as the Unix sed command can already do all of this!  I haven't released the sed file I'm using yet, but I'll give a quick explanation of what commands I'm using to perform these actions.  This is no substitute for the sed manual or other references on the web!

sed can take commands from the command line but since we are performing so many operations, they have all been placed in a single file that is read by using the -f directive on the command line followed by the name of the file containing our list of sed operations.

Since the raw disassembly needs the call graph removed from the end of the file to speed up the rest of the pattern matching (the call graph is huge), that's the first thing the sed script does.  Thankfully, the disassembler places an identifier at the head of the call graph that says "Call Graph:"
We simply tell sed to search for that string, match any pattern that follows, and delete the entire thing with the d command.  Notice that the string is placed inside forward slash (divide) symbols and the $ signifies matching any pattern that follows.  Then the ,$d tells sed what to do with the text it matched.  Take the entire block $ and deleted it d.
/Call Graph:$/,$d

Now that the Call Graph is gone, sed needs to relabel functions and system variables by global pattern search and replace.  If we want to replace a function label, we search for L followed by the address the function resides at. The s tells sed to search for the sting between the 1st and 2nd slashes, and to replace it with the string inside the 2nd and 3rd slashes.  The g says perform this globally. So everywhere a function is called, it will be given a meaningful label instead.

s/L1DAE/END/g
s/L1CA1/FOR/g
s/L0138/RESET/g
s/L0135/SET/g
s/L01C9/CLS/g
The same works for system variables, but those are  just raw addresses and not function calls, so the disassembler labels them as a 4 digit hex number followed by h signifying that it is a hexidecimal number.  In the case below, we must identify functions that reside in ROM and that are copied to RAM where the program actually calls them.  Notice the L prefix on the ROM code and h postfix on the RAM routines.  Variables would have h as well.
s/L06D2/COMPAREROM/g
s/7800h/COMPARE/g
s/L06D5/CHARGETROM/g
s/7803h/CHARGET/g



The last (so far) function we need to perform is to append comments to the end of lines based on address.  Most of the work here was extracting the comments from the pdf of Level II BASIC Decoded and Other Mysteries.  Once the code that is borrowed from Level II BASIC is identified, we can just append the comments to the proper lines using the address.  Again, we need to perform a pattern search, but then we must append the comment to the end of the line.  We match the hexadecimal address followed by the : using s (search) followed by any string $ and simply append the comment string to the end of the line.  The --lst option gives us the hex addresses followed by the colon for each line and this provides a unique pattern we can match with.  Again, strings are contained between / slash symbols.  In case you haven't already noticed, this also means our strings cannot contain the slash symbol.  There are others but I won't go into that.  Just watch the use of special characters that sed looks for.

/0013 /s/$/        ;--- Save BC - Keyboard routine/
/0014 /s/$/        ;--- B = Entry code/
/0016 /s/$/        ;--- Go to driver entry routine (3C2)/
/0018 /s/$/        ;--- RST 18 (JP 1C90H) Compare DE:HL/
/001B /s/$/        ;--- Save BC - Display routine, printer routine/


The end result looks something like this.  The formatting is messed up partially due to Blogger, but the spacing for the comments will need to be aligned.  I think the Entry Point comments will also be deleted since they aren't very useful. *Update* Entry Point labels deleted
                                        ; --- START PROC DBL_SUB ---
0C70: 21 2D 79                   DBL_SUB: LD      HL,792Dh        ;--- Double precision subtraction routine. ** cont--> *
0C73: 7E                                LD      A,(HL)        ;--- Load MSB of saved value
0C74: EE 80                             XOR     80h        ;--- Invert sign
0C76: 77                                LD      (HL),A        ;--- And restore
                                        ; Referenced from 12B4, 0F84, 0E5C
                                        ; --- START PROC DBL_ADD ---
0C77: 21 2E 79                   DBL_ADD: LD      HL,EXP2        ;--- HL=addr of exponent in WRA2 ************ cont--> *
0C7A: 7E                                LD      A,(HL)        ;--- Load exponent from WRA2
0C7B: B7                                OR      A        ;--- Set status flags for exponent
0C7C: C8                                RET     Z        ;--- Exit if WRA2 value zero

No comment

I'm still waiting on the USB to serial adapter for testing the 68hc11 BASIC.  I should have ordered from Newegg instead of ebay.  But that leaves more time to work on the VZ ROM disassembly.

The goal is to mostly to fully commented VZ ROM disassembly, but up until now I have only thrown in a few comments for things like RST calls. 

Below is a little sample of what the disassembly looks like once commends are added.
All function names and comments were added by sed.  The comments were extracted from the OCR text of a book for the TRS-80.   The formatting could use some work, but it's significant progress.  This will result in a fully commented disassembly but with actual labels for system functions and variables that were never in any books.  VZ specific code will still need comments, but this will take care of a significant portion of the ROM. 

This has involved a lot of data entry, and editing, but the speed at which changes can be made and applied to the entire file is well worth the effort.  I should also be able to generate disassemblies of other Z80 machines using Microsoft BASIC much faster if they share system variables and code.  The TRS-80 itself should be trivial other than fixing the ROM entry points for the disassembler and moving system variables.


07C3: 21 25 79                   L07C3: LD      HL,SIGN        ;--- Reset sign flag so that ************ see note--> *
07C6: 7E                                LD      A,(HL)        ;--- mantissa will have a negative sign
07C7: 2F                                CPL        ;--- Invert the sign flag
07C8: 77                                LD      (HL),A        ;--- Store sign flag
07C9: AF                                XOR     A        ;--- Zero A
07CA: 6F                                LD      L,A        ;--- then save it
07CB: 90                                SUB     B        ;--- Complement B (0 - B)
07CC: 47                                LD      B,A        ;--- Save new value of B
07CD: 7D                                LD      A,L        ;--- Reload zero into A
07CE: 9B                                SBC     A,E        ;--- Complement E (0 - E)
07CF: 5F                                LD      E,A        ;--- Save new value for E
07D0: 7D                                LD      A,L        ;--- Reload A with zero
07D1: 9A                                SBC     A,D        ;--- Complement D (0 - D)
07D2: 57                                LD      D,A        ;--- Save new D value
07D3: 7D                                LD      A,L        ;--- Reload A with zero
07D4: 99                                SBC     A,C        ;--- Complement C (0 - C)
07D5: 4F                                LD      C,A        ;--- Save new C value
07D6: C9                                RET        ;---Rtn to caller *********** Unpack a SP number ******

Tuesday, October 3, 2017

First Phase of VZ ROM Disassembly Mostly Completed

The VZ ROM disassembly is mostly complete from just a raw code standpoint.
It needs commented, more labels, etc... but it's to the point where you can follow a lot of the code.

Quite a few things are labeled with the sed script now including:
Math constant tables
Key math library functions (add, subtract, multiply, divide)
Commands in the token table
Code & data copied to RAM on startup
RST calls are tagged so you can see what is being called
Interrupt handler
Many system variables
Many text stings for prompts or errors
etc...

There are still a couple holes that haven't been disassembled.  I have to determine if they are used or not and it they are, where the entry points are.  There is some dead space in the ROM that was filled with garbage, and that isn't tagged in the data yet.

A header needs to be created so it can be reassembled.  At that point I can start removing dead code and filler that the VZ doesn't need.  Then the empty space can be used for new commands or to speed up the math library by unrolling some loops.  That's still a ways off though.