Wednesday, October 4, 2017

When all is sed and done...

Yeah it's a corny title for the post but nobody's been reading my blog anyway so who cares?

To generate the raw disassembly I simply had do define all the ROM entry points from the command line.  There are a lot of them but it's pretty strait forward.  The disassembler follows links (except for RST calls) and it's just sort of slogging through the code finding what's real and garbage.  Using the --xref option on the disassembler helps because we can see if if finds anything calling that piece of code.  Due to having to define so many things as entry points, a lot of the data is useless, but it still helps.

Adding labels and comments is a bit different.  Since labels (variables or functions) can occur in multiple places, we need to use pattern matching to save having to repeat the same operation over and over again.  Comments are only added to a single line each, but you need to find the address they are associated with and then append the comment to the end of the line.  So we have to generate the disassembly with the address at the start of each line which is done using the --lst option.  One side effect of this is it generates a huge call graph at the end of the disassembly which slows down our pattern searching, and it really isn't needed due to the known design of the BASIC interpreter.  We can get rid of it manually, but why not just have a program do it.  We could write a custom program to do this, but there is no need as the Unix sed command can already do all of this!  I haven't released the sed file I'm using yet, but I'll give a quick explanation of what commands I'm using to perform these actions.  This is no substitute for the sed manual or other references on the web!

sed can take commands from the command line but since we are performing so many operations, they have all been placed in a single file that is read by using the -f directive on the command line followed by the name of the file containing our list of sed operations.

Since the raw disassembly needs the call graph removed from the end of the file to speed up the rest of the pattern matching (the call graph is huge), that's the first thing the sed script does.  Thankfully, the disassembler places an identifier at the head of the call graph that says "Call Graph:"
We simply tell sed to search for that string, match any pattern that follows, and delete the entire thing with the d command.  Notice that the string is placed inside forward slash (divide) symbols and the $ signifies matching any pattern that follows.  Then the ,$d tells sed what to do with the text it matched.  Take the entire block $ and deleted it d.
/Call Graph:$/,$d

Now that the Call Graph is gone, sed needs to relabel functions and system variables by global pattern search and replace.  If we want to replace a function label, we search for L followed by the address the function resides at. The s tells sed to search for the sting between the 1st and 2nd slashes, and to replace it with the string inside the 2nd and 3rd slashes.  The g says perform this globally. So everywhere a function is called, it will be given a meaningful label instead.

The same works for system variables, but those are  just raw addresses and not function calls, so the disassembler labels them as a 4 digit hex number followed by h signifying that it is a hexidecimal number.  In the case below, we must identify functions that reside in ROM and that are copied to RAM where the program actually calls them.  Notice the L prefix on the ROM code and h postfix on the RAM routines.  Variables would have h as well.

The last (so far) function we need to perform is to append comments to the end of lines based on address.  Most of the work here was extracting the comments from the pdf of Level II BASIC Decoded and Other Mysteries.  Once the code that is borrowed from Level II BASIC is identified, we can just append the comments to the proper lines using the address.  Again, we need to perform a pattern search, but then we must append the comment to the end of the line.  We match the hexadecimal address followed by the : using s (search) followed by any string $ and simply append the comment string to the end of the line.  The --lst option gives us the hex addresses followed by the colon for each line and this provides a unique pattern we can match with.  Again, strings are contained between / slash symbols.  In case you haven't already noticed, this also means our strings cannot contain the slash symbol.  There are others but I won't go into that.  Just watch the use of special characters that sed looks for.

/0013 /s/$/        ;--- Save BC - Keyboard routine/
/0014 /s/$/        ;--- B = Entry code/
/0016 /s/$/        ;--- Go to driver entry routine (3C2)/
/0018 /s/$/        ;--- RST 18 (JP 1C90H) Compare DE:HL/
/001B /s/$/        ;--- Save BC - Display routine, printer routine/

The end result looks something like this.  The formatting is messed up partially due to Blogger, but the spacing for the comments will need to be aligned.  I think the Entry Point comments will also be deleted since they aren't very useful. *Update* Entry Point labels deleted
                                        ; --- START PROC DBL_SUB ---
0C70: 21 2D 79                   DBL_SUB: LD      HL,792Dh        ;--- Double precision subtraction routine. ** cont--> *
0C73: 7E                                LD      A,(HL)        ;--- Load MSB of saved value
0C74: EE 80                             XOR     80h        ;--- Invert sign
0C76: 77                                LD      (HL),A        ;--- And restore
                                        ; Referenced from 12B4, 0F84, 0E5C
                                        ; --- START PROC DBL_ADD ---
0C77: 21 2E 79                   DBL_ADD: LD      HL,EXP2        ;--- HL=addr of exponent in WRA2 ************ cont--> *
0C7A: 7E                                LD      A,(HL)        ;--- Load exponent from WRA2
0C7B: B7                                OR      A        ;--- Set status flags for exponent
0C7C: C8                                RET     Z        ;--- Exit if WRA2 value zero

No comment

I'm still waiting on the USB to serial adapter for testing the 68hc11 BASIC.  I should have ordered from Newegg instead of ebay.  But that leaves more time to work on the VZ ROM disassembly.

The goal is to mostly to fully commented VZ ROM disassembly, but up until now I have only thrown in a few comments for things like RST calls. 

Below is a little sample of what the disassembly looks like once commends are added.
All function names and comments were added by sed.  The comments were extracted from the OCR text of a book for the TRS-80.   The formatting could use some work, but it's significant progress.  This will result in a fully commented disassembly but with actual labels for system functions and variables that were never in any books.  VZ specific code will still need comments, but this will take care of a significant portion of the ROM. 

This has involved a lot of data entry, and editing, but the speed at which changes can be made and applied to the entire file is well worth the effort.  I should also be able to generate disassemblies of other Z80 machines using Microsoft BASIC much faster if they share system variables and code.  The TRS-80 itself should be trivial other than fixing the ROM entry points for the disassembler and moving system variables.

07C3: 21 25 79                   L07C3: LD      HL,SIGN        ;--- Reset sign flag so that ************ see note--> *
07C6: 7E                                LD      A,(HL)        ;--- mantissa will have a negative sign
07C7: 2F                                CPL        ;--- Invert the sign flag
07C8: 77                                LD      (HL),A        ;--- Store sign flag
07C9: AF                                XOR     A        ;--- Zero A
07CA: 6F                                LD      L,A        ;--- then save it
07CB: 90                                SUB     B        ;--- Complement B (0 - B)
07CC: 47                                LD      B,A        ;--- Save new value of B
07CD: 7D                                LD      A,L        ;--- Reload zero into A
07CE: 9B                                SBC     A,E        ;--- Complement E (0 - E)
07CF: 5F                                LD      E,A        ;--- Save new value for E
07D0: 7D                                LD      A,L        ;--- Reload A with zero
07D1: 9A                                SBC     A,D        ;--- Complement D (0 - D)
07D2: 57                                LD      D,A        ;--- Save new D value
07D3: 7D                                LD      A,L        ;--- Reload A with zero
07D4: 99                                SBC     A,C        ;--- Complement C (0 - C)
07D5: 4F                                LD      C,A        ;--- Save new C value
07D6: C9                                RET        ;---Rtn to caller *********** Unpack a SP number ******

Tuesday, October 3, 2017

First Phase of VZ ROM Disassembly Mostly Completed

The VZ ROM disassembly is mostly complete from just a raw code standpoint.
It needs commented, more labels, etc... but it's to the point where you can follow a lot of the code.

Quite a few things are labeled with the sed script now including:
Math constant tables
Key math library functions (add, subtract, multiply, divide)
Commands in the token table
Code & data copied to RAM on startup
RST calls are tagged so you can see what is being called
Interrupt handler
Many system variables
Many text stings for prompts or errors

There are still a couple holes that haven't been disassembled.  I have to determine if they are used or not and it they are, where the entry points are.  There is some dead space in the ROM that was filled with garbage, and that isn't tagged in the data yet.

A header needs to be created so it can be reassembled.  At that point I can start removing dead code and filler that the VZ doesn't need.  Then the empty space can be used for new commands or to speed up the math library by unrolling some loops.  That's still a ways off though.

Thursday, September 21, 2017

VZ Disassembly continued...

I'm waiting on a USB to serial adapter to come from China, so the 68hc11 BASIC is pretty much on hold, and I haven't even looked at the IDE interface since I last mentioned it.  The IDE port doesn't really need to be finished until I can use the 68hc11 board, and that was going to be used by BASIC... so it's not an immediate priority.  I'm working on the VZ ROM disassembly in the time being.

As for the disassembly... there are now enough system variables being labeled by the sed script to where it's possible to identify what many parts of the ROM are doing.

One thing I discovered is that YAZD may not properly handle the RST instruction.  It disassembles it, but it may not treat it like a JP.  This left some commonly called functions as data.  Microsoft used RST to conserve ROM space.  While it certainly works, it also slows down the interpreter.  But regardless of why, it left me having to manually define all the RST calls as entry points.  I have also blindly labeled many other entry points to see what the code looks like.

Anyway... here is another excerpt from the disassembly to give you an idea of how things are shaping up.  There are still system variables left to be defined, some of the entry points I manually created will need to be removed, and comments still need to be added.  But since the process is automated, it only takes seconds to regenerate the disassembly and add or change labels.  Once I create a block of define statements for system variables and label addresses containing string data, I should be able to reassemble the code.  I'm debating on whether to fix the disassembler so it can do a lot of this or to just script it in sed to be done with it.

        ; Referenced from 3747
L3752:  INC     HL
        DEC     BC
        LD      A,C
        OR      B
        JR      NZ,L3743
        LD      HL,7839h
        RES     3,(HL)
        LD      HL,VERIFY_MSG
        LD      HL,OK_MSG
        JP      L36CF

VERIFY_MSG:  DB      0Dh
        DB      56h             ; 'V'
        DB      45h             ; 'E'
        DB      52h             ; 'R'
        DB      49h             ; 'I'
        DB      46h             ; 'F'
        DB      59h             ; 'Y'
        DB      20h             ; ' '
        DB      00h

Tuesday, September 19, 2017

VZ ROM Disassembly

Where the combination of YAZD and sed works, you get code that looks like this.  With a pass through sed adding comments, this could look really good without major code manipulation on my part.

        ; Entry Point
        ; --- START PROC RUN ---
RUN:  JP      Z,L1B5D
        CALL    79C7h
        CALL    L1B61
        LD      BC,1D1Eh
        JR      L1EC1
        ; Entry Point
        ; --- START PROC GOSUB ---
GOSUB:  LD      C,03h
        CALL    L1963
        POP     BC
        PUSH    HL
        PUSH    HL
        LD      HL,(78A2h)
        EX      (SP),HL
        LD      A,91h
        PUSH    AF
        INC     SP
        ; Referenced from 1EAF
        ; --- START PROC L1EC1 ---
L1EC1:  PUSH    BC
        ; Entry Point
        ; --- START PROC GOTO ---
        ; Referenced from 1FC7
        ; --- START PROC L1EC5 ---
        PUSH    HL
        LD      HL,(78A2h)
        RST     0x18

However, YAZD  needs to take configuration input from a file, you should be able to manually identify blocks of data including size/type/length (byte, 16 bit word, string),  entry/data points should have configurable labels instead of manually generated ones, and data labels should automatically be generated as an option.  Using sed works, but it would be nice to complete this in a single step, especially for other people that aren't familiar with Unix/Linux.

One problem with sed is that it can be a bit indiscriminate in how it performs it's search and replace.  Relabeling addresses that are formatted like "L1EC1:" works well, but adding a custom label to actual references to that label (call jp, etc..) can potentially change unrelated values, where YAZD would be able to know the difference.

YAZD is open source, but it's poorly commented.  Figuring out the code takes time (though most of it doesn't look bad), and that's before I make changes.  It just adds more time to an already time consuming process, and I still have to a create a file with comments for the ROM based on existing comments from TRS-80.  At least I don't have to create everything just from the disassembly.

Friday, September 15, 2017

Fedora Plot MC-10

Another comparison of the factory MC-10 ROM vs the modified version.
This is a 3D plot of the "Fedora" hat.
I still have to fix a speed issue with the divide, but it seems stable and all but a handful of programs seem to work.  The ones that don't have some address dependencies that would need to be patched.
Once I fix the divide it should be ready for a final release for 8K.  Many of the changes I have planned for the 68hc11 version will directly work with the 6803.  I plan on saving some of the 68hc11 specific code for last so both versions can be developed in parallel.

Wednesday, September 13, 2017

I ran across a reverse engineered copy of Extended Color BASIC (mostly just Color BASIC actually) that had been ported to other 6809 systems.  It's been sitting on my hard drive for some time but I had forgotten about it.  At first glance, the math library appears to be very similar to Microcolor BASIC.  I could probably squeeze several changes from my MC-10 code in, but I'll have to locate a more intact version of the original ROMs if they exist.  There are also a couple things I could borrow for the 68hc11 and 6803 versions.