To generate the raw disassembly I simply had do define all the ROM entry points from the command line. There are a lot of them but it's pretty strait forward. The disassembler follows links (except for RST calls) and it's just sort of slogging through the code finding what's real and garbage. Using the --xref option on the disassembler helps because we can see if if finds anything calling that piece of code. Due to having to define so many things as entry points, a lot of the data is useless, but it still helps.
Adding labels and comments is a bit different. Since labels (variables or functions) can occur in multiple places, we need to use pattern matching to save having to repeat the same operation over and over again. Comments are only added to a single line each, but you need to find the address they are associated with and then append the comment to the end of the line. So we have to generate the disassembly with the address at the start of each line which is done using the --lst option. One side effect of this is it generates a huge call graph at the end of the disassembly which slows down our pattern searching, and it really isn't needed due to the known design of the BASIC interpreter. We can get rid of it manually, but why not just have a program do it. We could write a custom program to do this, but there is no need as the Unix sed command can already do all of this! I haven't released the sed file I'm using yet, but I'll give a quick explanation of what commands I'm using to perform these actions. This is no substitute for the sed manual or other references on the web!
sed can take commands from the command line but since we are performing so many operations, they have all been placed in a single file that is read by using the -f directive on the command line followed by the name of the file containing our list of sed operations.
Since the raw disassembly needs the call graph removed from the end of the file to speed up the rest of the pattern matching (the call graph is huge), that's the first thing the sed script does. Thankfully, the disassembler places an identifier at the head of the call graph that says "Call Graph:"
We simply tell sed to search for that string, match any pattern that follows, and delete the entire thing with the d command. Notice that the string is placed inside forward slash (divide) symbols and the $ signifies matching any pattern that follows. Then the ,$d tells sed what to do with the text it matched. Take the entire block $ and deleted it d.
Now that the Call Graph is gone, sed needs to relabel functions and system variables by global pattern search and replace. If we want to replace a function label, we search for L followed by the address the function resides at. The s tells sed to search for the sting between the 1st and 2nd slashes, and to replace it with the string inside the 2nd and 3rd slashes. The g says perform this globally. So everywhere a function is called, it will be given a meaningful label instead.
s/L1DAE/END/gThe same works for system variables, but those are just raw addresses and not function calls, so the disassembler labels them as a 4 digit hex number followed by h signifying that it is a hexidecimal number. In the case below, we must identify functions that reside in ROM and that are copied to RAM where the program actually calls them. Notice the L prefix on the ROM code and h postfix on the RAM routines. Variables would have h as well.
The last (so far) function we need to perform is to append comments to the end of lines based on address. Most of the work here was extracting the comments from the pdf of Level II BASIC Decoded and Other Mysteries. Once the code that is borrowed from Level II BASIC is identified, we can just append the comments to the proper lines using the address. Again, we need to perform a pattern search, but then we must append the comment to the end of the line. We match the hexadecimal address followed by the : using s (search) followed by any string $ and simply append the comment string to the end of the line. The --lst option gives us the hex addresses followed by the colon for each line and this provides a unique pattern we can match with. Again, strings are contained between / slash symbols. In case you haven't already noticed, this also means our strings cannot contain the slash symbol. There are others but I won't go into that. Just watch the use of special characters that sed looks for.
/0013 /s/$/ ;--- Save BC - Keyboard routine/
/0014 /s/$/ ;--- B = Entry code/
/0016 /s/$/ ;--- Go to driver entry routine (3C2)/
/0018 /s/$/ ;--- RST 18 (JP 1C90H) Compare DE:HL/
/001B /s/$/ ;--- Save BC - Display routine, printer routine/
; --- START PROC DBL_SUB ---
0C70: 21 2D 79 DBL_SUB: LD HL,792Dh ;--- Double precision subtraction routine. ** cont--> *
0C73: 7E LD A,(HL) ;--- Load MSB of saved value
0C74: EE 80 XOR 80h ;--- Invert sign
0C76: 77 LD (HL),A ;--- And restore
; Referenced from 12B4, 0F84, 0E5C
; --- START PROC DBL_ADD ---
0C77: 21 2E 79 DBL_ADD: LD HL,EXP2 ;--- HL=addr of exponent in WRA2 ************ cont--> *
0C7A: 7E LD A,(HL) ;--- Load exponent from WRA2
0C7B: B7 OR A ;--- Set status flags for exponent
0C7C: C8 RET Z ;--- Exit if WRA2 value zero