Friday, March 30, 2018

Further testing of the ASCII <-> float code using invert and multiply in place of division has given mixed results. 

Ahl's benchmark now turns in a time of 1:05 and accuracy is identical to the original.
Most code tested seems to work fine, and this optimization typically offers around a 3% speedup, that does vary from program to program though.

The Fedora plot is a little faster, but the graph is a little different.  There are a couple bugs elsewhere in the ROM, once those have been squashed, we'll see if there is a problem with this code or not.


The faster parser and faster ASCII to floating point code makes a huge difference over the video I recorded seven months ago.

Run them at the same time to compare to the previous release.

New


Old

Thursday, March 29, 2018


This is the difference an 8% speedup makes.

ASCII # to binary code optimization

When I wrote the post on using invert and multiply to speed up BASIC programs, I realized I hadn't searched through the ROM to find divides that could be altered to use this optimization.  A quick search revealed a couple divides in the code.

The LOG function contains a divide that involved a constant.  At first glance this seemed ideal, but on closer examination, the constant wasn't the divisor, so we're stuck with that one.  Too bad, because Ahl's benchmark depends on the SQR (square root) function, which in turn uses the LOG function.  This would have dropped the MC-10's time on that benchmark below one minute.  This also would have let me drop one of the constants from the ROM because it would not have been needed anymore, saving the 5 bytes used by the floating point number. 

The ASCII to binary, and binary to ASCII conversion code uses a common divide by the constant 10.  The change was easy, just switch the constant from the floating point representation of 10 to 1/10, and change the code so it calls multiply instead of dropping into the divide function.  On closer examination I had to duplicate a little code because SIN calls the function part way through and it cannot use the multiply due to the divisor not being a constant.  I'll revisit that later to see if it can be rewritten, but the important thing is that the code works, and this is used quite often during the execution of a BASIC program.  Numeric values are stored as ASCII text even after a BASIC program has been tokenized.  There is an even faster technique, but that code would need a complete rewrite to ditch floating point through most of the calculation and I'm not committing to that... at least not yet.

Running the Solitaire solver program on the new ROM with this update side by side against the original ROM overnight seems to indicate the speedup is now 8%, and it's looking like 10% might be possible with a few more optimizations.  This would make the MC-10 noticeably faster than 1 MHz 6502 machines when running pretty much anything.  One benchmark that outputs a lot of numbers was a solid 20% faster, and Ahl's benchmark, which does not output anything until the end, was a couple seconds faster, however that was run over a VNC connection to my developer box, so that will have to be timed directly from the machine to be sure.

That sounds great, but there is one annoying little detail.  I'm out of ROM space and I still have to fix a couple bugs.  It looks like I'll have to drop something to fit this in, and the next release will probably be the last one to use 8K unless I find a way to save a significant number of bytes.

To save time converting the inverted constants to floating point, I used the following program on the Color Computer, which supports converting numbers directly to hexadecimal.  The program prints the values stored in the variable.
Since numbers are calculated on an almost exactly identical math library, I don't have to worry about a PC based program producing a number that would differ from the MC-10.

10 A=1/10
20 Z=VARPTR(A)
30 FOR I=Z TO Z+4
40 PRINT HEX$(PEEK(I))
50 NEXT I

Wednesday, March 28, 2018

Is the MC-10 a doorstop?

One of the most common comments about the MC-10 within the Radio Shack computer community is that it's a doorstop. 

Out of the box, the keyboard alone makes it impossible to use as a word processor or for business, and it makes it difficult to program the machine.  The limited RAM and lack of hi-res graphics also limit the usefulness of the machine.  Unless you were willing to spend the time and money to hack in a replacement keyboard, more RAM, etc... as a general purpose computer it wasn't very useful.  So it's pretty much a doorstop for people that have those needs.
But if you wanted to introduce a child to programming at minimal cost (before you knew if they were interested or if it would just go into a closet) it would do that well.

But it didn't have to be a doorstop.  Once you put on a better keyboard and expand the RAM, it's more capable than a Model I without an expansion interface.  The cassette interface is faster, it has a built in serial port, it has color, sound, could have hi-res graphics, it only lacks a few BASIC keywords vs Level II BASIC, and it's faster.
Even with standard Microcolor BASIC, Ahl's benchmark yields the same result on the MC-10 as the Model III, and it's faster than the Oric, Spectrum, TI-99, Atari, Model 100, and the ZX-80 derivatives among others. With the new ROM it leaps past 32 more machines Creative Computing benchmarked, and uses greater math accuracy than most of them. 

I liken it to the V8 Vega a friend of mine had as a teenager.  Yeah, it was a Vega, but think how fun it was to see people's faces when it stomped on their sports car in a drag race.  I believe the term is sleeper.

Tuesday, March 27, 2018

Taking advantage of the new MC-10 ROM Part 2

You really don't have to do anything else software wise to speed up your code with the new ROM.  It speeds up everything around 5%.  However, there are some things you can do to make programs faster, most of which work with all MC-10 BASICs. 

Using ELSE will make some code a little faster, especially if you make the default condition the most often executed condition.

Example:
IF A=1 most.executed ELSE everything.else

You may have to reverse the logic:
IF A <= 1 least.executed ELSE most.executed

Becomes:
IF A > 1 most.executed ELSE least.executed

The reason for this is that the interpreter has to spend less time dealing with new lines, searching for lines, and less time searching for the end of a line.  The difference is small, but it adds up.
In this case, the > requires slightly less parsing than <= so that also makes the code slightly faster.


Things that will speed up all versions of MC-10 BASIC.

Declare your variables at the top of the program in order from most used to least used.  Just assign them to zero or whatever.  This places them in order in the variable table based on when they are created, and it will make the most use variables quicker to find when BASIC has to search through the table.  This is something common to all Microsoft BASICs I've looked at.  This can speed up some programs by several percent.

Remove all spaces and do not execute lines with REM statements.  This reduces the number of characters BASIC has to parse.  It is especially true of Microcolor BASIC.  Microsoft BASIC has a subroutine in RAM that searches for keyword tokens or other characters that it returns to the code that called it, and it skips spaces.  The MC-10 didn't have enough room in low RAM (the direct page) for all of the code, so it executes part of the code in RAM, then jumps to ROM to test what character was read.  If it's a token or other character it's looking for, it returns that.  If it's a space, it jumps back to the code in RAM to parse the next character.  So each space result in two two jumps and multiple tests of the character that aren't necessary if you simply delete the spaces.

The patch I posted that can be embedded in your program places the test for the most common item the parser finds (tokens) into RAM so it doesn't have to jump to the ROM as much, but spaces are still costly.  The patch does not work with MCX-BASIC, as that uses the addresses required for the patch.


Hardware changes.

The other thing you can do to take advantage of the new ROM is add a RAM expansion that places RAM immediately following the direct page.  This is hex address $100 on.  This allows the ROM to install the entire parse subroutine into RAM.  If no RAM exists at that address, it only installs the code to check for tokens like the patch.  The more of the subroutine that resides in RAM, the faster the interpreter runs.  Sadly, the Radio Shack RAM expansion does not support this.

To Err is Human, to Invert and Multiply is Divide... or how to take advantage of the optimized MC-10 ROM Part 1.

Supporting the 6803's hardware multiply in the floating point math library provided the single greatest speed improvement of any of the optimizations in the new MC-10 BASIC ROM if a program uses lots of multiplication.  That speeds up a lot of calculations, but math isn't just multiplication.  You also need to divide numbers a lot, and there is no hardware divide instruction on the 6803.

There is a partial solution, and it comes from a shortcut normally used when dividing fractions.
Invert and multiply.  But we aren't dealing with fractions, so how do you invert a number?  Microcolor BASIC represents all numbers with floating point numbers, so fractions are represented as floating point, and to invert a number, you divide 1 by the number.

Example:

Suppose you want to divide 94 by 144. 
Normally, the BASIC code would look something like this:
10 PRINT 94 / 144

But to invert and multiply, 94 can be represented as 94/1 and 144 can be represented as 144/1.  When you invert 144/1 you get 1/144.  When you multiply 94/1 by 1/144, you get 94/144.  Since 94/1 is still 94 that part of the code is the same as the original.
The code now looks like this:
10 C = 1 / 144
40 PRINT 94 * C

This lets us take advantage of the fast 6803 hardware multiply used in the optimized floating point library, but as I said, this is a partial solution.  When you invert 144/1, in order to create the floating point equivalent of 1/144, you are performing a divide.  So now you are performing a divide and a multiply to get the result instead of just a divide.  This is obviously slower, so it will not be faster in every case, including the one above, but that shows how it works.

Repeatedly dividing by a number is where we can take advantage of the faster multiply.  How much faster the code is will depend on how many divides there are.  This approach may be faster with as few as 2 divides given how slow the floating point divide is compared to the multiply, but you need to benchmark the code to be sure if there are a small number of divides.  If the divide is inside of a loop, the savings could be significant.
Example:

Original code:
10 FOR I = 1 TO 1000
20 PRINT I / 7
30 NEXT I

Becomes:
0  C = 1 / 7
10 FOR I = 1 TO 1000
20 PRINT I * C
30 NEXT I

Monday, March 26, 2018

What good is a new MC-10 ROM? (reply to CoCo Crew Podcasts)

My replacement for Microcolor BASIC has had a some mentions on the CoCo Crew Podcast in recent episodes.  This nay answer a couple questions they had.

Why did I start this?  The project arose for a couple reasons, and no it's not nostalgia, I did't own an MC-10 until a few years ago.  I wanted to enter the retro challenge (why I started this blog), and I had claimed on a forum that the MC-10 is more powerful than people give it credit for CPU wise... or something like that.  Someone mocked me for that.  Challenge accepted!  Plus this sort of thing is fun for me, I created a commented disassembly of the Amiga exec back in the 90s. 

Why would you want it?  If you want to run BASIC games on the MC-10, or you want to program the MC-10 in BASIC, programs will run faster.  It also adds the ELSE statement which makes it easier to port programs from other machines including from CoCo Color BASIC. Since a lot of programs never appeared on the MC-10, you can be the first to port them.

Okay, so there's a new ROM, how can we use it?
1)  The most obvious is that you can use it in an emulator.  This is how I test the ROM.  Just tell the emulator to load the ROM image instead of the original.  This is also the best way to write BASIC programs for the MC-10.  If you don't have to type them in on that little keyboard things become a lot easier.
2) The MC-10 hardware was designed so that external hardware can disable the internal CPU address decoding, including the ROM.  You can put the new ROM onto an expansion board that includes a socket for one, or replace internal ROM with RAM where supported to load the ROM image into RAM.
3) You can replace the system ROM inside the machine.  Most MC-10s have the ROM soldered to the motherboard, but there are a few (like mine) that have the ROM socketed.  You can desolder the ROM and install a socket if it doesn't already have one, then plug in a new ROM. 
You could have the ROM desoldered for you at CoCo Fest. 
At the same time, you could also have the CPU socketed for a future 6303 upgrade.  The 6303 upgrade requires a new ROM.  Getting the shielding removed to access the ROM and CPU also makes it easier to upgrade the internal RAM to take advantage of the highest resolution 6847 graphics modes.

FWIW, the optimizations I've made are mostly machine or Motorola specific.  You can't speed up the 6502 or Z80 versions of Microsoft BASIC the same way.  Microcolor BASIC on the MC-10, and Color BASIC on the CoCo weren't optimized much over the 6800 version they seem to be derived from.

Why the MC-10 ROM and not the CoCo ROM?   There's less software to break, and the MC-10 needs some love.

Thursday, March 15, 2018

MAME MC-10 HD6303 support is on hold

It seems the MAME crew is rewriting a lot of stuff again and it makes no sense to fix any code that is just going to to broken by the rewrite.  The design of the system is still a bit C oriented anyway.  The HD6303 object should be able to inherit from the MC6801/3 object which should be able to inherit from the 6800 object... and then just overload things that have changed.  The 6800/6801/6303 stuff hasn't quite made that transition yet.  The built in I/O and RAM isn't even in the 6801/6303 objects, it part of the code for each machine.

I'll hack together a version of MAME using the timing table of the 6303 in place of the 6803 so I can perform some benchmarks and make a video or two, but until I see what changes are being made to MAME, there isn't much point in doing anything more.  

Thursday, March 8, 2018

MAME MC-10 support update part 2

The ROM performs the memory test, copies setup code to RAM (that isn't there), returns and jumps into never never land because the stack pointer was never set due to loading it from page zero.
Turns out the HD6301/3 support in MAME is incomplete, it doesn't include the built in hardware and RAM. 
There were several oversights in my code prior to last night's work, so it would have had problems anyway, but seriously?  Someone couldn't take the time to inherit this from the 6801 emulation?
It's already there.  I'll look at fixing this in the next day or two.

One of the problems with MAME is that they use a lot of macros to "simplify" supporting new systems, but it has the side effect of keeping developers that aren't familiar with them from seeing what is going on, or more appropriately, what is going wrong.  Once I got around that hurdle, it was pretty easy to finish the HD6803 support in the MC-10 code.  I still have to fix an issue with the Alice support this caused, but I'll cross that bridge when I get to it.  Perhaps the most difficult part will be getting the MAME devs to accept these changes. 

Saturday, March 3, 2018

MAME MC-10 support update

I just spent several hours setting up a build environment for the MAME/MESS emulator.
It took a couple times to get it working properly.  Not sure what went wrong with the first install. 
Building it took 3 hours on my Intel i7 quad core laptop.  One thing MAME isn't, is small and I need a faster hard drive.

Adding HD6303 support for the MC-10 took about 2 hours.  Half of that was finding examples to make the changes, and the rest of the time was needed to figure out what was barfing in the build.  I cut and pasted from the 6309 CoCo3h definition and didn't update the tag on the line that replaces the cpu.  Du-Oh!

A real MC-10 with a 6303 will require a different ROM to fix timing differences, to add the extra 6303 interrupt vector, etc... but that shouldn't be a big deal.  Some testing will before I can submit the changes to the MAME project.  I won't be able to start that until tomorrow night.

MAME may not properly support 6803/6303 internal direct page RAM... but I haven't looked through all of the code yet.

Friday, March 2, 2018

Testing 1... 2... 3...

Testing some code formatting so it doesn't look ugly anymore.




Using http://hilite.me to format the html.
; Simple speed up for MC-10 Microcolor BASIC
; (C) 2018  James Diffendaffer
; May be freely redistributed
; Date:  2/28/2018
; code to patch the CHRGET function on the direct page 
; chrget is used to parse through the BASIC code and gets called a lot.
; by checking the most frequent case in RAM, it saves a 3 clock cycle jmp
; to the remainder of the function in ROM.  

; definitions for the TASM cross assembler needed for 6803 syntax
.MSFIRST           ; Most Significant byte first

#define EQU     .EQU
#define ORG     .ORG
#define RMB     .BLOCK
#define FCB     .BYTE
#define FCC     .TEXT
#define FDB     .WORD
#define END  .END
#define FCS  .TEXT

#define equ     .EQU
#define org     .ORG
#define rmb     .BLOCK
#define fcb     .BYTE
#define fcc     .TEXT
#define fdb     .WORD
#define end  .END
#define fcs  .TEXT

;start of code
 org  $434B   ;1st address after a REM on the first line of codeof the program.
       ;this is constant on startup in Microcolor BASIC

 pshx     ;preserve register contents
 psha
 pshb
 ldaa $F6
 cmpa #$7E    ; is it a jmp instruction?
 bne  exit   ; if not we exit
 
 ldd  $F7    ; grab the current address JMP calls
 addd #$0105   ; hopefully this will skip the compare & branch now on the direct page
 subd #$0101   ; - to avoid putting a zero in the code. 
 std  $FC    ; save it at the end of the new code

; ldx  #PATCH   ; get the address of our patch code minus 1 (to avoid using LDD 0,X)
 ;pc relative version of ldx to make code relocatable
 bsr  pcRel+1   ; push PC (avoiding zero byte)
pcRel:
 nop      ; origin for PC-relative indexing
 pulx     ; X = pcRel
 
 ldd  PATCH-pcRel,X    ; get the firs two bytes
 std  $F6    ; save them
 ldd  PATCH-pcRel+2,x    ; get the next two
 std  $F8    ; etc...
 ldd   PATCH-pcRel+4,x
 std  $FA
exit:
 pulb     ; restore registers
 pula
 pulx

 rts      ; return to BASIC

; contains the patch
PATCH:
; FCB $F0     ; dummy byte so LDD doesn't have to use LDD 0,X
; org $00F6    ; the address the patch is meant to run at.  Not really needed due to relative branch.
 cmpa      #':'    ; set Z flag if statement separator
 bcs       AA    ; perform more tests if not
 rts       ; return if >= ':' 
AA jmp $E1CC     ; jump to the parser back end.  The address can be dropped 
        ; - because we copy the current one plus 4 now

 end
 

Thursday, March 1, 2018

Final BASIC code for CHRGET patch.

The relocatable version using a string variable.
RUN it.  It will stop after POKEing the code into the string.
Then delete lines 1, 2, 3, and 5.
Save the program and then every time you load it, the patch will already be in the string.
You won't have to wait for it to be READ from the DATA and POKED into memory, it will just find the address of the string and EXECute the code.

0 Z$="01234567890123456789012345678901234567890123456"
1 DATA 60,54,55,150,246,129,126,38,26,220,247,195,1,5
2 DATA 131,1,1,221,252,141,1,1,56,236,18,221,246,236,20,221
3 DATA 248,236,22,221,250,51,50,56,57,129,58,37,1,57,126
4 Z=PEEK(VARPTR(Z$)+2)*256+PEEK(VARPTR(Z$)+3)
5 FORI=0TO44:READ A:POKE Z+I,A:NEXT:STOP
6 EXEC Z

Position independent code version of MC-10 patch.

Darren from the yahoo MC-10 group suggested this change.
The patch code is relocatable now in case the normal start address of a BASIC program is different on some machine. 

The BASIC code has the new DATA, but the POKE and EXEC are still position dependent.


0 REM01234567890123456789012345678901234567890123456
1 DATA 60,54,55,150,246,129,126,38,26,220,247,195,1,5
2 DATA 131,1,1,221,252,141,1,1,56,236,18,221,246,236,20,221
3 DATA 248,236,22,221,250,51,50,56,57,129,58,37,1,57,126
4 FORI=0TO44:READ A:POKE 17227+I,A:NEXT:STOP
5 EXEC17227



0001   0000             ; Simple speed up for MC-10 Microcolor BASIC
0002   0000             ; (C) 2018  James Diffendaffer
0003   0000             ; May be freely redistributed
0004   0000             ; Date:  2/28/2018
0005   0000             ; code to patch the CHRGET function on the direct page
0006   0000             ; chrget is used to parse through the BASIC code and gets called a lot.
0007   0000             ; by checking the most frequent case in RAM, it saves a 3 clock cycle jmp
0008   0000             ; to the remainder of the function in ROM.  
0009   0000             
0010   0000             ; definitions for the TASM cross assembler needed for 6803 syntax
0011   0000             .MSFIRST        ; Most Significant byte first
0012   0000             
0013   0000             #define EQU     .EQU
0014   0000             #define ORG     .ORG
0015   0000             #define RMB     .BLOCK
0016   0000             #define FCB     .BYTE
0017   0000             #define FCC     .TEXT
0018   0000             #define FDB     .WORD
0019   0000             #define END .END
0020   0000             #define FCS .TEXT
0021   0000             
0022   0000             #define equ     .EQU
0023   0000             #define org     .ORG
0024   0000             #define rmb     .BLOCK
0025   0000             #define fcb     .BYTE
0026   0000             #define fcc     .TEXT
0027   0000             #define fdb     .WORD
0028   0000             #define end .END
0029   0000             #define fcs .TEXT
0030   0000             
0031   0000             ;start of code
0032   434B              org $434B ;1st address after a REM on the first line of codeof the program.
0033   434B              ;this is constant on startup in Microcolor BASIC
0034   434B             
0035   434B 3C          pshx ;preserve register contents
0036   434C 36          psha
0037   434D 37          pshb
0038   434E 96 F6        ldaa $F6
0039   4350 81 7E        cmpa #$7E ; is it a jmp instruction?
0040   4352 26 1A        bne exit ; if not we exit
0041   4354             
0042   4354 DC F7        ldd $F7 ; grab the current address JMP calls
0043   4356 C3 01 05    addd #$0105 ; hopefully this will skip the compare & branch now on the direct page
0044   4359 83 01 01    subd #$0101 ; - to avoid putting a zero in the code. 
0045   435C DD FC        std $FC ; save it at the end of the new code
0046   435E             
0047   435E             ; ldx #PATCH ; get the address of our patch code minus 1 (to avoid using LDD 0,X)
0048   435E              ;pc relative version of ldx to make code relocatable
0049   435E 8D 01        bsr pcRel+1 ; push PC (avoiding zero byte)
0050   4360             pcRel:
0051   4360 01          nop ; origin for PC-relative indexing
0052   4361 38          pulx ; X = pcRel
0053   4362             
0054   4362 EC 12        ldd PATCH-pcRel,X ; get the firs two bytes
0055   4364 DD F6        std $F6 ; save them
0056   4366 EC 14        ldd PATCH-pcRel+2,x ; get the next two
0057   4368 DD F8        std $F8 ; etc...
0058   436A EC 16        ldd PATCH-pcRel+4,x
0059   436C DD FA        std $FA
0060   436E             exit:
0061   436E 33          pulb ; restore registers
0062   436F 32          pula
0063   4370 38          pulx
0064   4371             
0065   4371 39          rts ; return to BASIC
0066   4372             
0067   4372             ; contains the patch
0068   4372             PATCH:
0069   4372             ; FCB $F0 ; dummy byte so LDD doesn't have to use LDD 0,X
0070   4372             ; org $00F6 ; the address the patch is meant to run at.  Not really needed due to relative branch.
0071   4372 81 3A        cmpa      #':' ; set Z flag if statement separator
0072   4374 25 01        bcs       AA ; perform more tests if not
0073   4376 39          rts ; return if >= ':' 
0074   4377 7E E1 CC    AA jmp $E1CC ; jump to the parser back end.  The address can be dropped 
0075   437A              ; - because we copy the current one plus 4 now
0076   437A             
0077   437A              end
0078   437A              tasm: Number of errors = 0