The images look very different, but the only difference is the scale and center of the image. They are performing the same number of calculations and setting the same number of pixels. Once I add hi-res graphics support this will be more obvious.
My on again off again blog about whatever computer related hobby projects I happen to be working on at the moment.
Wednesday, December 12, 2018
Apple II vs MC-10
The images look very different, but the only difference is the scale and center of the image. They are performing the same number of calculations and setting the same number of pixels. Once I add hi-res graphics support this will be more obvious.
New CoCo 3 ROM
For those of you that want a new CoCo 3 ROM, I've turned my current code over to William Astle.
Pretty sure he'll be able to crank out the 6809 changes faster than I would since I haven't been working with the 6809 much. This will also leave me to work on a few other optimizations I had in mind.
Pretty sure he'll be able to crank out the 6809 changes faster than I would since I haven't been working with the 6809 much. This will also leave me to work on a few other optimizations I had in mind.
Monday, December 10, 2018
Finding Prime Numbers
Another speed comparison, finding prime numbers up to 10,000.
Thursday, December 6, 2018
CoCo 3 vs MC-10 with new ROM
Up to this point I've shown comparisons of the new vs old MC-10 ROM. This video demonstrates the difference in speed between the CoCo 3 running at double speed vs the MC-10 with the new ROM. This is the difference the hardware multiply makes, along with a better implementation of the interpreter. The CoCo 3 will show a greater difference on non-math oriented programs, but this provides a rather interesting comparison. If I were to replace the MC-10's Motorola 6803 with a Hitachi 6303, and if it provided even a 10% speed increase, the MC-10 would win in this comparison. The 6309 provides at least a 20% speedup over the 6809 so that is a very real possibility.
Wednesday, December 5, 2018
Some last minute notes on the new MC-10 ROM
Just some notes on last minute changes to the new MC-10 ROM.
The 16 bit string compare may be a few clock cycles slower when comparing strings of 1-3 bytes in length(?), but anything longer will be faster. The slower performance on short strings will be more than made up for by the faster interpreter, but the difference on longer strings is significant enough to make this change well worthwhile. The new interpreter will always be faster, but should be noticeably so when comparing a lot of long strings like with my test sort code (which I'll release shortly).
The manner in how the the main loop calls the functions associated with tokens has been changed.
The size of the "Command Dispatch Table" has been enlarged to account for all potential tokens. This allowed the removal of the range check before calling the address in the Command Dispatch Table, saving a few clock cycles. Tokens that would result in a syntax error, now jump directly to the error, and undefined tokens jump to the RAM hook "RVEC10" which is set to jump to syntax error by default. This lets the main loop skip the jump to the RAM hook for every ROM based token routine, saving multiple instructions for every token executed, but you can still intercept unused tokens to extend the ROM.
While existing programs that override tokens will not work on the new ROM (I can only think of one), it is still possible to extend BASIC, and alternative versions of existing commands could replace the current ones. I'm working on a way to embed new tokens and parameters directly into the code.
*edit*
Please note that tokens $F0-$FF cannot be used at this time. The token number is multiplied by 2 to calculate the offset in the dispatch table. This is done with a bit shift, but the result is only one byte and the carry is lost. It's a trade off for speed. It still leaves room for 38 new tokens, and I'd like to reserve $EF for future expansion. (Two byte tokens could be a possibility in the future)
The 16 bit string compare may be a few clock cycles slower when comparing strings of 1-3 bytes in length(?), but anything longer will be faster. The slower performance on short strings will be more than made up for by the faster interpreter, but the difference on longer strings is significant enough to make this change well worthwhile. The new interpreter will always be faster, but should be noticeably so when comparing a lot of long strings like with my test sort code (which I'll release shortly).
The manner in how the the main loop calls the functions associated with tokens has been changed.
The size of the "Command Dispatch Table" has been enlarged to account for all potential tokens. This allowed the removal of the range check before calling the address in the Command Dispatch Table, saving a few clock cycles. Tokens that would result in a syntax error, now jump directly to the error, and undefined tokens jump to the RAM hook "RVEC10" which is set to jump to syntax error by default. This lets the main loop skip the jump to the RAM hook for every ROM based token routine, saving multiple instructions for every token executed, but you can still intercept unused tokens to extend the ROM.
While existing programs that override tokens will not work on the new ROM (I can only think of one), it is still possible to extend BASIC, and alternative versions of existing commands could replace the current ones. I'm working on a way to embed new tokens and parameters directly into the code.
*edit*
Please note that tokens $F0-$FF cannot be used at this time. The token number is multiplied by 2 to calculate the offset in the dispatch table. This is done with a bit shift, but the result is only one byte and the carry is lost. It's a trade off for speed. It still leaves room for 38 new tokens, and I'd like to reserve $EF for future expansion. (Two byte tokens could be a possibility in the future)
Monday, November 26, 2018
MC-10 ROM fixes before a release
The bug in the code that converts ASCII to floating point by changing divide by 10 to multiplying by 1/10 is now fixed. This offered a significant speedup, so I'm happy that code will make it into the release. That leaves one bug to fix before an official release. That bug is likely to be in the math library, and I'm pretty sure I know what it is. Once I get time it shouldn't take too long to fix. The first release will definitely be a 16K ROM, but an 8K ROM with most of the code may follow if it will fit. I am not going to fight with it to get that working though.
You can see the bug in my youtube video with the sort comparison. The gap values are different between the original and new ROM. That is no longer the case. FWIW, the the constant for 1/10 was wrong and it was the first thing I checked, so... easy fix.
You can see the bug in my youtube video with the sort comparison. The gap values are different between the original and new ROM. That is no longer the case. FWIW, the the constant for 1/10 was wrong and it was the first thing I checked, so... easy fix.
Thursday, November 22, 2018
Breaking the chains... of the 8K barrier
This does not include the code that uses multiplication to convert ASCII numbers to binary instead of division, and the code that stores the current line pointer instead of the current line number.
Wednesday, November 21, 2018
Latest MC-10 ROM snapshot
The video is still uploading, but here is a snapshot of the difference in speed between the original MC-10 ROM and new one on the "Fedora" 3D Plot. The SIN(), COS(), TAN(), and string compare are faster.
I'd make a video of a large sort to show off the string compare, but it's not as dramatic and would take several minutes to watch.
I'd make a video of a large sort to show off the string compare, but it's not as dramatic and would take several minutes to watch.
Friday, November 16, 2018
String variable test/sort code (BASIC)
Here is a little test program I threw together to benchmark the 16 bit string compare code I'm preparing for a future ROM version. A little bonus here is a sort implementation you may not have seen before.
Back in the 90s I was working on a project I inherited from someone else. It involved nightly data dumps from a production system to a local server that made the data accessible to customers via modem, using custom software. The process was rather labor intensive. The operator had to run the data extraction, ftp the data to a PC, which was then imported by a custom VB application that required multiple starts, executing other program, then continuing, etc... It was impossible to automate the way it was.
Upon investigation, the original programmer did not know how to implement a sort, so he was using an external utility. (he also didn't seem to know how to take a shower... but that's another topic)
After a quick search on Yahoo, no VB sorts turned up (I think the search engine was undergoing maintenance), so I threw in a bubble sort and went home. Some 12 hours later, the sort was still going so I killed it, wrote a more optimal version, and on the first try it took about 18 minutes to sort the 80,000+ records, and it didn't require human intervention so it could be launched automatically.
Anyway, a more polished version of that sort is below. This has some optimizations that weren't in the original as it's aimed at the Microsoft BASIC interpreter.
The sort is, at the very least, a really improved version of the bubble sort. It functions a bit like a comb or shell sort, so I'm not sure what you'd call it. It's easy to implement, pretty fast, and it's fairly small. Taking a hint from the comb sort, dividing the gap by two may not be the most efficient, but it works pretty well for something I came up with off the top of my head in my 20s.
The data comes from a list of the 1000 most common surnames, and it is truncated here for space. The data includes a few other numbers with it that I didn't waste time removing. With all the additional data, only 894 names fit in memory, but that provides a really good workout for the interpreter and sort. Sorting takes over 10 minutes on a regular MC-10, but I'm hoping the new string comparison cuts that by at least 20%, which is about what you can expect from unrolling a loop once.
0 I=0:N=1000:C=0:D=0:E=0:NW=0
'initialize the index array, and read in the names. Skip the other data.
'scans through the data to determine the array sizes before dimensioning the arrays.
'The most heavily use variables are initialized on the first like to insure they are
' at the start of the variable table
'
10 I=0:J=0:G=0:N=0:Q=0
15 PRINT"SCANNING FOR END OF DATA"
20 READ K$,C,D,E:IF K$<>"" THEN N=N+1:GOTO 20
30 RESTORE
40 DIM A$(N),B(N)
45 PRINT"INITIALIZING ARRAYS"
50 FORI=1TON:READ A$(I),C,D,E:B(I)=I:NEXT
60 PRINT:PRINT N" MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR"
70 GOSUB 2000
80 PRINT:PRINT "SORTING"N"NAMES":GOSUB 10000
90 PRINT:PRINT N" MOST POPULAR SIR NAME IN ALPHABETICAL ORDER"
95 GOSUB 2000
96 END
'print out the names based on the array index
2000 FOR I=1 TO N:PRINT A$(B(I))" ";:NEXT:RETURN
'optimized bubble(?) sort.
'Moves data as far as possible with each Middle loop
'Skips data that doesn't need sorted with each new pass on inner loop.
'P = what we divide the array size by to get the gap
'G = the gap
'using nested for loops avoids having to perform a line search that would
'happen with GOTO. The address is simply pulled off of the stack.
'Using STEP 0 keeps FOR NEXT from changing the variable and the endless loop
'exits once the exit condition is achieved
10000 P=1:FORG=2TO1STEP0:P=P*2:G=INT(N/P):NW=N-G:PRINTG;
10010 FORQ=1TO0STEP0:Q=0:FORI=1TONW
10020 IFA$(B(I))>A$(B(I+G))THENT=B(I):B(I)=B(I+G):B(I+G)=T:Q=I
10030 NEXT:IFQ>0THENNW=Q
10040 NEXT:NEXT:RETURN
'1000 MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR
100 DATA SMITH,2501922,1.006,1,JOHNSON,2014470,.81,2,WILLIAMS,1738413,.699,3
101 DATA JONES,1544427,.621,4,BROWN,1544427,.621,5,DAVIS,1193760,.48,6
...
1000 DATA "",0,0,0 :REM END OF DATA MARKER
Back in the 90s I was working on a project I inherited from someone else. It involved nightly data dumps from a production system to a local server that made the data accessible to customers via modem, using custom software. The process was rather labor intensive. The operator had to run the data extraction, ftp the data to a PC, which was then imported by a custom VB application that required multiple starts, executing other program, then continuing, etc... It was impossible to automate the way it was.
Upon investigation, the original programmer did not know how to implement a sort, so he was using an external utility. (he also didn't seem to know how to take a shower... but that's another topic)
After a quick search on Yahoo, no VB sorts turned up (I think the search engine was undergoing maintenance), so I threw in a bubble sort and went home. Some 12 hours later, the sort was still going so I killed it, wrote a more optimal version, and on the first try it took about 18 minutes to sort the 80,000+ records, and it didn't require human intervention so it could be launched automatically.
Anyway, a more polished version of that sort is below. This has some optimizations that weren't in the original as it's aimed at the Microsoft BASIC interpreter.
The sort is, at the very least, a really improved version of the bubble sort. It functions a bit like a comb or shell sort, so I'm not sure what you'd call it. It's easy to implement, pretty fast, and it's fairly small. Taking a hint from the comb sort, dividing the gap by two may not be the most efficient, but it works pretty well for something I came up with off the top of my head in my 20s.
The data comes from a list of the 1000 most common surnames, and it is truncated here for space. The data includes a few other numbers with it that I didn't waste time removing. With all the additional data, only 894 names fit in memory, but that provides a really good workout for the interpreter and sort. Sorting takes over 10 minutes on a regular MC-10, but I'm hoping the new string comparison cuts that by at least 20%, which is about what you can expect from unrolling a loop once.
0 I=0:N=1000:C=0:D=0:E=0:NW=0
'initialize the index array, and read in the names. Skip the other data.
'scans through the data to determine the array sizes before dimensioning the arrays.
'The most heavily use variables are initialized on the first like to insure they are
' at the start of the variable table
'
10 I=0:J=0:G=0:N=0:Q=0
15 PRINT"SCANNING FOR END OF DATA"
20 READ K$,C,D,E:IF K$<>"" THEN N=N+1:GOTO 20
30 RESTORE
40 DIM A$(N),B(N)
45 PRINT"INITIALIZING ARRAYS"
50 FORI=1TON:READ A$(I),C,D,E:B(I)=I:NEXT
60 PRINT:PRINT N" MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR"
70 GOSUB 2000
80 PRINT:PRINT "SORTING"N"NAMES":GOSUB 10000
90 PRINT:PRINT N" MOST POPULAR SIR NAME IN ALPHABETICAL ORDER"
95 GOSUB 2000
96 END
'print out the names based on the array index
2000 FOR I=1 TO N:PRINT A$(B(I))" ";:NEXT:RETURN
'optimized bubble(?) sort.
'Moves data as far as possible with each Middle loop
'Skips data that doesn't need sorted with each new pass on inner loop.
'P = what we divide the array size by to get the gap
'G = the gap
'using nested for loops avoids having to perform a line search that would
'happen with GOTO. The address is simply pulled off of the stack.
'Using STEP 0 keeps FOR NEXT from changing the variable and the endless loop
'exits once the exit condition is achieved
10000 P=1:FORG=2TO1STEP0:P=P*2:G=INT(N/P):NW=N-G:PRINTG;
10010 FORQ=1TO0STEP0:Q=0:FORI=1TONW
10020 IFA$(B(I))>A$(B(I+G))THENT=B(I):B(I)=B(I+G):B(I+G)=T:Q=I
10030 NEXT:IFQ>0THENNW=Q
10040 NEXT:NEXT:RETURN
'1000 MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR
100 DATA SMITH,2501922,1.006,1,JOHNSON,2014470,.81,2,WILLIAMS,1738413,.699,3
101 DATA JONES,1544427,.621,4,BROWN,1544427,.621,5,DAVIS,1193760,.48,6
...
1000 DATA "",0,0,0 :REM END OF DATA MARKER
Warning about using TASM for 6803 development
TASM (the Telmark Assembler, not the other TASM assembler) does not always warn you if if the range of an 8 bit relative branch is exceeded. You can only detect the wrong code was generated by looking at the .LST file and checking the actual offsets by hand, or once you run the program in the VMC10 emulator, and list file. Then the emulator will halt of the bad code and tell you in the LST window that the range had been exceeded. This happened to me on a BRA instruction this morning and made me very unhappy.
Tuesday, November 13, 2018
SIN() in please! (MC-10 ROM changes)
The new MC-10 ROM has been sitting untouched and incomplete for far too long, so I worked on it a few hours.
- Miscellaneous fixes such as only adding a colon in front of the ELSE statement during tokenization if the character before it isn't already a colon. It would add one no matter what before. The colon is required for the ELSE to work.
- A redundant floating point register load was eliminated.
- The variable that saves the current line number was changed to the current line pointer. This was the easiest way to implement the feature, and it only took about an hour. All that needed changed was the code that sets or reads the variable, and code that needs the current line number. This will only be faster once I store both, but it was the quickest way to get it working since it didn't require significant stack code changes. The code that prints the line number on BREAK, now prints the address of the line, but the fix shouldn't be difficult once I have time to work on it. A side benefit of this change was the elimination of some unnecessary code and it's close to working in 8K again.
- The most significant change, is the SIN() function now uses multiply to perform division by 2*Pi using the reciprocal (invert and multiply). So it multiplies by 1/(2*Pi). The code works well enough to perform some initial tests, which show this offers a significant speedup. This also speeds up COS() since that is calculated with SIN(n+(Pi/2)), and TAN() which is calculated with SIN(n) / COS(n). The code does not work for all cases though, so there's some work to be done yet, but it works well enough I could make the video below.
This is the previous video showing the speed of the new ROM vs the factory ROM:
This is the ROM with the SIN() code change:
Saturday, October 13, 2018
Article 'A Great Old-Timey Game-Programming Hack', and a response
Here's a nice little story related to the 6809. It shows of one of the more interesting optimizations you can use with the 6809. It's also neat to see that people came up with similar solutions completely isolated from each other.
Link
My MC-10 (6803) 64 column graphics text code's screen scroll also uses the stack register as the destination pointer for similar reasons, but there are differences vs the 6809.
Each register PUSHed or PULLed requires a separate instruction, where the 6809 can PUSH or PULL multiple registers with a single instruction. As a result, he 6803 code looks more like their earlier code.
With only one stack pointer, you have to use the index register for the other source or destination pointer, and the offset is only 1 byte, so you can only go up to 254 with LDD #,X before you have to change X. The code looks like this, and it's unrolled for a 256 byte section of the screen:
LDD #255,x ; 2 bytes, 5 clock cycles
PSHB ; 1 byte, 3 clock cycles
PSHA ; 1 byte, 3 clock cycles
LDD #254,x ; 2 bytes, 5 clock cycles
PSHB ; 1 byte, 3 clock cycles
PSHA ; 1 byte, 3 clock cycles
Link
My MC-10 (6803) 64 column graphics text code's screen scroll also uses the stack register as the destination pointer for similar reasons, but there are differences vs the 6809.
Each register PUSHed or PULLed requires a separate instruction, where the 6809 can PUSH or PULL multiple registers with a single instruction. As a result, he 6803 code looks more like their earlier code.
With only one stack pointer, you have to use the index register for the other source or destination pointer, and the offset is only 1 byte, so you can only go up to 254 with LDD #,X before you have to change X. The code looks like this, and it's unrolled for a 256 byte section of the screen:
LDD #255,x ; 2 bytes, 5 clock cycles
PSHB ; 1 byte, 3 clock cycles
PSHA ; 1 byte, 3 clock cycles
LDD #254,x ; 2 bytes, 5 clock cycles
PSHB ; 1 byte, 3 clock cycles
PSHA ; 1 byte, 3 clock cycles
etc...
LDX ROWADDRESS+254 ; 3 bytes, 5 clock cycles
PSHX ; 1 byte, 4 clock cycles
LDX ROWADDRESS+252
PSHX
LDX ROWADDRESS+250
PSHX
etc...
Using PSHX saves 22 - 9 = 13 clock cycles per pair of bytes moved, or 13 * ((32/2)*(192-8)) = 38,272 clock cycles per scroll! The code size also half then number of bytes per pair of bytes moved.
So why didn't I do that?
So why didn't I do that?
That may not be a big deal of you have a large RAM expansion, but it's not practical for most MC-10's. However, if you wanted to implement 4 rows of text at the bottom of the screen similar to the Apple II and several other 8 bit machines, then it's not so bad.
The latest code generates the scroll code on the fly at startup, so I could generate either version of the code depending on the hardware you have. We'll see.
Thursday, September 20, 2018
'David Patterson Says It’s Time for New Computer Architectures and Software Languages'
Just a little commentary by David Patterson on computing in a post Moore's Law world.
https://spectrum.ieee.org/view-from-the-valley/computing/hardware/david-patterson-says-its-time-for-new-computer-architectures-and-software-languages
https://spectrum.ieee.org/view-from-the-valley/computing/hardware/david-patterson-says-its-time-for-new-computer-architectures-and-software-languages
Tuesday, September 11, 2018
The search for for a 6800 based computer with bitmapped graphics.
My curiosity as to the speed of the MC6800 processor has been piqued, so I did some searching for 6800 or compatible machines with bitmapped graphics. It seems the 6800 didn't have much of an impact on the more personal side of personal computers.
- Most of the machines based on the CPU don't seem to have any graphics at all. They are just text based FLEX systems.
- There is a French FLEX system that offers a graphics board option, but I couldn't find programming info on the graphics board. Even if I could, it would probably be in French.
- The Dream 6800 doesn't offer high enough resolution graphics to be of use.
- The Panasonic JR-200 uses a Japanese version of the 6802, which is 6800 compatible, but it's graphics are made with user definable characters.
- The APF Imagination Machine has a 6847 VDP like other machines I support, but it's not set up as a true bitmap. The hardware is set up to display hi-res objects, and there is limited RAM to hold those objects.
- The 6803 will run 6800 code, but some instructions have different timing and it would require adding NOP instructions, or changing some instructions to try to match the 6800 speed.
- An emulator could be written just for this, but it seems like a lot of work unless there's some actual hardware the code could run on.
I'm open to suggestions if anyone knows a system that could be used. 256x192 resolution graphics would be the best option.
Thursday, August 30, 2018
and the processor wars discussion leads to...
The 'Processor Wars' discussion led to a few changes in the regular bitmap code for the 6803 and Z80.
The 6803 bitmap code isn't as efficient as it's Plus/4 screen layout version, but it cut the number of X register swaps in half. This saves 16 clock cycles in the print two character code, and 12,288 clock cycles per screen of characters (16 clock cycles x 32 bytes per row x 24 rows of characters). More importantly, the code is still faster after switching from a 7 byte font to an 8 byte font to support full height graphics characters. Writing single characters does not benefit, but that should only happen at the start or end of a string where two characters are not aligned on a byte. This also puts the .89 MHz 6803 closer to the speed of the 1.77 MHz 65816 Atari, and 3.5 MHz Z80 VZ.
The Z80 code had not been updated to increment the font pointers using byte instead of 16 bit opcodes in spite of the fact that I came up with the optimization years ago. Not sure what happened there, but for some reason it wasn't in the code. This requires aligning the font data so that individual characters do not cross a 256 byte boundary, but it appears to cut 2 clock cycles per register increment, and there are 14 of those per pair of bytes, or 7 when printing single bytes. That's up to 21,504 clock cycles per screen of characters (2 clock cycles * 14 increments * 32 bytes per row * 24 rows of characters).
The 6502 code won't benefit from either of these optimizations. It only has a single accumulator, and it doesn't have any 16 bit registers. The 65816 might be able to speed up address calculation, and the larger index registers may help, but it's going to take some time to see if that's possible with the mode 8/16 bit switching. If it is, it will probably mean splitting the 6502, and 65816 versions up to make it easier to maintain the code. It's a bit complex already due to squeezing Atom, Atari, and Commodore versions in the same code base.
*edit*
The 65816 could save two opcodes during the screen address calculation, but it would require at least as many new opcodes for the 8->16->8 bit mode switches.
The 6803 bitmap code isn't as efficient as it's Plus/4 screen layout version, but it cut the number of X register swaps in half. This saves 16 clock cycles in the print two character code, and 12,288 clock cycles per screen of characters (16 clock cycles x 32 bytes per row x 24 rows of characters). More importantly, the code is still faster after switching from a 7 byte font to an 8 byte font to support full height graphics characters. Writing single characters does not benefit, but that should only happen at the start or end of a string where two characters are not aligned on a byte. This also puts the .89 MHz 6803 closer to the speed of the 1.77 MHz 65816 Atari, and 3.5 MHz Z80 VZ.
The Z80 code had not been updated to increment the font pointers using byte instead of 16 bit opcodes in spite of the fact that I came up with the optimization years ago. Not sure what happened there, but for some reason it wasn't in the code. This requires aligning the font data so that individual characters do not cross a 256 byte boundary, but it appears to cut 2 clock cycles per register increment, and there are 14 of those per pair of bytes, or 7 when printing single bytes. That's up to 21,504 clock cycles per screen of characters (2 clock cycles * 14 increments * 32 bytes per row * 24 rows of characters).
The 6502 code won't benefit from either of these optimizations. It only has a single accumulator, and it doesn't have any 16 bit registers. The 65816 might be able to speed up address calculation, and the larger index registers may help, but it's going to take some time to see if that's possible with the mode 8/16 bit switching. If it is, it will probably mean splitting the 6502, and 65816 versions up to make it easier to maintain the code. It's a bit complex already due to squeezing Atom, Atari, and Commodore versions in the same code base.
*edit*
The 65816 could save two opcodes during the screen address calculation, but it would require at least as many new opcodes for the 8->16->8 bit mode switches.
Wednesday, August 29, 2018
8 bit vs 8 bit (Processor wars) Update
The 6800 code has been updated using a similar optimization as with the 6803 version. Even though the 6800 doesn't support the 16 bit accumulator, the change still cuts the number of index register swaps in half, and the code that writes to the screen is now faster than the 6502 version by 28 clock cycles. If the 6800 address calculation code can be optimized a little, the 6800 may actually beat the 6502 at this. Some self modifying code and a couple tables may be required but it may be possible. The scroll may be slightly faster as well.
To be fair, this screen layout works well for the 680X processors and a standard bitmap may be a little slower. If I were to create a custom graphics chip for the 680X series, I'd include at least one screen layout like the Plus/4's simply because it works so well here.
To be fair, this screen layout works well for the 680X processors and a standard bitmap may be a little slower. If I were to create a custom graphics chip for the 680X series, I'd include at least one screen layout like the Plus/4's simply because it works so well here.
Tuesday, August 28, 2018
8 bit vs 8 bit (Processor wars) Update + 6309 code
The graphics text code for the 6803 has been updated. It has been optimized a bit by loading two bytes from the font at a time with a single LDDD instead of just 1. It also cuts the number of left and right font pointer changes for the index register in half.
It now takes a total of 92 clock cycles for the 6803 to write a pair of characters to the screen. The 6502 takes 152 clock cycles to do the same thing.
For one full screen of characters, the 6803 now takes 46080 fewer clock cycles to write the characters to the screen vs the 6502. I haven't looked up the cycles for the font address or screen address calculation yet, but the 6803 appears to be faster overall even though the 6502 uses tables. The screen scroll is even worse for the 6502.
Counting white space, the 6803 code is now 63 lines long and the 6502 code is 80 lines long not counting the font and all the extra address tables for the 6502. Total size wise it isn't even close.
The 6303 code will benefit from the same optimization but it should be noted that the push instructions take 1 clock cycle more than the 6803. The difference in clock cycles there is more than made up for with faster code in the address calculation section.
The 6809 code also benefits from this optimization, and it only requires 16 instructions to write the pair of characters to the screen vs 24 for the 6803, and 40 for the 6502.
The 6309 code supports the 16 bit instruction EORD, which drops 4 opcodes vs the 6809. There doesn't appear to be an EOR for the new registers, so we can't use 32 bit loads & pushes like I had hoped, but we are only looking at around 60 clock cycles.
As I stated before, this is based on writing to a graphics memory map like on the Plus/4, or C64.
6309 code to write the 2 characters to the screen:
It now takes a total of 92 clock cycles for the 6803 to write a pair of characters to the screen. The 6502 takes 152 clock cycles to do the same thing.
For one full screen of characters, the 6803 now takes 46080 fewer clock cycles to write the characters to the screen vs the 6502. I haven't looked up the cycles for the font address or screen address calculation yet, but the 6803 appears to be faster overall even though the 6502 uses tables. The screen scroll is even worse for the 6502.
Counting white space, the 6803 code is now 63 lines long and the 6502 code is 80 lines long not counting the font and all the extra address tables for the 6502. Total size wise it isn't even close.
The 6303 code will benefit from the same optimization but it should be noted that the push instructions take 1 clock cycle more than the 6803. The difference in clock cycles there is more than made up for with faster code in the address calculation section.
The 6809 code also benefits from this optimization, and it only requires 16 instructions to write the pair of characters to the screen vs 24 for the 6803, and 40 for the 6502.
The 6309 code supports the 16 bit instruction EORD, which drops 4 opcodes vs the 6809. There doesn't appear to be an EOR for the new registers, so we can't use 32 bit loads & pushes like I had hoped, but we are only looking at around 60 clock cycles.
As I stated before, this is based on writing to a graphics memory map like on the Plus/4, or C64.
6309 code to write the 2 characters to the screen:
; print characters to screen ldd 6,x ; get 2 bytes of left character 5 eord 6,y ; add the right character 5 pshu a,b ; write to the screen 4 ldd 4,x ; get 2 bytes of left character 5 eord 4,y ; add the right character 5 pshu a,b ; write to the screen 4 ldd 2,x ; get 2 bytes of left character 5 eord 2,y ; add the right character 5 pshu a,b ; write to the screen 4 ldd ,x ; get 2 bytes of left character 5 eord ,y ; add the right character 5 pshu a,b ; write to the screen 4 rts
Sunday, August 26, 2018
64 column text on a 256x192 graphics screen -> 6502 Part 2
;* Copyright (c) 2015, 2016, 2018 James Diffendaffer
;************************************************** ; NAME: print_64 ;************************************************** ;* Description: ;* 64 Column text display driver ;* Routine does not print at pixel X,Y but ;* prints at a character position. ;************************************************** print_64: ; register a contains character sec sbc #' ' ; printable character set data starts at space, ASCII 32 tax ; save as character table offset .if BytesPerLine = 32 ; point screen to base screen address + row lda #>ScreenAdr ; lda #$80 clc adc row ; adding row to MSB = 256 * row sta fscreen+1 ldy #0 ; top line is always black (1st byte of the font) ; start at zero offset from screen address ; put the lowest bit in the carry so we know if it's left or right nibble lda col ; 2 columns / byte lsr bcs rightnibble .endif .if BytesPerLine = 40 .if Plus4 ; col addition requires multiply by 8 due to the byte order of the screen and ; divide by 2 for two characters per byte, so col * 4 ; any col over 64 requires 2 bytes added to screen address anyway, so make another table ; calculate screen address based on row and column clc ldy row ; put row offset in y lda scrtableLSB,y ; get screen row address MSB from table ldy col ; put col index in y adc coltableLSB,y ; get LSB from column table sta fscreen ; store LSB of screen address ldy row ; put row offset in y lda scrtableMSB,y ; get screen row address MSB from table ldy col ; put col index in y adc coltableMSB,y ; get MSB from column table sta fscreen+1 ; store MSB of screen address tya ; put column in a ldy #0 ; top line is always black (1st byte of the font) ; start at zero offset from screen address lsr ; even or odd column? bcs rightnibble .else ldy row ; put row offset in y ; clc ; carry should be clear from asl lda col lsr adc scrtableLSB,y ; get LSB offset from table sta fscreen ; store LSB lda #0 adc scrtableMSB,y ; get MSB offset from table and add carry sta fscreen+1 ; store MSB ldy #0 ; top line is always black (1st byte of the font) ; start at zero offset from screen address lda col lsr bcs rightnibble .endif .endif ;************************************************** ;* left nibble ;************************************************** leftnibble: lda BGColor sta (fscreen),y ; write to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*1 ; point to next screen byte .endif lda (fscreen),y ; eor FCol1,X ; EOR with the next byte of the font ; and #%00001111 eor FCol1,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*2 ; point to next screen byte .endif lda (fscreen),y ; eor FCol2,X ; EOR with the next byte of the font ; and #%00001111 eor FCol2,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*3 ; point to next screen byte .endif lda (fscreen),y ; eor FCol3,X ; EOR with the next byte of the font ; and #%00001111 eor FCol3,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*4 ; point to next screen byte .endif lda (fscreen),y ; eor FCol4,X ; EOR with the next byte of the font ; and #%00001111 eor FCol4,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*5 ; point to next screen byte .endif lda (fscreen),y ; eor FCol5,X ; EOR with the next byte of the font ; and #%00001111 eor FCol5,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*6 ; point to next screen byte .endif lda (fscreen),y ; eor FCol6,X ; EOR with the next byte of the font ; and #%00001111 eor FCol6,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else .if BytesPerLine = 32 ldy #BytesPerLine*7 ; point to next screen byte .endif .if BytesPerLine = 40 ;clc lda fscreen ; point to next screen byte adc #BytesPerLine sta fscreen ; LSB lda #0 adc fscreen+1 ; MSB sta fscreen+1 .endif .endif lda (fscreen),y ; eor FCol7,X ; EOR with the next byte of the font ; and #%00001111 eor FCol7,X ; EOR with the next byte of the font sta (fscreen),y rts ;************************************************** ; right nibble ;************************************************** rightnibble: lda BGColor sta (fscreen),y ; write to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*1 ; point to next screen byte .endif lda (fscreen),y ; eor FCol21,X ; EOR with the next byte of the font ; and #%11110000 eor FCol21,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*2 ; point to next screen byte .endif lda (fscreen),y ; eor FCol22,X ; EOR with the next byte of the font ; and #%11110000 eor FCol22,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*3 ; point to next screen byte .endif lda (fscreen),y ; eor FCol23,X ; EOR with the next byte of the font ; and #%11110000 eor FCol23,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*4 ; point to next screen byte .endif lda (fscreen),y ; eor FCol24,X ; EOR with the next byte of the font ; and #%11110000 eor FCol24,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*5 ; point to next screen byte .endif lda (fscreen),y ; eor FCol25,X ; EOR with the next byte of the font ; and #%11110000 eor FCol25,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*6 ; point to next screen byte .endif lda (fscreen),y ; eor FCol26,X ; EOR with the next byte of the font ; and #%11110000 eor FCol26,X ; EOR with the next byte of the font sta (fscreen),y .if Plus4 = 1 iny ; point to next screen byte .else .if BytesPerLine = 32 ldy #BytesPerLine*7 ; point to next screen byte .endif .if BytesPerLine = 40 ;clc lda fscreen ; point to next screen byte adc #BytesPerLine sta fscreen ; LSB lda #0 adc fscreen+1 ; MSB sta fscreen+1 .endif .endif lda (fscreen),y ; eor FCol27,X ; EOR with the next byte of the font ; and #%11110000 eor FCol27,X ; EOR with the next byte of the font sta (fscreen),y rts ;************************************************** ; write two characters at once ;************************************************** print_642: ; register a contains character ; lda (string),y sec sbc #' ' ; printable character set data starts at space, ASCII 32 sta firstchar ; save as character table offset iny lda (string),y sec sbc #' ' sta secondchar .if BytesPerLine = 32 ; point screen to $8000 + row (base screen address + row) lda #>ScreenAdr ; lda #$80 clc adc row ; adding row to MSB = 256 * row sta fscreen+1 ldy #0 ; top line is always black (1st byte of the font) ; start at zero offset from screen address ; add the column lda col ; 2 columns / byte lsr sta fscreen ; save it .endif .if BytesPerLine = 40 lda row ; asl ; * 2 for word sized table (row max of 25 * 2) tax ; put offset in x ; clc ; carry should be clear from asl lda col lsr adc scrtableLSB,X ; get LSB offset from table sta fscreen ; store LSB lda scrtableMSB,X ; get MSB offset from table adc #0 ; add carry sta fscreen+1 ; store MSB .endif twochar: lda BGColor sta (fscreen),y ; write to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*1 ; point to next screen byte .endif ldx firstchar ; offset to 1st character lda FCol1,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol21,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*2 ; point to next screen byte .endif lda FCol22,X ; add the next byte of the 2nd character ldx firstchar ; offset to 1st character eor FCol2,X ; load the next byte of the 1st character sta (fscreen),y ; write it to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*3 ; point to next screen byte .endif lda FCol3,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol23,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*4 ; point to next screen byte .endif lda FCol24,X ; add the next byte of the 2nd character ldx firstchar ; offset to 1st character eor FCol4,X ; load the next byte of the 1st character sta (fscreen),y ; write it to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*5 ; point to next screen byte .endif lda FCol5,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol25,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen .if Plus4 = 1 iny ; point to next screen byte .else ldy #BytesPerLine*6 ; point to next screen byte .endif lda FCol26,X ; add the next byte of the 2nd character ldx firstchar ; offset to 1st character eor FCol6,X ; load the next byte of the 1st character sta (fscreen),y ; write it to the screen .if Plus4 = 1 iny ; point to next screen byte .else .if BytesPerLine = 32 ldy #BytesPerLine*7 ; point to next screen byte .endif .if BytesPerLine = 40 ;clc lda fscreen ; point to next screen byte adc #BytesPerLine sta fscreen ; LSB lda #0 adc fscreen+1 ; MSB sta fscreen+1 .endif .endif lda FCol7,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol27,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen rts .if BytesPerLine = 40 ;************************************************** ; 80 column address lookup table ;************************************************** .define scrval ScreenAdr+BytesPerLine*8 .define coltable scrval*0, scrval*1, scrval*2, scrval*3, scrval*4, scrval*5, scrval*6, scrval*7, scrval*8, scrval*9, scrval*10, scrval*11, scrval*12, scrval*13, scrval*14, scrval*15, scrval*16, scrval*17, scrval*18, scrval*19, scrval*20, scrval*21, scrval*22, scrval*23, scrval*24 scrtableLSB: .lobytes coltable scrtableMSB: .hibytes coltable .endif ;************************************************** ; HALF WIDTH 4x8 FONT ; Top row is always zero and not stored (336 bytes) ; characters are 4 bits wide and 7 bits high ; (the top row is always blank) ; There are two characters stored in each group of ; 7 bytes. Each byte has bits for one character in ; the high nibble and bits for another in the low nibble ; Font borrowed from Sinclair Spectrum code ;************************************************** ;.align 256 font: ;FCol0: ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ;FCol20: ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 ; .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 FCol1: .byte $00, $20, $50, $20, $20, $50, $20, $20, $10, $40, $20, $00, $00, $00, $00, $10 .byte $20, $20, $20, $70, $50, $70, $10, $70, $20, $20, $00, $00, $00, $00, $00, $20 .byte $20, $30, $60, $30, $60, $70, $70, $30, $50, $70, $30, $50, $40, $50, $60, $20 .byte $60, $20, $60, $30, $70, $50, $50, $50, $50, $50, $70, $30, $40, $60, $20, $00 .byte $20, $00, $40, $00, $10, $00, $10, $00, $40, $20, $10, $40, $60, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $10, $20, $40, $50, $60 FCol21: .byte $00, $02, $05, $02, $02, $05, $02, $02, $01, $04, $02, $00, $00, $00, $00, $01 .byte $02, $02, $02, $07, $05, $07, $01, $07, $02, $02, $00, $00, $00, $00, $00, $02 .byte $02, $03, $06, $03, $06, $07, $07, $03, $05, $07, $03, $05, $04, $05, $06, $02 .byte $06, $02, $06, $03, $07, $05, $05, $05, $05, $05, $07, $03, $04, $06, $02, $00 .byte $02, $00, $04, $00, $01, $00, $01, $00, $04, $02, $01, $04, $06, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $01, $02, $04, $05, $06 FCol2: .byte $00, $20, $50, $70, $70, $10, $40, $20, $20, $20, $70, $00, $00, $00, $00, $10 .byte $50, $60, $50, $10, $50, $40, $20, $10, $50, $50, $00, $00, $10, $00, $40, $50 .byte $50, $50, $50, $40, $50, $40, $40, $40, $50, $20, $10, $50, $40, $70, $50, $50 .byte $50, $50, $50, $40, $20, $50, $50, $50, $50, $50, $10, $20, $40, $20, $50, $00 .byte $10, $00, $40, $00, $10, $00, $20, $00, $40, $00, $00, $40, $20, $00, $00, $00 .byte $00, $00, $00, $00, $20, $00, $00, $00, $00, $00, $00, $20, $20, $20, $a0, $90 FCol22: .byte $00, $02, $05, $07, $07, $01, $04, $02, $02, $02, $07, $00, $00, $00, $00, $01 .byte $05, $06, $05, $01, $05, $04, $02, $01, $05, $05, $00, $00, $01, $00, $04, $05 .byte $05, $05, $05, $04, $05, $04, $04, $04, $05, $02, $01, $05, $04, $07, $05, $05 .byte $05, $05, $05, $04, $02, $05, $05, $05, $05, $05, $01, $02, $04, $02, $05, $00 .byte $01, $00, $04, $00, $01, $00, $02, $00, $04, $00, $00, $04, $02, $00, $00, $00 .byte $00, $00, $00, $00, $02, $00, $00, $00, $00, $00, $00, $02, $02, $02, $0a, $09 FCol3: .byte $00, $20, $00, $20, $60, $20, $30, $00, $40, $10, $20, $20, $00, $00, $00, $20 .byte $50, $20, $10, $20, $50, $60, $60, $10, $20, $50, $20, $20, $20, $70, $20, $10 .byte $70, $50, $60, $40, $50, $60, $60, $40, $70, $20, $10, $60, $40, $50, $50, $50 .byte $50, $50, $50, $20, $20, $50, $50, $50, $20, $50, $20, $20, $20, $20, $00, $00 .byte $00, $30, $60, $30, $30, $20, $70, $30, $60, $60, $30, $50, $20, $50, $60, $20 .byte $60, $30, $50, $30, $70, $50, $50, $50, $50, $50, $70, $20, $20, $20, $00, $60 FCol23: .byte $00, $02, $00, $02, $06, $02, $03, $00, $04, $01, $02, $02, $00, $00, $00, $02 .byte $05, $02, $01, $02, $05, $06, $06, $01, $02, $05, $02, $02, $02, $07, $02, $01 .byte $07, $05, $06, $04, $05, $06, $06, $04, $07, $02, $01, $06, $04, $05, $05, $05 .byte $05, $05, $05, $02, $02, $05, $05, $05, $02, $05, $02, $02, $02, $02, $00, $00 .byte $00, $03, $06, $03, $03, $02, $07, $03, $06, $06, $03, $05, $02, $05, $06, $02 .byte $06, $03, $05, $03, $07, $05, $05, $05, $05, $05, $07, $02, $02, $02, $00, $06 FCol4: .byte $00, $20, $00, $20, $30, $20, $50, $00, $40, $10, $50, $70, $00, $70, $00, $20 .byte $50, $20, $20, $10, $70, $10, $50, $20, $50, $30, $00, $00, $40, $00, $10, $20 .byte $70, $70, $50, $40, $50, $40, $40, $50, $50, $20, $50, $50, $40, $50, $50, $50 .byte $60, $50, $60, $10, $20, $50, $50, $50, $20, $20, $20, $20, $20, $20, $00, $00 .byte $00, $50, $50, $40, $50, $50, $20, $50, $50, $20, $10, $60, $20, $70, $50, $50 .byte $50, $50, $60, $60, $20, $50, $50, $50, $20, $50, $30, $40, $20, $10, $00, $40 FCol24: .byte $00, $02, $00, $02, $03, $02, $05, $00, $04, $01, $05, $07, $00, $07, $00, $02 .byte $05, $02, $02, $01, $07, $01, $05, $02, $05, $03, $00, $00, $04, $00, $01, $02 .byte $07, $07, $05, $04, $05, $04, $04, $05, $05, $02, $05, $05, $04, $05, $05, $05 .byte $06, $05, $06, $01, $02, $05, $05, $05, $02, $02, $02, $02, $02, $02, $00, $00 .byte $00, $05, $05, $04, $05, $05, $02, $05, $05, $02, $01, $06, $02, $07, $05, $05 .byte $05, $05, $06, $06, $02, $05, $05, $05, $02, $05, $03, $04, $02, $01, $00, $04 FCol5: .byte $00, $00, $00, $70, $70, $40, $50, $00, $40, $10, $00, $20, $00, $00, $00, $40 .byte $50, $20, $40, $50, $10, $50, $50, $20, $50, $20, $00, $00, $20, $70, $20, $00 .byte $40, $50, $50, $40, $50, $40, $40, $50, $50, $20, $50, $50, $40, $50, $50, $50 .byte $40, $50, $50, $50, $20, $50, $20, $70, $50, $20, $40, $20, $10, $20, $00, $00 .byte $00, $50, $50, $40, $50, $60, $20, $50, $50, $20, $10, $50, $20, $50, $50, $50 .byte $50, $50, $40, $30, $20, $50, $20, $70, $20, $50, $60, $20, $20, $20, $00, $60 FCol25: .byte $00, $00, $00, $07, $07, $04, $05, $00, $04, $01, $00, $02, $00, $00, $00, $04 .byte $05, $02, $04, $05, $01, $05, $05, $02, $05, $02, $00, $00, $02, $07, $02, $00 .byte $04, $05, $05, $04, $05, $04, $04, $05, $05, $02, $05, $05, $04, $05, $05, $05 .byte $04, $05, $05, $05, $02, $05, $02, $07, $05, $02, $04, $02, $01, $02, $00, $00 .byte $00, $05, $05, $04, $05, $06, $02, $05, $05, $02, $01, $05, $02, $05, $05, $05 .byte $05, $05, $04, $03, $02, $05, $02, $07, $02, $05, $06, $02, $02, $02, $00, $06 FCol6: .byte $00, $20, $00, $20, $20, $50, $30, $00, $20, $20, $00, $00, $20, $00, $10, $40 .byte $20, $70, $70, $20, $10, $20, $20, $20, $20, $40, $20, $20, $10, $00, $40, $20 .byte $30, $50, $60, $30, $60, $70, $40, $30, $50, $70, $20, $50, $70, $50, $50, $20 .byte $40, $30, $50, $20, $20, $20, $20, $50, $50, $20, $70, $20, $10, $20, $00, $00 .byte $00, $30, $60, $30, $30, $30, $40, $30, $50, $70, $50, $50, $70, $50, $50, $20 .byte $60, $30, $40, $60, $10, $20, $20, $50, $50, $30, $70, $20, $20, $20, $00, $90 FCol26: .byte $00, $02, $00, $02, $02, $05, $03, $00, $02, $02, $00, $00, $02, $00, $01, $04 .byte $02, $07, $07, $02, $01, $02, $02, $02, $02, $04, $02, $02, $01, $00, $04, $02 .byte $03, $05, $06, $03, $06, $07, $04, $03, $05, $07, $02, $05, $07, $05, $05, $02 .byte $04, $03, $05, $02, $02, $02, $02, $05, $05, $02, $07, $02, $01, $02, $00, $00 .byte $00, $03, $06, $03, $03, $03, $04, $03, $05, $07, $05, $05, $07, $05, $05, $02 .byte $06, $03, $04, $06, $01, $02, $02, $05, $05, $03, $07, $02, $02, $02, $00, $09 FCol7: .byte $00, $00, $00, $00, $00, $00, $00, $00, $10, $40, $00, $00, $20, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $20, $00, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $30, $00, $60, $00, $F0 .byte $00, $00, $00, $00, $00, $00, $00, $66, $00, $00, $20, $00, $00, $00, $00, $00 .byte $40, $10, $00, $00, $00, $00, $00, $00, $00, $60, $00, $10, $00, $40, $00, $60 FCol27: .byte $00, $00, $00, $00, $00, $00, $00, $00, $01, $04, $00, $00, $02, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $02, $00, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $03, $00, $06, $00, $0F .byte $00, $00, $00, $00, $00, $00, $00, $06, $00, $00, $02, $00, $00, $00, $00, $00 .byte $04, $01, $00, $00, $00, $00, $00, $00, $00, $06, $00, $01, $00, $04, $00, $06 ; .bss String: .res 256 .if Plus4 ; Plus4 column offset. I use Font Height but this must be fixed at 8 for the Plus4 screen layout ; format column left nibble offset, column right nibble offset (both are the same), next... ; doubling info lets us replace lda col, lsr, tay with ldy coltableLSB: .byte <FontHeight*0,<FontHeight*0, <FontHeight*1,<FontHeight*1, <FontHeight*2,<FontHeight*2, <FontHeight*3,<FontHeight*3 .byte <FontHeight*4,<FontHeight*4, <FontHeight*5,<FontHeight*5, <FontHeight*6,<FontHeight*6, <FontHeight*7,<FontHeight*7 .byte <FontHeight*8,<FontHeight*8, <FontHeight*9,<FontHeight*9, <FontHeight*10,<FontHeight*10, <FontHeight*11,<FontHeight*11 .byte <FontHeight*12,<FontHeight*12, <FontHeight*13,<FontHeight*13, <FontHeight*14,<FontHeight*14, <FontHeight*15,<FontHeight*15 .byte <FontHeight*16,<FontHeight*16, <FontHeight*17,<FontHeight*17, <FontHeight*18,<FontHeight*18, <FontHeight*19,<FontHeight*19 .byte <FontHeight*20,<FontHeight*20, <FontHeight*21,<FontHeight*21, <FontHeight*22,<FontHeight*22, <FontHeight*23,<FontHeight*23 .byte <FontHeight*24,<FontHeight*24, <FontHeight*25,<FontHeight*25, <FontHeight*26,<FontHeight*26, <FontHeight*27,<FontHeight*27 .byte <FontHeight*28,<FontHeight*28, <FontHeight*29,<FontHeight*29, <FontHeight*30,<FontHeight*30, <FontHeight*31,<FontHeight*31 .byte <FontHeight*32,<FontHeight*32, <FontHeight*33,<FontHeight*33, <FontHeight*34,<FontHeight*34, <FontHeight*35,<FontHeight*35 .byte <FontHeight*36,<FontHeight*36, <FontHeight*37,<FontHeight*37, <FontHeight*38,<FontHeight*38, <FontHeight*39,<FontHeight*39 coltableMSB: .byte >FontHeight*0,>FontHeight*0, >FontHeight*1,>FontHeight*1, >FontHeight*2,>FontHeight*2, >FontHeight*3,>FontHeight*3 .byte >FontHeight*4,>FontHeight*4, >FontHeight*5,>FontHeight*5, >FontHeight*6,>FontHeight*6, >FontHeight*7,>FontHeight*7 .byte >FontHeight*8,>FontHeight*8, >FontHeight*9,>FontHeight*9, >FontHeight*10,>FontHeight*10, >FontHeight*11,>FontHeight*11 .byte >FontHeight*12,>FontHeight*12, >FontHeight*13,>FontHeight*13, >FontHeight*14,>FontHeight*14, >FontHeight*15,>FontHeight*15 .byte >FontHeight*16,>FontHeight*16, >FontHeight*17,>FontHeight*17, >FontHeight*18,>FontHeight*18, >FontHeight*19,>FontHeight*19 .byte >FontHeight*20,>FontHeight*20, >FontHeight*21,>FontHeight*21, >FontHeight*22,>FontHeight*22, >FontHeight*23,>FontHeight*23 .byte >FontHeight*24,>FontHeight*24, >FontHeight*25,>FontHeight*25, >FontHeight*26,>FontHeight*26, >FontHeight*27,>FontHeight*27 .byte >FontHeight*28,>FontHeight*28, >FontHeight*29,>FontHeight*29, >FontHeight*30,>FontHeight*30, >FontHeight*31,>FontHeight*31 .byte >FontHeight*32,>FontHeight*32, >FontHeight*33,>FontHeight*33, >FontHeight*34,>FontHeight*34, >FontHeight*35,>FontHeight*35 .byte >FontHeight*36,>FontHeight*36, >FontHeight*37,>FontHeight*37, >FontHeight*38,>FontHeight*38, >FontHeight*39,>FontHeight*39 .endif .CODE .if AcornAtom = 1 .macro SETVDG value lda #value ; 6847 control - GM2 GM1 GM0 A/G 0 0 0 0 sta $B000 .endmacro InitScreen: ; lda #$FF lda #$00 sta BGColor ;set the background color ;clear the screen before we show it jsr cls ;Acorn Atom SETVDG(%11110000) ; 6847 control - GM2 GM1 GM0 A/G 0 0 0 0 RG6 = 11110000 rts .endif .if Plus4 = 1 InitScreen: ;set hi-res graphics mode lda #VMSet ; Load the video mode setting sta TEDVMR ; set it ;set video RAM address lda #VASet ; Load the video address setting sta VBASEREG ; set the address of our hi-res screen 3072/255 ; set up color RAM lda #$33 ;%00110011 ldx #$00 ;clear X @cloop: sta $ColorRAM,x sta $ColorRAM+$100,x sta $ColorRAM+$200,x dex bne @cloop ; loop until x hits zero again (256 times) ldx #$ @cloop2: sta $ColorRAM+$300,x dex bne @cloop2 rts .endif .if Atari = 1 ; ****************************** ; Atari 8 bit code ; ****************************** ; ****************************** ; CIO equates ; ****************************** ICHID = $0340 ICDNO = $0341 ICCOM = $0342 ICSTA = $0343 ICBAL = $0344 ICBAH = $0345 ICPTL = $0346 ICPTH = $0347 ICBLL = $0348 ICBLH = $0349 ICAX1 = $034A ICAX2 = $034B CIOV = $E456 ; ****************************** ; Other equates needed ; ****************************** ;COLOR0 = $02C4 ;COLCRS = $55 ;ROWCRS = $54 ;ATACHR = $02FB ;STORE1 = $CC ;STOCOL = $CD COLOR0 = $02C4 ; OS COLOR REGISTERS COLOR1 = $02C5 COLOR2 = $02C6 COLOR3 = $02C7 COLOR4 = $02C8 ; ****************************** ; Non Maskable Interrupt Enable register ; clear bit bits of the NMIEN register at $D40E to disable, set to enable ; DLI - D7 ; VBI - D6 ; RESET - D5 ; ****************************** NMIEN = $D40E ; non maskable interrupt enable ; SDMCTL=$022F ; SDLSTL=$0230 SDLSTH=$0231 DMACTL=$D400 ;Mode 8 requires 40 * 192 bytes, or 7680 bytes. ;So a mode 8 screen will cross a 4K boundary, and my display list is a little more complex. ;To have contiguous screen RAM, I have to align the end of a 40 byte line with the end of the first 4K block. ;4K = 4096 bytes. 4096 / 40 = 102.4. So 102 lines can fit in the 2nd 4K page, and 90 can fit in the first (192 lines - 102 = 90). ;So find the 4K boundary I want to be top of RAM, subtract 4K, subtract 90 * 80 and that is the screen start address. ;Add 102 * 40 (4080) and that will give me the screen end address + 1. 4096 - 4080 = 16 unused bytes at the end of the screen. InitScreen: lda #$00 sta BGColor ;set the background color ;set up our display LDA #$0D ; Set COLOR Light Grey STA COLOR2 ; Set background color LDA #00 ; clear A STA COLOR1 ; Set COLOR Black STA SDMCTL ; TURN ANTIC OFF FOR A MOMENT ... LDA #<HLIST ; WHILE WE STORE OUR NEW LIST'S ADDRESS STA SDLSTL ; IN THE OS DISPLAY POINTER. LDA #>HLIST ; NOW FOR THE HIGH BYTE. /256 STA SDLSTH ; NOW ANTIC WILL KNOW OUR NEW ADDRESS .if BytesPerLine = 32 LDA #$21 ; NARROW PLAYFIELD 256 .elseif BytesPerLine = 40 LDA #$22 ; NORMAL PLAYFIELD 320 .elseif BytesPerLine = 48 LDA #$23 ; WIDE PLAYFIELD 384 .endif STA SDMCTL ; ... SO WE'LL TURN ANTIC BACK ON NOW RTS .data ; .align 256,0 ; .res 194 .res 76 ; ;Atari Antic Display List ; Screen Mode 8, 256/320/384 pixels wide x 192 high, 1 bit per pixel ; HLIST: .BYTE $70,$70,$70 ; 3 BLANK LINES .if BytesPerLine = 32 ;first 4K screen block. Only using 2K .BYTE $0F+64 ; Mode 8 + LSM, 256/320/384 pixels wide, 192 high, 1 bit per pixel, 7891 bytes .WORD SCREEN ; Screen RAM address ; 63 mode 8 instructions + above instruction = 64 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 ; ;second 4K screen block, completely filled .BYTE $0F+64 ; Mode 8 + LSM, 256/320/384 pixels wide, 192 high, 1 bit per pixel, 7891 bytes .WORD SCREEN+2048 ; Screen RAM address ; 127 mode 8 instructions + above = 128 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .endif .if BytesPerLine = 40 ;first 4K screen block .BYTE $0F+64 ; Mode 8 + LSM, 256/320/384 pixels wide, 192 high, 1 bit per pixel, 7891 bytes .WORD SCREEN ; Screen RAM address ; 89 mode 8 instructions + above .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 ; ;second 4K screen block .BYTE $0F+64 ; Mode 8 + LSM, 320 pixels wide, 192 high, 1 bit per pixel, 7891 bytes .WORD SCREEN+40*90 ; Screen RAM address ; 101 mode 8 instructions + above .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F ; Mode 8 .BYTE $0F,$0F,$0F,$0F,$0F ; Mode 8 .endif ; .BYTE $42,$60,$9F,$02,$02 ;copied from basic DL 8 dump .BYTE $41; JVB INSTRUCTION .WORD HLIST; TO JUMP BACK TO START OF LIST .endif EOF:.END
Subscribe to:
Posts (Atom)