Wednesday, December 12, 2018

Apple II vs MC-10

Apple II vs the MC-10 with the new ROM.  It would be worse if I hadn't optimized the original code by moving some calculations outside of the loops.

The images look very different, but the only difference is the scale and center of the image.  They are performing the same number of calculations and setting the same number of pixels.  Once I add hi-res graphics support this will be more obvious.

New CoCo 3 ROM

For those of you that want a new CoCo 3 ROM, I've turned my current code over to William Astle.
Pretty sure he'll be able to crank out the 6809 changes faster than I would since I haven't been working with the 6809 much.  This will also leave me to work on a few other optimizations I had in mind.

Monday, December 10, 2018

Finding Prime Numbers

Another speed comparison, finding prime numbers up to 10,000.



Thursday, December 6, 2018

CoCo 3 vs MC-10 with new ROM

Up to this point I've shown comparisons of the new vs old MC-10 ROM.  This video demonstrates  the difference in speed between the CoCo 3 running at double speed vs the MC-10 with the new ROM.  This is the difference the hardware multiply makes, along with a better implementation of the interpreter.  The CoCo 3 will show a greater difference on non-math oriented programs, but this provides a rather interesting comparison.  If I were to replace the MC-10's Motorola 6803 with a Hitachi 6303, and if it provided even a 10% speed increase, the MC-10 would win in this comparison.   The 6309 provides at least a 20% speedup over the 6809 so that is a very real possibility.

Wednesday, December 5, 2018

Some last minute notes on the new MC-10 ROM

Just some notes on last minute changes to the new MC-10 ROM.

The 16 bit string compare may be a few clock cycles slower when comparing strings of 1-3 bytes in length(?), but anything longer will be faster.  The slower performance on short strings will be more than made up for by the faster interpreter, but the difference on longer strings is significant enough to make this change well worthwhile.  The new interpreter will always be faster, but should be noticeably so when comparing a lot of long strings like with my test sort code (which I'll release shortly).

The manner in how the the main loop calls the functions associated with tokens has been changed. 
The size of the "Command Dispatch Table" has been enlarged to account for all potential tokens.  This allowed the removal of the range check before calling the address in the Command Dispatch Table, saving a few clock cycles.  Tokens that would result in a syntax error, now jump directly to the error, and undefined tokens jump to the RAM hook "RVEC10" which is set to jump to syntax error by default.  This lets the main loop skip the jump to the RAM hook for every ROM based token routine, saving multiple instructions for every token executed, but you can still intercept unused tokens to extend the ROM.
While existing programs that override tokens will not work on the new ROM (I can only think of one), it is still possible to extend BASIC, and alternative versions of existing commands could replace the current ones.  I'm working on a way to embed new tokens and parameters directly into the code.

*edit*
Please note that tokens $F0-$FF cannot be used at this time.  The token number is multiplied by 2 to calculate the offset in the dispatch table.  This is done with a bit shift, but the result is only one byte and the carry is lost.  It's a trade off for speed.  It still leaves room for 38 new tokens, and I'd like to reserve $EF for future expansion.  (Two byte tokens could be a possibility in the future)

Monday, November 26, 2018

MC-10 ROM fixes before a release

The bug in the code that converts ASCII to floating point by changing divide by 10 to multiplying by 1/10 is now fixed.  This offered a significant speedup, so I'm happy that code will make it into the release.  That leaves one bug to fix before an official release.  That bug is likely to be in the math library, and I'm pretty sure I know what it is.  Once I get time it shouldn't take too long to fix.  The first release will definitely be a 16K ROM, but an 8K ROM with most of the code may follow if it will fit.  I am not going to fight with it to get that working though.

You can see the bug in my youtube video with the sort comparison.  The gap values are different between the original and new ROM.  That is no longer the case.  FWIW, the the constant for 1/10 was wrong and it was the first thing I checked, so... easy fix.


Thursday, November 22, 2018

Breaking the chains... of the 8K barrier

Only one day after deciding to finally ditch the MC-10 8K ROM limitation, a couple small changes have pushed the new ROM to almost 10% faster than the original, and the actual executable code is smaller.  I simply dumped the valid token check in the main loop and extended the command dispatch table.  Anything invalid jumps strait to SN ERROR, and valid tokens are executed quicker.  The table wastes some space, but there will be less wasted space as I add Extended BASIC commands to the ROM.

This does not include the code that uses multiplication to convert ASCII numbers to binary instead of division, and the code that stores the current line pointer instead of the current line number.

Solitaire Solver, original ROM vs new ROM 

Again... with feeling!

For your viewing pleasure

Wednesday, November 21, 2018

Latest MC-10 ROM snapshot

The video is still uploading, but here is a snapshot of the difference in speed between the original MC-10 ROM and new one on the "Fedora" 3D Plot.  The SIN(), COS(), TAN(), and string compare are faster.
I'd make a video of a large sort to show off the string compare, but it's not as dramatic and would take several minutes to watch.

Friday, November 16, 2018

String variable test/sort code (BASIC)

Here is a little test program I threw together to benchmark the 16 bit string compare code I'm preparing for a future ROM version.  A little bonus here is a sort implementation you may not have seen before.

Back in the 90s I was working on a project I inherited from someone else.  It involved nightly data dumps from a production system to a local server that made the data accessible to customers via modem, using custom software.  The process was rather labor intensive.  The operator had to run the data extraction, ftp the data to a PC, which was then imported by a custom VB application that required multiple starts, executing other program, then continuing, etc...  It was impossible to automate the way it was.
Upon investigation, the original programmer did not know how to implement a sort, so he was using an external utility.   (he also didn't seem to know how to take a shower... but that's another topic)
After a quick search on Yahoo, no VB sorts turned up (I think the search engine was undergoing maintenance), so I threw in a bubble sort and went home.  Some 12 hours later, the sort was still going so I killed it, wrote a more optimal version, and on the first try it took about 18 minutes to sort the 80,000+ records, and it didn't require human intervention so it could be launched automatically. 
Anyway, a more polished version of that sort is below.  This has some optimizations that weren't in the original as it's aimed at the Microsoft BASIC interpreter.

The sort is, at the very least, a really improved version of the bubble sort.  It functions a bit like a comb or shell sort, so I'm not sure what you'd call it.  It's easy to implement, pretty fast, and it's fairly small.  Taking a hint from the comb sort, dividing the gap by two may not be the most efficient, but it works pretty well for something I came up with off the top of my head in my 20s.

The data comes from a list of the 1000 most common surnames, and it is truncated here for space.  The data includes a few other numbers with it that I didn't waste time removing.  With all the additional data, only 894 names fit in memory, but that provides a really good workout for the interpreter and sort.  Sorting takes over 10 minutes on a regular MC-10, but I'm hoping the new string comparison cuts that by at least 20%, which is about what you can expect from unrolling a loop once.



0 I=0:N=1000:C=0:D=0:E=0:NW=0

'initialize the index array, and read in the names.  Skip the other data.
'scans through the data to determine the array sizes before dimensioning the arrays.
'The most heavily use variables are initialized on the first like to insure they are
' at the start of the variable table
'
10 I=0:J=0:G=0:N=0:Q=0
15 PRINT"SCANNING FOR END OF DATA"
20 READ K$,C,D,E:IF K$<>"" THEN N=N+1:GOTO 20
30 RESTORE
40 DIM A$(N),B(N)
45 PRINT"INITIALIZING ARRAYS"
50 FORI=1TON:READ A$(I),C,D,E:B(I)=I:NEXT
60 PRINT:PRINT N" MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR"
70 GOSUB 2000
80 PRINT:PRINT "SORTING"N"NAMES":GOSUB 10000
90 PRINT:PRINT N" MOST POPULAR SIR NAME IN ALPHABETICAL ORDER"
95 GOSUB 2000
96 END


'print out the names based on the array index
2000 FOR I=1 TO N:PRINT A$(B(I))" ";:NEXT:RETURN



'optimized bubble(?) sort.
'Moves data as far as possible with each Middle loop
'Skips data that doesn't need sorted with each new pass on inner loop.
'P = what we divide the array size by to get the gap
'G = the gap
'using nested for loops avoids having to perform a line search that would
'happen with GOTO.  The address is simply pulled off of the stack.
'Using STEP 0 keeps FOR NEXT from changing the variable and the endless loop
'exits once the exit condition is achieved

10000 P=1:FORG=2TO1STEP0:P=P*2:G=INT(N/P):NW=N-G:PRINTG;
10010 FORQ=1TO0STEP0:Q=0:FORI=1TONW
10020 IFA$(B(I))>A$(B(I+G))THENT=B(I):B(I)=B(I+G):B(I+G)=T:Q=I
10030 NEXT:IFQ>0THENNW=Q
10040 NEXT:NEXT:RETURN



'1000 MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR

100 DATA SMITH,2501922,1.006,1,JOHNSON,2014470,.81,2,WILLIAMS,1738413,.699,3
101 DATA JONES,1544427,.621,4,BROWN,1544427,.621,5,DAVIS,1193760,.48,6
...


1000 DATA "",0,0,0 :REM END OF DATA MARKER


Warning about using TASM for 6803 development

TASM (the Telmark Assembler, not the other TASM assembler) does not always warn you if if the range of an 8 bit relative branch is exceeded.  You can only detect the wrong code was generated by looking at the .LST file and checking the actual offsets by hand, or once you run the program in the VMC10 emulator, and list file.  Then the emulator will halt of the bad code and tell you in the LST window that the range had been exceeded.  This happened to me on a BRA instruction this morning and made me very unhappy.  

Tuesday, November 13, 2018

SIN() in please! (MC-10 ROM changes)

The new MC-10 ROM has been sitting untouched and incomplete for far too long, so I worked on it a few hours.

A small list of some of the changes that were made.
  1. Miscellaneous fixes such as only adding a colon in front of the ELSE statement during tokenization if the character before it isn't already a colon.  It would add one no matter what before.  The colon is required for the ELSE to work.
  2. A redundant floating point register load was eliminated.
  3. The variable that saves the current line number was changed to the current line pointer.  This was the easiest way to implement the feature, and it only took about an hour.  All that needed changed was the code that sets or reads the variable, and code that needs the current line number.  This will only be faster once I store both, but it was the quickest way to get it working since it didn't require significant stack code changes.  The code that prints the line number on BREAK, now prints the address of the line, but the fix shouldn't be difficult once I have time to work on it.  A side benefit of this change was the elimination of some unnecessary code and it's close to working in 8K again.
  4. The most significant change, is the SIN() function now uses multiply to perform division by 2*Pi using the reciprocal (invert and multiply).  So it multiplies by 1/(2*Pi).  The code works well enough to perform some initial tests, which show this offers a significant speedup.  This also speeds up COS() since that is calculated with SIN(n+(Pi/2)), and TAN() which is calculated with  SIN(n) / COS(n).  The code does not work for all cases though, so there's some work to be done yet, but it works well enough I could make the video below.

This is the previous video showing the speed of the new ROM vs the factory ROM:


This is the ROM with the SIN() code change:

Saturday, October 13, 2018

Article 'A Great Old-Timey Game-Programming Hack', and a response

Here's a nice little story related to the 6809.  It shows of one of the more interesting optimizations you can use with the 6809.  It's also neat to see that people came up with similar solutions completely isolated from each other.

Link



My MC-10 (6803) 64 column graphics text code's screen scroll also uses the stack register as the destination pointer for similar reasons, but there are differences vs the 6809.

Each register PUSHed or PULLed requires a separate instruction, where the 6809 can PUSH or PULL multiple registers with a single instruction.  As a result, he 6803 code looks more like their earlier code.

With only one stack pointer, you have to use the index register for the other source or destination pointer, and the offset is only 1 byte, so you can only go up to 254 with LDD #,X before you have to change X.  The code looks like this, and it's unrolled for a 256 byte section of the screen:

   LDD #255,x      ; 2 bytes, 5 clock cycles
   PSHB                ; 1 byte, 3 clock cycles
   PSHA                ; 1 byte, 3 clock cycles
   LDD #254,x      ; 2 bytes, 5 clock cycles
   PSHB                ; 1 byte, 3 clock cycles
   PSHA                ; 1 byte, 3 clock cycles
   etc...

You could PUSH/PULL two bytes at a time if you are storing/loading the index register. You would loose the index register as a source pointer, so you have to hard code the address for each pair of bytes.

   LDX ROWADDRESS+254  ; 3 bytes, 5 clock cycles
   PSHX                                    ; 1 byte, 4 clock cycles
   LDX ROWADDRESS+252
   PSHX
   LDX ROWADDRESS+250
   PSHX
   etc...

Using PSHX saves 22 - 9 = 13 clock cycles per pair of bytes moved, or 13 * ((32/2)*(192-8)) = 38,272 clock cycles per scroll!  The code size also half then number of bytes per pair of bytes moved.

So why didn't I do that?

While this would be noticeably faster, you can't just change the index register for each 256 byte block, you have to hard code the addresses for the entire screen.
That may not be a big deal of you have a large RAM expansion, but it's not practical for most MC-10's.  However, if you wanted to implement 4 rows of text at the bottom of the screen similar to the Apple II and several other 8 bit machines, then it's not so bad.

The latest code generates the scroll code on the fly at startup, so I could generate either version of the code depending on the hardware you have.  We'll see.

Tuesday, September 11, 2018

The search for for a 6800 based computer with bitmapped graphics.

My curiosity as to the speed of the MC6800 processor has been piqued, so I did some searching for 6800 or compatible machines with bitmapped graphics.  It seems the 6800 didn't have much of an impact on the more personal side of personal computers.

  • Most of the machines based on the CPU don't seem to have any graphics at all.  They are just text based FLEX systems.
  • There is a French FLEX system that offers a graphics board option, but I couldn't find programming info on the graphics board.  Even if I could, it would probably be in French.
  • The Dream 6800 doesn't offer high enough resolution graphics to be of use.
  • The Panasonic JR-200 uses a Japanese version of the 6802, which is 6800 compatible, but it's graphics are made with user definable characters.
  • The APF Imagination Machine has a 6847 VDP like other machines I support, but it's not set up as a true bitmap.  The hardware is set up to display hi-res objects, and there is limited RAM to hold those objects.
  • The 6803 will run 6800 code, but some instructions have different timing and it would require adding NOP instructions, or changing some instructions to try to match the 6800 speed.
  • An emulator could be written just for this, but it seems like a lot of work unless there's some actual hardware the code could run on.
I'm open to suggestions if anyone knows a system that could be used.  256x192 resolution graphics would be the best option.


Thursday, August 30, 2018

and the processor wars discussion leads to...

The 'Processor Wars' discussion led to a few changes in the regular bitmap code for the 6803 and Z80.

The 6803 bitmap code isn't as efficient as it's Plus/4 screen layout version, but it cut the number of X register swaps in half.  This saves 16 clock cycles in the print two character code, and 12,288 clock cycles per screen of characters (16 clock cycles x 32 bytes per row x 24 rows of characters).  More importantly, the code is still faster after switching from a 7 byte font to an 8 byte font to support full height graphics characters.  Writing single characters does not benefit, but that should only happen at the start or end of a string where two characters are not aligned on a byte.  This also puts the .89 MHz 6803 closer to the speed of the 1.77 MHz 65816 Atari, and 3.5 MHz Z80 VZ.

The Z80 code had not been updated to increment the font pointers using byte instead of 16 bit opcodes in spite of the fact that I came up with the optimization years ago.  Not sure what happened there, but for some reason it wasn't in the code.  This requires aligning the font data so that individual characters do not cross a 256 byte boundary, but it appears to cut 2 clock cycles per register increment, and there are 14 of those per pair of bytes, or 7 when printing single bytes.  That's up to 21,504 clock cycles per screen of characters (2 clock cycles * 14 increments * 32 bytes per row * 24 rows of characters).

The 6502 code won't benefit from either of these optimizations.  It only has a single accumulator, and it doesn't have any 16 bit registers.  The 65816 might be able to speed up address calculation, and the larger index registers may help, but it's going to take some time to see if that's possible with the mode 8/16 bit switching.  If it is, it will probably mean splitting the 6502, and 65816 versions up to make it easier to maintain the code.  It's a bit complex already due to squeezing Atom, Atari, and Commodore versions in the same code base.

*edit*
The 65816 could save two opcodes during the screen address calculation, but it would require at least as many new opcodes for the 8->16->8 bit mode switches.

Wednesday, August 29, 2018

8 bit vs 8 bit (Processor wars) Update

The 6800 code has been updated using a similar optimization as with the 6803 version.  Even though the 6800 doesn't support the 16 bit accumulator, the change still cuts the number of index register swaps in half, and the code that writes to the screen is now faster than the 6502 version by 28 clock cycles.  If the 6800 address calculation code can be optimized a little, the 6800 may actually beat the 6502 at this.  Some self modifying code and a couple tables may be required but it may be possible.  The scroll may be slightly faster as well.

To be fair, this screen layout works well for the 680X processors and a standard bitmap may be a little slower.  If I were to create a custom graphics chip for the 680X series, I'd include at least one screen layout like the Plus/4's simply because it works so well here.

Tuesday, August 28, 2018

8 bit vs 8 bit (Processor wars) Update + 6309 code

The graphics text code for the 6803 has been updated.  It has been optimized a bit by loading two bytes from the font at a time with a single LDDD instead of just 1.   It also cuts the number of left and right font pointer changes for the index register in half.

It now takes a total of 92 clock cycles for the 6803 to write a pair of characters to the screen.  The 6502 takes 152 clock cycles to do the same thing.

For one full screen of characters, the 6803 now takes 46080 fewer clock cycles to write the characters to the screen vs the 6502.   I haven't looked up the cycles for the font address or screen address calculation yet, but the 6803 appears to be faster overall even though the 6502 uses tables.  The screen scroll is even worse for the 6502.

Counting white space, the 6803 code is now 63 lines long and the 6502 code is 80 lines long not counting the font and all the extra address tables for the 6502.  Total size wise it isn't even close.

The 6303 code will benefit from the same optimization but it should be noted that the push instructions take 1 clock cycle more than the 6803.  The difference in clock cycles there is more than made up for with faster code in the address calculation section.

The 6809 code also benefits from this optimization, and it only requires 16 instructions to write the pair of characters to the screen vs 24 for the 6803, and 40 for the 6502.

The 6309 code supports the 16 bit instruction EORD, which drops 4 opcodes vs the 6809.  There doesn't appear to be an EOR for the new registers, so we can't use 32 bit loads & pushes like I had hoped, but we are only looking at around 60 clock cycles.

As I stated before, this is based on writing to a graphics memory map like on the Plus/4, or C64.

6309 code to write the 2 characters to the screen:


 ; print characters to screen
 ldd  6,x     ; get 2 bytes of left character 5
 eord 6,y     ; add the right character 5
 pshu a,b     ; write to the screen 4

 ldd  4,x     ; get 2 bytes of left character 5
 eord 4,y     ; add the right character 5
 pshu a,b     ; write to the screen 4

 ldd  2,x     ; get 2 bytes of left character 5
 eord 2,y     ; add the right character 5
 pshu a,b     ; write to the screen 4

 ldd  ,x     ; get 2 bytes of left character 5
 eord ,y     ; add the right character 5
 pshu a,b     ; write to the screen 4

 rts

Sunday, August 26, 2018

64 column text on a 256x192 graphics screen -> 6502 Part 2

;* Copyright (c) 2015, 2016, 2018 James Diffendaffer

;**************************************************
; NAME: print_64
;**************************************************
;* Description:
;*  64 Column text display driver
;*  Routine does not print at pixel X,Y but
;*  prints at a character position.
;**************************************************
print_64:
	; register a contains character
	sec
	sbc		#' '				; printable character set data starts at space, ASCII 32
	tax							; save as character table offset

.if BytesPerLine = 32
	; point screen to base screen address + row
	lda		#>ScreenAdr
;	lda		#$80
	clc
	adc		row					; adding row to MSB = 256 * row
	sta		fscreen+1

	ldy		#0					; top line is always black (1st byte of the font)
								; start at zero offset from screen address
	
	; put the lowest bit in the carry so we know if it's left or right nibble
	lda		col					; 2 columns / byte
	lsr

	bcs		rightnibble
.endif

.if BytesPerLine = 40
.if Plus4
; col addition requires multiply by 8 due to the byte order of the screen and
; divide by 2 for two characters per byte, so col * 4
; any col over 64 requires 2 bytes added to screen address anyway, so make another table

	; calculate screen address based on row and column
	clc
	ldy		row					; put row offset in y
	lda		scrtableLSB,y		; get screen row address MSB from table
	ldy		col					; put col index in y
	adc		coltableLSB,y		; get LSB from column table
	sta		fscreen				; store LSB of screen address
	ldy		row					; put row offset in y
	lda		scrtableMSB,y		; get screen row address MSB from table
	ldy		col					; put col index in y
	adc		coltableMSB,y		; get MSB from column table
	sta		fscreen+1			; store MSB of screen address

	tya							; put column in a
	
	ldy		#0					; top line is always black (1st byte of the font)
								; start at zero offset from screen address

	lsr							; even or odd column?
	bcs		rightnibble
.else
	ldy		row					; put row offset in y

;	clc							; carry should be clear from asl
	lda		col
	lsr
	adc		scrtableLSB,y		; get LSB offset from table
	sta		fscreen				; store LSB
	lda		#0
	adc		scrtableMSB,y		; get MSB offset from table and add carry
	sta		fscreen+1			; store MSB

	ldy		#0					; top line is always black (1st byte of the font)
								; start at zero offset from screen address
	lda		col
	lsr
	bcs		rightnibble
.endif
.endif

	
;**************************************************
;* left nibble 
;**************************************************
leftnibble:
	lda		BGColor
	sta		(fscreen),y			; write to the screen

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*1		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol1,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol1,X				; EOR with the next byte of the font	
	sta		(fscreen),y

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*2		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol2,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol2,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*3		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol3,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol3,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*4		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol4,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol4,X				; EOR with the next byte of the font	
	sta		(fscreen),y

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*5		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol5,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol5,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*6		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol6,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol6,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
.if BytesPerLine = 32
	ldy		#BytesPerLine*7		; point to next screen byte
.endif
.if BytesPerLine = 40
	;clc
	lda		fscreen				; point to next screen byte
	adc		#BytesPerLine
	sta		fscreen				; LSB
	lda		#0
	adc		fscreen+1			; MSB
	sta		fscreen+1
.endif
.endif
	lda		(fscreen),y
;	eor		FCol7,X				; EOR with the next byte of the font
;	and		#%00001111
	eor		FCol7,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	rts
	
	
;**************************************************
; right nibble
;**************************************************
rightnibble:
	lda		BGColor
	sta		(fscreen),y			; write to the screen

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*1		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol21,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol21,X			; EOR with the next byte of the font	
	sta		(fscreen),y

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*2		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol22,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol22,X			; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*3		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol23,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol23,X			; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*4		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol24,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol24,X			; EOR with the next byte of the font	
	sta		(fscreen),y

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*5		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol25,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol25,X			; EOR with the next byte of the font	
	sta		(fscreen),y
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*6		; point to next screen byte
.endif
	lda		(fscreen),y
;	eor		FCol26,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol26,X			; EOR with the next byte of the font	
	sta		(fscreen),y
	

.if Plus4 = 1
	iny							; point to next screen byte
.else
.if BytesPerLine = 32
	ldy		#BytesPerLine*7		; point to next screen byte
.endif
.if BytesPerLine = 40
	;clc
	lda		fscreen				; point to next screen byte
	adc		#BytesPerLine
	sta		fscreen				; LSB
	lda		#0
	adc		fscreen+1			; MSB
	sta		fscreen+1
.endif
.endif
	lda		(fscreen),y
;	eor		FCol27,X			; EOR with the next byte of the font
;	and		#%11110000
	eor		FCol27,X			; EOR with the next byte of the font	
	sta		(fscreen),y
	
	rts

;**************************************************
; write two characters at once
;**************************************************
print_642:
	; register a contains character
;	lda		(string),y
	sec
	sbc		#' '				; printable character set data starts at space, ASCII 32
	sta		firstchar			; save as character table offset

	iny
	lda		(string),y
	sec
	sbc		#' '
	sta		secondchar
	
.if BytesPerLine = 32
	; point screen to $8000 + row (base screen address + row)
	lda		#>ScreenAdr
;	lda		#$80
	clc
	adc		row					; adding row to MSB = 256 * row
	sta		fscreen+1

	ldy		#0					; top line is always black (1st byte of the font)
								; start at zero offset from screen address
	
	; add the column
	lda		col					; 2 columns / byte
	lsr
	sta		fscreen				; save it
.endif

.if BytesPerLine = 40
	lda		row
;	asl							; * 2 for word sized table (row max of 25 * 2)
	tax							; put offset in x

;	clc							; carry should be clear from asl
	lda		col
	lsr
	adc		scrtableLSB,X			; get LSB offset from table
	sta		fscreen				; store LSB
	lda		scrtableMSB,X		; get MSB offset from table
	adc		#0					; add carry
	sta		fscreen+1			; store MSB
.endif

twochar:
	lda		BGColor
	sta		(fscreen),y			; write to the screen

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*1		; point to next screen byte
.endif
	ldx		firstchar			; offset to 1st character
	lda		FCol1,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol21,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*2		; point to next screen byte
.endif
	lda		FCol22,X			; add the next byte of the 2nd character
	ldx		firstchar			; offset to 1st character
	eor		FCol2,X				; load the next byte of the 1st character
	sta		(fscreen),y			; write it to the screen
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*3		; point to next screen byte
.endif
	lda		FCol3,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol23,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*4		; point to next screen byte
.endif
	lda		FCol24,X			; add the next byte of the 2nd character
	ldx		firstchar			; offset to 1st character
	eor		FCol4,X				; load the next byte of the 1st character
	sta		(fscreen),y			; write it to the screen

.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*5		; point to next screen byte
.endif
	lda		FCol5,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol25,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
	ldy		#BytesPerLine*6		; point to next screen byte
.endif
	lda		FCol26,X			; add the next byte of the 2nd character
	ldx		firstchar			; offset to 1st character
	eor		FCol6,X				; load the next byte of the 1st character
	sta		(fscreen),y			; write it to the screen
	
.if Plus4 = 1
	iny							; point to next screen byte
.else
.if BytesPerLine = 32
	ldy		#BytesPerLine*7		; point to next screen byte
.endif
.if BytesPerLine = 40
	;clc
	lda		fscreen				; point to next screen byte
	adc		#BytesPerLine
	sta		fscreen				; LSB
	lda		#0
	adc		fscreen+1			; MSB
	sta		fscreen+1
.endif
.endif
	lda		FCol7,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol27,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
	rts

.if BytesPerLine = 40	
;**************************************************
; 80 column address lookup table
;**************************************************
.define scrval		ScreenAdr+BytesPerLine*8
.define coltable 	scrval*0, scrval*1, scrval*2, scrval*3, scrval*4, scrval*5, scrval*6, scrval*7, scrval*8, scrval*9, scrval*10, scrval*11, scrval*12, scrval*13, scrval*14, scrval*15, scrval*16, scrval*17, scrval*18, scrval*19, scrval*20, scrval*21, scrval*22, scrval*23, scrval*24	
scrtableLSB:
	.lobytes		coltable
scrtableMSB:
	.hibytes		coltable
.endif

;**************************************************
; HALF WIDTH 4x8 FONT
; Top row is always zero and not stored (336 bytes)
; characters are 4 bits wide and 7 bits high 
; (the top row is always blank)
; There are two characters stored in each group of
; 7 bytes.  Each byte has bits for one character in
; the high nibble and bits for another in the low nibble
; Font borrowed from Sinclair Spectrum code
;**************************************************
;.align 256
font:
;FCol0:
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;FCol20:
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
;	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
FCol1:
	.byte	$00, $20, $50, $20, $20, $50, $20, $20, $10, $40, $20, $00, $00, $00, $00, $10
	.byte	$20, $20, $20, $70, $50, $70, $10, $70, $20, $20, $00, $00, $00, $00, $00, $20
	.byte	$20, $30, $60, $30, $60, $70, $70, $30, $50, $70, $30, $50, $40, $50, $60, $20
	.byte	$60, $20, $60, $30, $70, $50, $50, $50, $50, $50, $70, $30, $40, $60, $20, $00
	.byte	$20, $00, $40, $00, $10, $00, $10, $00, $40, $20, $10, $40, $60, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $10, $20, $40, $50, $60
FCol21:
	.byte	$00, $02, $05, $02, $02, $05, $02, $02, $01, $04, $02, $00, $00, $00, $00, $01
	.byte	$02, $02, $02, $07, $05, $07, $01, $07, $02, $02, $00, $00, $00, $00, $00, $02
	.byte	$02, $03, $06, $03, $06, $07, $07, $03, $05, $07, $03, $05, $04, $05, $06, $02
	.byte	$06, $02, $06, $03, $07, $05, $05, $05, $05, $05, $07, $03, $04, $06, $02, $00
	.byte	$02, $00, $04, $00, $01, $00, $01, $00, $04, $02, $01, $04, $06, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $01, $02, $04, $05, $06
FCol2:
	.byte	$00, $20, $50, $70, $70, $10, $40, $20, $20, $20, $70, $00, $00, $00, $00, $10
	.byte	$50, $60, $50, $10, $50, $40, $20, $10, $50, $50, $00, $00, $10, $00, $40, $50
	.byte	$50, $50, $50, $40, $50, $40, $40, $40, $50, $20, $10, $50, $40, $70, $50, $50
	.byte	$50, $50, $50, $40, $20, $50, $50, $50, $50, $50, $10, $20, $40, $20, $50, $00
	.byte	$10, $00, $40, $00, $10, $00, $20, $00, $40, $00, $00, $40, $20, $00, $00, $00
	.byte	$00, $00, $00, $00, $20, $00, $00, $00, $00, $00, $00, $20, $20, $20, $a0, $90
FCol22:
	.byte	$00, $02, $05, $07, $07, $01, $04, $02, $02, $02, $07, $00, $00, $00, $00, $01
	.byte	$05, $06, $05, $01, $05, $04, $02, $01, $05, $05, $00, $00, $01, $00, $04, $05
	.byte	$05, $05, $05, $04, $05, $04, $04, $04, $05, $02, $01, $05, $04, $07, $05, $05
	.byte	$05, $05, $05, $04, $02, $05, $05, $05, $05, $05, $01, $02, $04, $02, $05, $00
	.byte	$01, $00, $04, $00, $01, $00, $02, $00, $04, $00, $00, $04, $02, $00, $00, $00
	.byte	$00, $00, $00, $00, $02, $00, $00, $00, $00, $00, $00, $02, $02, $02, $0a, $09
FCol3:
	.byte	$00, $20, $00, $20, $60, $20, $30, $00, $40, $10, $20, $20, $00, $00, $00, $20
	.byte	$50, $20, $10, $20, $50, $60, $60, $10, $20, $50, $20, $20, $20, $70, $20, $10
	.byte	$70, $50, $60, $40, $50, $60, $60, $40, $70, $20, $10, $60, $40, $50, $50, $50
	.byte	$50, $50, $50, $20, $20, $50, $50, $50, $20, $50, $20, $20, $20, $20, $00, $00
	.byte	$00, $30, $60, $30, $30, $20, $70, $30, $60, $60, $30, $50, $20, $50, $60, $20
	.byte	$60, $30, $50, $30, $70, $50, $50, $50, $50, $50, $70, $20, $20, $20, $00, $60
FCol23:
	.byte	$00, $02, $00, $02, $06, $02, $03, $00, $04, $01, $02, $02, $00, $00, $00, $02
	.byte	$05, $02, $01, $02, $05, $06, $06, $01, $02, $05, $02, $02, $02, $07, $02, $01
	.byte	$07, $05, $06, $04, $05, $06, $06, $04, $07, $02, $01, $06, $04, $05, $05, $05
	.byte	$05, $05, $05, $02, $02, $05, $05, $05, $02, $05, $02, $02, $02, $02, $00, $00
	.byte	$00, $03, $06, $03, $03, $02, $07, $03, $06, $06, $03, $05, $02, $05, $06, $02
	.byte	$06, $03, $05, $03, $07, $05, $05, $05, $05, $05, $07, $02, $02, $02, $00, $06
FCol4:
	.byte	$00, $20, $00, $20, $30, $20, $50, $00, $40, $10, $50, $70, $00, $70, $00, $20
	.byte	$50, $20, $20, $10, $70, $10, $50, $20, $50, $30, $00, $00, $40, $00, $10, $20
	.byte	$70, $70, $50, $40, $50, $40, $40, $50, $50, $20, $50, $50, $40, $50, $50, $50
	.byte	$60, $50, $60, $10, $20, $50, $50, $50, $20, $20, $20, $20, $20, $20, $00, $00
	.byte	$00, $50, $50, $40, $50, $50, $20, $50, $50, $20, $10, $60, $20, $70, $50, $50
	.byte	$50, $50, $60, $60, $20, $50, $50, $50, $20, $50, $30, $40, $20, $10, $00, $40
FCol24:
	.byte	$00, $02, $00, $02, $03, $02, $05, $00, $04, $01, $05, $07, $00, $07, $00, $02
	.byte	$05, $02, $02, $01, $07, $01, $05, $02, $05, $03, $00, $00, $04, $00, $01, $02
	.byte	$07, $07, $05, $04, $05, $04, $04, $05, $05, $02, $05, $05, $04, $05, $05, $05
	.byte	$06, $05, $06, $01, $02, $05, $05, $05, $02, $02, $02, $02, $02, $02, $00, $00
	.byte	$00, $05, $05, $04, $05, $05, $02, $05, $05, $02, $01, $06, $02, $07, $05, $05
	.byte	$05, $05, $06, $06, $02, $05, $05, $05, $02, $05, $03, $04, $02, $01, $00, $04
FCol5:
	.byte	$00, $00, $00, $70, $70, $40, $50, $00, $40, $10, $00, $20, $00, $00, $00, $40
	.byte	$50, $20, $40, $50, $10, $50, $50, $20, $50, $20, $00, $00, $20, $70, $20, $00
	.byte	$40, $50, $50, $40, $50, $40, $40, $50, $50, $20, $50, $50, $40, $50, $50, $50
	.byte	$40, $50, $50, $50, $20, $50, $20, $70, $50, $20, $40, $20, $10, $20, $00, $00
	.byte	$00, $50, $50, $40, $50, $60, $20, $50, $50, $20, $10, $50, $20, $50, $50, $50
	.byte	$50, $50, $40, $30, $20, $50, $20, $70, $20, $50, $60, $20, $20, $20, $00, $60
FCol25:
	.byte	$00, $00, $00, $07, $07, $04, $05, $00, $04, $01, $00, $02, $00, $00, $00, $04
	.byte	$05, $02, $04, $05, $01, $05, $05, $02, $05, $02, $00, $00, $02, $07, $02, $00
	.byte	$04, $05, $05, $04, $05, $04, $04, $05, $05, $02, $05, $05, $04, $05, $05, $05
	.byte	$04, $05, $05, $05, $02, $05, $02, $07, $05, $02, $04, $02, $01, $02, $00, $00
	.byte	$00, $05, $05, $04, $05, $06, $02, $05, $05, $02, $01, $05, $02, $05, $05, $05
	.byte	$05, $05, $04, $03, $02, $05, $02, $07, $02, $05, $06, $02, $02, $02, $00, $06
FCol6:
	.byte	$00, $20, $00, $20, $20, $50, $30, $00, $20, $20, $00, $00, $20, $00, $10, $40
	.byte	$20, $70, $70, $20, $10, $20, $20, $20, $20, $40, $20, $20, $10, $00, $40, $20
	.byte	$30, $50, $60, $30, $60, $70, $40, $30, $50, $70, $20, $50, $70, $50, $50, $20
	.byte	$40, $30, $50, $20, $20, $20, $20, $50, $50, $20, $70, $20, $10, $20, $00, $00
	.byte	$00, $30, $60, $30, $30, $30, $40, $30, $50, $70, $50, $50, $70, $50, $50, $20
	.byte	$60, $30, $40, $60, $10, $20, $20, $50, $50, $30, $70, $20, $20, $20, $00, $90
FCol26:
	.byte	$00, $02, $00, $02, $02, $05, $03, $00, $02, $02, $00, $00, $02, $00, $01, $04
	.byte	$02, $07, $07, $02, $01, $02, $02, $02, $02, $04, $02, $02, $01, $00, $04, $02
	.byte	$03, $05, $06, $03, $06, $07, $04, $03, $05, $07, $02, $05, $07, $05, $05, $02
	.byte	$04, $03, $05, $02, $02, $02, $02, $05, $05, $02, $07, $02, $01, $02, $00, $00
	.byte	$00, $03, $06, $03, $03, $03, $04, $03, $05, $07, $05, $05, $07, $05, $05, $02
	.byte	$06, $03, $04, $06, $01, $02, $02, $05, $05, $03, $07, $02, $02, $02, $00, $09
FCol7:
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $10, $40, $00, $00, $20, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $20, $00, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $30, $00, $60, $00, $F0
	.byte	$00, $00, $00, $00, $00, $00, $00, $66, $00, $00, $20, $00, $00, $00, $00, $00
	.byte	$40, $10, $00, $00, $00, $00, $00, $00, $00, $60, $00, $10, $00, $40, $00, $60
FCol27:
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $01, $04, $00, $00, $02, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $02, $00, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
	.byte	$00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $03, $00, $06, $00, $0F
	.byte	$00, $00, $00, $00, $00, $00, $00, $06, $00, $00, $02, $00, $00, $00, $00, $00
	.byte	$04, $01, $00, $00, $00, $00, $00, $00, $00, $06, $00, $01, $00, $04, $00, $06



;	.bss	
String:	.res	256



.if Plus4
; Plus4 column offset.  I use Font Height but this must be fixed at 8 for the Plus4 screen layout
; format  column left nibble offset, column right nibble offset (both are the same), next...
; doubling info lets us replace lda col, lsr, tay  with ldy

coltableLSB:
	.byte	<FontHeight*0,<FontHeight*0, <FontHeight*1,<FontHeight*1, <FontHeight*2,<FontHeight*2, <FontHeight*3,<FontHeight*3
	.byte	<FontHeight*4,<FontHeight*4, <FontHeight*5,<FontHeight*5, <FontHeight*6,<FontHeight*6, <FontHeight*7,<FontHeight*7
	.byte	<FontHeight*8,<FontHeight*8, <FontHeight*9,<FontHeight*9, <FontHeight*10,<FontHeight*10, <FontHeight*11,<FontHeight*11
	.byte	<FontHeight*12,<FontHeight*12, <FontHeight*13,<FontHeight*13, <FontHeight*14,<FontHeight*14, <FontHeight*15,<FontHeight*15
	.byte	<FontHeight*16,<FontHeight*16, <FontHeight*17,<FontHeight*17, <FontHeight*18,<FontHeight*18, <FontHeight*19,<FontHeight*19
	.byte	<FontHeight*20,<FontHeight*20, <FontHeight*21,<FontHeight*21, <FontHeight*22,<FontHeight*22, <FontHeight*23,<FontHeight*23
	.byte	<FontHeight*24,<FontHeight*24, <FontHeight*25,<FontHeight*25, <FontHeight*26,<FontHeight*26, <FontHeight*27,<FontHeight*27
	.byte	<FontHeight*28,<FontHeight*28, <FontHeight*29,<FontHeight*29, <FontHeight*30,<FontHeight*30, <FontHeight*31,<FontHeight*31
	.byte	<FontHeight*32,<FontHeight*32, <FontHeight*33,<FontHeight*33, <FontHeight*34,<FontHeight*34, <FontHeight*35,<FontHeight*35
	.byte	<FontHeight*36,<FontHeight*36, <FontHeight*37,<FontHeight*37, <FontHeight*38,<FontHeight*38, <FontHeight*39,<FontHeight*39
coltableMSB:
	.byte	>FontHeight*0,>FontHeight*0, >FontHeight*1,>FontHeight*1, >FontHeight*2,>FontHeight*2, >FontHeight*3,>FontHeight*3
	.byte	>FontHeight*4,>FontHeight*4, >FontHeight*5,>FontHeight*5, >FontHeight*6,>FontHeight*6, >FontHeight*7,>FontHeight*7
	.byte	>FontHeight*8,>FontHeight*8, >FontHeight*9,>FontHeight*9, >FontHeight*10,>FontHeight*10, >FontHeight*11,>FontHeight*11
	.byte	>FontHeight*12,>FontHeight*12, >FontHeight*13,>FontHeight*13, >FontHeight*14,>FontHeight*14, >FontHeight*15,>FontHeight*15
	.byte	>FontHeight*16,>FontHeight*16, >FontHeight*17,>FontHeight*17, >FontHeight*18,>FontHeight*18, >FontHeight*19,>FontHeight*19
	.byte	>FontHeight*20,>FontHeight*20, >FontHeight*21,>FontHeight*21, >FontHeight*22,>FontHeight*22, >FontHeight*23,>FontHeight*23
	.byte	>FontHeight*24,>FontHeight*24, >FontHeight*25,>FontHeight*25, >FontHeight*26,>FontHeight*26, >FontHeight*27,>FontHeight*27
	.byte	>FontHeight*28,>FontHeight*28, >FontHeight*29,>FontHeight*29, >FontHeight*30,>FontHeight*30, >FontHeight*31,>FontHeight*31
	.byte	>FontHeight*32,>FontHeight*32, >FontHeight*33,>FontHeight*33, >FontHeight*34,>FontHeight*34, >FontHeight*35,>FontHeight*35
	.byte	>FontHeight*36,>FontHeight*36, >FontHeight*37,>FontHeight*37, >FontHeight*38,>FontHeight*38, >FontHeight*39,>FontHeight*39
.endif	




.CODE

.if AcornAtom = 1
.macro SETVDG	value
	lda		#value			; 6847 control - GM2 GM1 GM0 A/G 0 0 0 0
	sta		$B000
.endmacro
InitScreen:
;	lda		#$FF
	lda		#$00
	sta		BGColor			;set the background color
	;clear the screen before we show it
	jsr		cls
	;Acorn Atom
	SETVDG(%11110000)		; 6847 control - GM2 GM1 GM0 A/G 0 0 0 0   RG6 = 11110000
	
	rts
.endif


.if Plus4 = 1
InitScreen:
	;set hi-res graphics mode
	lda		#VMSet				; Load the video mode setting
	sta		TEDVMR				; set it 
	;set video RAM address
	lda		#VASet				; Load the video address setting
	sta		VBASEREG			; set the address of our hi-res screen

3072/255
; set up color RAM
	lda	#$33		;%00110011
	ldx	#$00		;clear X
@cloop:
	sta	$ColorRAM,x
	sta	$ColorRAM+$100,x
	sta	$ColorRAM+$200,x
	dex
	bne	@cloop		; loop until x hits zero again (256 times)

	ldx	#$
@cloop2:
	sta	$ColorRAM+$300,x
	dex
	bne	@cloop2

	rts
.endif
	




.if Atari = 1
; ******************************
; Atari 8 bit code
; ******************************

; ******************************
; CIO equates
; ******************************
ICHID =    $0340
ICDNO =    $0341
ICCOM =    $0342
ICSTA =    $0343
ICBAL =    $0344
ICBAH =    $0345
ICPTL =    $0346
ICPTH =    $0347
ICBLL =    $0348
ICBLH =    $0349
ICAX1 =    $034A
ICAX2 =    $034B
CIOV  =    $E456
; ******************************
; Other equates needed
; ******************************
;COLOR0 =   $02C4
;COLCRS =   $55
;ROWCRS =   $54
;ATACHR =   $02FB
;STORE1 =   $CC
;STOCOL =   $CD

COLOR0	=	$02C4			; OS COLOR REGISTERS
COLOR1	=	$02C5
COLOR2	=	$02C6
COLOR3	=	$02C7
COLOR4	=	$02C8

; ******************************
; Non Maskable Interrupt Enable register
; clear bit bits of the NMIEN register at $D40E to disable, set to enable
; DLI	- D7
; VBI	- D6
; RESET	- D5
; ******************************
NMIEN = $D40E	; non maskable interrupt enable

;
SDMCTL=$022F
;
SDLSTL=$0230
SDLSTH=$0231

DMACTL=$D400

;Mode 8 requires 40 * 192 bytes, or 7680 bytes.
;So a mode 8 screen will cross a 4K boundary, and my display list is a little more complex.
;To have contiguous screen RAM, I have to align the end of a 40 byte line with the end of the first 4K block.
;4K = 4096 bytes.  4096 / 40 = 102.4.   So 102 lines can fit in the 2nd 4K page, and 90 can fit in the first (192 lines - 102 = 90).  
;So find the 4K boundary I want to be top of RAM, subtract 4K, subtract 90 * 80 and that is the screen start address.
;Add 102 * 40 (4080) and that will give me the screen end address + 1.  4096 - 4080 = 16 unused bytes at the end of the screen.
	
InitScreen:	
	lda	#$00
	sta	BGColor				;set the background color

	;set up our display
	LDA	#$0D			; Set COLOR Light Grey
	STA	COLOR2			; Set background color
	LDA	#00				; clear A
	STA	COLOR1			; Set COLOR Black

	STA	SDMCTL			; TURN ANTIC OFF FOR A MOMENT ...
	LDA	#<HLIST			; WHILE WE STORE OUR NEW LIST'S ADDRESS
	STA	SDLSTL			; IN THE OS DISPLAY POINTER.
	LDA	#>HLIST			; NOW FOR THE HIGH BYTE.  /256
	STA	SDLSTH			; NOW ANTIC WILL KNOW OUR NEW ADDRESS
.if BytesPerLine = 32
	LDA	#$21			; NARROW PLAYFIELD 256
.elseif BytesPerLine = 40
	LDA	#$22			; NORMAL PLAYFIELD 320
.elseif BytesPerLine = 48
	LDA	#$23			; WIDE PLAYFIELD 384
.endif
	STA	SDMCTL			; ... SO WE'LL TURN ANTIC BACK ON NOW
	RTS

.data
;	.align  256,0

;	.res	194
	.res	76
;
;Atari Antic Display List
; Screen Mode 8, 256/320/384 pixels wide x 192 high, 1 bit per pixel
;
HLIST:
	.BYTE	$70,$70,$70		; 3 BLANK LINES

.if BytesPerLine = 32
	;first 4K screen block.  Only using 2K
	.BYTE	$0F+64			; Mode 8 + LSM, 256/320/384 pixels wide, 192 high, 1 bit per pixel, 7891 bytes
	.WORD	SCREEN			; Screen RAM address

	; 63 mode 8 instructions + above instruction = 64
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F					; Mode 8
	
;	;second 4K screen block, completely filled
	.BYTE	$0F+64			; Mode 8 + LSM, 256/320/384 pixels wide, 192 high, 1 bit per pixel, 7891 bytes
	.WORD	SCREEN+2048		; Screen RAM address

	; 127 mode 8 instructions + above = 128
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F					; Mode 8
.endif

.if BytesPerLine = 40
	;first 4K screen block
	.BYTE	$0F+64			; Mode 8 + LSM, 256/320/384 pixels wide, 192 high, 1 bit per pixel, 7891 bytes
	.WORD	SCREEN			; Screen RAM address
	; 89 mode 8 instructions + above
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F											; Mode 8

;	;second 4K screen block
	.BYTE	$0F+64			; Mode 8 + LSM, 320 pixels wide, 192 high, 1 bit per pixel, 7891 bytes
	.WORD	SCREEN+40*90	; Screen RAM address
; 101 mode 8 instructions + above
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F,$0F				; Mode 8
	.BYTE	$0F,$0F,$0F,$0F,$0F														; Mode 8
.endif
	
;	.BYTE	$42,$60,$9F,$02,$02	;copied from basic DL 8 dump
	
	.BYTE $41; JVB INSTRUCTION
	.WORD HLIST; TO JUMP BACK TO START OF LIST
.endif


EOF:
	 
.END