Well then, I missed an obvious change to the sprite code that will speed it up, so I updated it, and threw in a 2nd speedup I knew about while I was at it. A few extra instructions are executed at the end, but it's more than offset by eliminating one branch per loop.
The case where no shift is needed is separate to eliminate a lot of unneeded instructions. I originally used the same code to keep sprite drawing rate consistent, but it's easy enough to go back to the old way if it's a problem.
;----------------------------------------------------
; Draw sprite
;
; AGD sprites are normally 16 pixels wide by 16 pixels high, but the heighth can be changed with the loop counter
; spr must be on the direct page
; spr+1 must be loaded with the bytes per line (32) on startup
; For 8 bit wide sprites, the ROR <spr instructions could be dropped along with the last screen write.
; LEAX 32,X would then be the fastest way to advance to the next screen line instead of using LDB ABX
;----------------------------------------------------
Sprite:
leau ,x ; move graphic address to regU
jsr scadd ; get screen address in X
ldy #SPR_HGT+1 ; load loop counter with number of lines per sprite + 1 for first loop.
ldb <dispx ; x position.
andb #7 ; position straddling cells.
beq sprit3 ; branch if no shifts are needed
lda #Endshift-ShiftLoc1 ; offset to skip shifts for BNE as a negative offset
aslb ; subtract 4 (number of bytes for each LSRD ROR spr) times number of shifts. 2 clocks each
sba ; subtract increases negative offset
sba
sta ShiftJump+1 ; set the BRA offset to perform the right number of shifts 4 clocks
bra sprit1 ; jump to the branch changed by the code above
;StartShift:
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
EndShift:
eora 0,x ; merge spr with screen image
eorb 1,x ; merge spr+1 with screen image
std 0,x ; write to screen.
ldd spr ; get sprite data in A, and screen line width in B
eora 2,x ; merge with sprite
sta 2,x ; write to screen
; b contains number of bytes per line (32) after the LDD, spr+1 is initialized at the very start of the game
; for different resolutions, this could be changed elsewhere on startup
abx ; move to next screen line
sprit1:
ldd ,u++ ; load data from sprite into D register.
clr <spr ; clear byte for shifting, also clears the Z bit
dey ; decrement the # of lines counter
ShiftJump:
bne StartShift ; go again if not done, address offset is modified above
ShiftLoc1:
rts
; no shifts
sprit3:
dey
sprit2:
ldd ,u++ ; load data from sprite into D register.
eora 0,x ; merge spr with screen image
eorb 1,x ; merge spr+1 with screen image
std 0,x ; write to screen.
leax 32,x ; move to next screen line
dey ; decrement the # of lines counter
bne sprit2 ; go again if not done, address offset is modified above
rts
My on again off again blog about whatever computer related hobby projects I happen to be working on at the moment.
Sunday, August 14, 2022
You ever wonder how you misseds something obvious?
Saturday, August 13, 2022
MPAGD status update
Here is the current state of the MC-10 port of Multi-Platform Arcade Game Designer
To integrate with MPAGD, it requires an external assembler, and an
emulator that takes command line parameters. This hasn't been dealt
with yet. The assembler needs to be able to deal with macros that can
generate alternate code if a branch is out of range. The emulator issue
is going to have to be dealt with by someone else.
The compiler appears to be done unless some bugs are found.
I adapted the z88dk peephole optimizer for use with MPAGD, and created 50 peephole optimizations for the 6803 code. This needs one more addition so it can run on the final assembly file instead of just the game code. The final file includes the game engine, which should not be optimized.
A lot of time has been spent optimizing code, and the resulting sprite drawing code is pretty efficient. One thing needs verified in the sprite code, but that's minor. An offshoot of this is that I have created a more efficient sprite routine for the 6809 version.
6809 Sprite code I forwarded to another programmer:
;----------------------------------------------------
; Draw sprite
;
; AGD sprites are normally 16 pixels wide by 16 pixels high, but the heighth can be changed with the loop counter
; spr must be on the direct page
; spr+1 must be loaded with the bytes per line (32) on startup
; For 8 bit wide sprites, the ROR <spr instructions could be dropped along with the last screen write.
; LEAX 32,X would then be the fastest way to advance to the next screen line instead of using LDB ABX
;----------------------------------------------------
Sprite:
leau ,x ; move graphic address to regU
jsr scadd ; get screen address in X
ldy #16 ; load loop counter with number of lines per sprite.
lda #(EndShift-StartShift) ; offset to skip shifts for BRA
sta ShiftJump+1 ; Set the BRA offset to skip bit shifts as default for when none are needed
ldb <dispx ; x position.
andb #7 ; position straddling cells.
beq sprit1 ; branch if no shifts are needed
aslb ; subtract 4 (number of bytes for each LSRD ROR spr) times number of shifts. 2 clocks each
sba
sba
sta ShiftJump+1 ; set the BRA offset to perform the right number of shifts 4 clocks
sprit1:
ldd ,u++ ; load data from sprite into D register.
clr <spr ; clear byte for shifting
ShiftJump:
bra EndShift ; self modifying code changes this branch offset 3 clocks
StartShift:
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
lsra
rorb
ror <spr
EndShift:
eora 0,x ; merge spr with screen image
eorb 1,x ; merge spr+1 with screen image
std 0,x ; write to screen.
ldd spr ; get sprite data in A, and screen line width in B
eora 2,x ; merge with sprite
sta 2,x ; write to screen
; b contains number of bytes per line (32) after the LDD, spr+1 is initialized at the very start of the game
; for different resolutions, this could be changed elsewhere on startup
abx ; move to next screen line
dey ; decrement the # of lines counter
bne sprit1 ; go again if not done
rts