Sunday, August 14, 2022

You ever wonder how you misseds something obvious?

 Well then, I missed an obvious change to the sprite code that will speed it up, so I updated it, and threw in a 2nd speedup I knew about while I was at it.  A few extra instructions are executed at the end, but it's more than offset by eliminating one branch per loop. 

The case where no shift is needed is separate to eliminate a lot of unneeded instructions.  I originally used the same code to keep sprite drawing rate consistent, but it's easy enough to go back to the old way if it's a problem.


;----------------------------------------------------
; Draw sprite
;
;  AGD sprites are normally 16 pixels wide by 16 pixels high, but the heighth can be changed with the loop counter
;  spr must be on the direct page
;  spr+1 must be loaded with the bytes per line (32) on startup
;  For 8 bit wide sprites, the ROR <spr instructions could be dropped along with the last screen write.  
;  LEAX 32,X would then be the fastest way to advance to the next screen line instead of using LDB ABX
;----------------------------------------------------
Sprite:
    leau    ,x                  ; move graphic address to regU
    jsr     scadd              ; get screen address in X

    ldy     #SPR_HGT+1           ; load loop counter with number of lines per sprite + 1 for first loop.

    ldb     <dispx             ; x position.
    andb    #7                  ; position straddling cells.
    beq     sprit3              ; branch if no shifts are needed
    
    lda     #Endshift-ShiftLoc1     ; offset to skip shifts for BNE as a negative offset
    aslb                        ; subtract 4 (number of bytes for each LSRD ROR spr) times number of shifts.   2 clocks each
    sba                         ; subtract increases negative offset
    sba
    sta     ShiftJump+1         ; set the BRA offset to perform the right number of shifts   4 clocks

    bra     sprit1           ; jump to the branch changed by the code above

;StartShift:
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
EndShift:

    eora    0,x         ; merge spr with screen image
    eorb    1,x         ; merge spr+1 with screen image
    std     0,x         ; write to screen.

    ldd     spr         ; get sprite data in A, and screen line width in B
    eora    2,x         ; merge with sprite
    sta     2,x         ; write to screen

    ; b contains number of bytes per line (32) after the LDD, spr+1 is initialized at the very start of the game
    ; for different resolutions, this could be changed elsewhere on startup
    abx                 ; move to next screen line

sprit1:
    ldd     ,u++        ; load data from sprite into D register.
    clr     <spr        ; clear byte for shifting, also clears the Z bit
    dey                 ; decrement the # of lines counter

ShiftJump:
    bne     StartShift      ; go again if not done, address offset is modified above
ShiftLoc1:

    rts

; no shifts
sprit3:
    dey
sprit2:
    ldd     ,u++        ; load data from sprite into D register.
    eora    0,x         ; merge spr with screen image
    eorb    1,x         ; merge spr+1 with screen image
    std     0,x         ; write to screen.

    leax    32,x        ; move to next screen line
    dey                 ; decrement the # of lines counter
    bne     sprit2      ; go again if not done, address offset is modified above

    rts

Saturday, August 13, 2022

MPAGD status update

Here is the current state of the MC-10 port of Multi-Platform Arcade Game Designer

To integrate with MPAGD, it requires an external assembler, and an emulator that takes command line parameters.  This hasn't been dealt with yet.  The assembler needs to be able to deal with macros that can generate alternate code if a branch is out of range.  The emulator issue is going to have to be dealt with by someone else.

The compiler appears to be done unless some bugs are found.

I adapted the z88dk peephole optimizer for use with MPAGD, and created 50 peephole optimizations for the 6803 code.  This needs one more addition so it can run on the final assembly file instead of just the game code.  The final file includes the game engine, which should not be optimized.

A lot of time has been spent optimizing code, and the resulting sprite drawing code is pretty efficient.  One thing needs verified in the sprite code, but that's minor.   An offshoot of this is that I have created a more efficient sprite routine for the 6809 version. 


6809 Sprite code I forwarded to another programmer:

;----------------------------------------------------
; Draw sprite
;
;  AGD sprites are normally 16 pixels wide by 16 pixels high, but the heighth can be changed with the loop counter
;  spr must be on the direct page
;  spr+1 must be loaded with the bytes per line (32) on startup
;  For 8 bit wide sprites, the ROR <spr instructions could be dropped along with the last screen write.  
;  LEAX 32,X would then be the fastest way to advance to the next screen line instead of using LDB ABX
;----------------------------------------------------
Sprite:
    leau    ,x                  ; move graphic address to regU
    jsr     scadd               ; get screen address in X

    ldy     #16                 ; load loop counter with number of lines per sprite.

    lda     #(EndShift-StartShift)     ; offset to skip shifts for BRA
    sta     ShiftJump+1         ; Set the BRA offset to skip bit shifts as default for when none are needed

    ldb     <dispx              ; x position.
    andb    #7                  ; position straddling cells.
    beq     sprit1              ; branch if no shifts are needed
    
    aslb                        ; subtract 4 (number of bytes for each LSRD ROR spr) times number of shifts.   2 clocks each
    sba
    sba
    sta     ShiftJump+1         ; set the BRA offset to perform the right number of shifts   4 clocks

sprit1:
    ldd     ,u++                ; load data from sprite into D register.
    clr     <spr                ; clear byte for shifting

ShiftJump:
    bra     EndShift            ; self modifying code changes this branch offset    3 clocks

StartShift:
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
EndShift:

    eora    0,x         ; merge spr with screen image
    eorb    1,x         ; merge spr+1 with screen image
    std     0,x         ; write to screen.

    ldd     spr         ; get sprite data in A, and screen line width in B
    eora    2,x         ; merge with sprite
    sta     2,x         ; write to screen

    ; b contains number of bytes per line (32) after the LDD, spr+1 is initialized at the very start of the game
    ; for different resolutions, this could be changed elsewhere on startup
    abx                 ; move to next screen line
    dey                 ; decrement the # of lines counter
    bne     sprit1      ; go again if not done
    rts