Sunday, August 14, 2022

You ever wonder how you misseds something obvious?

 Well then, I missed an obvious change to the sprite code that will speed it up, so I updated it, and threw in a 2nd speedup I knew about while I was at it.  A few extra instructions are executed at the end, but it's more than offset by eliminating one branch per loop. 

The case where no shift is needed is separate to eliminate a lot of unneeded instructions.  I originally used the same code to keep sprite drawing rate consistent, but it's easy enough to go back to the old way if it's a problem.


;----------------------------------------------------
; Draw sprite
;
;  AGD sprites are normally 16 pixels wide by 16 pixels high, but the heighth can be changed with the loop counter
;  spr must be on the direct page
;  spr+1 must be loaded with the bytes per line (32) on startup
;  For 8 bit wide sprites, the ROR <spr instructions could be dropped along with the last screen write.  
;  LEAX 32,X would then be the fastest way to advance to the next screen line instead of using LDB ABX
;----------------------------------------------------
Sprite:
    leau    ,x                  ; move graphic address to regU
    jsr     scadd              ; get screen address in X

    ldy     #SPR_HGT+1           ; load loop counter with number of lines per sprite + 1 for first loop.

    ldb     <dispx             ; x position.
    andb    #7                  ; position straddling cells.
    beq     sprit3              ; branch if no shifts are needed
    
    lda     #Endshift-ShiftLoc1     ; offset to skip shifts for BNE as a negative offset
    aslb                        ; subtract 4 (number of bytes for each LSRD ROR spr) times number of shifts.   2 clocks each
    sba                         ; subtract increases negative offset
    sba
    sta     ShiftJump+1         ; set the BRA offset to perform the right number of shifts   4 clocks

    bra     sprit1           ; jump to the branch changed by the code above

;StartShift:
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
    lsra
    rorb
    ror     <spr
EndShift:

    eora    0,x         ; merge spr with screen image
    eorb    1,x         ; merge spr+1 with screen image
    std     0,x         ; write to screen.

    ldd     spr         ; get sprite data in A, and screen line width in B
    eora    2,x         ; merge with sprite
    sta     2,x         ; write to screen

    ; b contains number of bytes per line (32) after the LDD, spr+1 is initialized at the very start of the game
    ; for different resolutions, this could be changed elsewhere on startup
    abx                 ; move to next screen line

sprit1:
    ldd     ,u++        ; load data from sprite into D register.
    clr     <spr        ; clear byte for shifting, also clears the Z bit
    dey                 ; decrement the # of lines counter

ShiftJump:
    bne     StartShift      ; go again if not done, address offset is modified above
ShiftLoc1:

    rts

; no shifts
sprit3:
    dey
sprit2:
    ldd     ,u++        ; load data from sprite into D register.
    eora    0,x         ; merge spr with screen image
    eorb    1,x         ; merge spr+1 with screen image
    std     0,x         ; write to screen.

    leax    32,x        ; move to next screen line
    dey                 ; decrement the # of lines counter
    bne     sprit2      ; go again if not done, address offset is modified above

    rts

No comments:

Post a Comment