Here is a little test program I threw together to benchmark the 16 bit string compare code I'm preparing for a future ROM version. A little bonus here is a sort implementation you may not have seen before.
Back in the 90s I was working on a project I inherited from someone else. It involved nightly data dumps from a production system to a local server that made the data accessible to customers via modem, using custom software. The process was rather labor intensive. The operator had to run the data extraction, ftp the data to a PC, which was then imported by a custom VB application that required multiple starts, executing other program, then continuing, etc... It was impossible to automate the way it was.
Upon investigation, the original programmer did not know how to implement a sort, so he was using an external utility. (he also didn't seem to know how to take a shower... but that's another topic)
After a quick search on Yahoo, no VB sorts turned up (I think the search engine was undergoing maintenance), so I threw in a bubble sort and went home. Some 12 hours later, the sort was still going so I killed it, wrote a more optimal version, and on the first try it took about 18 minutes to sort the 80,000+ records, and it didn't require human intervention so it could be launched automatically.
Anyway, a more polished version of that sort is below. This has some optimizations that weren't in the original as it's aimed at the Microsoft BASIC interpreter.
The sort is, at the very least, a really improved version of the bubble sort. It functions a bit like a comb or shell sort, so I'm not sure what you'd call it. It's easy to implement, pretty fast, and it's fairly small. Taking a hint from the comb sort, dividing the gap by two may not be the most efficient, but it works pretty well for something I came up with off the top of my head in my 20s.
The data comes from a list of the 1000 most common surnames, and it is truncated here for space. The data includes a few other numbers with it that I didn't waste time removing. With all the additional data, only 894 names fit in memory, but that provides a really good workout for the interpreter and sort. Sorting takes over 10 minutes on a regular MC-10, but I'm hoping the new string comparison cuts that by at least 20%, which is about what you can expect from unrolling a loop once.
0 I=0:N=1000:C=0:D=0:E=0:NW=0
'initialize the index array, and read in the names. Skip the other data.
'scans through the data to determine the array sizes before dimensioning the arrays.
'The most heavily use variables are initialized on the first like to insure they are
' at the start of the variable table
'
10 I=0:J=0:G=0:N=0:Q=0
15 PRINT"SCANNING FOR END OF DATA"
20 READ K$,C,D,E:IF K$<>"" THEN N=N+1:GOTO 20
30 RESTORE
40 DIM A$(N),B(N)
45 PRINT"INITIALIZING ARRAYS"
50 FORI=1TON:READ A$(I),C,D,E:B(I)=I:NEXT
60 PRINT:PRINT N" MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR"
70 GOSUB 2000
80 PRINT:PRINT "SORTING"N"NAMES":GOSUB 10000
90 PRINT:PRINT N" MOST POPULAR SIR NAME IN ALPHABETICAL ORDER"
95 GOSUB 2000
96 END
'print out the names based on the array index
2000 FOR I=1 TO N:PRINT A$(B(I))" ";:NEXT:RETURN
'optimized bubble(?) sort.
'Moves data as far as possible with each Middle loop
'Skips data that doesn't need sorted with each new pass on inner loop.
'P = what we divide the array size by to get the gap
'G = the gap
'using nested for loops avoids having to perform a line search that would
'happen with GOTO. The address is simply pulled off of the stack.
'Using STEP 0 keeps FOR NEXT from changing the variable and the endless loop
'exits once the exit condition is achieved
10000 P=1:FORG=2TO1STEP0:P=P*2:G=INT(N/P):NW=N-G:PRINTG;
10010 FORQ=1TO0STEP0:Q=0:FORI=1TONW
10020 IFA$(B(I))>A$(B(I+G))THENT=B(I):B(I)=B(I+G):B(I+G)=T:Q=I
10030 NEXT:IFQ>0THENNW=Q
10040 NEXT:NEXT:RETURN
'1000 MOST POPULAR SIR NAMES FROM MOST TO LEAST POPULAR
100 DATA SMITH,2501922,1.006,1,JOHNSON,2014470,.81,2,WILLIAMS,1738413,.699,3
101 DATA JONES,1544427,.621,4,BROWN,1544427,.621,5,DAVIS,1193760,.48,6
...
1000 DATA "",0,0,0 :REM END OF DATA MARKER