Sunday, July 22, 2018

Why is Microsoft BASIC so slow? (text related to my guest appearance on the CoCoTalk video podcast)

The following text is a slightly revised version of what I presented on the CoCoTalk video podcast.
Some of the text was simplified for a re-recording of the show and I may have added one new section.  The rest of the discussion was off script.



Why is Microsoft BASIC so slow?




A brief introduction to the inner workings of
Microsoft BASIC
on the CoCo and MC-10



Index

  1. What is an interpreter?
  2. The Tokenizer
  3. 8 bit vs 16 bit and the 6800 legacy
  4. Floating Point Math
  5. Variable Storage
  6. Runtime math formula evaluation
  7. Garbage Collection
  8. Running a program step by step

What is an interpreter?
From wikipedia:
“In computer science, an interpreter is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:
parse the source code and perform its behavior directly
...”
The Microsoft BASIC Interpreter...
  1. Makes no attempt to convert the program to machine code
  2. The code is still human readable or can be quickly translated back to human readable code
  3. Makes minimal changes to the source code to speed up interpretation
  4. All syntax checking is left for runtime execution


The Tokenizer
  1. Changes the line number at the beginning of a line to a 16 bit integer
  2. Converts BASIC keywords to tokens
  3. Terminates the line with a zero
  4. Inserts the line of code into the program linked list 
  5. Does not perform any syntax checking
  6. Does not convert other numbers from ASCII to computer readable form, therefore they must be converted every time they are needed, and conversion uses one of the slowest math functions, division


8 bit vs 16 bit and the 6800 legacy
  1. The CoCo and MC-10 interpreters are mostly based on Microsoft's 6800 version and have minimal optimizations that take advantage of the 6809's or 6803's additional 16 bit features, new registers, or hardware multiply instruction.
  2. The math library is largely unchanged from the 6800 version.  It is still oriented around the use of 8 bit registers, making it slower and larger than it needs to be.  It is so complex, it occupies ¼ of the CoCo Color BASIC and MC-10 ROMs
  3. Functions such as memory moves, screen scrolls, clearing screens, etc... all use 8 bit code even though 16 bit code would be significantly faster 

Floating Point Math
  1. All numeric variables are stored as floating point
  2. All math is performed using floating point
  3. Floating point math is performed similar to integer math, but it uses larger numbers and requires extra steps such as normalization, converting numbers to the same exponent, and dealing with the sign
  4. Many fast floating point math functions such as calculating a square root, did not exist yet
  5. All math is performed using software floating point registers reserved in memory.  This requires constantly coping numbers back and forth between the software registers and memory
  6. Floating point numbers are stored in variables using a packed form, but must be unpacked to load key floating point registers, then repacked to move them back to variables
  7. Some numbers do not have exact floating point equivalents making pre-conversion of constants impractical


Runtime Math Evaluation
  1. Math formulas must be parsed in order to determine proper order of operations.
  2. A list of operations is built and then is processed by the interpreter.
  3. This must be done every time you use a math function, even something as simple as A=A+1


Variable Storage
  1. Variables are added to system memory in the order they are first used, and then by variable type.  Every time a new variable is created, space must be allocated for it in the proper area of the variable list.  This can include copying a large block for RAM to make room
  2. The interpreter must search through the variable list every time a variable is used in order to use it
  3. The interpreter allocates space for the stack and variables at startup
  4. It must perform checks every time a variable is created, a string is modified, or the stack is used to insure that the program and variables are not overwritten


Garbage Collection
  1. As string variables change in size, additional space must be allocated or freed to keep from running out of variable storage space.  This can involve moving large blocks of memory
  2. Every time the system does this, it requires copying the blocks of memory where a string is stored and any strings after it up or down in memory as needed
  3. All memory copies in the CoCo and MC-10 ROMs use 8 rather than 16 bits 


Running a program step by step
  1. Every time the program looks for a new command, it starts by checking for the BREAK key, which is slow
  2. All characters from the program are read through a call to an inefficient subroutine
  3. Command tokens must be range checked before commands are executed
  4. The token must be used to look up the address of the command and then the command itself is called
  5. Commands must perform additional parsing and syntax checking, on completion, they return to the main loop
  6. Every GOTO and GOSUB requires searching through the linked list of lines
  7. Whenever a line such as an IF THEN skips the rest of a line, the interpreter must search for end of line byte by byte
  8. This sequence is repeated for every command in a line of code

No comments:

Post a Comment