A Toy C Compiler: Code Patterns and Code Generation

The toy C compiler has been built using lexing and parsing tools Flex and Bison. Code patterns for a simple (abstract) stack machine have been devised for each grammatical construction using a specification notation based on abstract recursive functions. Each pattern has been hooked manually as a semantic action into a Bison production. The compiler generates code to a file as soon as all the information needed about each language construct is available (i.e., after parsing and static checking). There is no need to construct an abstract syntax tree or to accumulate code in non-terminal attributes. This is because of the subset of the C language chosen.

The following is an example of the code pattern associated with C's expression pattern E1 && E2.

The code pattern specifies how to generate the code that, once executed, will evaluate in short-circuit the application of the logical operator && to the two subexpressions E1 and E2:

gen(E1 && E2) =
   gen(E1) ; Generate the code that will evaluate E1 leaving the result on top of the stack.
   POP ; Pop to accumulator register.
   JZ X ; Jump to label if false.
   gen(E2) ; Generate the code that will evaluate E2 leaving the result on top of the stack.
   POP ; Pop to accumulator register.
   X: PUSH ; Leave the result on top of the stack.
where X = NewLabel()

The abstract specification function gen() is recursive on the structure of the grammar.

We can almost automatically create the following Bison production for it. Non-terminals replace recursive calls and semantic actions replace the code between recursive calls.

expr   :   expr "&&"
{
  char *X = newlabel();
  emit("POP","");
  emit("JZ",X);
  $<label>$=X;
}
  expr
{
  emit("POP","");
  emit($<label>3, "PUSH");
} ;

The source code [c-compiler.zip] has been written in C and has been developed in a GNU/Linux platform. It also contains test example programs in C, thorough documentation in Spanish [memoria.ps.gz], and some GNU/Linux ix86 executables, including the stack machine's symbolic assembler, called ass [sic].

Type make parser after decompressing.

Type ./parser -h for options.

Run ./ass on your compiled file to execute it.

The ass program is extremely primitive. It worked on Linux. User input doesn't work on Solaris/SunOS. The VT-100 terminal emulator seems the culprit, and it's a dirty and daunting code.

Note: If you read the documentation, you'll see that the size of the parameter segment is "passed" to the target procedure in the TMP register. A cleaner solution is to pass it as an extra parameter pushed first on the stack so that it is the first value read by the target procedure, which can now set its PS pointer accordingly to access the remaining arguments.