CSCI 439 lab 4

CSCI 439: Lab 4 (Spring 2025)

Lab 4 can be obtained and submitted following our usual git processes (with the repo named "lab4" this time).

The repo contains four files:

tokenizer.h and tokenizer.cpp: for the tokenization code you'll be writing
parsing.h and parsing.cpp: for the parsing (rewriting) code you'll be writing
lab4.cpp: just a main routine/driver to call your tokenizer and parser and display the results
a makefile to compile the .cpp's into .o's and create executable lab4x

Correction to repo code pushed (10am Feb. 25th), tokenizing.h was missing main token
Other corrections: the tokenizing code is missing the true/false literals

Vurbossity revisions:

One more tweak on top of the lab3 version: we'll add keywords "true" and "false" as the boolean literals.

Lab 4 overview:

For lab 4 you'll be implementing a hand-crafted parser for vurbossity, written in C++ and following the specifications below.

The main routine, in lab4.cpp, calls the tokenize routine from lab3 to fill an array of tokens from vurbossity source code provided through standard input, and then passes that array (and its size) to the parse routine.

Assuming you're starting with valid tokening code (either your own from lab3 or from the posted sample solution for lab3), your sole objective for lab4 is to implement the parse function (and any helper functions it requires).

The parse function is prototyped in parsing.h and is to be implemented in parsing.cpp:

// parse the token sequence and rewrite as C,
//    writing the results to standard output,
// with any error messages directed to standard error
void parse(token tokens[], int size);

As suggested by the comments above, the parser's task is actually to translate valid vurbossity code into valid C++ code, plus produce helpful error messages if the vurbossity code itself is invalid.

You are free to use either a recursive-descent or bottom-up approach to the actual parsing, though I tend to recommend the former in this case as the design of vurbossity is well suited for simple and efficient recursive-descent parsing..

You do not have to perform context sensitive checking for lab4 (Once you have the parsing/translation working, adding the necessary symbol table/procedure table checks would be very similar to what you carried out in lab2, so we won't repeat that exercise.)

Specs and requirements for the parsing/translation to C++

The objective for the parser is to read the vurbossity token sequence, parse it for structural validity, and write equivalent C++ code that is reasonably formatted for reading, e.g.

Original vurbossity

Resulting C++

gdef x integer COM some global?

pdef printStuff left real r right
begin
   write "I was given"
   write r
end

main
begin
   vdef y real
   set x 33
   set y left add x 1.2 right
   call printStuff left y right
end

#include <iostream>
#include <string>
using namespace std;

long x;

void printStuff(double r)
{
   cout << "I was given" << endl;
   cout << r << endl;
}

int main()
{
   double y;
   x = 33;
   y = (x + 1.2);
   printStuff(y);
}

Fortunately, the features of vurbossity match very closely to a simple subset of C++, and once your parser recognizes the structure of a vurbossity feature it is generally very mechanical to write a C++ equivalent, e.g.

Your recursive descent function for parsing main sees the next statement begins with the 'set' token, so calls the parse_set function and tells it what position we've reached in the sequence of tokens.
The parse_set function checks that the next token (after the set) is an identifier, gets the name from the token contents (e.g. y) and outputs the start of the assignment statement (e.g. "y =").
It then calls the parse_expression function to read/print the expression that must come next, passing the up-to-date position in the token sequence (since now we've "used" the set and identifier tokens).
Once parse_expression returns we can assume the expression has been printed out, e.g. maybe it printed something like "(x + (a * b))", and the parse_set just needs to print the final ";" before returning to the parse_main (perhaps returning the position we got to in the token sequence).

Type and literal translation

The following type translation rules are to be used:

integer in vurbossity is long in C++
real in vurbossity is double in C++
text in vurbossity is string in C++
boolean in vurbossity is bool in C++

The vurbossity literals should exactly match those in C++ with the exception that C++ has size limits on longs and doubles. For the moment we'll simply assume the vurbossity code only uses literals within the acceptable C++ ranges (i.e. won't test with integer or real literals that are too big).

Stripping of comments

It is assumed comments are being stripped by the tokenizer and thus never reach the parser and do not need to appear in the C++ output.

Formatting of the C++ output

Because we want the output C++ to be human-friendly, we want a good layout to the output:

Each full statement (set, write, read, call, etc) is on its own line.
Statements are indented appropriately to their level of nesting in begin/end blocks. (This can be tracked by incrementing the current indentation level on entry to a block and decrementing on exit from the block.)
Opening and closing brackets for blocks should be on their own lines (at the appropriate level of indentation).
There should be at least one blank line separating function definitions from one another and from the main.

C++ preamble

To support the input/output and use of text literals, your parser can begin by always printing the first three expected lines of C++:

#include <iostream>
#include <string>
using namespace std;

Translation of the if loop

The 'if' portion of the if loop can be directly replaced by a while loop to give the desired functionality.

Note the 'else' portion in vurbossity is always executed once following the completion of the loop portion, but does need to be in a block of its own to ensure the scoping works out the same in the translation as the original, i.e.

while (condition from the if)
{
   ... the statements from the if body ...
}
{
   ... the statements from the else body ...
}

Function return values

Since vurbossity has no return types/statements the return type in the C++ functions will always be "void".

Input/output

Assume "cin >> SomeVarName;" and "cout << ExpressionOutput << endl;" are the desired I/O equivalents of read and write.

Only valid C++ statements should be written to stdout

If your program is generating any error messages, debugging messages, or informational messages they should all be going to stderr (cerr) not stdout (cout).

This allows the user to separate any valid generated code from all other output if they so desire.

Recommended approach

The recursive descent approach works through the token list, looking at the next token in the sequence to predict what kind of statement/feature we're about to parse and calling a function that specializes in that kind of statement/feature.

Each function knows what specific tokens and what kind of features to expect in what order, calls other parsing functions to process more complex features, and produces appropriate construction (C++ output in our case) to translate the specific tokens handled by this function.

I would suggest having one parsing function for each core type of statement (global variable declaration, procedure declaration, local variable declaration, set statement, output statement, input statement, if loop, procedure call, main routine declaration) and one to handle expressions.

Each parsing function would need to be passed the tokens array and its size, plus the current position we've reached within the tokens array (so the function knows where its tokens begin). The parsing function would carry out its portion of the translation and either update or return the new position (so the caller knows that the function 'used up' the tokens from positions k through p for instance).

This is precisely the approach discussed in the 'set statement' example near the start of the specifications section above.