CSCI 439: Lab 4 (Fall 2022): building parse tree representation

The lab 4 repository contains a working (hopefully not buggy) tokenizer and recursive descent parser for the same grammar as was used in lab 3.

The scanner/parser executable, named lab4x, can be constructed by running the makefile found in the repo. The scanner/parser can then be used to check the structural validity of a source code file by running

As the parser runs it displays which language component it is currently attempting to parse, and whether it accepts the component or discovers an error. A sample source code file (with valid code) is included in the repository.

The tokenizing code is provided in tokens.h and tokens.cpp, the parsing code in parser.h and parser.cpp, and a main routine to coordinate them is in lab4.cpp.

Objectives:

The goal of the lab is to have the recursive descent parser construct an actual tree representation of the program as it runs, then print the constructed tree at the end of the program.

A "node" struct has been defined in parser.h, and each of the parsing function takes a (by reference) pointer to such a node as a parameter. Feel free to alter the contents of the node struct and/or replace it with a class.

Each of the recursive descent parsing functions should, if they parse successfully, build the subtree they represent and (through the reference node parameter) pass their constructed subtree along to the "parent" routine that called them.

Your task is to implement this construction, as well as a print routine that shows the tree content once constructed.

Your program does not need to perform any context sensitive checking (it does not need to check types match, variables are declared before use, etc) but it does need to ensure the tree nodes contain sufficient information to enable that, and to enable the eventual generation of code in some target language. Thus VAR nodes would need to include the variable name, INUM and RNUM nodes would need to include the actual literal values in use, and a data type field would be needed for asgn nodes, and recommended for VAR, INUM, and RNUM nodes.

Please try to make the tree-printing output as clear/easy to follow as possible!

Below is a run on the sample program included in the repo.
In the sample solution I added node ids and other information to the node struct, your output does not have to look like this

Successfully read file sample, 147 total chars

Tokenizing succeeded: token list is:
   "int"(INT) "foo"(VAR) ";"(TERM) "int"(INT) "x"(VAR) 
   ";"(TERM) "int"(INT) "i"(VAR) ";"(TERM) "real"(REAL) 
   "r"(VAR) ";"(TERM) "real"(REAL) "rubberducky"(VAR) ";"(TERM) 
   "i"(VAR) "="(EQ) "10"(INUM) ";"(TERM) "r"(VAR) 
   "="(EQ) "3.456"(RNUM) ";"(TERM) "print"(PRINT) "r"(VAR) 
   ";"(TERM) "rubberducky"(VAR) "="(EQ) "r"(VAR) ";"(TERM) 
   "print"(PRINT) "rubberducky"(VAR) ";"(TERM) "foo"(VAR) "="(EQ) 
   "x"(VAR) "="(EQ) "0"(INUM) ";"(TERM) "print"(PRINT) 
   "i"(VAR) ";"(TERM) 

PARSE: parsing prog from position 0
PARSE: parsing deflist from position 0
PARSE: parsing def from position 0
ACCEPT: def of var foo
PARSE: parsing deflist from position 3
PARSE: parsing def from position 3
ACCEPT: def of var x
PARSE: parsing deflist from position 6
PARSE: parsing def from position 6
ACCEPT: def of var i
PARSE: parsing deflist from position 9
PARSE: parsing def from position 9
ACCEPT: def of var r
PARSE: parsing deflist from position 12
PARSE: parsing def from position 12
ACCEPT: def of var rubberducky
PARSE: parsing deflist from position 15
ACCEPT: deflist -> def deflist
ACCEPT: deflist -> def deflist
ACCEPT: deflist -> def deflist
ACCEPT: deflist -> def deflist
ACCEPT: deflist -> def deflist
PARSE: parsing stmtlist from position 15
PARSE: parsing stmt from position 15
PARSE: parsing asgn from position 17
ACCEPT: assigning literal 10
ACCEPT: VAR EQ asgn for var i
PARSE: parsing stmtlist from position 19
PARSE: parsing stmt from position 19
PARSE: parsing asgn from position 21
ACCEPT: assigning literal 3.456
ACCEPT: VAR EQ asgn for var r
PARSE: parsing stmtlist from position 23
PARSE: parsing stmt from position 23
ACCEPT: print for var r
PARSE: parsing stmtlist from position 26
PARSE: parsing stmt from position 26
PARSE: parsing asgn from position 28
ACCEPT: assigning var r
ACCEPT: VAR EQ asgn for var rubberducky
PARSE: parsing stmtlist from position 30
PARSE: parsing stmt from position 30
ACCEPT: print for var rubberducky
PARSE: parsing stmtlist from position 33
PARSE: parsing stmt from position 33
PARSE: parsing asgn from position 35
PARSE: parsing asgn from position 37
ACCEPT: assigning literal 0
ACCEPT: VAR EQ asgn for var x
ACCEPT: VAR EQ asgn for var foo
PARSE: parsing stmtlist from position 39
PARSE: parsing stmt from position 39
ACCEPT: print for var i
PARSE: parsing stmtlist from position 42
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: stmtlist -> stmt stmtlist
ACCEPT: prog -> deflist stmtlist

Parsing succeeded, parse tree as follows:
NODE 72, type: prog, has 2 children: (24) (71)
   NODE 24, type: deflist, has 2 children: (0) (23)
      NODE 0, type: def, has 3 children: (1) (2) (3)
         NODE 1, type: INT, has 0 children:
         NODE 2, type: VAR(foo), has 0 children:
         NODE 3, type: TERM, has 0 children:
      NODE 23, type: deflist, has 2 children: (4) (22)
         NODE 4, type: def, has 3 children: (5) (6) (7)
            NODE 5, type: INT, has 0 children:
            NODE 6, type: VAR(x), has 0 children:
            NODE 7, type: TERM, has 0 children:
         NODE 22, type: deflist, has 2 children: (8) (21)
            NODE 8, type: def, has 3 children: (9) (10) (11)
               NODE 9, type: INT, has 0 children:
               NODE 10, type: VAR(i), has 0 children:
               NODE 11, type: TERM, has 0 children:
            NODE 21, type: deflist, has 2 children: (12) (20)
               NODE 12, type: def, has 3 children: (13) (14) (15)
                  NODE 13, type: REAL, has 0 children:
                  NODE 14, type: VAR(r), has 0 children:
                  NODE 15, type: TERM, has 0 children:
               NODE 20, type: deflist, has 2 children: (16) (nil)
                  NODE 16, type: def, has 3 children: (17) (18) (19)
                     NODE 17, type: REAL, has 0 children:
                     NODE 18, type: VAR(rubberducky), has 0 children:
                     NODE 19, type: TERM, has 0 children:
   NODE 71, type: stmtlist, has 2 children: (27) (70)
      NODE 27, type: stmt, has 4 children: (28) (29) (25) (30)
         NODE 28, type: VAR(i), has 0 children:
         NODE 29, type: EQ, has 0 children:
         NODE 25, type: asgn, has 1 children: (26)
            NODE 26, type: INUM(10), has 0 children:
         NODE 30, type: TERM, has 0 children:
      NODE 70, type: stmtlist, has 2 children: (33) (69)
         NODE 33, type: stmt, has 4 children: (34) (35) (31) (36)
            NODE 34, type: VAR(r), has 0 children:
            NODE 35, type: EQ, has 0 children:
            NODE 31, type: asgn, has 1 children: (32)
               NODE 32, type: RNUM(3.456), has 0 children:
            NODE 36, type: TERM, has 0 children:
         NODE 69, type: stmtlist, has 2 children: (37) (68)
            NODE 37, type: stmt, has 3 children: (38) (39) (40)
               NODE 38, type: PRINT, has 0 children:
               NODE 39, type: VAR(r), has 0 children:
               NODE 40, type: TERM, has 0 children:
            NODE 68, type: stmtlist, has 2 children: (43) (67)
               NODE 43, type: stmt, has 4 children: (44) (45) (41) (46)
                  NODE 44, type: VAR(rubberducky), has 0 children:
                  NODE 45, type: EQ, has 0 children:
                  NODE 41, type: asgn, has 1 children: (42)
                     NODE 42, type: VAR(r), has 0 children:
                  NODE 46, type: TERM, has 0 children:
               NODE 67, type: stmtlist, has 2 children: (47) (66)
                  NODE 47, type: stmt, has 3 children: (48) (49) (50)
                     NODE 48, type: PRINT, has 0 children:
                     NODE 49, type: VAR(rubberducky), has 0 children:
                     NODE 50, type: TERM, has 0 children:
                  NODE 66, type: stmtlist, has 2 children: (57) (65)
                     NODE 57, type: stmt, has 4 children: (58) (59) (53) (60)
                        NODE 58, type: VAR(foo), has 0 children:
                        NODE 59, type: EQ, has 0 children:
                        NODE 53, type: asgn, has 4 children: (54) (55) (51) (56)
                           NODE 54, type: VAR(x), has 0 children:
                           NODE 55, type: EQ, has 0 children:
                           NODE 51, type: asgn, has 1 children: (52)
                              NODE 52, type: INUM(0), has 0 children:
                           NODE 56, type: TERM, has 0 children:
                        NODE 60, type: TERM, has 0 children:
                     NODE 65, type: stmtlist, has 2 children: (61) (nil)
                        NODE 61, type: stmt, has 3 children: (62) (63) (64)
                           NODE 62, type: PRINT, has 0 children:
                           NODE 63, type: VAR(i), has 0 children:
                           NODE 64, type: TERM, has 0 children: