Question | Mark |
1. Tokenizing IttyBitty in Ruby [10 marks] | |
2. Auto-generating scripts (cmake) [10 marks] | |
3. Parse trees and symbol tables [10 marks] | |
4. Grammars and code translation [10 marks] | |
5. C macros and C++ templates [10 marks] | |
Exam Total (Best 4) [40 marks] |
Question 1: Tokenizing IttyBitty in Ruby [10]
Given the IttyBitty specifications and Ruby tokenizer studied this term (copies are appended to the end of the exam), suppose we wanted to replace the 'integer' token type with a set of token types representing positive integers of different bases:
(i) Show the modifications necessaray for the IttyBitty grammar rules to support this change (i.e. what happens to the rule Value --> integer, and what new rules need to be added?).
(ii) Show an exact set of changes/additions that would be needed in order for the Ruby tokenizer to support the revised language specifications. (Note: for parts (i) and (ii) you only need to show changes/additions, you do not need to copy/rewrite the entire grammar or tokenizer.)
Sample solution: (i) We need to replace the rule for integer with a set of rules for the different types of integer, possibly making Integer a new non-terminal, e.g. the old rule was Value --> integer the new rules could be Value --> Integer Integer --> binary Integer --> decimal Integer --> octal Integer --> hexadecimal Similarly the integer regex in the original grammar would be removed, and the regexes for the four new integer categories would be inserted instead, i.e. binary b[0-1]+ octal o[0-7]+ decimal d[0-9]+ hexadecimal x[0-9a-f]+ Note that the variable regex would be impacted by this, since now there is ambiguity with tokens whose pattern is x[a-f]+ ... each of these could be a valid hex pattern or a valid identifier under the current rules. One solution would be to specify that variable identifiers could not begin with the letter x, e.g. variable [a-wyzA-WYZ]+ (ii) The changes to the ruby tokenizer would mimic those listed above, the when case for integers would need to be replaced with four when cases, one for each of the regexes identified in (i), and either (a) the when case for variables would need to have its regex adjusted to remove ambiguity, or (b) we would need to define which interpretation was correct for the x[a-f]+ case, and ensure that interpretation's when case came first in the code block. |
Question 2: Auto-generating scripts (cmake) [10]
The cmake program generates a custom makefile for a project
by
(i) examining source code for dependencies (e.g. detecting which files #include which others),
(ii)
obtaining user responses to a set of project configuration questions provided by the project
developer,
(iii) applying the data
from parts (i) and (ii) to a template, also provided by the project developer.
1. Does what the project developer does constitute metaprogramming (the developer providing the template and the project configuration questions)? Why or why not? Clearly justify your answer.
2. Does writing the cmake software itself constitute metaprogramming? Why or why not? Clearly justify your answer.
Sample solution: For both of these cases the important part is the justification you provide, not your actual choice of yes/no. |
Question 3: Parse trees and symbol tables [10]
Many high level languages support both integer and floating point numeric data, using the single '+' operator to represent both forms of addition.
However, most assembly languages use different instructions for integer addition than for floating point addition, e.g. ADD vs ADDFLT.
During the process of compilation, the compiler must then identify which is the correct type of instruction to use in the resulting computation.
Describe, in terms of parse trees and symbol tables, how a compiler is able to do this. Use the expression 3.2 * y + 4 - x as an example.
Sample solution: To determine which type of operation should be used in any given computation, the compiler must be able to identify the data types of the operands. If the operands were simple numeric values it would be relatively easy for the compiler to simply consider the token type, and use a fixed set of rules for operation selection, e.g. float op float: use float int op int: use int float op int: use ? float int op float: use ? int However, if either of the operands is a variable, then the data type of that variable must be determined, which is where the symbol table comes in. In a statically typed language, the type of the variable would generally be determined in its declaration statement, and recorded in the symbol table to be looked up later, e.g. if we saw an instruction like int x,y; we would add x and y to the table along with their now known type. Furthermore, if the operands are expressions, then those expressions must first be evaluated to determine their resulting types before identifying the appropriate data type for the current op. This is where the parse tree comes in, e.g. given the expression 3.2*y+4-x, apply the grammar rules to produce a parse tree identifying the "meaning" of the expression suppose the resulting parse tree is as shown on the left below, then the tree can be evaluated bottom-up to identify the types of the resulting operands for each operator - (iv) once the + type is known from (iii), and the type of x is known from the / \ symbol table, we can determine the type of - op to use + x (iii) once the type of * op is know, and knowing the type of 4, / \ we can determine the + type * 4 (ii) from the types determined in (i), we know the float*int rule should / \ be used to determine the right * op to use 3.2 y (i) 3.2 is a float, look up y in the symbol table, say it's an int |
Question 4: Grammars and code translation [10]
Suppose we are working on a translator to take source code written in one language and translate it to another language.
For this question, we consider specifically the problem of translating sequences of if-then-else cases from one language to another.
The table below shows the grammar rules for the two languages, with non-terminals shown in uppercase beginning with an underscore.
Given a parse tree for an if-else sequence in language I, outline (pseudocode + discussions) what the translator would need to do to produce the logically-comparable parse tree for language II.
LANGUAGE I _IFELSESEQ --> _IF _IFELSESEQ --> _IF _ELSESEQ _ELSESEQ --> _ELSE _ELSESEQ --> _ELSIF _ELSESEQ --> _ELSIF _ELSESEQ _IF --> if _CONDITION _BLOCK _ELSIF --> elsif _CONDITION _BLOCK _ELSE --> else _BLOCK | LANGUAGE II _IFELSESEQ --> (cond _CONDLIST ) _CONDLIST --> _CONDCASE _CONDLIST --> _CONDCASE _CONDLIST _CONDCASE --> ( _CONDITION _BLOCK ) |
Sample solution: If we look at the trees for a typical structure, we can see the translation patterns we need to apply, e.g. (shortening _CONDITION and _BLOCK below) The first version for a statement with an if, elsif, elsif, else: _IFELSESEQ / \ _IF _ELSESEQ / | \ / \ if _CONDT _BLK _ELSIF _ELSESEQ / | \ / \ elsif _CONDT _BLK _ELSIF _ELSESEQ / | \ | elsif _CONDT _BLK _ELSE / \ else BLOCK The corresponding cond version: _IFELSESEQ / | \ (cond _CONDLIST ) / \ _CONDCASE _CONDLIST / | / \ (_CONDT _BLK) _CONDCASE _CONDLIST / | / \ (_CONDT _BLK) _CONDCASE _CONDLIST / | | (_CONDT _BLK) _CONDCASE / | (_CONDT _BLK) To transform from the first to the second: we introduce a (cond subtree on the left at the top level, a ) subtree on the right at the top level, then treat that first _CONDT _BLOCK as an _ELSESEQ for each _ELSESEQ, we replace the _ELSIF with a _CONDLIST - the _CONDT and _BLOCK subtrees are identifcal, only the surrounding syntax changes, from the elsif to the (cond .... ) for the final else case we simply replace with a _CONDCASE, substituting (t for the else |
Question 5: C macros and C++ templates [10]
Suppose in our C source code we want to be able to use round brackets ( ) for block delimiters in our if/else if/else statements, instead of the traditional curly brackets { }
We decide to try the following set of macros to take our customized C variant and transform it into valid C code just prior to compilation:
#define if(...) if(__VA_ARGS__) Block #define else else Block #define Block(...) { __VA_ARGS__ }(i) Would our macros correctly translate the following into valid C?
if ( x < y ) ( a = 1 ; b = 2 ; ) else ( b = 1 ; a = 2 ; )Clearly show the sequence of macro expansions to justify your answer.
(ii) Would the macros work for single-statement C blocks, e.g.?
if ( x < y ) a = 0 ; else if ( x > y ) a = 1 ; else a = 2 ;Again, show the macro expansions needed to justify your answer.
(iii) Using C++ templates, write a type-safe variadic min function using the < operator for comparisons, e.g. the following would each be valid calls (assuming the parameters passed were compatible with < and each other).
min(x,y,z) min(x) min(a,b,c,d,e)
Sample solution: (i) Yes, the expansion would produce the following: if( x < y ) { a = 1 ; b = 2 ; } else { b = 1 ; a = 2 ; } Expansion process: if (x < y) ( a = 1 ; b = 2 ; ) after the first rule becomes if (x < y) Block ( a = 1 ; b = 2 ; ) after the third rule becomes if (x < y) { a = 1 ; b = 2 ; } Similarly, else ( b = 1 ; a = 2 ; ) after the second rule becomes else Block ( b = 1 ; a = 2 ; ) after the third rule becomes else { b = 1 ; a = 2; } (ii) No, the resulting expansion would produce the following (leftover Blocks): if( x < y ) Block a = 0 ; else Block if( x > y ) Block a = 1 ; else Block a = 2 ; Expansion process: if (x < y ) a = 0; after the first rule becomes if (x < y) Block a = 0; but the third rule doesn't apply now, because there are no ( ) around the a = 0, so the Block sticks in our expanded code. (Similar problems arise in the expansions of the else's) (iii) template <typename T> T min(T x) { return x; } template <typename T, typename... Args> T min(T x, Args... args) { T tmp = min(args...); return ((x<tmp)?x:tmp); } |
# The IttyBitty Language # ====================== # # A program consists of one or more statements, # with the permissible token types and list of statement # grammar rules described below # # Note that tokens must all be whitespace delimited, # token names begin with a lowercase letter, # non-terminal language component names begin with an uppercase letter # # Comments in IttyBitty begin with a # and go to eoln # and are removed by the tokenizer # Tokens: # ======= # variable [a-zA-Z]+ # integer [-]?[0-9]+ # quote " # text between quotes, this is any block of non-whitespace # characters that does not contain a quote # mathop +, -, *, /, % # assignop = # compop ==, <, !=, <=, >=, > # keyword if, while, print # bracket ( ) # delimiter { } # unknown any otherwise unrecognized string of non-whitespace characters # Grammar rules # ============= # Program --> StatementList # StatementList --> Statement # StatementList --> Statement StatementList # Statement --> VarAssign # Statement --> Loop # Statement --> Conditional # Statement --> Print # VarAssign --> variable assignop Expression # Print --> print Value # Print --> print Text # Text --> quote WordList quote # WordList --> text # WordList --> text WordList # Conditional --> if Condition Block # Loop --> while Condition Block # Condition --> bracket-( Value compop Value bracket-) # Value --> variable # Value --> integer # Block --> delimiter-{ StatementList delimiter-} # Expression --> Value # Expression --> Value mathop Expression # Sample program # -------------- # print " Welcome! " # mid = 3 # x = -4 # y = 10 # while ( x < y ) { # if ( x == mid ) { # print " mid point reached " # } # print " x is " # print x # x = x + 1 # } |
# a tokenizer for the IttyBitty language # # the tokenize method takes one or more lines of source code, # and produces an array of tokens, each of the form [ type, value ] # where the value is the actual text string for the token # and the token type is one of the following strings: # quote, text, keyword, compop, assignop, bracket, delimiter, integer, mathop, variable, unknown # the tokenizer strips out any comments in a line before tokenizing the line def tokenize(source) # first split/join the lines of text into a single array of words rawTokens = Array.new source.each_line do | line | # strip out any comments in the line before adding its tokens to the list cleanline = "" if line.include?("#") then pos = line.index("#") if pos > 0 then cleanline = line[0..(pos-1)] end else cleanline = line end rawTokens += cleanline.split end # now translate each token into a token-type, token-value pair # and put into the tokenlist tokenlist = Array.new numTokens = 0 # while matching tokens, keep track of whether or not we are # currently inside a text literal inText = false rawTokens.each do | token | # see if we are entering/leaving a text literal if token == "\"" then inText = !inText # flip state in/out of text mode ttype = 'quote' elsif inText then ttype = 'text' # everything inside " " is text else # match each token against the regular expression for the # corresponding token type case token when "if", "while", "print" ttype = 'keyword' when "==", "<", "!=", ">", ">=", "<=" ttype = 'compop' when "=" ttype = 'assignop' when "(", ")" ttype = 'bracket' when "{", "}" ttype = 'delimiter' when /^[-]?[0-9]+$/ ttype = 'integer' when "-","+", "*", "/", "%" ttype = 'mathop' when /^[a-zA-Z]+$/ ttype = 'variable' else puts "Error: invalid token encountered: #{token}" ttype = 'unknown' end end # the token value is simply the text for that token tokenpair = [ ttype, token ] tokenlist[numTokens] = tokenpair numTokens += 1 end # return the final list of token type/value pairs return tokenlist end |
# parser for the IttyBitty language # - produces an array-based representation of the program structure, # annotated with component identification # # A summary of what the parser produces for each component type # is provided below. # # Notes: # StatementList returns a non-empty array of statements, which is # in Program and Block, and referred to below as a StatementsArray # Integer values are stored as their text representation # # Program --> ['Program',StatementsArray] # # Statement --> ['Assign', varname, opname, Expression ] # --> ['Print', Value ] # --> ['Print', ['Text', ArrayOfWords] ] # --> ['If', ['Cond', Value, opname, Value ], # ['Block', StatementsArray]] # --> ['While', ['Cond', Value, opname, Value ], # ['Block', StatementsArray]] # # Value --> ['Variable', varname] # --> ['Integer', value] # # Expression --> Value # --> [ 'Expression', Value opname Expression ] # parsing debug flag (true if parser debugging turned on, false otherwise) PRSDEBUG = false # $ymtable: parsing symbol table (global) # ======== # hash table of variable names and current initialization state, # true if initialized, valse otherwise # the hash table is cleared whenever the parser is called on a # Program component $ymtable = {} # Track the current active scope id, $SID $SID = -1 # (parse tokenlist) builds a parse tree from the token list for a program # ================= # (parse tokenlist component position scopes) - builds a parse (sub)tree for the # specified component, assuming it begins at the specified positon in the token list # scopes is a list of the current active scopes, newest scope at the end # scope id's are unique, based on the position of their first token in the tokenlist, # which is 0 for global scope, the position of the { token for blocks # # a parser for the IttyBitty language # checks if a source code is syntactically correct for the language, # and builds a parse tree # expects three parameters: # the overall token list for the source code # the type of language component we are checking for next in the token list # (defaults to an entire valid program) # the token list position where parsing for the component should begin # (defaults to the start of the token list) # returns two values: # if parsing succeeds it returns the parse tree # and the position of the token immediately after the component # otherwise # it returns false and the position originally passed to it def parse(tokenlist, component="Program", position=0, scopes=[]) if PRSDEBUG then puts "\n---parsing #{component} from position #{position} (#{tokenlist[position]}) ---" end # make sure we're not off the end of the token list if tokenlist.length <= position then return false,tokenlist.length end # default parse tree is empty array parsetree = Array.new # based on the component type, try to parse the tokenlist appropriately case component when "Program" # initialize $ymtable.clear # make sure we start with a clean table $SID = 0 # set the current scope id to 0 (global) scopes << $SID # add the global scope to the list of active scopes # Program --> StatementList parsetree,pos = parse(tokenlist, "StatementList", position, scopes) if parsetree == false then return false, position elsif tokenlist.length > pos then puts "---parsing successful(?) but unused tokens from position #{pos} ---" return false, position end return ['Program',parsetree], pos when "StatementList" parsetree,pos = parse(tokenlist, "Statement", position, scopes) # StatementList --> Statement if parsetree == false then return false,position # StatementList --> Statement StatementList else backuppos = pos backuptree = parsetree parsetree,pos = parse(tokenlist, "StatementList", pos, scopes) if parsetree == false then return [backuptree],backuppos else return [backuptree] + parsetree,pos end end when "Statement" # Statement --> Print parsetree,pos = parse(tokenlist, "Print",position, scopes) if parsetree != false then return parsetree,pos end # Statement --> VarAssign parsetree,pos = parse(tokenlist, "VarAssign",position, scopes) if parsetree != false then return parsetree,pos end # Statement --> Loop parsetree,pos = parse(tokenlist, "Loop",position, scopes) if parsetree != false then return parsetree,pos end # Statement --> Conditional parsetree,pos = parse(tokenlist, "Conditional",position, scopes) if parsetree != false then return parsetree,pos end return false,position when "VarAssign" # VarAssign --> variable = Expression if tokenlist.length < position+3 then return false,position end var = tokenlist[position] op = tokenlist[position+1] if var[0] == 'variable' and op[0] == 'assignop' then parsetree,pos = parse(tokenlist, "Expression",position+2, scopes) if parsetree != false then # if the variable is not in the symbol table, # add it with value [ $SID ] (the active scope id) if !$ymtable.has_key?(var[1]) then $ymtable[var[1]] = [ $SID ] varscopes = $ymtable[var[1]] else # otherwise, get the variable's entry from the symbol table # (its list of scopes in which it has been initialized) # if $SID is not in that list, add it and update # variable's entry in the scopes list varscopes = $ymtable[var[1]] if !varscopes.include?($SID) then varscopes << $SID $ymtable[var[1]] = varscopes end end return ['Assign', var[1], op[1], parsetree],pos end end return false,position when "Print" if tokenlist.length < position+2 then return false,position end prt = tokenlist[position] if prt[0] != 'keyword' or prt[1] != 'print' then return false,position end parsetree,pos = parse(tokenlist, "Value", position+1, scopes) if parsetree != false then return ['Print', parsetree],pos end # Print --> print text parsetree,pos = parse(tokenlist, "Text", position+1, scopes) if parsetree != false then return ['Print', parsetree],pos end return false,position when "Value" if tokenlist.length < position+1 then return false,position end val = tokenlist[position] # Value --> variable if val[0] == 'variable' then # if the variable is not yet in the symbol table add it now as uninitialized if !$ymtable.has_key?(val[1]) then $ymtable[val[1]] = [] end # get the variable's list of scopes from the symbol table, # and compare it to the list of active scopes # if there is no overlap between the two then the # variable has not been initialized, # so generate a warning message varscopes = $ymtable[val[1]] overlap = varscopes & scopes if overlap.length < 1 then puts "WARNING: use of uninitialized variable #{val[1]}" end return ['Variable', val[1]],position+1 # Value --> integer elsif val[0] == 'integer' then return ['Integer',val[1]],position+1 end return false,position when "Conditional" if tokenlist.length <= position then return false,position end iftok = tokenlist[position] # Conditional --> if Condition Block if iftok[0] == 'keyword' and iftok[1] == 'if' then condtree,newpos = parse(tokenlist, "Condition",position+1, scopes) if condtree != false then blocktree,lastpos = parse(tokenlist, "Block",newpos, scopes) if blocktree != false then return ['If',condtree,blocktree],lastpos end end end return false,position when "Loop" if tokenlist.length <= position then return false,position end whiletok = tokenlist[position] # Loop --> while Condition Block if whiletok[0] == 'keyword' and whiletok[1] == 'while' then condtree,newpos = parse(tokenlist, "Condition",position+1, scopes) if condtree != false then blocktree,lastpos = parse(tokenlist, "Block",newpos, scopes) if blocktree != false then return ['While',condtree,blocktree],lastpos end end end return false,position when "Condition" # Condition --> ( Value comparisonop Value ) if tokenlist.length < position+5 then return false,position end br1 = tokenlist[position] if br1[0] == 'bracket' and br1[1] == "(" then val1tree,pos = parse(tokenlist, "Value",position+1, scopes) if val1tree != false then op = tokenlist[position+2] if op[0] == 'compop' then val2tree,lastpos = parse(tokenlist, "Value",position+3, scopes) if val2tree != false then br2 = tokenlist[position+4] if br2[0] == 'bracket' and br2[1] == ')' then return ['Cond',val1tree,op[1],val2tree],position+5 end end end end end return false,position when "Block" # Block --> { StatementList } if tokenlist.length < position+1 then return false,position end br1 = tokenlist[position] if br1[0] == 'delimiter' and br1[1] == '{' then oldSID = $SID # remember the current scope id $SID = position # capture the new scope id scopes << $SID # add the id for the current scope to the active list parsetree,pos = parse(tokenlist, "StatementList",position+1, scopes) scopes.pop # remove the scope id, back to the previous scope list $SID = oldSID # restore the correct active scope id if parsetree != false then if tokenlist.length < pos+1 then return false,position end br2 = tokenlist[pos] if br2[0] == 'delimiter' and br2[1] == '}' then return ['Block',parsetree],pos+1 end end end return false,position when "Expression" # Expression --> Value parsetree,pos = parse(tokenlist, "Value",position, scopes) if parsetree != false then if tokenlist.length <= pos return parsetree,pos end backuppos = pos # Expression --> Value mathop Expression op = tokenlist[pos] if op[0] == 'mathop' then exprtree,pos = parse(tokenlist, "Expression",pos+1, scopes) if exprtree != false then return ['Expression', parsetree, op[1], exprtree],pos end end return parsetree,backuppos end return false,position when "Text" # Text --> " WordList " if tokenlist.length <= position then return false,position end q1 = tokenlist[position] if q1[0] == 'quote' then parsetree,pos = parse(tokenlist, "WordList",position+1, scopes) if parsetree != false then if tokenlist.length <= pos then return false,position end q2 = tokenlist[pos] if q2[0] == 'quote' then return ['Text', parsetree],pos+1 end end end return false,position when "WordList" # WordList --> text if tokenlist.length <= position then return false,position end word = tokenlist[position] if word[0] == 'text' then # WordList --> text WordList wordstree,pos = parse(tokenlist, "WordList",position+1, scopes) if wordstree != false then return [word[1]]+wordstree,pos end return [word[1]],position+1 end return false,position else # this else is for the overall component case puts "Error, invalid component type specified: #{component}" return false,position end # return the results of parsing as the suggested component type # should not be reachable? puts "---???Reached default end of parse???---" return parsetree,position end |