Sample solution: Identifiers: [A-Z][a-zA-Z0-9]* Integers: [-]?[0-9]+ Reals: [.][0-9]+[x][+-][0-9]+ |
Sample solution: keywords and identifiers can't clash since keywords are all lowercase whereas identifiers must start with an uppercase, so no issues there reals (and their embedded x,+,-) can't conflict with anything else because of the leading . we do need to check integers before the - operator, however, since we'd want -12 to be treated as the integer -12 not the - operator followed by the integer 12 |
Token types: IDENTIFIER: [a-z]+ INTEGER: [0-9]+ ADD: [+] MULT: [*] Grammar rules: expression --> multexpr ADD expression expression --> multexpr multexpr --> value multexpr --> multexpr MULT multexpr value --> IDENTIFIER value --> INTEGER
Sample solution: The ambiguity stems from the multexpr --> multexpr MULT multexpr. If we have a token sequence like x * y * z either of the two * operations could be parsed first, producing either of the following parse trees: multexpr / \ / \ value multexpr | / \ IDENTIFIER multexpr multexpr equating to (x) | | (x * (y * z)) value value | | IDENTIFIER IDENTIFIER (y) (z) multexpr / \ / \ multexpr multexpr equating to / \ \ ((x * y) * z) multexpr multexpr value | | \ value value IDENTIFIER | | (z) IDENTIFIER IDENTIFIER (x) (y) |
Token types: IDENTIFIER: [a-z]+ INTEGER: [0-9]+ ADD: [+] ASSIGN: [=] MULT: [*] SEMI: [;] Grammar rules: assignment --> IDENTIFIER ASSIGN expr SEMI expr --> subexpr expr --> subexpr op expr subexpr --> val subexpr --> subexpr ASSIGN val val --> IDENTIFIER | INTEGER op --> ADD | MULT assignment is the 'starting' non-terminal
Sample solution: (i) precedence and associativity The leftmost assign (from the assignment rule) is a special case in that it will always get parsed out first (evaluated last) and there can only be one. The remainder of the discussion deals with precedence within the subexpr's. The only way ADD/MULT can be processed is through expr rules, and all the expr rules must be applied before moving on to subexprs, so all the ADD/MULTs must be parsed out before/above any of the ASSIGNs. This means they'll be higher in the parse tree and evaluated later than ASSIGNs (since the tree is constructed top-down but evaluated bottom-up). Thus ASSIGN has higher precedence thand ADD/MULT. ADD and MULT are treated interchangably within expr, so have the same associativity and precedence. In terms of associativity, if we consider x = y = z we can see in the subexpr rule that the rightmost = must be parsed first, since the thing to its right can only be a value (identifier or integer) Thus it parses like = / \ = z / \ x y which effectively means ((x = y) = z) so assignments evaluate left-to-right. The rules for +/* are the opposite: the expr --> subexpr op expr means the portion with any other */+ must appear in the right subtree, so x * y * z gives * / \ x * which effectively means (x * (y * z)) / \ and implies */+ evaluate right to left y z (ii) is this sensible? From a conventional viewpoint probably not: this is the opposite associativity to "normal" math, and the opposite associativity to the way most languages handle assignment, and the leftmost = operator has a different precedence level than all the other ones do. Consider a = b + c = d = e + f is permitted and gives something like a = (b + (((c = d) = e) + f))? (That doesn't mean it's unusable, and there may be a solid justification for it in some special-purpose language, but it would be counterintutitive for most programmers and probably not recommended in a general-purpose language.) |
Token types: Grammar rules IDENTIFIER: [a-z]+ program --> statements INTLIT: [0-9]+ statements --> stmt STRLIT: ["][^\n]*["] statements --> stmt statements ASSIGN: [=] stmt --> assignment SEMI: [;] stmt --> declaration INT: "integer" assignment --> IDENTIFIER ASSIGN value SEMI STR: "string" declaration --> IDENTIFIER type SEMI type --> STR type --> INT value --> INTLIT value --> STRLITBriefly outline the type/declaration checking that would need to be performed in each of the relevant rules.
Sample solution: We'd need some way to record/look up what variables have been declared and what type they were declared with, I'll assume here we're using some form of symbol table. We'd also need to evaluate the data type associated with the right hand value in the assignment statement, so here I'll assume we can associate types with the nonterminals as we evaluate rules. Given those assumptions, a workable checking approach would be: value --> INTLIT action: associate datatype integer with this 'value' nonterminal value --> STRLIT action: associate datatype string with this 'value' nonterminal type --> STR action: associate datatype string with this 'type' nonterminal type --> INT action: associate datatype integer with this 'type' nonterminal declaration --> IDENTIFIER type SEMI action: lookup the identifier in the symbol table, error if previously declared insert the identifier into the table with whatever datatype is associated with the 'type' nonterminal assignment --> IDENTIFIER ASSIGN value SEMI action: lookup the identifier in the symbol table, error if not declared yet check the datatype for that identifier (from the lookup) against the datatype associated with the 'value' nonterminal, error if the two datatypes don't match |