Question: In this assignment, we are going to start working on the parser. The lexer's job is to make words, the parser's job is to make

In this assignment, we are going to start working on the parser. The lexer's job is to make "words", the parser's job is to make sure that those tokens are in an order that makes sense and make a data structure in memory that represents the program. The Tokens are in a 1-dimensional structure (array or list). The parser's output will be a 2-dimensional tree.

The parser works using recursive descent. We are encoding the rules into the structure of the program. There are a few simple rules that we will follow in writing our parser:

Each function represents some "phrase" in our language, like "if statement" or "assignment statement".

Each function must either succeed or fail:

On success, remove Tokens from the list and output a tree node

On failure, leave the list unchanged and return null.

When there are alternatives, the function must call each alternative's function until it finds one that is not null.

Details

Nodes

Let's start by building some tree nodes. Each tree node is just a Java class. We will start by defining Node as an abstract class. It should have an abstract ToString() method. Every tree node will extend Node and will end with the word "Node" - IfNode, WhileNode, etc. You should make any constructors for the Nodes that make sense as you go. The ToString() method will be your primary debugging/testing tool, so don't overlook that. The tree will be read-only, so mutators need not be created.

Make two new nodes - IntegerNode and RealNode - extending Node. IntegerNode needs to have an int member. RealNode needs to have a float member. Members should be private, with an accessor. Constructors should take the members as parameters.

Make sure to create a "MathOpNode" (also extending Node) - this represents a math operation. This node should have an enum for the math operations (add, subtract, etc.). It should have two sub-nodes for the left and right sides of the operation. You might thing that these should be IntegerNode or RealNode. But, of course, we have lots of other possibilities. For example, consider that we will need to represent (eventually) x+4. So, our left and right sub-nodes should be of type Node so that we can represent whatever we need.

Parser - setup

Make sure to create a Parser class (does not derive from anything). It must have a constructor that accepts your collection of Tokens. We will be treating the collection of tokens as a queue - taking off the front. It isn't necessary to use a Java Queue, but you may.

We will add three helper functions to parser. These should be private:

matchAndRemove - accepts a token type. Looks at the next token in the collection:

If the passed in token type matches the next token's type, remove that token and return it.

If the passed in token type DOES NOT match the next token's type (or there are no more tokens) return null.

expectEndsOfLine - uses matchAndRemove to match and discard one or more ENDOFLINE tokens. Throw a SyntaxErrorException if no ENDOFLINE was found.

peek - accepts an integer and looks ahead that many tokens and returns that token. Returns null if there aren't enough tokens to fulfill the request.

Parser - parse

Make sure to create a public parse method. There are no parameters, and it returns Node. This will be called from main once lexing is complete. For now, we are going to only parse a subset of Shank V2 - mathematical expressions. Parse should call expression() then expectEndOfLine() in a loop until either returns null. Don't worry about storing the return node but you should print it out (using ToString()) for testing.

The tricky part of mathematical expressions is order of operations. Fortunately, this is fairly easily solved in recursive descent by using a few functions. The intuition is this: accept tokens from the highest priority first. When you can't find any that fit the pattern, fall back to lower priority. The thing that makes this "weird" is that we will call the lowest level first. But it calls the next highest first thing. Let's take a look:

EXPRESSION = TERM { (plus or minus) TERM}

TERM = FACTOR { (times or divide or mod) FACTOR}

FACTOR = {-} number or lparen EXPRESSION rparen

Notice the curly braces - these indicate 0 or more.

expression(), term() and factor() will all be methods of our recursive descent parser. All of them must use matchOrRemove(). The formulae above are very literal. Consider the first one:

EXPRESSION = TERM { (plus or minus) TERM}

expression() calls term(). Then it (in a loop) looks for a + or -. If it finds one, it calls term again and constructs a MathOpNode using the two term() returns as the left and right. That MathOpNode becomes the left if there are more +/- sign. So 1+2+3 becomes:

MathOpNode(+,IntegerNode(1),(MathOpNode(+,IntegerNode(2),IntegerNode(3)))

Let's look at a complete example: 3*-4+5: our tokens will look something like this:

NUMBER TIMES MINUS NUMBER PLUS NUMBER ENDOFLINE- your token types might differ a little

We will call expression(). The first thing it does is look for a term by calling term(). term() calls factor() first thing. factor() will matchAndRemove MINUS and not find it. It then tries NUMBER succeeds.