Question: Use a command line argument for the input file for processing tokens Recognize the new tokens > void float Be sure to change token file

Use a command line argument for the input file for processing tokens

Recognize the new tokens > void float Be sure to change token file appropriately and then run TokenSetup as required

Include line number information within tokens for subsequent error reporting, etc.

Print each token with line number

READLINE: program { int i int j program left: 0 right: 6 line: 1 { left: 8 right: 8: line: 1 int left: 10 right: 12 line: 1 i left: 14 right: 14 line: 1 int left: 16 right: 18 line: 1 j left: 20 right: 20 line: 1 . .

Output the source program with line numbers (since SourceReader reads the source lines it should save the program in memory for printout after the tokens are scanned):

e.g.

1. program { int i int j

2. i = 2

3. j = 3

4. i = write(j+4)

5. }

If you encounter an error, e.g. you find a "%" on line 7 of the source file that contains 20 lines, then you should:

report the error

stop processing tokens at that point

echo the lines of the source file with line numbers up to and including the error line - e.g., echo lines 1 through 7 inclusive in the case with the "%" on line 7

exit

Comments on this lab:

The main method in Lexer.java uses a specific file (e.g. simple.x) for processing tokens. You are required to modify the code to use a command line argument for the file for processing tokens.

TokenSetup[1] should NOT be changed

The tokens file should be changed appropriately

Use the main method in Lexer.java for testing - DO NOT use other packages in the compiler (e.g. Compiler.java) since there will always be complaints due to not recognizing the new constructs

[1] You should avoid changing the file path in TokenSetup by doing the following: 1. Right click on the project in Netbeans and select "Properties" in the drop-down menu 2. Select the "run" option 3. Set the "Working Directory" to the location of the src directory on your system Now, the relative paths are correct and there is no need to change TokenSetup.java

Sample expected output for the case with no errors:

READLINE: program { int i int j program left: 0 right: 6 line: 1 { left: 8 right: 8: line: 1 int left: 10 right: 12 line: 1 i left: 14 right: 14 line: 1 int left: 16 right: 18 line: 1 j left: 20 right: 20 line: 1 . .

1. program { int i int j

2. i = 2

3. j = 3

4. i = write(j+4)

5. }

Provided Code to be modified:

Lexer.java:

package lexer;

/** * The Lexer class is responsible for scanning the source file * which is a stream of characters and returning a stream of * tokens; each token object will contain the string (or access * to the string) that describes the token along with an * indication of its location in the source program to be used * for error reporting; we are tracking line numbers; white spaces * are space, tab, newlines */ public class Lexer { private boolean atEOF = false; private char ch; // next character to process private SourceReader source; // positions in line of current token private int startPosition, endPosition;

public Lexer(String sourceFile) throws Exception { new TokenType(); // init token table source = new SourceReader(sourceFile); ch = source.read(); }

/* public static void main(String args[]) { Token tok; try { Lexer lex = new Lexer("simple.x"); while (true) { tok = lex.nextToken(); String p = "L: " + tok.getLeftPosition() + " R: " + tok.getRightPosition() + " " + TokenType.tokens.get(tok.getKind()) + " "; if ((tok.getKind() == Tokens.Identifier) || (tok.getKind() == Tokens.INTeger)) p += tok.toString(); System.out.println(p + ": "+lex.source.getLineno()); } } catch (Exception e) {} } */ /** * newIdTokens are either ids or reserved words; new id's will be inserted * in the symbol table with an indication that they are id's * @param id is the String just scanned - it's either an id or reserved word * @param startPosition is the column in the source file where the token begins * @param endPosition is the column in the source file where the token ends * @return the Token; either an id or one for the reserved words */ public Token newIdToken(String id,int startPosition,int endPosition) { return new Token(startPosition,endPosition,Symbol.symbol(id,Tokens.Identifier)); }

/** * number tokens are inserted in the symbol table; we don't convert the * numeric strings to numbers until we load the bytecodes for interpreting; * this ensures that any machine numeric dependencies are deferred * until we actually run the program; i.e. the numeric constraints of the * hardware used to compile the source program are not used * @param number is the int String just scanned * @param startPosition is the column in the source file where the int begins * @param endPosition is the column in the source file where the int ends * @return the int Token */ public Token newNumberToken(String number,int startPosition,int endPosition) { return new Token(startPosition,endPosition, Symbol.symbol(number,Tokens.INTeger)); }

/** * build the token for operators (+ -) or separators (parens, braces) * filter out comments which begin with two slashes * @param s is the String representing the token * @param startPosition is the column in the source file where the token begins * @param endPosition is the column in the source file where the token ends * @return the Token just found */ public Token makeToken(String s,int startPosition,int endPosition) { if (s.equals("//")) { // filter comment try { int oldLine = source.getLineno(); do { ch = source.read(); } while (oldLine == source.getLineno()); } catch (Exception e) { atEOF = true; } return nextToken(); } Symbol sym = Symbol.symbol(s,Tokens.BogusToken); // be sure it's a valid token if (sym == null) { System.out.println("******** illegal character: " + s); atEOF = true; return nextToken(); } return new Token(startPosition,endPosition,sym); }

/** * @return the next Token found in the source file */ public Token nextToken() { // ch is always the next char to process if (atEOF) { if (source != null) { source.close(); source = null; } return null; } try { while (Character.isWhitespace(ch)) { // scan past whitespace ch = source.read(); } } catch (Exception e) { atEOF = true; return nextToken(); } startPosition = source.getPosition(); endPosition = startPosition - 1;

if (Character.isJavaIdentifierStart(ch)) { // return tokens for ids and reserved words String id = ""; try { do { endPosition++; id += ch; ch = source.read(); } while (Character.isJavaIdentifierPart(ch)); } catch (Exception e) { atEOF = true; } return newIdToken(id,startPosition,endPosition); } if (Character.isDigit(ch)) { // return number tokens String number = ""; try { do { endPosition++; number += ch; ch = source.read(); } while (Character.isDigit(ch)); } catch (Exception e) { atEOF = true; } return newNumberToken(number,startPosition,endPosition); } // At this point the only tokens to check for are one or two // characters; we must also check for comments that begin with // 2 slashes String charOld = "" + ch; String op = charOld; Symbol sym; try { endPosition++; ch = source.read(); op += ch; // check if valid 2 char operator; if it's not in the symbol // table then don't insert it since we really have a one char // token sym = Symbol.symbol(op, Tokens.BogusToken); if (sym == null) { // it must be a one char token return makeToken(charOld,startPosition,endPosition); } endPosition++; ch = source.read(); return makeToken(op,startPosition,endPosition); } catch (Exception e) {} atEOF = true; if (startPosition == endPosition) { op = charOld; } return makeToken(op,startPosition,endPosition); } }

Symbol.java:

package lexer;

/**

* The Symbol class is used to store all user strings along with

* an indication of the kind of strings they are; e.g. the id "abc" will

* store the "abc" in name and Sym.Tokens.Identifier in kind

*/

public class Symbol {

private String name;

private Tokens kind; // token kind of symbol

private Symbol(String n, Tokens kind) {

name=n;

this.kind = kind;

}

// symbols contains all strings in the source program

private static java.util.HashMap symbols = new java.util.HashMap();

public String toString() {

return name;

}

public Tokens getKind() {

return kind;

}

/**

* Return the unique symbol associated with a string.

* Repeated calls to symbol("abc") will return the same Symbol.

*/

public static Symbol symbol(String newTokenString, Tokens kind) {

Symbol s = symbols.get(newTokenString);

if (s == null) {

if (kind == Tokens.BogusToken) { // bogus string so don't enter into symbols

return null;

}

//System.out.println("new symbol: "+u+" Kind: "+kind);

s = new Symbol(newTokenString,kind);

symbols.put(newTokenString,s);

}

return s;

}

}

TokenSetup.java: *DO NOT CHANGE. ONLY FOR REFERENCE*

package lexer.setup;

import java.util.*;

import java.io.*;

/**

* TokenSetup class is used to read the tokens from file tokens

* and automatically build the 2 classes/files TokenType.java

* and Sym.java

* Therefore, if there is any change to the tokens then we only need to

* modify the file tokens and run this program again before using the

* compiler

*/

public class TokenSetup {

private String type, value; // token type/value for new token

private int tokenCount = 0;

private BufferedReader in;

private PrintWriter table, symbols; // files used for new classes

public static void main(String args[]) {

new TokenSetup().initTokenClasses();

}

TokenSetup() {

try {

System.out.println("User's current working directory: " + System.getProperty("user.dir"));

String sep = System.getProperty("file.separator");

in = new BufferedReader( new FileReader("lexer" + sep + "setup" + sep + "tokens"));

table = new PrintWriter(new FileOutputStream("lexer" + sep + "TokenType.java"));

symbols = new PrintWriter(new FileOutputStream("lexer" + sep + "Tokens.java"));

} catch (Exception e) {

System.out.println(e);

}

}

/**

* read next line which contains token information;

* each line will contain the token type used in lexical analysis and

* the printstring of the token: e.g.

*

*

*

Program program

Int int

BOOLean boolean

*/

public void getNextToken() throws IOException {

try {

StringTokenizer st = new StringTokenizer(in.readLine());

type = st.nextToken();

value = st.nextToken();

} catch (NoSuchElementException e) {

System.out.println("***tokens file does not have 2 strings per line***");

System.exit(1);

} catch (NullPointerException ne) {

// attempt to build new StringTokenizer when at end of file

throw new IOException("***End of File***");

}

tokenCount++;

}

/**

* initTokenClasses will create the 2 files

*/

public void initTokenClasses() {

table.println("package lexer;");

table.println(" ");

table.println("/**");

table.println(" * This file is automatically generated ");

table.println(" * it contains the table of mappings from token");

table.println(" * constants to their Symbols");

table.println("*/");

table.println("public class TokenType {");

table.println(" public static java.util.HashMap tokens = new java.util.HashMap();");

table.println(" public TokenType() {");

symbols.println("package lexer;");

symbols.println(" ");

symbols.println("/**");

symbols.println(" * This file is automatically generated ");

symbols.println(" * - it contains the enumberation of all of the tokens");

symbols.println("*/");

symbols.println("public enum Tokens {");

symbols.print(" BogusToken");

while (true) {

try {

getNextToken();

} catch (IOException e) {break;}

String symType = "Tokens." + type;

table.println(" tokens.put(" + symType +

", Symbol.symbol(\"" + value + "\"," + symType + "));");

if (tokenCount % 5 == 0) {

symbols.print(", "+ type);

} else {

symbols.print("," + type);

}

}

table.println(" }");

table.println("}");

table.close();

symbols.println(" }");

symbols.close();

try {

in.close();

} catch (Exception e) {}

}

}

Token.java:

package lexer;

/**

* The Token class records the information for a token: * 1. The Symbol that describes the characters in the token * 2. The starting column in the source file of the token and * 3. The ending column in the source file of the token * 

*/ public class Token { private int leftPosition,rightPosition; private Symbol symbol;

/** * Create a new Token based on the given Symbol * @param leftPosition is the source file column where the Token begins * @param rightPosition is the source file column where the Token ends */ public Token(int leftPosition, int rightPosition, Symbol sym) { this.leftPosition = leftPosition; this.rightPosition = rightPosition; this.symbol = sym; }

public Symbol getSymbol() { return symbol; }

public void print() { System.out.println(" " + symbol.toString() + " left: " + leftPosition + " right: " + rightPosition); return; }

public String toString() { return symbol.toString(); }

public int getLeftPosition() { return leftPosition; }

public int getRightPosition() { return rightPosition; }

/** * @return the integer that represents the kind of symbol we have which * is actually the type of token associated with the symbol */ public Tokens getKind() { return symbol.getKind(); } }

*More can be posted if needed*

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!