Question: Introduction In this project, we will begin our lexer. Our lexer will start by reading the strings of the BASIC file that the user wants
Introduction
In this project, we will begin our lexer. Our lexer will start by reading the strings of the BASIC file that the user wants to run. It will break the BASIC code up into words or tokens and build a collection of these tokens. We can consider the lexer complete when it can take any BASIC file and output a list of the tokens being generated.
We will not be using the Scanner class that you may be familiar with for reading from a file; we are, instead, using Files.readAllBytes. This is a much simpler way of dealing with files.
Example of readAllLines:
Path myPath Paths.getsomeFilebasic;
String content new StringFilesreadAllBytes myPath;
A second concept you may not be familiar with is enum Enum, short for enumeration. This is a Java language construct that lets us create a variable that may be any one of a list of things. We will use this to define the types of tokens a tokenType.
enum colorType RED, GREEN, BLUE
colorType myFavorite colorType.BLUE;
System.out.printlnmyFavorite; prints BLUE
The general way to think about a lexer is the same way that you learned to read. Remember following the line of letters with your finger? We will do the same thing we will create a class I called mine CodeHandler This class will manage the incoming stream of letters, allowing the lexer to peek ahead in the stream, get a character move the finger one forward and tell us if we are at the end of the document.
With the CodeHandler written, we will move on to the lexer itself. For now, we will only deal with a few types of tokens words, numbers and new line. Of course, just reading through the document doesnt do anything we will make tokens objects that hold the type using an enum and a string of the value. For example:
WORDhello
Details
This assignment must have four different source code files.
Basic.Java
Basic.java must contain main. Your main must ensure that there is one and only one argument args If there are none or more than it must print an appropriate error message and exit. That one argument will be considered as a filename. You will pass the filename to an instance of Lexer which we will build below You will then call lex on your lexer. It will return a linked list of Token. Print each token out. This is just debugging output the format isnt super important so long as it makes sense to a programmer.
CodeHandler
The code handler class should have a private string to hold the document BASIC file and a private integer index the finger position It should have methods:
char Peekilooks i characters ahead and returns that character; doesnt move the index
String PeekStringi returns a string of the next i characters but doesnt move the index
char GetChar returns the next character and moves the index
void Swallowi moves the index ahead i positions
boolean IsDone returns true if we are at the end of the document
String Remainder returns the rest of the document as a string
Some of these are used in next assignment. Note that there is no accessor for the string array. You must use the methods.
The constructor will take a filename. Basic will call Lexer which will call CodeHandler. Use ReadAllBytes to read from the file and populate the private internal string.
Token
Create the Token class. It needs an enum of TokenType values: WORD, NUMBER, ENDOFLINE and a string to hold the value of the token for example, hello and goodbye are both WORD, but with different values. The token will also hold the line number and character position of the start of the token. Create two constructors one for TokenType, line number and position and one that also has a value; some tokens dont have a value because it is doesnt matter new line Make sure to add a ToString method. Your exact format isnt critical; I output the token type and the value in parentheses if it is set.
Lexer
The final file must be called Lexer.java. The Lexer class must contain a lex method that accepts a single string the filename and returns a linked list of Tokens. Your lexer must keep track of the line number and character position within the line. This will help you generate good error messages.
Lex is the method that will break the data from CodeHandler into a linked list of tokens. While there is still data in CodeHandler, we want to peek at the next character to get an idea what to do with it
If the character is a space or tab, we will just move past it increment position
If the character is a linefeed
we will create a new EndOfLine token with no value and add it to token list. We should also increment the line number and set line position to
If the character is a carriage return r we will ignore it
If the character is a letter, we need to call ProcessWord see below and add the result to our list of tokens.
If the character is a digit, we need to call Proces
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
