Question: The first project involves writing the lexical analyzer with lexical error checking, and the compilation listing generator for the compiler. The specification for the lexical

The first project involves writing the lexical analyzer with lexical error checking, and the compilation listing generator for the compiler. The specification for the lexical structure of the language is the following:

1. Comments begin with -- and end with the end of the line.

2. White space between tokens is permitted but not required.

3. Identifiers must begin with a letter, followed by letters, digits or underscores. Consecutive underscores and trailing underscores are not permitted

4. Integer literals consist of a sequence of digits preceded by an optional sign.

5. Real literals consist of a sequence of digits containing a decimal point. At least one digit must be before the decimal point.

6. Boolean literals are true and false.

7. The logical operators are not, and and or. Each logical operator should be a separate token.

8. The relational operators are =, /=, >, >=, <, and <=. All six lexemes should be represented by a single token.

9. The adding operators are the binary + and -. Both lexemes should be represented by a single token.

10. The multiplying operators are *, / and rem. The first two lexemes should be represented by a single token.

11. The exponentiation operator is **.

12. The following punctuation symbols should be accepted: commas, colons, semicolons, and parentheses. In addition the two character arrow symbol => should be a punctuation token.

13. The following are reserved words: begin, boolean, case, else, end, endcase, endif, function, if, is, integer, real, returns, then, when The lexical analyzer should be created using flex. The compiler should produce a listing of the program with lexical error messages included after the line in which they occur. Any character that cannot start any token should be considered a lexical error. An example of compilation listing output is shown below: 1 -- Program with two lexical errors 2 3 function main a: integer returns integer; 4 b: integer is a * 2; 5 begin 6 if a <= 0 then 7 b + b; 8 else 9 b ^ b $ Lexical Error, Invalid Character ^ Lexical Error, Invalid Character $ 10 endif; 11 end; Lexical Errors 2 It should also generate a file containing the lexeme-token pairs as a means to verify that the lexical analyzer is working correctly. Only token numbers are required, not token names. The token numbers for the punctuation symbols should be the ASCII value of the character. The remaining tokens should be numbered sequentially beginning at 256. You are to submit two files. 1. The first is a .zip file that contains all the source code for the project. The .zip file should contain the flex input file, which should be a .l file, all .cc and .h files and a makefile that builds the project. 2. The second is a Word document (PDF or RTF is also acceptable) that contains the documentation for the project, which should include the following: a. A discussion of how you approached the project b. A test plan that includes test cases that you have created indicating what aspects of the program each one is testing c. A discussion of lessons learned from the project and any improvements that could be made

HINTS

Programming project 1 involves writing the lexical analyzer for the compiler. It requires that you provide an input file for flex that generates the lexical analyzer. The main method can be included in that file. I recommend that you provide a separate C++ file for the code that generates the compilation listing. I suggest also providing a file named tokens.h, which contains the token definitions as an enumerated type. In project 2, this file will be automatically generated by bison. You must also include a makefile that builds your program. To help you with this project, provided below is a skeleton of the flex input, a skeleton tokens.h file and a possible makefile. None of these are requirements, just some help getting you started if you need it: Here is the skeleton of the flex input file. I recommend naming it scanner.l. Explanatory comments are in red. %{ #include Needed because in main you need to output a file, you can include iostream if you prefer to use C++ file I/O #include "tokens.h" You need to write this header file for project 1, in project 2, bison will generate it Any other include files needed go here %} %option noyywrap Without this option, a function called yywrap will be called In this section you may want to define some of the more complicated tokens, here's one example ws [ \t ]+ %% {ws} { ECHO; } The ECHO macro echoes the lexeme, it should be done for all tokens {line} { ECHO; Listing::nextLine();} Calling the nextLine method allows any error messages on that line to be displayed and numbers the line "<" { ECHO; return(RELOP); } The token should be returned All other operators need to be included begin { ECHO; return(BEGIN_); } BEGIN is a macro so the underscore is needed to name this token All other keywords need to be included Identifier, literals and punctuation tokens need to be included here {punc} { ECHO; return yytext[0]; } The token numbers of punctuation characters should be their ASCII values . { ECHO; Listing::appendError(LEXICAL, yytext); } All other characters are lexical errors %% int main() { The main method should contain a loop repeatedly calling yylex until end of file. The token, lexeme pairs should be written to an output file } Here is a skeleton of the tokens.h file that should be included by the flex input file: enum Tokens {RELOP = 256, ADDOP, ... all the remaining tokens By setting the first token to 256, 0-255 are reserved for singlecharacter tokens In project 2, bison will automatically generate the token file. It will begin numbering tokens at 256. The numbers 0-255 are reserved for single character tokens, whose token numbers will be their ASCII values. Although punctuation characters can be given token names, it is simplest to just allow return their token numbers to be their ASCII values. What that means is that action for punctuation characters should be to return yytext[0]. Here is sample makefile that assumes that you have also defined a class Listing that contains the code to generate the compilation listing: compile: scanner.o listing.o g++ -o compile scanner.o listing.o scanner.o: scanner.c listing.h tokens.h g++ -c scanner.c scanner.c: scanner.l flex scanner.l mv lex.yy.c scanner.c listing.o: listing.cc listing.h g++ -c listing.cc

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!