Question: Lab: Lexical Analysis - Tokenizer use C++ or Python Objective In this project we build a Lexical analyzer that parses Jack programs according to the

Lab: Lexical Analysis - Tokenizer use C++ or Python Objective In this project we build a Lexical analyzer that parses Jack programs according to the Jack grammar, producing an XML file that renders the program's structure using marked-up text. Write a Lexical analyzer for the Jack language. Use it to parse all the .jack class files supplied below. For each input .jack file, your analyzer should generate an .xml output file. The generated files should be identical to the supplied compare-files, up to white space. Resources tools: the programming language with which you will implement your Lexical analyzer, and ANY Text-Comparer utility. You may also want to inspect the generated and supplied output files visually, using some XML viewer. To do so, simply load these files into some web browser or text editor. Some of these tools, e.g. Chrome, are designed to display XML text nicely - give it a try. Proposed Implementation - Tokenizer Tokenizing, a basic service of any syntax alayzer, is the act of breaking a given textual input into a stream of tokens. And while it is at it, the tokenizer can also classify the tokens into lexical categories. With that in mind, your first task it to implement, and test, the JackTokenizer module specified in chapter 10. Specifically, you have to develop (i) a Tokenizer implementation, and (ii) a test program that goes through a given input file (.jack file) and produces a stream of tokens using your Tokenizer implementation. Each token should be printed in a separate line, along with its classification: symbol, keyword, identifier, integer constant or string constant. Below is an example. Note that in the case of string constants, the tokenizer throws away the double-quote characters. This behavior is intended, and is part of our tokenizer specification. Also note that four of the symbols used in the Jack language (<, >, ", and &) are also used for XML markup, and thus they cannot appear verbatim as XML data. To solve the problem, and following convention, we require the tokenizer to output these tokens as <, >, ", and &, respectively. For example, in order for the symbol "less than" to be displayed properly in a web browser, it should be generated as "<". Tokenizer Testing Test your tokenizer on the Square Dance and the TestArray programs. For each Xxx.jack source file, have your tokenizer test program give the output file the name XxxT.xml. Apply your tokenizer test to each class file in the test programs, then use the supplied TextComparer utility to compare the generated output to the supplied .xml compare files. Since the output files generated by your tokenizer test will have the same names and extensions as those of the supplied compare files, we suggest putting them in separate directories.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Can anyone explain how to do a tokenizer for jack YOU ARE REQUIRED TO ONLY WRITE THE "TOKENIZER"; "PARSER" IS NOTREQUIRED AS PART OF THIS LAB Objective: Build a compiler for Jack - - a modern,...

Objective: Build a compiler for Jack -- a modern, object-based, Java-like language. The compiler construction spans two projects: 10 (Syntax Analysis) and 11 (Code Generation). In this project we...

IF some culd really help me with this i'd really appreciate it it's due today at 11 pm and i've tried everything but it's not working. Objective: Build a compiler for Jack -- a modern, object-based,...

CAN SOMEONE PLEASE HELP ME WITH THIS!!!! Build a compiler for Jack -- a modern, object-based, Java-like language. The compiler construction spans two projects: 10 (Syntax Analysis) and 11 (Code...

This question concerns lexical grammars. (a) Tree Adjoining Grammars contain two types of elementary tree. (i) What are these trees called? [1 mark] (ii) If one were building a grammar for English...

Lexical analysis is the process of converting a sequence of characters (such as a string) into a sequence of tokens (smaller strings, substrings, that have an identified "meaning"). A program that...

Programming Part: Building a Lexical Analysis Program. This is divided into two parts: Building a tokenizer and a lexer. A tokenizer accepts an i/p stream (in this case, a file) and separates the...

CSE_420_Lab_2.pdf + File C:/Users/User/Downloads/CSE_420_Lab_2.pdf : 1 of 4 Q IL Page view A Read aloud y Draw Highlight Erase A A BL Lab 02 Introduction I. Topic Overview: The lab is designed to...

Practical class for Lab 1 0 _ A ( VPL ) Students should do this lab during the 2 hours practical class Write a Java program to process and display the following stages of compiler: Stage 1 : Lexical...

Your boss, B brown is looking at the future prospects of the xxx company as a possible investment. Joe Rivera, CFO of Company xxx, recently specified that the companys weighted average cost of...

Determine the resultant couple moment. 60 Ib -2 in. -2 in. 100 lb 30 4 in. 100 lb 3 in. 60 lb

What's the bankruptcy - risk categorization of a firm with a 2 . 9 9 Z - score? Question 8 options: 1 ) High 2 ) Low 3 ) Mediocre 4 ) Very high

In electro - discharge machining, tool is made of _ _ a brass b copper c copper tungsten alloy d all of these

=+j Describe the various support services delivered by IHR.

=+j Describe the problems associated with implementing an effective global HR information system.

=+j Explain IHRMs role in global HR research.