Question: We want to build a tokenizer for simple expressions such as xpr = res = 3 + x_sum*11. Such expressions comprise only three tokens, as

We want to build a tokenizer for simple expressions such as xpr = "res = 3 + x_sum*11". Such expressions comprise only three tokens, as follows: (1) Integer literals:one or more digits e.g.,3 11; (2) Identifiers:strings starting with a letter or an underscore and followed by more letters, digits, or underscores e.g.,res x_sum; (3) Operators:= + * . Leading or trailing whitespace characters should be skipped.

(a) Write a regular expression patternfor the above grammar and use it with re.findallto split the expression into a list of lexemes. If usingxprabove, the list returned should be:

['res', '=', '3', '+', 'x_sum', '*', '11']

(b) The problem with the above is thatre.findall returns a list of all lexemes butnottheir respective tokens (as defined by the grammar). Modify your regex pattern so that each matched lexeme is in its own (regex) group. The list returned should now be as follows:

[('', 'res', ''), ('', '', '='), ('3', '', ''), ('', '', '+'),

('', 'x_sum', ''), ('', '', '*'), ('11', '', '')]

(c) To find which token each lexeme is associated with, we only need to find the first non-empty item in each tuple. Write a tokenize generator(usingre.findallandmap) that returns all pairs (tuples) of lexemes and tokens. The output oflist(tokenize(xpr))should thus be:

[('res', 'id'), ('=', 'op'), ('3', 'int'), ('+', 'op'),

('x_sum', 'id'), ('*', 'op'), ('11', 'int')]

(d) The above solution works fine if the number of tokens is small, but it will break down when the number increases. Better is to use a feature of the regular expression engine, which is that whenever it completes a matching group, it assigns the group number to an attribute of the match object, calledlastindex. Rewritetokenize to make use of this feature (usingre.matchand repeatedly scanning the match object) and still produce the same output as before.

(e) We can improve the approach further, using another feature which is the scanner()method of regular expressions: It creates a scanner object and attaches it to a string, keeps track of the current position, and moves forward after each successful match. Rewrite the tokenize generator to make use of this feature (usingscanner()and calling match()repeatedly, yielding lexeme and token pairs) and again produce the same output as before.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Using an Excel spreadsheet, show the calculation of your gain or loss for each security for the week. Make sure you show a dollar return calculation and a percentage return calculation. Be especially...

Python programming; pleasee help!! We want to build a tokenizer for simple expressions such as xpr = "res = 3 + x_sum*11". Such expressions comprise only three tokens, as follows: (1) Integer...

We want to build a tokenizer for simple expressions such as xpr = "res = 3 + x_sum*11". Such expressions comprise only three tokens, as follows: (1) Integer literals: one or more digits e.g., 3 11;...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

A very high level introduction on Meredith, because I know she is going to go into a lot more detail. So, Meredith Best and I, we actually met during our MBA program. We were randomly assigned in a...

Digital twins are a combination of data analysis, CAD modelling, Finite Element Analysis, and IoT (Internet of Things) technologies. This technologies are all elements of the 4th Industrial...

doesn't matter how it's solved. To enter or not to enter trades and hope t5 buld on that knctitdee and ins buiness degree. You want to go into the Here are the facts and fiamne (rhinh pod are to...

Coding Language c++ Huffman Encoding Using the Huffman encoding algorithm as explained in class, encode and decode the Speech.txt file using frequency tree and priority queue. Implement Huffman style...

Playgrounds and Performance: Results Management at KaBOOM! (A) We do this work because we want to make a difference in the world; how can we go further faster? - Darell Hammond, CEO and co-founder,...

Good communication is just as stimulating as black coffee and just as hard to sleep after. - Anne Morrow Lindbergh In May 2021, David Black, CEO of Blackbox, ended his Zoom call with a sense of...

What is required from the management team? What are the risks of the venture and what can you do to reduce them? How to Start a Car Rental Business The car rental business can be lucrative, but you...

The negative income tax has been proposed as a means of increasing both the efficiency and the equity of Canada's tax system (see Applying Economic Concepts 18-1 on page 465). The most basic NIT can...

In 1999, American Golf Corporation (AGC) hired Heye for a job in the pro shop at the Paradise Hills Golf Course, a club that it managed. AGC gave Heye a number of documents, including the Co-Worker...

the cash - to - cash cycle can be calculated from a firm's financial statements. What is the interpretation of the number of days of the resulting calculation

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

c. How will members be allowed to rotate out of the team if they so desire?

Do you currently have a team agreement?

How will the members be held accountable?