Algorithm Tokenize ( string s ) list ( string ) tokens while not isEmpty(s) If s begins with a token, remove the longest possible token from the beginning of s and push that token onto the back of tokens If head ( s ) is a whitespace character, pop ( s ) return tokens For example, below is a trace of the call tokenize( hello, world 3 1 ) it tokens s 0 hello, world 3 1 1 hello , world 3 1 2 hello , , world 3 1 3 hello , , world 3 1 4 hello , , , world 3 1 5 hello , , , world 3 1 6 hello , , , world , 3 1 7 hello , , , world , 3 , 1 8 hello , , , world , 3 , , 1 Homework Implement Tokenize in Python That is, write a Python function called tokenize such that if s is a tring for which Tokenize ( s ) is defined, then Tokenize (s) tokenize (s) Test cases Note the following tokenize( hello world ) hello , world tokenize( hello'world'foo ) hello , 'world' , foo hello , 'world ' , foo 'hello', ' 'world '', 'foo' tokenize( 'hello 'world' ) 'hello 'world' tokenize( 'hello world' ) 'hello world' tokenize( 3 33(33 ) not in test suite tokenize( ) tokenize( t ) tokenize( 'a'e'c' ) 'a' , e , 'c' tokenize( 3 3 1 ) 3 3 , , 1 tokenize( '' ) '' For example, you could enter into repl it or IDLE, tokenize( 'hello 'world' ) 'hello 'world' True Send your tokenizer as an attached py file EXAMPLE CODE The Python program below implements Tokenize In this exercise, we assume the input is correct (that is, that it consists of tokens, whitespace, and comments) This is an oversimplification of the problem of tokenizing real world text, but it is still not easy whiteChar(c) iff c is a whitespace character def whiteChar(c) return (c in t v ) tokenize(s) Tokenize(s), whenever Tokenize(s) is defined def tokenize(s) i 0 tokens invariants 0 'hello', 'world' tokenize( hello comment world ) 'hello', 'world' tokenize( hello you little 8 com ) 'hello', 'you', 'little', '8' tokenize( duh 8 comment hi3 ) 'duh', '8', 'hi3'

The Answer is in the image, click to view ...

Question: Algorithm Tokenize ( string s ): list ( string ) tokens := [ ] while not isEmpty(s) If s begins with a token, remove the

Algorithm Tokenize(string s):

list(string) tokens := [ ]

while not isEmpty(s)

If s begins with a token, remove the longest possible token from the beginning of s and push that token onto the back of tokens

If head(s) is a whitespace character, pop(s).

return tokens

For example, below is a trace of the call tokenize("hello, world 3+1")

# it	tokens	s
0	[]	"hello, world 3+1"
1	["hello"]	", world 3+1"
2	["hello",","]	" world 3+1"
3	["hello",","]	"world 3+1"
4	["hello",",","world"]	" 3+1"
5	["hello",",","world"]	"3+1"
6	["hello",",","world","3"]	"+1"
7	["hello",",","world","3","+"]	"1"
8	["hello",",","world","3","+","1"]	""

Homework:

Implement Tokenize in Python. That is, write a Python function called tokenize such that if s is a tring for which Tokenize(s) is defined, then Tokenize(s) == tokenize(s).

Test cases:

Note the following

tokenize("hello world")	["hello", "world"]
tokenize("hello'world'foo")	["hello", "'world'", "foo"] ["hello", "\'world\'", "foo"] ['hello', '\'world\'', 'foo']
tokenize("'hello\\'world'")	["'hello\\'world'"]
tokenize("'hello world'")	["'hello world'"]
tokenize("3.33(33..")	not in test suite
tokenize("\\")	["\\"]
tokenize("\t")	[ ]
tokenize("'a'e'c'")	["'a'", "e", "'c'"]
tokenize("3.3+1")	["3.3","+","1"]
tokenize("''")	["''"]

For example, you could enter into repl.it or IDLE,

>> tokenize("'hello\\'world'")== ["'hello\\'world'"]

>> True

Send your tokenizer as an attached .py file

EXAMPLE CODE

The Python program below implements Tokenize. In this exercise, we assume the input is correct (that is, that it consists of tokens, whitespace, and comments). This is an oversimplification of the problem of tokenizing real world text, but it is still not easy.

# whiteChar(c) iff c is a whitespace character.

def whiteChar(c): return (c in " \t\v")

# tokenize(s) = Tokenize(s), whenever Tokenize(s) is defined

def tokenize(s):

i = 0

tokens = [ ]

# invariants:0 <= i <= len(s), tokens = Tokenize(s[:i]),

while i < len(s):

if whiteChar(s[i]):

i = i+1

elif i < len(s)-1 and s[i:i+2] == "//":

# skip the comment until the next return or new line character

i = i+2

while i < len(s) and s[i] not in " ":

i = i+1

else:

# process the longest possible token

tok = munch(s,i)

tokens.append(tok)

i = i + len(tok)

return tokens

# If 0<= i < len(s) and s[i:] begins with a token, then munch(s,i)

# is longest token that is a prefix of s[i:]

def munch(s,i):

A,j = 'em',i

# invariants: i <= j <= len(s), A is the state that describes s[i:j]

while True:

if j == len(s): break # end of string

A = newState(A,s[j])

# A is now the state that *would* result if we process

# one more character.

if A == 'err': break

# A is not 'err', so good with one more character

j = j+1

return s[i:j]

'''

A *state* is a string . States *describe* strings as given below:

1. 'em' describes str iff str is empty

2. 'id' describes str iff str is an identifier

3. 'num' describes str iff is a numeral

4. 'err' describes str iff str is not a prefix of any token

'''

# If state A describes a string and c is a character, than newState(A,c)

# describes the string A+c.

def newState(A,c):

if A=='em':

if c.isalpha(): return 'id'

elif c.isdigit(): return 'num'

elif A=='id':

if (c.isalpha() or c.isdigit()): return 'id'

elif A == 'num':

if c.isdigit(): return 'num'

return 'err'

# END PROGRAM

Below is a transcript of a test session in repl.it with the program above:

Python 3.5.2 (default, Dec 2015, 13:05:11) [GCC 4.8.2] on linux tokenize("hello world") => ['hello', 'world']

tokenize("hello //comment world//") => ['hello', 'world']

tokenize("hello you little 8// com") => ['hello', 'you', 'little', '8']

tokenize("duh 8 // comment hi3") => ['duh', '8', 'hi3']

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

need help with the following program reach out if you need more information. . the trace I was given the classes I was given and need help coding operator operand neg operator multiplication main...

This is an assignment that needs to be completed in ocaml. Thank you in advance! I have posted the code necessary to complete it below. Just copy it into a .ml file and use a text editor to edit it....

Can you modify the main.java and infixexpression.java to use an ArrayQueue and LinkedQueue instead of the ArrayStack and LinkedStack? (Please don't change anything from the queue and stack classes....

Can you modify the program to use an ArrayQueue and LinkedQueue instead of the ArrayStack and LinkedStack ___________________________________________________________________________________ /** Main...

Need answers in Python. Q1. Define a function to analyze the frequency of words in a string Define a function named "tokenize" which does the following: has a string as an input splits the string...

Report. The report consists of four components. 1) Specify the important data structure(s) you will use in your scanner algorithm. The speci- fication should be independent of any specific...

My current code does not account for multiple white spaces and needs to reduce anytime there is more than one white space back to one. (Currently when reading a string with two white spaces in a row...

The main objectives of this lab include: set up a 2d array (or matrix) with proper initial values using vector of vectors given a string, implement a tokenizer to identify all the unique tokens...

Make sure you use the specified struct name and function prototypes, as they will be referred to as such in the unit tests. Please feel free to introduce additional subroutines (i.e., functions) to...

(NOTE: THE PROGRAM SHOULD BE WRITTEN IN C++) Important, please read! Make sure you use the specified struct name and function prototypes, as they will be referred to as such in the unit tests. Please...

Best Garments has two producing departments, cloth and garments, and two service departments, personnel and store. The company's budget for April 20X1 is as follows: (Click the icon to view the...

Provide three reasons for allocating joint costs to individual products or services?

For 2 0 2 2 , the unified credit is equivalent to a statutory exemption of Question 2 3 options: A ) $ 1 2 , 0 6 0 , 0 0 0 . B ) $ 1 , 5 0 0 , 0 0 0 . C ) $ 1 , 0 0 0 , 0 0 0 . D ) $ 7 8 0 , 8 0 0 .

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

l. Discuss in one page the strengths and weaknesses of the arguments in terms of the major claims being made as well as the grounds and warrants the authors provide to support those claims.

3. Offer incentives so that workers behave in accordance with the principles of the science that has been developed.

LO3 Name the seven categories of HR functions.