The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields for downstream analysis. You've been asked to write a function tokenise that does the following: 1. Accepts two arguments, raw_text and window_size, and returns a list of tuples, where the tuple contains the sub-word tokens for each word in raw_text 2. Splits raw_text into individual words on spaces and punctuation (except apostrophes) 3. Appends angle brackets to either end of a word to delimit the start and end of the word (e.g. <word>) 4. Creates sub-word tokens of window_size length. If a word (including the angle brackets) is shorter than window_size, then no sub-word tokens other than the special token (step 5) should be generated. 5. Appends a single, special token at the end of the sub-word token list containing the entire word with angle brackets NB. The implementation and partial solutions will be assessed too. Please don't panic if you can't pass all of the test cases! Example ### Running your function >>> tokenise (raw_text="hello, world", window_size=3) ### Returns ### NOTE: the output below has been formatted for readability. ### Your function just needs to output the list of tuples [ ] ('<he', 'hel', 'ell', 'llo', 'lo>', '<hello>'), ('<wo', 'wor', 'orl', 'rld', 'ld>', '<world>') # Complete the 'tokenise' function below. # # The type signatures have been completed for you # You may use helper functions to modularise your code def tokenise (raw_text: str, window_size: int) -> List [Tuple [str]]: # Write your code here name__ == ' __main__':- > if 11 The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields for downstream analysis. You've been asked to write a function tokenise that does the following: 1. Accepts two arguments, raw_text and window_size, and returns a list of tuples, where the tuple contains the sub-word tokens for each word in raw_text 2. Splits raw_text into individual words on spaces and punctuation (except apostrophes) 3. Appends angle brackets to either end of a word to delimit the start and end of the word (e.g. <word>) 4. Creates sub-word tokens of window_size length. If a word (including the angle brackets) is shorter than window_size, then no sub-word tokens other than the special token (step 5) should be generated. 5. Appends a single, special token at the end of the sub-word token list containing the entire word with angle brackets NB. The implementation and partial solutions will be assessed too. Please don't panic if you can't pass all of the test cases! Example ### Running your function >>> tokenise (raw_text="hello, world", window_size=3) ### Returns ### NOTE: the output below has been formatted for readability. ### Your function just needs to output the list of tuples [ ] ('<he', 'hel', 'ell', 'llo', 'lo>', '<hello>'), ('<wo', 'wor', 'orl', 'rld', 'ld>', '<world>') # Complete the 'tokenise' function below. # # The type signatures have been completed for you # You may use helper functions to modularise your code def tokenise (raw_text: str, window_size: int) -> List [Tuple [str]]: # Write your code here name__ == ' __main__':- > if 11
Expert Answer:
Answer rating: 100% (QA)
To create the tokenise function based on the requirements youve pro... View the full answer
Related Book For
Smith and Roberson Business Law
ISBN: 978-0538473637
15th Edition
Authors: Richard A. Mann, Barry S. Roberts
Posted Date:
Students also viewed these programming questions
-
For Hy-Vee Inc. identify one strategic goal and trace that goal through the tactical and operational levels of planning with one specific example at each level. Specify the appropriate person in the...
-
You are an independent consultant, providing strategy, execution, and project management services to several clients, across a variety of different project types. Your skills are valued by these...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Randi Corp. is considering the replacement of some machinery that has zero book value and a current market value of $3,700. One possible alternative is to invest in new machinery that costs $30,900....
-
How can gross interest income rise, while the net interest margin remains somewhat stable for a particular bank?
-
Calculate [H 3 O + ] in a solution that is (a) 0.035 M HCl and 0.075 M HOCl; (b) 0.100 M NaNO 2 and 0.0550 M HNO 2 ; (c) 0.0525 M HCl and 0.0768 M NaCH 3 COO.
-
Use the Method of Variation of Parameters to determine the general solution for the following problems. a. \(y^{\prime \prime}+y=\tan x\). b. \(y^{\prime \prime}-4 y^{\prime}+4 y=6 x e^{2 x}\).
-
Anglers Dream Company supplies flies and fishing gear to sporting goods stores and outfitters throughout the western United States. The accounts receivable clerk for Anglers Dream prepared the...
-
Hello. I cannot seem to get this answer correct and all the tutors cannot explain it either. Can someone possibly help me? I am down to my last try.. Effective Cost of Trade Credit The DJ. Masson...
-
Cost 100,000 Similar projects, Eta and Zeta, are being considered using the payback method. Each has an initial cost of $100,000. Annual cash flows for each project are provided in the table at the...
-
7. Returned air at 22C DB and 50% RH is mixed with equal flow rate of outdoor air at 30C DB and 26C WB. The mixed air is cooled by passing over a cooling coil to achieve an absolute humidity of 0.008...
-
For flow over a flat plate with an extremely rough surface, convection heat transfer effects are known to be correlated by the expression \[N u_{x}=0.04 \operatorname{Re}_{x}^{0.9}...
-
Why is consumption so much more stable over the business cycle than investment? In formulating your answer, discuss household behavior as well as business behavior.
-
Express Mail offers overnight delivery to customers. It is attempting to come to some conclusion on whether to expand its facilities. Currently its fixed costs are \($2\) million per month, and its...
-
Use an aggregate demand and aggregate supply diagram to illustrate and explain how each of the following will affect the equilibrium price level and real GDP: a. Consumers expect a recession. b....
-
How can a larger government fiscal deficit cause a larger international trade deficit?
-
As a SCM marketing consultant, you are confronted with various issues of logistics and movement of goods. Please list and discuss five options to resolve these issues.
-
we have to compute the letter grades for a course. The data is a collection of student records stored in a file. Each record consists of a name(up to 20 characters), ID (8 characters), the scores of...
-
During the years prior to the passage of the Civil Rights Act of 1964, Duke Power openly discriminated against African Americans by allowing them to work only in the labor department of the plants...
-
This is a stocklist case arising under 220(b) of our [Delaware] General Corporation Law. The issue is whether a shareholder states a proper purpose for inspection under our statute in seeking to...
-
Civil Code 1719, subdivision (a) provides in part that any person who draws a check that is dishonored due to insufficient funds shall be liable to the payee for the amount owing upon the check and...
-
Tom Robinson is thinking about buying a portable computer. He has a computer at home, but the portable computer would allow him to work during his frequent business trips. Tom is trying to convince...
-
Managers at Ace Manufacturing are considering upgrading some production equipment. They are considering the following factors: Required: For each item listed, indicate whether it is quantitative (A)...
-
Jean Parks is a salesperson for Quality Food Products, Inc. She is considering a 250-mile trip to visit a potential customer, ByLots. Following are factors she is pondering. Required: For each item...
Study smarter with the SolutionInn App