Question: Define build_ds(file) in python that reads the file and builds the datastructure which is needed for further processing. The datastructure returned should be a 2D

Define build_ds(file) in python that reads the file and builds the datastructure which is needed for further processing. The datastructure returned should be a 2D matrix (list of lists), where each sentence of the text is a row of the matrix, and each word of the line - an element of the row. Any text that is not composed of letters-only (e.g. numbers, punctuations marks) should be excluded (use the function with re module which takes a string and returns a boolean - whether it consists of letters only (both lower and upper case).). The only sentence border is the newline.

sample.txt:

human language is filled with ambiguities .

usage exceptions , variations in sentence structurethese just a few of the irregularities of human language that take humans years to learn .

but that programmers must teach natural language-driven applications to recognize and understand accurately from the start, if those applications are going to be useful.

The output should like this:

ds = build_ds (" sample .txt ")

ds [:2]

Out [1]: [[human , language , is , filled , with , ambiguities ], [ usage , exceptions , variations , in , sentence , 'just ', 'a ', ... , to ' , learn ]]

#Task

def build_ds(file): """ Read a file. Build a datastructure - 2D matrix (list of lists). Rows should be sentences, cells should contain the words. :param file: a txt file where the text is all lowercase and sentence border punctuation marks have spaces around them :rtype: 2D list """

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!