Question: Write a function score_document(document,lang_counts=default_lang_counts) which takes as input a document name as a string and a dictionary of dictionaries containing normalised language counts called lang_counts.

Write a function score_document(document,lang_counts=default_lang_counts) which takes as input a document name as a string and a dictionary of dictionaries containing normalised language counts called lang_counts. It should return a dictionary of scores for each language in lang_counts, as obtained by performing a 'dot product' of trigram counts from the document with the normalised language counts. That is, it should multiply the trigram counts from the document with the trigram counts in lang_counts and add the whole lot up. If a trigram from the document is not in the dictionary for a given language, assume the count for the language as zero.

We have provided a stub of code which trains the classifier for you. We have also provided train_classifier(training_set) in a hidden library.

There are also two files included, visible in the tabs at top right. These are en_163083.txt, written in English, and de_1231811.txt, written in German, and can be loaded and used to test your function, which should behave as follows:

>>> test1 = 'en_163083.txt' 
>>> d = score_document(test1) 
>>> d['Vietnamese'] 
9.427325768357315 
>>> max([(v, n) for (n, v) in d.items()]) 
(21.428216914833023, 'English') 
>>> test2 = 'de_1231811.txt' 
>>> d = score_document(test2) 
>>> d['Polish'] 
7.710346556417009 
>>> max([(v,n) for (n, v) in d.items()]) 
(53.12937809633241, 'German') 

How to code this in python??

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!