Question: In python with comments please, thanks! Our authorship attribution system will be able to perform authorship attribution based on a selection of sample documents from

In python with comments please, thanks!

Our authorship attribution system will be able to perform authorship attribution based on a selection of sample documents from a range of authors, and a document of unknown origin.

You will be given a selection of sample documents from a range of authors (from which we will learn our word frequency dictionaries), and a document of unknown origin. Given these, you need to return a list of authors in ascending order of out-of-place distance between the document of unknown origin and the combined set of documents from each of the authors. You should do this according to the following steps:

- compute a single dictionary of word frequencies for each author based on the combined set of documents from that author (provided in the form of a list of strings)

- compute a dictionary of word frequencies for the document of unknown origin

- compare the document of unknown origin with the combined works of each author, based on the out-of-place distance metric

- calculate and return a ranking of authors, from most similar (smallest distance) to least similar (greatest distance), resolving any ties in the ranking based on an alphabetic sort

Write a function authattr_authorpred(authordict, unknown, maxrank) that takes three arguments:

authordict: a dictionary of authors (each of which is a str), associated with a non-empty list of documents (each of which is a str)

unknown: a str contained the document of unknown origin

maxrank: the positive int value to set maxrank to in the call to authattr_oop

and returns a list of (author, oop) tuples, where author is the name of an author from authordict, and oop is the out-of-place distance between unknown and the combined works of author, in the form of a float.

For example:

>>> authattr_authorpred({'tim': ['One One was a racehorse; Two Two was one too', 'How much wood could a woodchuck chuck'], 'einstein': ['Unthinking respect for authority is the greatest enemy of truth.', 'Not everything that can be counted counts, and not everything that counts can be counted.']}, 'She sells sea shells on the seashore', 20) [('tim', 287.0), ('einstein', 290.0)] >>> authattr_authorpred({'Beatles': ['Hey Jude', 'The Fool on the Hill', "A Hard Day's Night", "Yesterday"], 'Rolling Stones': ["(I Can't Get No) Satisfation", 'Ruby Tuesday', 'Paint it Black']}, 'Eleanor Rigby', 15) [('Beatles', 129.0), ('Rolling Stones', 129.0)]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!