Question: In python with comments please, thanks! Our authorship attribution system will be able to perform authorship attribution based on a selection of sample documents from
In python with comments please, thanks!
Our authorship attribution system will be able to perform authorship attribution based on a selection of sample documents from a range of authors, and a document of unknown origin.
You will be given a selection of sample documents from a range of authors (from which we will learn our word frequency dictionaries), and a document of unknown origin. Given these, you need to return a list of authors in ascending order of out-of-place distance between the document of unknown origin and the combined set of documents from each of the authors. You should do this according to the following steps:
- compute a single dictionary of word frequencies for each author based on the combined set of documents from that author (provided in the form of a list of strings)
- compute a dictionary of word frequencies for the document of unknown origin
- compare the document of unknown origin with the combined works of each author, based on the out-of-place distance metric
- calculate and return a ranking of authors, from most similar (smallest distance) to least similar (greatest distance), resolving any ties in the ranking based on an alphabetic sort
Write a function authattr_authorpred(authordict, unknown, maxrank) that takes three arguments:
authordict: a dictionary of authors (each of which is a str), associated with a non-empty list of documents (each of which is a str)
unknown: a str contained the document of unknown origin
maxrank: the positive int value to set maxrank to in the call to authattr_oop
and returns a list of (author, oop) tuples, where author is the name of an author from authordict, and oop is the out-of-place distance between unknown and the combined works of author, in the form of a float.
For example:
>>> authattr_authorpred({'tim': ['One One was a racehorse; Two Two was one too', 'How much wood could a woodchuck chuck'], 'einstein': ['Unthinking respect for authority is the greatest enemy of truth.', 'Not everything that can be counted counts, and not everything that counts can be counted.']}, 'She sells sea shells on the seashore', 20) [('tim', 287.0), ('einstein', 290.0)] >>> authattr_authorpred({'Beatles': ['Hey Jude', 'The Fool on the Hill', "A Hard Day's Night", "Yesterday"], 'Rolling Stones': ["(I Can't Get No) Satisfation", 'Ruby Tuesday', 'Paint it Black']}, 'Eleanor Rigby', 15) [('Beatles', 129.0), ('Rolling Stones', 129.0)]
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
