Question: In python with comments please, thanks! Our authorship attribution system will be able to perform authorship attribution based on a selection of sample documents from

In python with comments please, thanks!

Our authorship attribution system will be able to perform authorship attribution based on a selection of sample documents from a range of authors, and a document of unknown origin.

You will be given a selection of sample documents from a range of authors (from which we will learn our word frequency dictionaries), and a document of unknown origin. Given these, you need to return a list of authors in ascending order of out-of-place distance between the document of unknown origin and the combined set of documents from each of the authors. You should do this according to the following steps:

- compute a single dictionary of word frequencies for each author based on the combined set of documents from that author (provided in the form of a list of strings)

- compute a dictionary of word frequencies for the document of unknown origin

- compare the document of unknown origin with the combined works of each author, based on the out-of-place distance metric

- calculate and return a ranking of authors, from most similar (smallest distance) to least similar (greatest distance), resolving any ties in the ranking based on an alphabetic sort

Write a function authattr_authorpred(authordict, unknown, maxrank) that takes three arguments:

authordict: a dictionary of authors (each of which is a str), associated with a non-empty list of documents (each of which is a str)

unknown: a str contained the document of unknown origin

maxrank: the positive int value to set maxrank to in the call to authattr_oop

and returns a list of (author, oop) tuples, where author is the name of an author from authordict, and oop is the out-of-place distance between unknown and the combined works of author, in the form of a float.

For example:

>>> authattr_authorpred({'tim': ['One One was a racehorse; Two Two was one too', 'How much wood could a woodchuck chuck'], 'einstein': ['Unthinking respect for authority is the greatest enemy of truth.', 'Not everything that can be counted counts, and not everything that counts can be counted.']}, 'She sells sea shells on the seashore', 20) [('tim', 287.0), ('einstein', 290.0)] >>> authattr_authorpred({'Beatles': ['Hey Jude', 'The Fool on the Hill', "A Hard Day's Night", "Yesterday"], 'Rolling Stones': ["(I Can't Get No) Satisfation", 'Ruby Tuesday', 'Paint it Black']}, 'Eleanor Rigby', 15) [('Beatles', 129.0), ('Rolling Stones', 129.0)]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Case Study - Boutique Build Australia Boutique Build Australia Pty Ltd is a boutique building company based in Sydney that specialises in the design and build of high quality designer homes for the...

Benchmarking: An International Journal Impact of e-procurement on procurement practices and performance Gioconda Quesada Marvin E. Gonzlez James Mueller Rene Mueller Downloaded by UNIVERSITY OF...

Business memorelating to Audit. All questions and materials needed are attached. Please, ensure you are familiar with auditing before accepting this project. Thank you. THE ACCOUNTING REVIEW Vol. 90,...

Question: How does the unit demonstrate/guide ways in which bilingual students and teachers approach the translanguage develop language, make room for bilingual ways of knowing, and foster more...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Question: What action steps are taken to ensure that translingualism unit designs and assessments promote content to build on their bilingualism, promote stronger social-emotional identity, and work...

Asia Pacific Educ. Rev. (2012) 13:219-226 DOI 10.1007/s12564-011-9190-9 Performance and duration differences between online and paper-pencil tests Alper Bayazit Petek As kar Received: 10 September...

Assignment USE THE HARVARD REFERENCING SYSTEM THROUGHOUT THE REPORT Problem 1 (25 pts) The cost of 14 models of digitals cameras at a camera specialty store during 2014 was as follows: 340 450 450...

Please help!! Use "You Can Lead a Horse to Water but You Can't Make Him Edit: Varied Effects of Feedback on Grammar Across Upper-Division Business Students" (Located Below the Page) Found directly...

Some businesses try to overcome the agency problem referred to in the chapter by using an incentive pay plan that is based on the growth of profits over a period. What are the drawbacks of this type...

What causes Multiple Sclerosis?

Which of the following is not one of the pitfalls of a low - cort provider strategh? Failing to emphaske averues of coot advantage that can be kept propietary or that are uery coully and / or time -...

Patel Service Co. does make a few sales on account but is mostly a cash business. Consequently, it uses the direct write-off method to account for uncollectible accounts. During Year 1, Patel Service...

13. What are the major forms of household income? Contrast the wage and salary share to the profit share in terms of relative size. Distinguish between a durable consumer good and a nondurable...

14. What are the three major legal forms of business enterprises? Which form is the most prevalent in terms of numbers? Which form is dominant in terms of total sales revenues? LO5

2 What supply is and what affects it.