One interesting application of statistics is in trying to identify who wrote important historical works that were published using pseudonyms. A classic paper on this topic is “ On Sentence-Length as a Statistical Characteristic of Style in Prose: With Application to Two Cases of Disputed Authorship” by G. Udny Yule, published in the journal Biometrika (Vol. 30, No. 3/ 4 Jan., 1939, pp. 363– 390). In that paper, Yule identified the length of sentences as a feature that tends to remain consistent across written works by the same author. For this project, you are going to figure out how to estimate the distribution of sentence lengths for a book of your choosing. Find a book that is mainly text and relatively uncluttered with pictures, etc. Choose a random sample of 20 sentences from throughout the book and count how many words they have in them. Use two different sampling methods, chosen from simple random sampling, stratified sampling, cluster sampling, or systematic sampling. In each case, make a table showing how many sentences there were of each length (two words, three words, etc.).
a. Explain exactly how you chose your samples.
b. Explain which of your two methods was easier to use.
c. Of the methods you did not use, explain which of them would have been the most difficult to use.
d. Do you think either of your methods produced biased results? Explain.
e. Report your results for each of the two methods. Are they generally in agreement with each other?
f. Do you think that average sentence length would be as good an indicator of authorship as listing all of the different sentence lengths and the proportion of time each one occurred? Explain.
