Question: 1. Summarizing categorical data - Frequency distributions A corpus is a technical term for a collection of texts used to analyze a language and verify

 1. Summarizing categorical data - Frequency distributions A corpus is a

1. Summarizing categorical data - Frequency distributions A corpus is a technical term for a collection of texts used to analyze a language and verify its linguistic properties. The first modern, computer-readable corpus was the Brown Corpus of Standard American English, compiled by Henry Kucera and W. Nelson Francis of Brown University. The Brown Corpus draws from American English texts printed in 1961 and was for many years a widely cited resource in computational linguistics. The five most frequently occurring words in the Brown Corpus are the, of, and, to, and a. Consider a data set consisting of all occurrences of these words in the Corpus. The values of the variable named Word are the, of, and, to, and a, so Word is a nominal variable with five classes. Frequency and relative frequency distributions are constructed to summarize the data. They are shown in the table that follows, but the table is incomplete. Use the dropdown menus to complete the table. Table 1 Frequency Word (Thousands of occurrences) Relative Frequency the 70.0 0.3794 of 36.4 and 0.1566 to 26.1 0.1415 23.1 0.1252 Total 184.5 The Brown Corpus contains about 1 million words. The frequency of the word the in the entire corpus is about occurrences. The relative frequency of the word the in the entire corpus is about

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!