Question: Using Linux commands, I am trying to make a histogram of all three word sequences (trigram) in a database given. It needs to be sorted
Using Linux commands, I am trying to make a histogram of all three word sequences (trigram) in a database given. It needs to be sorted in decreasing order of occurrence, it needs to be case insensitive and punctuation needs to be ignored. There should be a column where it counts the number of times the trigram occurred, a column where it calculates the percentage each was used and a column that keeps a running sum of the percentages in the percent column.
The output of your command line should be:
| Trigram | Frequency | ||
| No. | Percentage | Cumulative | |
| see jane run | 3 | 37.5000% | 37.5000% |
| jane run see | 2 | 25.0000% | 62.5000% |
| run see john | 1 | 12.5000% | 75.0000% |
| see john run | 1 | 12.5000% | 87.5000% |
| run see jane | 1 | 12.5000% | 100.0000% |
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
