Question: Please use python code. This looks long but it is sime. I have 4 parts project and I have the code for first 2 parts.
Please use python code. This looks long but it is sime. I have 4 parts project and I have the code for first 2 parts. I need help with part 3 only.
Part 1 Bhattacharyya distance
import math def bhatt_dist(D1, D2, n): BCSum = 0 for i in range(n): BCSum += math.sqrt(D1[i] * D2[i]) DBValue = - math.log(BCSum) return DBValue D1 = [1 , 2, 1, 2] D2 = [ 59 ,100, 41, 100] print bhatt_dist(D1, D2, 2)
Output: -3.08297735276
Part 2 lename and creates a single letter frequency distribution
import string
fp=open('sample.txt','r')
#fp=open('file1.txt','r') also used
#fp=open('file2.txt','r') also used
file_name=fp.readlines()
frequency = {}
for l in file_name:
l = filter(lambda x: x in string.letters, l.lower())
for char in l:
if char in frequency:
frequency[char] += 1
else:
frequency[char] = 1
print frequency
Output:
file 1 = {'a': 14819, 'c': 14947, 'b': 14899, 'e': 15106, 'd': 14908, 'g': 14828, 'f': 14854, 'i': 14984, 'h': 14949, 'k': 14725, 'j': 14917, 'm': 15174, 'l': 14821, 'o': 15033, 'n': 14906, 'q': 14849, 'p': 15093, 's': 14771, 'r': 14749, 'u': 14899, 't': 14946, 'w': 14880, 'v': 14709, 'y': 14938, 'x': 14899, 'z': 14808}
file2 = {'a': 8899, 'c': 27723, 'b': 9912, 'e': 16075, 'd': 26786, 'g': 51184, 'f': 9394, 'i': 9639, 'h': 36746, 'k': 6884, 'j': 24261, 'm': 5714, 'l': 25631, 'o': 7135, 'n': 466, 'q': 29605, 'p': 154, 's': 2370, 'r': 16460, 'u': 6589, 't': 24447, 'w': 26625, 'v': 340, 'y': 289, 'x': 10385, 'z': 3698}
sample = {'a': 39604, 'c': 10644, 'b': 7685, 'e': 57726, 'd': 22713, 'g': 9695, 'f': 11819, 'i': 33433, 'h': 31290, 'k': 3428, 'j': 369, 'm': 13803, 'l': 17229, 'o': 38778, 'n': 32060, 'q': 314, 'p': 7482, 's': 27463, 'r': 25718, 'u': 13159, 't': 46758, 'w': 13395, 'v': 5140, 'y': 10020, 'x': 592, 'z': 223}
part 3 Statistical Analysis of Files
Write a program that takes two lenames as inputs. The output of the program should be the Bhattacharyya distance between the single letter frequency distributions resulting from each of the les, respectively. Note that to implement this, you will need to use the two functions dened in the rst two challenges.
You have three les in your folder: sample.txt, file1.txt, and file2.txt. The le sample.txt is a large sample of writing in English which you will use to build a statistical prole for letter distribution in the English language. Of the other two les, one is an encryption of a document written in English and one is a random collection of letters.
To see what all this is good for, run the Bhattarchaya distance function on the three text les provided to your group. Find the distance between sample.txt and file1.txt, and the distance between sample.txt and file2.txt. In your written report, write down the results of these two comparisons and analyze what these results mean. Exactly one of these les is an encryption of a document written in English. In your report, explain which le is the le written in English and justify your reasoning.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
