Question: Note in Python Please Task: we are going to test various hash functions to see how good they are, in terms of how many collisions
Note in Python Please
Task: we are going to test various hash functions to see how good they are, in terms of how many collisions they have. Your input will be strings. The dataset is here. . This file contains just under 100,000 English words, which were going to use to test the uniformity of various hash functions. Your hash functions will hash strings into 16-bit (not 32-bit) ints. This is important, because were going to keep a table of the number of collisions for each hash value. For each of the possible hash functions your program should: Create hashes of size 65,536 Process the list of words, and for each word, compute its hash h Increment the entry in the table for that hash When finished, use Pearsons test to determine the probability that the resulting distribution is uniformly distributed. (See below for the deets.) Hash functions to test The hash (non)functions you should test are: String length (modulo 216216) First character Additive checksum (add all characters together), modulo 216216 Remainder (use a modulo of 65413, this is the first prime that is smaller than the table size). Remember that you cannot just add up all the characters and then take the mod of the result; you have to thread the modulo through the loop that computes the sum. Multiplicative (using the scheme described in class/in the lecture notes). Again, remember that you cant just use the final sum; you have to incorporate the multiplicative calculation into hashing loop. print out nice-looking histograms of each hash functions distribution.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
