Question: 4. A good hash function h(1) behaves in practice very close to the simple uniform hash- ing assumption analyzed in class, but is a deterministic

4. A good hash function h(1) behaves in practice very close

to the simple uniform hash- ing assumption analyzed in class, but is a deterministic function. Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under-loading others. Good hash functions

4. A good hash function h(1) behaves in practice very close to the simple uniform hash- ing assumption analyzed in class, but is a deterministic function. Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under-loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing. Consider the following two hash functions. Let U be the universe of strings composed of the characters from the alphabet = A...,2), and let the function f(1) return the index of a letter I; E , e.g., f(A) = 1 and f(Z) = 26. Letr be a string of length m. (1) The first hash function we consider is hi() = f(1) mode, where l is the number of buckets in the hash table. (2) For the second hash function, first-globally, external to the hash function- choose uniformly random integers a; (one for each I; E ) from 0.....10,000, and then define h2(2) = L eif(1) mod l. List your values of a; here: (and please use consistent values of ; throughout this question) (a) There is a txt file on Canvas that contains US Census derived last names. Using these names as input strings, first choose a uniformly random 50% of these name strings. Letl=5851 be the number of buckets. For each of the two hash functions (separately), produce a histogram showing the distribution of hash locations for the names you chose. Label the axes of your figures. Give a brief description of what the figure shows about him and h2(): justify your results in terms of the behavior of these hash functions. Hint: the raw file includes information other than the name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table. (b) State at least 4 reasons why hi() is a bad hash function relative to the ideal behavior of uniform hashing. (c) Produce two plotsone for each hash function h, he showing the length of the longest chain were we to use chaining for resolving collisions as a function of the number n of these strings that we hash into a table with l = 5851 buckets. That is, you may use the 50% of names from part (a), and as you hash them one by one, show how the length of the longest chain is growing. (d) Produce another pair of plotsone for each of hy, he showing the number of collisions as a function of l. Comment on how collisions decrease as l increases. Aside from size, do you notice any particular kinds of values for that seem better than others? (e.g. odd/even, prime, etc.) Discuss briefly. 4. A good hash function h(1) behaves in practice very close to the simple uniform hash- ing assumption analyzed in class, but is a deterministic function. Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under-loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing. Consider the following two hash functions. Let U be the universe of strings composed of the characters from the alphabet = A...,2), and let the function f(1) return the index of a letter I; E , e.g., f(A) = 1 and f(Z) = 26. Letr be a string of length m. (1) The first hash function we consider is hi() = f(1) mode, where l is the number of buckets in the hash table. (2) For the second hash function, first-globally, external to the hash function- choose uniformly random integers a; (one for each I; E ) from 0.....10,000, and then define h2(2) = L eif(1) mod l. List your values of a; here: (and please use consistent values of ; throughout this question) (a) There is a txt file on Canvas that contains US Census derived last names. Using these names as input strings, first choose a uniformly random 50% of these name strings. Letl=5851 be the number of buckets. For each of the two hash functions (separately), produce a histogram showing the distribution of hash locations for the names you chose. Label the axes of your figures. Give a brief description of what the figure shows about him and h2(): justify your results in terms of the behavior of these hash functions. Hint: the raw file includes information other than the name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table. (b) State at least 4 reasons why hi() is a bad hash function relative to the ideal behavior of uniform hashing. (c) Produce two plotsone for each hash function h, he showing the length of the longest chain were we to use chaining for resolving collisions as a function of the number n of these strings that we hash into a table with l = 5851 buckets. That is, you may use the 50% of names from part (a), and as you hash them one by one, show how the length of the longest chain is growing. (d) Produce another pair of plotsone for each of hy, he showing the number of collisions as a function of l. Comment on how collisions decrease as l increases. Aside from size, do you notice any particular kinds of values for that seem better than others? (e.g. odd/even, prime, etc.) Discuss briefly

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

4. A good hash function h(x) behaves in practice very close to the simple uniform hashing assumption analyzed in class, but is a deterministic function. Designing good hash functions is hard, and a...

Produce a plot showing (i) the length of the longest chain (were we to use chaining for resolving collisions) as a function of the number n of these strings that we hash into a table with l = 5701...

ignore the test file, just provide some psedo code 2. A good hash function h(x) behaves in practice very close to the uniform hashing assumption analyzed in class, but is a deterministic function....

A good hash function h(x) behaves in practice very close to the uniform hashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to...

2. (30pts)Agoodhashfunctionh(x)behavesinpracticeveryclosetotheuniformhashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

Please help me make an Executive Summary. Explain what you will examine in the case study. Write an overview of the field you are researching. Make a thesis statement and sum up the results of your...

Read the above passage and then answer short questions Summarize and elaborate the research method of this article in concise language Application Research Based on Machine Learning in Network...

Downloaded 02/28/15 to 132.174.255.3. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 15.7. Exercises 475 changes and we laugh no more. While...

If a student is more likely to be late than on time for the 1:20 PM history class: (a) Determine if the median of the student's arrival time distribution is earlier than, equal to, or later than 1:20...

Emmalia is a careful investor. She wants to invest in a new stock. Her potential financial advisor told her that the wealthguarant stock would be a good option, as she might expect an average annual...

The marital deduction for transfers to a U . S . citizen spouse is: Group of answer choices Limited to $ 1 9 , 0 0 0 per year of marriage. Limited to $ 1 9 0 , 0 0 0 ( as indexed for inflation for 2...

During the early years of a regular amortizing mortgage loan, the lender applies most of the monthly payment to interest on the loan. True False Previous Page Next Page Page 13 of 26 Submit Qulz 0 of...

How can you follow the principles of business communication in your composing efforts and still reflect your own personality in your messages? (Objective 5)

Use short words. Use a print or electronic thesaurus or dictionary to select short words to replace these long words: (a) whimsical, (b) facsimile, (c) consolidate, (d) reproduction, (e) reasonable,...