This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the cost of performing the modified external merge sort. Recall that sequential IO (i.e. involving reading from / writing to consecutive pages) is generally much faster that random access 10 (any reading / writing that is not sequential). Additionally, on newer memory technologies like SSD reading data can be faster than writing data (if you want to read more about SSD access patterns look here). For example, if we read 8 consecutive pages from file A, this should be much faster than reading 2 pages from A, then 4 pages from file B, then 2 pages from A. In this problem, we will begin to model this, by assuming that 8 sequential READS are "free", i.e. the total cost of 8 sequential reads is 1 10. We will also assume that the writes are always twice as expensive as a read. Sequential writes are never free, therefore the cost of N writes is always 2N. Please note the following: • NO REPACKING: Consider the external merge sort algorithm using the basic optimizations we present in class, but do not use the repacking optimization covered in class. • ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for output in a merge, e.g. a B-way merge would require B+1 buffer pages. • REMEMBER TO ROUND: Take ceilings (i.e. rounding up to nearest integer values) into account in this problem. Note that we have sometimes omitted these (for simplicity) in lecture. • Consider worst case cost: In other words, if 2 reads could happen to be sequential, but in general might not be, consider these random 10. Question 2.1 Consider a modification of the external merge sort algorithm where reads are always read in 8-page chunks (i.e. 8 pages sequentially at a time) so as to take advantage of sequential reads. Calculate the cost of performing the external merge sort for a setup having B+1=40 buffer pages and an unsorted input file with 320 pages. Show the steps of your work and make sure to explain your reasoning if necessary. Question 2.1.1 What is the exact 10 cost of splitting and sorting the files? As is standard we want runs of size B+1. Question 2.1.2 After the file is split and sorted, we can merge n runs into 1 using the merge process. What is the largest n we could have, given reads are always read in 8-page chunks? Note: this is known as the arity of the merge. Question 2.1.3 How many passes of merging are required? Question 2.1.4 What is the total 10 cost of running this external merge sort algorithm? Do not forget to add in the remaining passes (if any) of merging. Question 2.2 Now, we'll generalize the reasoning above by writing formula that compute the approximate cost of performing this version of external merge sort for a setup having B+1 buffer pages, a file with N pages, and where we now read in P-page chunks (replacing our fixed 8 page chunks in the previous section. *Note: our approximation will be a small one- for simplicity, we'll assume that each pass of the merge phase has the same 10 cost. We'll calculate the 10 cost for each merge phase and compute the total cost as the product of the cost of reading in and writing out all the data (which we do each pass), and the number of passes we'll have to do. Even though this is an approximation, make sure to take care of floor / ceiling operations- i.e. rounding down / up to integer values properly! Importantly, to simplify your calculations, you can assume: • (B+1) %P == 0 (i.e. the buffer size is divisible by the chunk size) N% (B + 1) == 0 (i.e. the file size is divisible by the buffer size) Question 2.2.1 First, write the formula that computes the exact total 10 cost to create the initial runs in terms of B, N, and P. Question 2.2.2 Next, write the formula that computes the approximate total 10 cost to read in and then write out all the data during one pass of the merge in terms of B, N, and P. Question 2.2.3 Next, write the formula that computes the exact total number of passes we'll need to do in terms of B, N, and P. Question 2.2.4 Finally, write the formula that computes the total cost in terms of B, N, and P. This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the cost of performing the modified external merge sort. Recall that sequential IO (i.e. involving reading from / writing to consecutive pages) is generally much faster that random access 10 (any reading / writing that is not sequential). Additionally, on newer memory technologies like SSD reading data can be faster than writing data (if you want to read more about SSD access patterns look here). For example, if we read 8 consecutive pages from file A, this should be much faster than reading 2 pages from A, then 4 pages from file B, then 2 pages from A. In this problem, we will begin to model this, by assuming that 8 sequential READS are "free", i.e. the total cost of 8 sequential reads is 1 10. We will also assume that the writes are always twice as expensive as a read. Sequential writes are never free, therefore the cost of N writes is always 2N. Please note the following: • NO REPACKING: Consider the external merge sort algorithm using the basic optimizations we present in class, but do not use the repacking optimization covered in class. • ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for output in a merge, e.g. a B-way merge would require B+1 buffer pages. • REMEMBER TO ROUND: Take ceilings (i.e. rounding up to nearest integer values) into account in this problem. Note that we have sometimes omitted these (for simplicity) in lecture. • Consider worst case cost: In other words, if 2 reads could happen to be sequential, but in general might not be, consider these random 10. Question 2.1 Consider a modification of the external merge sort algorithm where reads are always read in 8-page chunks (i.e. 8 pages sequentially at a time) so as to take advantage of sequential reads. Calculate the cost of performing the external merge sort for a setup having B+1=40 buffer pages and an unsorted input file with 320 pages. Show the steps of your work and make sure to explain your reasoning if necessary. Question 2.1.1 What is the exact 10 cost of splitting and sorting the files? As is standard we want runs of size B+1. Question 2.1.2 After the file is split and sorted, we can merge n runs into 1 using the merge process. What is the largest n we could have, given reads are always read in 8-page chunks? Note: this is known as the arity of the merge. Question 2.1.3 How many passes of merging are required? Question 2.1.4 What is the total 10 cost of running this external merge sort algorithm? Do not forget to add in the remaining passes (if any) of merging. Question 2.2 Now, we'll generalize the reasoning above by writing formula that compute the approximate cost of performing this version of external merge sort for a setup having B+1 buffer pages, a file with N pages, and where we now read in P-page chunks (replacing our fixed 8 page chunks in the previous section. *Note: our approximation will be a small one- for simplicity, we'll assume that each pass of the merge phase has the same 10 cost. We'll calculate the 10 cost for each merge phase and compute the total cost as the product of the cost of reading in and writing out all the data (which we do each pass), and the number of passes we'll have to do. Even though this is an approximation, make sure to take care of floor / ceiling operations- i.e. rounding down / up to integer values properly! Importantly, to simplify your calculations, you can assume: • (B+1) %P == 0 (i.e. the buffer size is divisible by the chunk size) N% (B + 1) == 0 (i.e. the file size is divisible by the buffer size) Question 2.2.1 First, write the formula that computes the exact total 10 cost to create the initial runs in terms of B, N, and P. Question 2.2.2 Next, write the formula that computes the approximate total 10 cost to read in and then write out all the data during one pass of the merge in terms of B, N, and P. Question 2.2.3 Next, write the formula that computes the exact total number of passes we'll need to do in terms of B, N, and P. Question 2.2.4 Finally, write the formula that computes the total cost in terms of B, N, and P.
Expert Answer:
Answer rating: 100% (QA)
21 Here are the steps to calculate the cost of performing the modified external merge sort for a setup having B140 buffer pages and an unsorted input file with 320 pages Step 1 Determine the number of ... View the full answer
Related Book For
Building Java Programs A Back To Basics Approach
ISBN: 9780135471944
5th Edition
Authors: Stuart Reges, Marty Stepp
Posted Date:
Students also viewed these databases questions
-
Last year Power Electric Industries earned $3.80 earnings per share. Common stock sold for $65.00, the last approved dividend was $2.50 per share, and a 9% float cost would require selling new common...
-
A friend loans you $1000 with the agreement that yo payback the principal plus interest at the end of year 2. the loan has a simple interest rate of 6%, at the end of year 2 how much interest will...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Write code in MATLAB (Radionuclide) (half-life) U-238 4.468 x 10 years U-235 703.8 x 10 years Mo-99 67 hours Tc-99m 6.04 hours Given the formula, where is the decay constant used in , calculate the...
-
Replace the loading by an equivalent resultant force and specify its location on the beam, measured from point B. Units Used: kip = 103 lb Given: w1 = 800lb/ft w2 = 500lb/ft a = 12 ft b = 9 ft W2 Be
-
Why do some firms tend to pay lower cash dividends, but also use stock repurchases in addition to distributing cash?
-
A model for chromatographic separation: a case-study problem. An important application of Taylor dispersion is in chromatography. Here pulses of a mixture of solutes are introduced into one end of a...
-
Data Set 9 in Appendix B lists measured cotinine levels from a sample of subjects who smoke, another sample of subjects who do not smoke but are exposed to environmental tobacco smoke, and a third...
-
Determine the amount of work (in J) done on an ideal gas as it is heated in an enclosed, thermally isolated cylinder topped with a freely moving piston. The cylinder contains 0.220 mol of the gas and...
-
Ted is an audit manager with a national public accounting firm and one of his clients is Easy Clean, Co. Easy Clean provides industrial and domestic carpet steam-cleaning services. This is the first...
-
Melissa owns the following portfolio of stocks. What is the return on her portfolio? A) 8.0% B) 9.0% C) 9.8% D) 10.9%
-
Assume the same facts as in Problem 38, except that the executor sold the house on May 2, 2019, for $138,000. What value is used by the executor in the estate tax return? Problem 38 Kermit Ames died...
-
Disbarment of Lawyers Egil Krogh, Jr., was admitted [to practice] law in the state of Washington on September 20, 1968. On February 4, 1974, he was suspended as a result of his having been convicted...
-
Zipcar is a car sharing service that Cambridge, Massachusetts, residents Robin Chase and Antje Danielson launched in 2000. Scott Griffith, a former Boeing engineer, now leads the firm. Although...
-
Why did Mendels work refute the blending hypothesis of inheritance?
-
What is the difference between cross-fertilization and self-fertilization?
-
Perth Ltd has been operating for a number of years. The company provides goods and services to customers and uses a perpetual inventory system. The following information relates to the year ending 30...
-
Prove the formula for (d/dx)(cos-1x) by the same method as for (d/dx)(sin-1x).
-
Why does the binary search algorithm require the input to be sorted?
-
Write a method called stretch that takes an integer as a parameter and that increases a list of integers by a factor of by replacing each integer in the original list with copies of that integer. For...
-
Why does a method to swap two array elements work correctly when a method to swap two integer values does not?
-
Figure P4.2 shows the velocity of a block of wood as a function of time. The block is sliding over a horizontal surface. Describe the physical processes that led to this graph. Data from Figure P4.2...
-
The velocity-versus-time graph in Figure P4.3 shows the motion of two different objects sliding across a horizontal surface. Could the change in the \(x\) component of velocity with time be...
-
Consider the two velocity-versus-time graphs shown in Figure P4.4. Are the motions represented by these curves best described as similar or as different? Is the effect of friction on the motion...
Study smarter with the SolutionInn App