Consider the following relations R(A, B), S(C, D), T (E, F ) with S = 1 10
Question:
Consider the following relations R(A, B), S(C, D), T (E, F ) with S = 1 10
(10 tuples fit on each page). The sizes
V (R, B) = 50
V (S, D) = 25, 000 V (T , F ) = 3
and value distributions are:
T (R) = 10, 000 T (S) = 50, 000 T (T ) = 30
V (R, A) = 10, 000 V (S, C) = 100
V (T , E ) = 10
Question 8.1 Greedy Join Enumeration (10 Points)
Use the greedy join enumeration algorithm to find the cheapest plan for the join R ▷◁B=C S ▷◁D=E T. Assume
that nested-loop (not the block based version) is the only available join implementation with the left input
being the "outer" (for each tuple from the outer we have to scan the whole inner relation). Furthermore, there
are no indicies defined on any of the relations (that is you have to use sequential scan for each of the relations).
As a cost model consider the total number of I/O operations. For example, if you join two relations with
5,000 and 10,000 tuples with S = 1 , where the 5,000 tuple relation is the outer, then the cost would be 10
5,000,000 (scan the inner 5000 times) + 500 to scan the other once. The total cost is then 5,000,500 I/Os. Assume that the system supports pipelining for the outer input of a join. That is if you join the result of a join with a relation where the join result is the outer, then there is no I/O cost for scaning the outer. Also under these assumptions you never have to store join results to disk. Hint: You will have to estimate the size of intermediate results. Use the estimation based on the number of values and not the one based on the size of the domain. Use the assumption that the number of values in a join attribute of a join result is the minimum of the number of values in the join attribute of each input.
Write down the state (the current plans) after each iteration of the algorithm in file q8_joinenum.txt. In this file comments start with --. The state for an iteration starts with the keyword ITERATION. A plan consists of three parts: an algebra tree (separated by : from the next part), the expected number of result tuples (separated by ; from the next part) and the expected total cost (I/Os) for the plan. An algebra tree (expression) uses the key word JOIN to indicate joins and uses parentheses to indicate the order of joins, e.g., R JOIN (S JOIN T) represents the plan which joins R with the result of joining S with T. An example result is shown below.
----------------------------------------------------- -- FIRST ITERATION
ITERATION
R: 300; 30 S: 100; 10 T: 100; 10 ----------------------------------------------------- -- SECOND ITERATION
ITERATION
(R JOIN S): 3000; 300 T: 100; 10 ----------------------------------------------------- -- THIRD ITERATION ITERATION ((R JOIN S) JOIN T): 30; 100543
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill