Question: With CUDA we can use coarse-grain parallelism at the block level to compute the conditional likelihood of multiple nodes in parallel. Assume that we want

With CUDA we can use coarse-grain parallelism at the block level to compute the conditional likelihood of multiple nodes in parallel. Assume that we want to compute the conditional likelihood from the bottom of the tree up. Assume seq_length = = 500 for all notes and that the group of tables for each of the 12 leaf nodes is stored in consecutive memory locations in the order of node number (e.g., the mth element of clP on node n is at clP [n*4*seq_length+m*4]). Assume that we want to compute the conditional likelihood for nodes 12–17, as shown in Figure 4.35. Change the method by which you compute the array indices in your answer from Exercise 4.5 to include the block number.

12 18 13 21 2 3 Figure 4.35 Sample tree. 14 5

Exercise 4.5

Now assume we want to implement the MrBayes kernel on a GPU using a single thread block. Rewrite the C code of the kernel using CUDA.

Assume that pointers to the conditional likelihood and transition probability tables are specified as parameters to the kernel. Invoke one thread for each iteration of the loop. Load any reused values into shared memory before performing operations on it.

12 18 13 21 2 3 Figure 4.35 Sample tree. 14 5 19 6 15 22 16 20 17 10 11

Step by Step Solution

★★★★★

3.47 Rating (154 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

The question asks for a CUDA kernel that computes the conditional likelihoods of nodes in a tree structure Additionally the question specifies the memory layout for the conditional likelihood arrays c... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Computer Architecture Questions!

Convert your code from Exercise 4.6 into PTX code. How many instructions are needed for the kernel? Exercise 4.6 With CUDA we can use coarse-grain parallelism at the block level to compute the...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Provide a summary technical report with your own words about Pipelined Execution which is also named as Instruction Level Parallelism, addressing mainly the following areas: 1. What is Pipelined...

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

.Periods of College Students The senior member of understudies needs to see whether there is a critical contrast in times of occupant understudies and driving understudies. She chooses an example of...

2 CS229 Problem Set #4 Solutions log m ! p(x(i) |)p() = log p() + m " log p(x(i) |) i=1 i=1 = log p() + m " log " p(x(i) , z (i) |) log " Qi (z (i) ) i=1 = log p() + log p() + m " z (i) i=1 z (i) m...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

This question involves the use of AGGREGATE linear PYTHOIN regression on the Auto data set. (a) Perform a simple linear regression with mpg as the response and horsepower as the predictor. Describe...

can i get a solution to this question please. Value: Document (include your file name) 15% of final grade How to complete: 1. Read the following Anniversary Party scheduling problem: As the project...

The amino acid glycine can be condensed to form a polymer called polyglycine. Draw the repeating monomer unit.

Provide two definitions of DSS.

True or false canceling a credit card can hurt your FICO score

Find and copy two magazine or newspaper advertisements, one based on the affective component and the other on the cognitive component. Discuss the approach of each ad in terms of its copy and...

Write a Little Man program that accepts three values as input and outputs them in order of size, largest to smallest.

Write a Little Man program to accept an indefinite number of input values. The output value will be the largest of the input values. You should use the value 0 as a flag to indicate the end of input.

Write a Little Man program that accepts three values as input and produces the largest of the three as output.

Question 3 ( 6 Points ) : Given the following database instance, answer questions 3 . 1 through 3 . 3 : Employee Plan emp _ code emp _ lname job _ code plan _ code plan _ description EC 1 4 Rudell JC...

Digital forensics and data recovery refer to the same activities.

Use the following asset information to answer the question below Purchase price $85,000 Salvage Value $25,000 Estimated life 5years Estimated total units 50,000 actual units year 1 10,500 units...