Question: 3 - Deep learning accelerator Design and implement a verilog module called perceptron . We will use fixed point format in our simple single layer

3- Deep learning accelerator Design and implement a verilog module called perceptron. We will use fixed point format in our simple single layer perceptron. It should follow the following specification. Top level Port name Direction Width Description rst_n Input 0:0 Active low reset clk Input 0:0 Clock x1 Input 7:0 Signed fixed point x2 Input 7:0 Signed fixed point valid_in Input 0:0 Inputs valid y Output 0:0 Unsigned (0 or 1) y_valid Output 0:0 Y valid, driven by design Perceptron should also contain internally (Read Only Memory - ROM)- b reg 7:0 Bias - Signed fixed point w1 reg 7:0 Weight - Signed fixed point w2 reg 7:0 Weight - Signed fixed point Please use the following values signed [7:0] w1=8'sb00000010; signed [7:0] w2=8'sb11111110; signed [7:0] b =8'sb11111101; Fixed point x1, x2, b, w1, w2 all have an implicit decimal point in the middle of the 4 least significant bits i.e. b/w bit 2 and bit 1. Hence, every signed fixed point number =(Value if it were a signed integer)/4,000001012=1.25(=5/4)111110002=-2(=-8/4)111110102=-1.5(=-6/4) etc You may use the same adder and multiplier that work with signed integers, but will need to account for the decimal point. If we add two numbers of the above type, the decimal point is still 2 places from the least significant bit (LSB). But if we multiply two such numbers, the implicit decimal point is 4 places from the least significant bit, so to get the correct fixed point product, we need to shift it right by 2 in this case. Working - The perceptron must be pipelined (each stage 1 clock cycle and computation on new inputs begins every clock cycle). In first stage, it uses 2 signed multipliers to calculate the products p1= w1*x1 and p2= w2*x2. In the second stage, it adds the p1, p2 and b to get an intermediate signed output - s. For a deep learning accelerator we will over provision, hence, we will use multiple multipliers and adders such that all multiplications may be finished in 1 stage (clock cycle) and all additions may be finished in one stage (clock cycle). For now, we just want to get it functionally correct and pipelined. At a later point we may want to refine the design to be our accelerator. You may use * and + sign now, but at some point we may want to use the IP components and may want to pipeline the multiplier for best performance. You may save your self future effort by using the library components now itself (be aware that will change clock cycles, no extra marks). After the addition (after the 2nd stage), it asserts or de-asserts y based on - y =1 if s >=00 Otherwise Hence, latency =2 Clock cycles You must use internal registers of appropriate size to contain the complete intermediate results.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!