Question: Using the example axpy computation problem(Provided under the problem) Write a C console progrm of the matrix multiplication algorithm ([A] * [B] = [C]) for
Using the example axpy computation problem(Provided under the problem)
Write a C console progrm of the matrix multiplication algorithm ([A] * [B] = [C]) for DOUBLE precision data types. All matrices [A], [B], and [C] are to be square i.e. same number of rows and columns. Execution should be scalable and be able to handle matrix dimension N x N, from 4 x 4, 16 x 16, 32 x 32, 64 x 64, 128 x 128, 512 x 512, 1024 x 1024, 2048 x 2048. Set the matrix dimension, N, number of accuracy improvement loops, and system clock speed using DEFINE statements. You'll need to loop many more times for small array sizes, then reduce the loop iterations as you increase array size (1 accuracy loop for matrices 1024x1024 and larger). Use a random number generator to fill the random data into the matrices, compiler optimizations should be on for full optimization.
Your program should printout using formatted printf 1.Vector length 2.Number of accuracy loops 3.Total computation time 4.Computation time for the complete NxN matrix multiplication 5.Computation time per arithmetic operation 6.Number of machine cycles per arithmetic operation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~EXAMPLE~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#include
#include
#include
#define SIZE 4 // vector dimension
#define LOOP 1E9 // number of accuracy improvement loops
#define CPU_CLK 3e9
int main()
{
//declare vectors and variables
long i;
long long j;
long * z;
long * x;
long * y;
long a;
double NumOfOps; //variable declaration for total number of arithmetic Ops in computation
// LOOP * SIZE * #of arithemtic ops requires per element
double OPS;
double ElapsedTime;
double ElapsedTimePerVector;
double ElapsedTimePerVectorElement;
long long OPS_PER_INSTR; /umber of arithmetic ops required per element
time_t start_time;
time_t end_time;
a = 1;
// declare the axpy coefficient and variables. Allocate memory/stack space for them
z = (long*)malloc(SIZE * sizeof(long));
x = (long*)malloc(SIZE * sizeof(long));
y = (long*)malloc(SIZE * sizeof(long));
//calulate LOOP value for defined accuracy value
//LOOP = MAX_ITERATIONS * ACCURACY;
OPS_PER_INSTR = 2; // arithmetic Ops per vector element from algorithm expressed in C code
//fill vectors with random values 1 to 100
for (i = 0; i
{
x[i] = (long)1 + rand() % 100;
y[i] = (long)1 + rand() % 100;
z[i] = 0.0;
}
printf("Number of elements per vector is: %d ", SIZE);
printf("Number of accuracy loops is: %e ", (double)LOOP);
printf("Processor clock frequency is: %0.2e cycles per second ", CPU_CLK);
NumOfOps = (double)(OPS_PER_INSTR * SIZE*(double)LOOP); //evaluate total number of multiply adds
printf("# of floating point multiply adds is: %0.3e ", NumOfOps);
printf("Ops per instruction = %d ", OPS_PER_INSTR);
///begin timed portion of benchmark
start_time = (double)time(0);
for (j = 0; j
{
for (i = 0; i
{
z[i] = (a*x[i]) + y[i]; //single line of code to implement axpy
}
}
end_time = (double)time(0);
///end timed portion of benchmark
ElapsedTime = ((end_time - start_time)); //elapsed time in double precision format
printf("Measured elasped time was: %0.4e seconds ", ElapsedTime);
ElapsedTimePerVector = (double)ElapsedTime / (double)LOOP;
printf("Execution time per vector is: %0.4e seconds ", ElapsedTimePerVector);
ElapsedTimePerVectorElement = ElapsedTimePerVector / ((double)SIZE);
printf("Execution time per vector element is: %0.4e seconds ", ElapsedTimePerVectorElement);
printf("Execution time per arithmetic Op is: %0.4e seconds ", ElapsedTimePerVectorElement / OPS_PER_INSTR); /eed to divide by 2 for complete multiply add functionality
printf("Estimated OPs per second is: %0.3e OPs per second ", OPS = (OPS_PER_INSTR / ElapsedTimePerVectorElement)); /ed 2 in numerator for case of multiply add
printf("Estimated number of clock cycles per OP is %0.2f CPU Clock Cycles per OP ", (CPU_CLK) / (OPS));
getchar(); //uncomment this if needed by IDE need to keep command concole open after execution
free(x);
free(y);
free(z);
return 0;
}
A C console program of the vector multiply-add axpy" algorithm for integer data types. Instrument and monitor and measure execution time for the vector multiply add. a) C program should be single threaded, and sequential. Execution should be scalable and be able to handle the number of vector elements, N, from 1 to 1,000,000. Set the vector dimension, N, number of accuracy improvement loops, and system clock speed using DEFINE statements. Use a random number generator to fill the random data into the vector elements. Recompile & execute for each vector length (and # of accuracy loops) . Compiler optimizations should be off/ defaulted. The console window should execute and remain open until manually closed. The code is portable, and not dependent on development environment or properties settings. b) Your console program should print out ( using formatted printf commands ) 1 Vector length ( number of vector elements) 2) The number of accuracy improvement loops you run the axpy computation to improve accuracy. 3) Total computation time 4) 5) 6) Computation time per axpy vector Computation time per vector element The number of machine cycles per arithmetic operation
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
