Question: Suppose you have a problem in which the feature matrix, X, has 100 million rows and 200 columns. If each element of X is stored

Suppose you have a problem in which the feature matrix, X,

has 100 million rows and 200 columns. If each element of X

Suppose you have a problem in which the feature matrix, X, has 100 million rows and 200 columns. If each element of X is stored as a 64-bit double precision floating point number, how much memory is required to store ?? (Please enter the value with a precision of at least two significant figures, your answer will be graded with a 10% tolerance.) 2.3 GB (GigaBytes) Big O notation and algorithmic complexity In computer science, big o notation (0) is used to communicate the asymptotic, or, in other words, dominant behavior of an algorithm. Usually, big o notation is used in the context of time complexity and space complexity. Time complexity describes the running time behavior of the algorithm, while space complexity describes the memory requirement behavior of the algorithm. For example, the computational cost of bubble sort grows with the square of the number of elements in the list to sort - here denoted as n. Therefore, the time complexity (the computational cost) is said to be of order () (n), as the square dominates any linear or sub-linear costs in the algorithm. If we had a list with n = 30 elements, we could then say that the time complexity of sorting this list is of order (30%) = (900). Note that we don't specify units, this is because we are describing the behavior of the algorithm, and how its costs scale with n. The actual number of CPU instructions would depend on the programming environment, processor architecture, etc. The same can also be used for memory storage requirements. For example, to calculate the Hessian of a function with k parameters, one must calculate all combinations of second order partial derivatives. Thus, the space complexity of this algorithm grows with the square of the number of parameters, and we say that the space complexity is 0 (k). As the actual amount of memory required may vary with time throughout the running of the algorithm, space complexity refers peak memory requirement. For a function with four parameters, we would have (42) = ((16). Again, we don't specify units as we are describing the behavior of the algorithm. The algorithm could be applied to problems using 32-bit floating point numbers or 128-bit complex numbers, and the actual amount of space used in RAM may be architecture dependent (for example, due to alignment issues). In practice, the actual time or space complexity of most algorithms are fairly close to their big o notation. In the previous example, if the function for which we compute the Hessian is fairly simple, and we compute using 64-bit floating point numbers, then we shouldn't expect the actual amount of memory required to be more than few times 16 x 8 bytes. As the time and space complexities are an important metric through which to compare algorithms, they are intensively studied and well documented for common algorithms. Hide The algorithmic space complexity for inverting the specific x x mentioned above is of order... (Please enter the value with a precision of at least two significant figures, your answer will be graded with a 10% tolerance.) ( 1.22 Which of these is the bottleneck? Storage of x Inverting xx Part (b) 2 points possible (graded) Suppose we instead use gradient descent to solve this problem, what is the memory required to run gradient descent? (Ignore the memory required for y and B. Please enter the value with a precision of at least two significant figures, your answer will be graded with a 10% tolerance.) 2.3 GB (GigaBytes) Now, what is memory required for stochastic gradient descent? (Ignore the memory required for y and B. Please enter the value with a precision of at least two significant figures, your answer will be graded with a 10% tolerance.) 10 kB (kiloBytes) Submit You have used 0 of 3 attempts Save Part (c) 1 point possible (graded) Now, suppose we are in a setting in which the number of data points, n, is much smaller than the number of variables, p, i.e. X has many more columns than rows. This situation occurs often in biological applications, for example, in which the features may represent the expression levels of various genes. This is often referred to as the "high-dimensional" regime. (Assume x is small enough that it can fit in memory.) Can we run gradient descent to compute the regression coefficients? Yes, and we will get a unique solution. No, the gradient can't be computed if n

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

%% Lab 2 - Your Name - MAT 275 Lab %% Example code % Example 1 % NOTE: Delete examples before submission. A = [1 0; 0 -1] A = [1, 0; 0, -1] % NOTE: The two matrices above are the same. We can...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

) Explain the term overloading in the context of Java constructors and methods. [2 marks] (b) Without describing the details of either, outline the relationship between the Java methods...

The best definition for nonparametric statistics I can think of is when data does not fit inside the normal parameters of a normal distribution, it calls for the use of this statistical method....

The objective of this assignment is to write a simple program library that computes the product of two dense matrices, C = A B. 1. Implement the function with the prototype double *malloc matrix(int...

ENGI 398: Engineering Mathematics Project 3: Application of Singular Value Decomposition to Image Compression1 1 Background The singular value decomposition (SVD) is a fundamental concept in linear...

Using the example axpy computation problem(Provided under the problem) Write a C console progrm of the matrix multiplication algorithm ([A] * [B] = [C]) for DOUBLE precision data types. All matrices...

Scenario You and your group work for a multi-national consulting firm. Your team is a group of capital budget consultants and are seeking new engagements. You intend on responding to a Request for...

For continuous random variables X and Y , taking on continuous values x and y respectively with probability densities p(x) and p(y) and with joint probability distribution p(x, y) and conditional...

#ifndef MATRIX_TYPE #define MATRIX_TYPE #include using std::string; #include using std::vector; /* very convenient. If you want to change the type stored in the matrix, you only have to change the...

What ethnicity dominates in Vietnam?

Part 2 Management Assertions (16 points)

A project costing US$ 3 0 0 , 0 0 0 has an IRR of 1 0 % . Which of the following NPVs ( in US$ ) would this project return using a discount rate of 1 0 % ? Question 2 Answer a . ( $ 2 5 , 0 0 0 ) - a...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

5. Determine the training costs (direct costs indirect costs development costs overhead costs compensation for trainees).

1. What can be done to motivate companies to evaluate training programs?

6. Calculate the total savings by subtracting the training costs from benefits (operational results).