Question: Description Dear Participants, Please find below the Predictive Modelling project instructions: You have to submit 2 files or 1 file with both problems: Answer Report
Description
Dear Participants,
Please find below the Predictive Modelling project instructions:
You have to submit 2 files or 1 file with both problems: Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. Your answer should include detailed explanations & inferences to all the questions. Your report should not be filled with codes. You will be evaluated based on the business report.
Note: In the business report, there should be a proper interpretation of all the tasks performed along with actionable insights. Only the presence of interpretation of the models is not sufficient to be eligible for full marks in each of the criteria mentioned in the rubric. Marks will be deducted wherever inferences are not clearly mentioned. THE REPORT HAS TO BE STRICTLY SUBMITTED IN A PDF/DOC FORMAT. ANY OTHER FORMAT WILL NOT BE CONSIDERED FOR GRADING. 6 Marks are allotted for the "Quality of Business Report".
Jupyter Notebook file: This is a must and will be used for reference while evaluating. Any assignment found copied/ plagiarized with another person will not be graded and marked as zero. Please ensure timely submission as a post-deadline assignment will not be accepted.
Problem 1: Linear Regression
The comp-activ databases is a collection of a computer systems activity measures . The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user university department. Users would typically be doing a large variety of tasks ranging from accessing the internet, editing files or running very cpu-bound programs.
As you are a budding data scientist you thought to find out a linear equation to build a model to predict 'usr'(Portion of time (%) that cpus run in user mode) and to find out how each attribute affects the system to be in 'usr' mode using a list of system attributes.
Dataset for Problem 1: compactiv.xlsx
DATA DICTIONARY: ----------------------- System measures used:
lread - Reads (transfers per second ) between system memory and user memory lwrite - writes (transfers per second) between system memory and user memory scall - Number of system calls of all types per second sread - Number of system read calls per second . swrite - Number of system write calls per second . fork - Number of system fork calls per second. exec - Number of system exec calls per second. rchar - Number of characters transferred per second by system read calls wchar - Number of characters transfreed per second by system write calls pgout - Number of page out requests per second ppgout - Number of pages, paged out per second pgfree - Number of pages per second placed on the free list. pgscan - Number of pages checked if they can be freed per second atch - Number of page attaches (satisfying a page fault by reclaiming a page in memory) per second pgin - Number of page-in requests per second ppgin - Number of pages paged in per second pflt - Number of page faults caused by protection errors (copy-on-writes). vflt - Number of page faults caused by address translation . runqsz - Process run queue size (The number of kernel threads in memory that are waiting for a CPU to run. Typically, this value should be less than 2. Consistently higher values mean that the system might be CPU-bound.) freemem - Number of memory pages available to user processes freeswap - Number of disk blocks available for page swapping. ------------------------ usr - Portion of time (%) that cpus run in user mode
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
