Question: 1. For this assignment, we will write a code which does a basic data processing pipeline having the following steps: Taking data from a file
1. For this assignment, we will write a code which does a basic data processing pipeline having the following steps: Taking data from a file and cleaning it. Modifying this data. Creating a predictor on future based on this data. Plotting the data, the predictor and its prediction. 2.The dataset you will be using is created using USD to TL exchange rate for the last 5 years starting from a given date, namely 01.01.2015, with the following modifications to make things more interesting: Some points are corrupted and replaced with random values which are called noise. All given points are deformed with some transformations beyond recognition. In other words, the data to be provided to you is just a very noisy 2D point cloud with X and Y values (columns). You are required to clean the noise and reverse all the deformations on the data before using it. Note that your submission will be tested with another dataset. Therefore, in your code, you should not make any assumptions based on the current dataset. You must implement your code for the general case as outlined below. Important Notes For this assignment you will implement the functions with empty bodies(read_and_filter, fix_deformation,fit_and_predict and plot). The test function is just to show you an example run of the application. You can change it as you like. Do not change the signature (function names and parameters) of the functions and be careful of their return types. You should implement the functions in the given order. Explanations of the Functions In this section, we give a detailed description for every function (arguments, return type and its task) that appears in the problem. 4.1 read_and_filter(filename, filter_limit) It takes a str argument which is called filename (a CSV file) and a float argument which is called filter_limit. It returns a Pandas Dataframe. It should read the CSV file with the given filename. The retrieved data should have these two columns in this order: X and Y. Furthermore, this function should remove any row from the dataset where the corresponding Y value is bigger than the given filter_limit. Lastly, it should return the filtered dataset. fix_deformation(dataframe) This function takes the dataframe (which is a Pandas DataFrame) which will be the output of the read_and_filter function. It should return a NumPy array with two columns. The first column will store the X values and the second column will store the Y values. The following deformations were performed on the original data in this order: Update X values by -1500. Scale X values by 0.01 (shrink 100 times). Rotate points 45 degrees clockwise around the origin. Update Y values by 5. Scale Y values by 0.5. In function fix_deformation(), you should reverse all these deformations by applying inverse of these transformations in the reverse order. The inversion of deformations should proceed like this: Inverting the scaling Y by 0.5; inverting the update of Y by 5; inverting rotation of points by 45 degrees clockwise around the origin, and so on. You will find the explanations of these transformations below. 4.2.1 Updating X or Y values Here, you must only update the values with the given amount on the given axis. For example, updating the X value of point (1,0) by 100 will yield point (101,0). The inverse of this transformation will just be updating the point in the opposite direction. Therefore, the inverse operation is just updating the X value by -100. 4.2.2 Scaling X or Y values For this transformation, you need to scale the values with the given factor on the given axis. For example, scaling the point (0.5,0) on the X axis by 7 will result in (3.5,0). Its inverse is scaling on X axis by 1/7. 4.2.3 Rotation Rotation is a little more complex than the others since the resulting values depend on both X and Y values. We can rotate a point (X, Y ) counter-clockwise around the origin with angle as follows to obtain new point ( X , Y ): X = X cos Y sin , Y = X sin + Y cos . To take an inverse of the rotation, you should calculate the rotation in the reverse direction. For example, the inverse of rotating 27 degrees in counter-clockwise is just 27 degrees in clockwise hence -27 degrees in counter-clockwise. A very crucial point is that the arguments of trigonometric functions in NumPy are in radians, not degrees. So, you should convert degrees (d) into radians (r) as follows: r = d 180 . 4.2.4 Imprecisions due to floating point precision loss At this point, after fixing every deformation, we are very close to the original data. X is the days passed since 01.01.2015 and Y is the exchange rate on that day. There is just one little thing we should fix before returning the result. If you inspect days (X values) youll notice that they have very small decimal parts (maybe even less than 106). This is due to the limitations of floatingpoint numbers and imprecisions arisen due to those limitations. Since days with a decimal part does not make any sense, you need to round the days to the closest integer value before returning. Note that the dataset you are given are not equally spaced, so you can see sequences like (0,1,4) in X when you fixed deformations. 4.3 fit_and_predict(dataset, day) This function takes a NumPy array (dataset see the Solution Template how this is linked to the other functions) and an int (the day of the wanted prediction). It returns a tuple with 3 elements all of which are float. The aim of this function is to find a line which best describes the given dataset, so that we can make some predictions about future values. In other words, we are interested in finding the following function: y = f(x) = + x. With this function, we can predict the y value given an x value. Finding this linear function can be formulated as finding the and values that minimize the following error: n i=1 [yi ( + xi)]2 . The minimizing values and are as follows: = y x, = n i=1(xi x)(yi y) n i=1(xi x)2 , where x and y are respectively the averages (mean) of x and y values. With these estimated and values, our model (predictor) becomes: y = + x. value should be the tuple (, , predicted value y). 4.4 plot(dataset, alpha, beta, prediction, day) This function takes the dataset output of fix_deformation(); , , and prediction output from the fit_and_predict() function and the day of the prediction. It does not return anything. Instead, it draws the dataset points, the line of the model using and , and the prediction point. When plotting: Label of X-axis should be Day. Label of Y-axis should be USD. The title of the plot should be Exchange Rate. The line using alpha and beta arguments which starts from point 0 towards the prediction point; while including both end-points. You can find the example output under the test() section below. Please Complete this code with phyton. The task will require pandas, numpy and matplotlib modules;........................................ import pandas as pd import numpy as np import matplotlib.pyplot as plt def read_and_filter(filename, filter_limit): #TODO Implement pass # pass is a placeholder statement that does nothing, you can remove after your implementation is complete. def fix_deformation(dataframe): #TODO Step 1: Invert Scaling Y by 0.5 #TODO Step 2: Invert Updating Y by 5 #TODO Step 3: Invert Rotating points 45 degrees clockwise #TODO Step 4: Invert Scaling X by 0.01 #TODO Step 5: Invert Updating X by -1500 #TODO Step 6: Round Days pass # pass is a placeholder statement that does nothing, you can remove after your implementation is complete. def fit_and_predict(dataset, day): #TODO Implement pass # pass is a placeholder statement that does nothing, you can remove after your implementation is complete. def plot(dataset, alpha, beta, prediction, day): #TODO Implement pass # pass is a placeholder statement that does nothing, you can remove after your implementation is complete. def test(): df = read_and_filter("data.csv", 11) ds = fix_deformation(df) a,b,p = fit_and_predict(ds, 2500) plot(ds, a, b, p, 2500) # Call the test function test()
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
