Question: Now implement sqsplit, which takes as input a data set of size n d with labels and computes the best feature and the threshold /

Now implement sqsplit, which takes as input a data set of size nd with labels and computes the best feature and the threshold/cut of the optimal split based on the squared loss impurity. The function outputs a feature dimension 0<= feature < d, a cut threshold cut, and the impurity loss bestloss of this best split.
Recall in the CART algorithm that, to find the split with the minimum impurity, you iterate over all features and cut values along each feature. We enforce that the cut value be the average of the two consecutive data points' feature values.
You should calculate the impurity of a node of data S with two branches SL and SR as:
()=||||()+||||()=1||(,) in ()2+1||(,) in ()2(,) in ()2+(,) in ()2I(S)=|SL||S|I(SL)+|SR||S|I(SR)=1|S|(x,y) in SL(yySL)2+1|S|(x,y) in SR(yySR)2(x,y) in SL(yySL)2+(x,y) in SR(yySR)2
Implementation Notes:
For calculating the impurity of a node, you should just return the sum of left and right impurities instead of the average.
Returned feature must be 0-indexed as is consistent with programming in Python.
If along a feature f, two data points xi and xj have the same value, avoid splitting between them; move to the next pair of data points.
For example, with the following xTr of size 4343 and yTr for 4 points:
120200012112,1111[102201001212],[1111]
among possible features[0,1,2], the best split would be atfeature =1andcut =(0+1)/2=0.5.
If you're stuck, we recommend that you start with the nave algorithm for finding the best split, which involves a double loop over all features 0<= f < d and all cut values xTr[0, f]<(xTr[i, f]+ xTr[i+1, f])/2< xTr[n-1, f](with xTr sorted along feature f). This algorithm thus calculates impurities for d(n-1) splits.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!