Recall that in lectures we showed that the Logistic Regression for binary classification boils down to solving the following optimization problem (training error) over n training samples: n T f(w) = Σlog (1 + e¯Yiw™; Xi -4:20-21) i=1 a) Compute the gradient of f(w). b) Please write the pseudocode for using GD to optimize the f(w). c) Argue that

