Question: need help in artificial intelligence assignment 1- Suppose that we want to classify patients to healthy or not healthy based on physicians' notes. For example,

need help in artificial intelligence assignment
1- Suppose that we want to classify patients to "healthy" or "not healthy" based on physicians' notes. For example, we have the following set of mini-notes, each labeled healthy (-1) or not healthy (+1). 1. (-1) negative results 2. (-1) good response 3. (+1) not good 4. (+1) negative reaction Each note x is mapped onto a feature vector (x), which maps each word to the number of occurrences of that word in the note. For example, the first note maps to the (sparse) feature vector (x)-(negative: 1, test: 1, results: 1]. Recall the definition of the hinge loss (Losshinge(x, y, w) = max(0,1-w-(x)y, where y is the correct label.) a) Suppose we run stochastic gradient descent, updating the weights according to w w-7P,,Losshinge (x, y,w), once for each of the four examples in order. After the classifier is trained on the given four data points, what are the weights of the six words ("'negative", "good", "results", "not", "response" "reaction") that appear in the above notes? Use 1 as the step size and initialize w = [0, ,0]. Assume that wLossh inge (x,y,w)-0 when the margin is exactly 1 b) Create a small labeled dataset of four mini-notes about patients test results using the words "not"', "negative", and "positive", where the labels make intuitive sense (Note that if the test results are negative, or not positive, it means that the patient is healthy). Each note should contain one or two words, and no repeated words. Prove that no linear classifier using word features can get zero error on your dataset. Propose a single additional feature that we could augment the feature vector with that would fix this problem. 1- Suppose that we want to classify patients to "healthy" or "not healthy" based on physicians' notes. For example, we have the following set of mini-notes, each labeled healthy (-1) or not healthy (+1). 1. (-1) negative results 2. (-1) good response 3. (+1) not good 4. (+1) negative reaction Each note x is mapped onto a feature vector (x), which maps each word to the number of occurrences of that word in the note. For example, the first note maps to the (sparse) feature vector (x)-(negative: 1, test: 1, results: 1]. Recall the definition of the hinge loss (Losshinge(x, y, w) = max(0,1-w-(x)y, where y is the correct label.) a) Suppose we run stochastic gradient descent, updating the weights according to w w-7P,,Losshinge (x, y,w), once for each of the four examples in order. After the classifier is trained on the given four data points, what are the weights of the six words ("'negative", "good", "results", "not", "response" "reaction") that appear in the above notes? Use 1 as the step size and initialize w = [0, ,0]. Assume that wLossh inge (x,y,w)-0 when the margin is exactly 1 b) Create a small labeled dataset of four mini-notes about patients test results using the words "not"', "negative", and "positive", where the labels make intuitive sense (Note that if the test results are negative, or not positive, it means that the patient is healthy). Each note should contain one or two words, and no repeated words. Prove that no linear classifier using word features can get zero error on your dataset. Propose a single additional feature that we could augment the feature vector with that would fix this
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
