Question: Kernel on discrete sequence Thus far, we have dealt with kernel feature mappings that map real-valued inputs to some feature space. As we will see
Kernel on discrete sequence


Thus far, we have dealt with kernel feature mappings that map real-valued inputs to some feature space. As we will see in this question, it is also possible to define feature mappings on discrete objects such as sequences of tokens. This is especially useful in applications such as bioinformatics and natural language processing. Let S denote the set of strings (sequences) formed by concatenating elements of some alphabet V. For example, V might be the set {a, b, ..., x, [space}} in the case of the English alphabet, in which case S contains all strings obtained by concatenating these characters (e.g. "the", "cat", "the cat, etc.). For any 81, 82 ES, we define , if s1 = $2 (i.e. the two strings are the same) (S1, S2) = {0, otherwise. (a) (4 points) For any sequence s E S, let gn(s) denote the set of all length n sub- sequences of s. For example, 92(abcd) = {ab, bc, cd}. gnos We define the kernel function: K,(81, 82) = 2 2 8(, 2). - n Tegn (81) zgn (82) Express the kernel function using an appropriate feature mapping o(s). That is, define the feature mapping O(S) ER such that K, (S1, S2) = O($1)ols2). What is the dimension of the feature mapping space d? Hint: The description of O(s) should be brief. SVM
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
