Consider a dataset with data points each having 3 features, e.g., x 1 = { Atlanta ,

Fantastic news! We've Found the answer you've been seeking!

Question:

Consider a dataset with data points each having 3 features, e.g., x 1 = {"Atlanta", "house", 500k},

and x 2 = {"Houston", "house", 300k}. Define a proper similarity function d(x i , xj ) for this kind of data,

and argue why it is a reasonable choice. (Hint: The feature vector consists of categorial and real-valued features; for categorical variables, it is better to convert them into one-hot-keying binary vectors and use Hamming distance, and for real-valued features, you may use Euclidean distance, for instance. And then you can combine the similarity measure in some way.)