Question: Consider a dataset with data points each having 3 features, e . g . , x 1 = { Atlanta , house, 5

Consider a dataset with data points each having 3 features, e.g., x1={"Atlanta", "house", 500k}, and x2={"Houston","house",300k}. Define a proper similarity function d(xi, xj) for this kind of data, and argue why it is a reasonable choice. (Hint: The feature vector consists of categorial and real-valued features; for categorical variables, it is better to convert them into one-hot-keying binary vectors and
use Hamming distance, and for real-valued features, you may use Euclidean distance, for instance. And then you can combine the similarity measure in some way.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!