Question: ( a ) Please identify the appropriate data transformation methods for the following situations. Give a brief description about your answers: [ 4 ] Consider

(a) Please identify the appropriate data transformation methods for the following situations.
Give a brief description about your answers:
[4] Consider a dataset containing information about student performance in two
subjects: Math and English. The Math scores range from 35 and 200(mean =88,
standard deviation =18), while the English scores range from 84 to 112(mean =
98.6, standard deviation =0.95).
For each feature, apply normalization (transformed data has: x'in[0,1]) and
calculate the new mean and new standard deviation of the normalized feature.
Compare their means and standard deviations. And
for each feature, apply standardization to it and show the range of transformed
data and compare their ranges.
[4] During the design of an artificial neural network, we sometimes need to transform
a variable x that has a range of (-,) to an open set zin(-1,1). Note that
z monotonically increases as x increases in this transformation. Please specify a
proper function for such transformation.
(b) In natural language processing (NLP), there are diverse ways to represent words such
as one-hot encoding, bag of words, TF*IDF, and distributed word representations. In
one hot encoding, a bit vector whose length is the size of the vocabulary of words is
created, where only the associated word bit is on (i.e.,1) while all other bits are off (i.e.,
0). Here is a toy example: suppose there is a 5-dimensional feature vector to represent
a vocabulary of five words: [king, queen, man, woman, power]. In this case, 'king' is
encoded into 1,0,0,0,0, 'queen' is encoded into 0,1,0,0,0, etc. Due to the nature of this
representation, the feature vector encodes the vocabulary of a sentence where all words
are equally distant. On the other hand, in distributed word vectors, a real-valued
vector whose length is defined by some common properties of words is created, then
each word can be represented as a linear combination of the defined properties. Using
the toy example above, given a 3-dimensional feature vector of [man, woman, power] as
the common properties, then words such as 'king', 'queen', 'man', and 'woman' could be
encoded into 0.98,0.1,0.8
0,0.99,0.85
0.9,0,0.5, and 0,0.97,0.5, respectively.
In this case, if you subtract a vector of 'man' from a vector of 'king', and add a vector
of 'woman', then you will get a vector close to a vector of 'queen'.
[4] What is a major advantage/disadvantage of one hot encoding as compared to
distributed word vectors. Briefly justify your answer.
[4] What is a major advantage/disadvantage of distributed word vectors as com-
pared to one hot encoding. Briefly justify your answer.
( a ) Please identify the appropriate data

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!