Question: Q3. Add dialogue context data and features Adjust create_character_document_from_dataframe and the other functions appropriately so the data incorporates the context of the line spoken by
Q3. Add dialogue context data and features
Adjust create_character_document_from_dataframe and the other functions appropriately so the data incorporates the context of the line spoken by the characters in terms of the lines spoken by other characters in the same scene (immediately before and after). You can also use scene information from the other columns (but NOT the gender and character names directly).
The code to be changed is:
corpusVectorizer = DictVectorizer() # corpusVectorizor which will just produce sparse vectors from feature dicts
# Any matrix transformers (e.g. tf-idf transformers) should be initialized here
def create_document_matrix_from_corpus(corpus, fitting=False):
"""Method which fits different vectorizers
on data and returns a matrix.
Currently just does simple conversion to matrix by vectorizing the dictionary. Improve this for Q3.
::corpus:: a list of (class_label, document) pairs.
::fitting:: a boolean indicating whether to fit/train the vectorizers (should be true on training data)
"""
# uses the global variable of the corpus Vectorizer to improve things
if fitting:
corpusVectorizer.fit([to_feature_vector_dictionary(doc) for name, doc in corpus])
doc_feature_matrix = corpusVectorizer.transform([to_feature_vector_dictionary(doc) for name, doc in corpus])
#training_feature_matrix[0].toarray()
return doc_feature_matrix
training_feature_matrix = create_document_matrix_from_corpus(training_corpus, fitting=True)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
