Question: Note : - Use python to solve the question - Also prove the simulation you did in question Number 3 by using coding. ====PLEASE ANSWER

Note : - Use python to solve the question

- Also prove the simulation you did in question Number 3 by using coding.

====PLEASE ANSWER Q3 ONLY====

-Data and Text mining-

Doc1: When someone is infected with the Coronavirus, the symptoms that appear can range from fever to loss of the ability to smell and taste. However, there are also those who do not experience symptoms of COVID-19 at all which is called asymptomatic.

Doc2: The loss of the ability to smell or anosmia has been a symptom of COVID-19 for a long time. Recently, a new symptom appeared in the form of parosmia, which is a condition in which patients detect bad odors through their sense of smell.

Doc3: The new variant of Corona found in the UK is said to be more contagious and is spreading rapidly to various countries. To date, 22 countries have detected a new variant of Corona in their region.

Doc4: Indonesia is one of the largest archipelagic countries in the world. Consisting of more than 17,000 islands stretching from Sabang to Merauke, hold priceless assets of wealth. Thousands of islands are lined up to form an elongated coastline with a very attractive stretch of clean white sand. Rolling waves ranging from small waves to large waves suitable for surfing sports lovers are all available in Indonesia.

Doc5: Developing the potential for cultural and historical tourism can indeed be done by renovating buildings or historical sites and supporting facilities and infrastructure for these attractions. However, this effort cannot run optimally and sustainably if the community especially the local community does not participate and care.

Based on the data above:

Q1 ) Calculate the following TF-IDF in each document:

  • Symptoms

  • Corona

  • Virus

  • Covid-19

  • Country

  • Public / society

Q2) If someone performs a query using the keyword Symptoms of Covid-19 then only Doc1 and Doc2 are relevant. Meanwhile, if you use the keyword "Corona Variant" then only Doc1 and Doc3 are relevant. By using the TF-IDF and Cosine Similarity methods, prove it!

Q3 )

a) Apart from being a text representation method, TF-IDF can also be used to extract topics from individual documents. What are the five keywords that represent each of the documents above?

b) Explain and show photos of simulation evidence, how can TF-IDF be used to filter stopwords?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!