Question: --- output: html_document: default pdf_document: default --- --- title: 'Twitter Text Mining' author: - FirstName LastName date: `r format(Sys.time(), '%d %B %Y')` output: pdf_document ---

--- output: html_document: default pdf_document: default ---

--- title: 'Twitter Text Mining' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document --- # Problem The aim of this assignment is to, through text mining, develop better understanding of information dissemination about LLBean and its products on Twitter. This task is expected to provide value information to support decision makings on social media marketing.

# Data ## Step 1: Data Collection and Retrieval ```{r message=FALSE, warning=FALSE, results='hide'} # install and load all the needed R packages # install.packages(c("tidyverse", "tidytext", "wordcloud", "igraph","ggraph", "tidygraph"), repos='http://cran.us.r-project.org') library(igraph) library(ggraph) library(tidyverse) library(tidytext) library(wordcloud) library(igraph) library(ggraph) library(tidygraph) ```

```{r message=FALSE, warning=FALSE} # import the Twitter data LLBean_complete<-read_csv("https://filedn.com/lJpzjOtA91quQEpwdrgCvcy/Business%20Data%20Mining%20and%20Knowledge%20Discovery/Datasets/LLBean.csv") LLBean_complete<-LLBean_complete %>% filter(language=="en") %>% select(-language) ```

In below, Please write all "your answers" in **Bold** font.

*Problem 1: In the following chunk, use a simple R command to count the total number of Tweet messages contained in the LLBean dataset* ```{r message=FALSE, warning=FALSE}

```

**Because this Twitter data is too big to be processed on RStudio Cloud, we randomly select a small subset and continue our assignment with the subset**

```{r message=FALSE, warning=FALSE} LLBean<-LLBean_complete %>% sample_n(10000) ```

## Step 2: Cleaning and Parsing ### Text Cleaning ```{r message=FALSE, warning=FALSE} LLBean$tweet_clean <- iconv(LLBean$tweet, from="UTF-8", to="ASCII", sub="") LLBean$tweet_clean <- gsub("https\\S*", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("?\\$\\w+ ?", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("?\\#\\w+ ?", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("?\\@\\w+ ?", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("amp", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("[ ]", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("[[:punct:]]", "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub('[[:digit:]]+', "", LLBean$tweet_clean) LLBean$tweet_clean <- gsub("(RT|via)((?:\\b\\w*@\\w+)+)","", LLBean$tweet_clean) LLBean$tweet_clean <- trimws(gsub("\\s+", " ", LLBean$tweet_clean)) ```

*Problem 2: Open the "LLBean" dataset in the "Environment" panel (click the Data name in the panel) to compare the column "tweet" with the column "tweet_clean" and then summarize how the above R code clean the tweet messages?**

**Your answer:( )**

### Text Parsing (creating the tidy data) ```{r message=FALSE, warning=FALSE} LLBean_tibble <- tibble(line = 1:nrow(LLBean), text = LLBean$tweet_clean) LLBean_tibble

LLBean_tidy<-LLBean_tibble%>%unnest_tokens(word, text) LLBean_tidy ```

# Analysis ## Step 3: Data Analysis and Information Learning ### Word Clouds ```{r message=FALSE, warning=FALSE} # remove all the stop words words_count<-LLBean_tidy %>% count(word, sort = TRUE) %>% filter(!word %in% stop_words$word) %>% filter(!word %in% c("llbean", "bean", "im", "ll", "check", "size", "items")) # we are also not interested in some additional terms because they are highly frequent words but conveying less valuable information in the word cloud. ```

```{r message=FALSE, warning=FALSE} # find 99% percentile of the words_count frequency and create the network visualization only for the top 1% most frequent terms. q99<-as.integer(quantile(words_count$n, 0.99)) words_count %>% with(wordcloud(word, n, min.freq=q99, random.order=FALSE, colors=rev(colorRampPalette(brewer.pal(9,"Set1"))(32)[seq(8,32,6)]))) ```

*Problem 3: What valuable information can you learn from this word cloud?*

**Your answer: ( )**

### Bi-grams Analysis ```{r message=FALSE, warning=FALSE} LLBean_bigrams<-LLBean_tibble%>%unnest_tokens(bigram, text, token = "ngrams", n = 2)

# separate each pair of two terms into two columns term1 and term2 LLBean_separated <- LLBean_bigrams %>% separate(bigram, c("term1", "term2"), sep = " ") # remove stop words and one additional word from term1 and term2 columns LLBean_filtered <- LLBean_separated %>% filter(!term1 %in% stop_words$word) %>% filter(!term2 %in% stop_words$word) %>% filter(!term1 %in% c("size")) %>% filter(!term2 %in% c("size")) ```

### Term Association ```{r message=FALSE, warning=FALSE} # find the second terms most frequently following llbean LLBean_filtered %>% filter(term1 == "llbean") %>% count(term2, sort = TRUE) ```

*Problem 4: What valuable information can you learn from the term associations summarized above?*

**Your answer: ( )**

```{r message=FALSE, warning=FALSE} # find the first terms most frequently followed by llbean LLBean_filtered %>% filter(term2 == "llbean") %>% count(term1, sort = TRUE) ```

*Problem 5: What valuable information can you learn from the term associations summarized above?*

**Your answer: ( )**

### Semantic Network Analysis ```{r message=FALSE, warning=FALSE} # we further filter out "llbean" since it is well expected to be a major hub but does not convery very valuable information LLBean_filtered_SNA<-LLBean_filtered %>% filter(!term1 %in% c("llbean", "ll", "bean")) %>% filter(!term2 %in% c("llbean", "ll", "bean")) LLBean_filtered_SNA_count<-LLBean_filtered_SNA %>% count(term1, term2, sort = TRUE) ```

#### Automated Network Creation ```{r message=FALSE, warning=FALSE} # find 99% percentile of the bigrams frequency and create the network visualization only for the top 1% most frequent bigrams. q99<-as.integer(quantile(LLBean_filtered_SNA_count$n, 0.99)) LLBean_filtered_SNA_count_top<-LLBean_filtered_SNA_count %>% filter(n > q99) nrow(LLBean_filtered_SNA_count_top) ```

```{r message=FALSE, warning=FALSE, fig.width = 14, fig.height = 14} set.seed(2023) # this function normalize any scale into a scale of 0 to 1 normalize <- function (x, from = range(x), to = c(0, 1)) { x <- (x - from[1])/(from[2] - from[1]) if (!identical(to, c(0, 1))) { x <- x * (to[2] - to[1]) + to[1] } x }

# create the graph for top bigrams and add popularity of node bigram_graph <- as_tbl_graph(LLBean_filtered_SNA_count_top) %>% # calculate pagerank centrality score as the popularity mutate(Popularity = centrality_pagerank())

# normalize the popularity value for better visualization effect V(bigram_graph)$Popularity <- normalize(V(bigram_graph)$Popularity, to = c(1, 8))

# create semantic network visualization ggraph(bigram_graph, "fr") + geom_edge_link( aes(end_cap = circle(node2.Popularity + 2, "pt")), edge_colour = "gray", arrow = arrow( angle = 10, length = unit(0.1, "inches"), ends = "last", type = "closed") ) + geom_node_point( aes(size= I(Popularity), alpha=Popularity), col = "red", show.legend = FALSE ) +geom_node_text(aes(label = name))+ theme_graph() ```

#### Semantic Analysis *Problem 6: What valuable information can you learn from the semantic network created above?* (**after knitting the file to a html file, set the browser zoom level to 200% or even higher for better visibility**)

**Hint:** *Which nodes are the top hubs based on their PageRank score. Then, how are these top hubs connected with one another through some interesting paths passing through several other nodes? Also, are there other interesting paths passing through some nodes in this network?*

**Your answer:( )**

# Discussion *Reflect on the ways in which the text mining results could contribute to the development of an enhanced social media marketing strategy for the company. Although this analysis won't be detailed here, you will have the opportunity to collaborate with your project team, allowing you to work together with your teammates to further examine this aspect and devise comprehensive marketing approaches.*

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

--- output: html_document: default pdf_document: default --- --- title: 'Twitter Text Mining for LLBean Social Media Content Marketing' subtitle: author: date: "`r format(Sys.time(), '%d %B %Y')`"...

*It says my question has been answered but I don't see the answer *THE ONES IN BOLD ARE THE QUESTIONS YOU NEED TO ANSER --- output: html_document: default pdf_document: default --- --- title:...

--- output: html_document: default pdf_document: default --- --- title: 'Twitter Retweetability Analysis' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: "`r format(Sys.time(),...

USING R --- output: html_document: default pdf_document: default --- --- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName date: "`r...

--- output: pdf_document: default html_document: default --- --- title: 'Twitter Hashtags Basket Analysis' subtitle: 'BUA684 Module 1' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B...

**PLEASE PROVIDE ANSWERS TO 8,9,10,11 ** --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684 Module 3'...

This is the previous question I submitted. I need Answer to Problem 8 , 9 , 10 and 11 now please *Problem 8: In the following chunk, estimate the possible range of regression coefficient estimate for...

--- output: pdf_document: default html_document: default --- --- title: 'Universal Bank Personal Loan Acceptance' subtitle: ' BUA Assignment' author: - FirstName LastName date: "`r format(Sys.time(),...

title: "MA678 homework 01" author: "yourname" date: "Septemeber 12, 2017" output: pdf_document: default html_document: default --\ ewcommand{\\mat}[1]{\\boldsymbol{#1}} % vectors and matrices \...

Find the transfer function Io(Ï) / Is (Ï ) for the circuit in Fig. 18.39. igt) 2 1 H 4 4(t)

Jason is the project manager for a manufacturing company. After several weeks of planning sessions, a team member informs him that a part of the production work needs to be outsourced to another...

36. Calculating Rates of Return (LO2) Suppose an investment offers to quadruple your money in 12 months (don't believe it). What rate of return per quarter are you being offered?

Bank of Canada recently acknowledges that we are fast moving in the direction of a cashless society where payment options will generally include: A) Crypto B Square C Paypal D None of the choices...