Question: --- output: html_document: default pdf_document: default --- --- title: 'Twitter Text Mining for LLBean Social Media Content Marketing' subtitle: author: date: `r format(Sys.time(), '%d %B

--- output: html_document: default pdf_document: default ---

--- title: 'Twitter Text Mining for LLBean Social Media Content Marketing' subtitle: author: date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document ---

# Business Interests and Project Objectives

**Instruction:** *I have completed this section for you. So, you don't need to work on this section.*

## Business Interests

The COVID-19 pandemic has had a significant impact on the retail industry, causing a sharp decline in traditional retail channels and a substantial increase in online shopping. Consequently, retailers are increasingly turning to online marketing to attract customers. In the digital environment, traditional advertising methods are becoming less effective, while a revolutionary strategy known as content marketing is gaining traction.

*Content Marketing* is a strategic marketing approach focused on creating and distributing valuable, relevant, and consistent content on popular online platforms to attract and retain large audiences, ultimately driving profitable customer actions. Twitter is one such platform that retailers prioritize due to its unique ability to promote open communication and massive multicasting through its extensive user network, accessible to everyone. According to a study by Twitter, 69% of people make a purchase because, at least in part, they learned about it from a tweet. As a result, retailers engaging in content marketing on Twitter aim to craft product-related tweets that encourage users to take purchasing actions or participate in broadcasting the tweets to a wider audience (a process known as information diffusion, based on Everett Rogers' *Theory of Diffusion of Innovations*).

Twitter offers users three methods (retweet, mention, and reply) to engage in group actions that promote information diffusion. Among these methods, retweeting allows users to share a tweet directly with the Twitter public, signaling positive endorsement and content credibility. Research literature has shown that retweeting is the most promising method for generating effective and long-lasting information diffusion on Twitter. Consequently, retailers are eager to learn how to create product-related tweets that inspire Twitter users to retweet messages to as many users as possible, ultimately fostering large-scale information virality.

## Project Objectives

In this project, the primary objective is to offer insightful recommendations for enhancing marketing strategies related to crafting LLBean product tweets, ultimately leading to more effective information dissemination within the Twitter retweet network. By analyzing existing tweet content and user engagement patterns, you will identify key factors that contribute to increased retweets and information virality, providing valuable insights for the improvement of LLBean's content marketing on Twitter.

# Project Data Source

**Instruction:** *I have completed this section for you. So, you don't need to work on this section.*

A total of 276,843 tweets were retrieved from Twitter, containing information such as date, time, message, hashtags used, photos, and more. Each tweet message includes the phrase llbean. The original data collection has been processed to retain only English-language tweets and subsequently divided into three separate datasets.

The first dataset comprises the hashtags used in all the tweets, listed individually under separate columns. The table below displays six representative rows from this dataset: ```{r, echo=FALSE, error=FALSE, message=FALSE, warning=FALSE} # import data install.packages("readr") library(readr) LLBean_hashtags <- read_csv("https://filedn.com/lJpzjOtA91quQEpwdrgCvcy/Business%20Data%20Mining%20and%20Knowledge%20Discovery/Datasets/LLBean_hashtags.csv")

# wrangle data install.packages(c("dplyr", "splitstackshape")) library(dplyr) library(splitstackshape) LLBean<-LLBean_hashtags%>%filter(language == c('en'))%>% mutate(hashtags = gsub("\\[|\\]|'", "", hashtags))%>% cSplit("hashtags", ",") LLBean_assoc<-LLBean%>%filter(hashtags_01!=" ")%>%select(-c(language)) head(LLBean_assoc) ```

The second dataset contains numerical measurements about the collected tweets, including:

* retweet_ind: 1 = message is retweeted; 0 = message is not retweeted

* tweet_length: the length of message

* url_ind: 1 = message includes a hyperlink; 0 = message does not include a hyperlink

* hashtags_count: the number of hashtags used in message

* video: 1= message includes a video; 0 = message does not include a video

The table below showcases six representative rows from this dataset: ```{r, echo=FALSE, error=FALSE, message=FALSE, warning=FALSE} # import data install.packages("readr") LLBean_retweet <- read_csv("https://filedn.com/lJpzjOtA91quQEpwdrgCvcy/Business%20Data%20Mining%20and%20Knowledge%20Discovery/Datasets/LLBean_retweet.csv")

# wrangle data install.packages(c("stringr","stringi")) LLBean1<-LLBean_retweet%>%filter(language == c('en'))%>% mutate(hashtags = gsub("\\[|\\]", "", hashtags), urls=gsub("\\[|\\]", "", urls))

library(stringr) library(stringi) LLBean2<-LLBean1%>% mutate( tweet_length=str_length(tweet), url_ind=ifelse(str_length(urls)==0, 0, 1), hashtags_count=ifelse(str_length(hashtags)==0,0,stri_count_fixed(hashtags, ",") + 1), retweet_ind=as.numeric(retweets_count>0))%>%select(-language)

LLBean_regression<-LLBean2%>% select(retweet_ind, tweet_length, url_ind, hashtags_count, video) head(LLBean_regression) ```

The third dataset contains all the English-language tweet messages (with special characters and icons removed) and their corresponding number of retweets. The table below presents six representative rows from this dataset: ```{r, echo=FALSE, error=FALSE, message=FALSE, warning=FALSE} # import the Twitter data LLBean_text<-read_csv("https://filedn.com/lJpzjOtA91quQEpwdrgCvcy/Business%20Data%20Mining%20and%20Knowledge%20Discovery/Datasets/LLBean_project.csv")

LLBean_text<-LLBean_text %>% filter(language=="en") %>% select(-language) %>% select(tweet, retweets_count)

# convert text encoding from "UTF-8" into "ASCII" to remove icons. LLBean_text$tweet_clean <- iconv(LLBean_text$tweet, from="UTF-8", to="ASCII", sub="") # remove hyperlinks LLBean_text$tweet_clean <- gsub("https\\S*", "", LLBean_text$tweet_clean) # remove cashtags LLBean_text$tweet_clean <- gsub("?\\$\\w+ ?", "", LLBean_text$tweet_clean) # remove hashtags LLBean_text$tweet_clean <- gsub("?\\#\\w+ ?", "", LLBean_text$tweet_clean) # remove mentions LLBean_text$tweet_clean <- gsub("?\\@\\w+ ?", "", LLBean_text$tweet_clean) # remove ampersand (&), carriage return character( ), and line break character( ) LLBean_text$tweet_clean <- gsub("amp", "", LLBean_text$tweet_clean) LLBean_text$tweet_clean <- gsub("[ ]", "", LLBean_text$tweet_clean) # remove punctuations LLBean_text$tweet_clean <- gsub("[[:punct:]]", "", LLBean_text$tweet_clean) # remove numbers LLBean_text$tweet_clean <- gsub('[[:digit:]]+', "", LLBean_text$tweet_clean) # remove all other special characters LLBean_text$tweet_clean <- gsub("(RT|via)((?:\\b\\w*@\\w+)+)","", LLBean_text$tweet_clean) # remove extra spaces between characters LLBean_text$tweet_clean <- trimws(gsub("\\s+", " ", LLBean_text$tweet_clean)) LLBean_text<-LLBean_text%>% select(tweet_clean, retweets_count) head(LLBean_text) ```

# Data Analysis

## Hashtags Basket Analysis

**Instruction:** *Execute the next two chunks of code. In your report, outline the analysis process and describe the methods used. After discussion within your group, select the three most meaningful association rules that all group members agree upon. Explain the rationale behind choosing these three rules and their significance. Ensure that your description and explanation are seamlessly integrated into the overall report, maintaining a professional tone rather than merely responding to a homework prompt.*

```{r, echo=FALSE, message=FALSE, warning=FALSE, results='hide'} write.csv(LLBean_assoc, "LLBean_assoc.csv") install.packages("arules") library(arules) # read the dataset "LLBean_assoc" of baskets database format LLBean_tr<-read.transactions("LLBean_assoc.csv", header=TRUE, format='basket', sep=',', rm.duplicates = TRUE)

# complete the following R command to generate association rules with minimum support of 0.1 # and minimum confidence of 0.5. Also, all the rules must have the hashtag "llbean" appearing # on the right-hand side. llbean.rhs.rules<-apriori(LLBean_tr, parameter=list(support=0.1,confidence=0.5), appearance = list(default="lhs",rhs="llbean")) ```

```{r, echo=FALSE, message=FALSE, warning=FALSE} install.packages("arulesViz") library(arulesViz) # create a interactive network diagram showing all the generated rules. set.seed(2020) plot(llbean.rhs.rules, method="graph", engine = "htmlwidget") ```

## Retweetability Key Factor Analysis

**Instruction:** *Complete and execute the following code block to build a logistic regression model. In your report, discuss whether it exhibits multicollinearity, a critical concern for regression models. Additionally, present the regression model equation in the format illustrated in the lesson section of "Logistic Linear Regression Model: Business Case," representing the risk scoring model. Identify the TOP TWO most crucial predictors in the regression model and explain the generalized meanings of their regression coefficients. Finally, provide commentary on the model's performance. Importantly, your explanation, discussion and comment are expected to be one integrated part of a professional report, not just a answer to homework question.*

```{r echo=FALSE, message=FALSE, warning=FALSE} LRformula<-retweet_ind~tweet_length+url_ind+hashtags_count+video LLBean_LR<-glm(LRformula, data=LLBean_regression, family=binomial(link = logit))

install.packages("car") library(car) vif(LLBean_LR) LLBean_LR

install.packages("DescTools") library(DescTools) Cstat(LLBean_LR) install.packages("caret") library(caret) varImp(LLBean_LR, scale = TRUE)

LR_imp <- as.data.frame(varImp(LLBean_LR, scale = TRUE)) LR_imp <- data.frame(names = rownames(LR_imp), Overall = LR_imp$Overall) LR_imp[order(LR_imp$Overall,decreasing = T),]

confint(LLBean_LR, ______, level=0.95) confint(LLBean_LR, ______, level=0.95)

Cstat(LLBean_LR)

```

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!