Question: Use R language code and English interpretation to answer all questions: Clustering Stock Returns When building portfolios of stocks, investors seek to obtain good returns

Use R language code and English interpretation to answer all questions: Clustering Stock Returns
When building portfolios of stocks, investors seek to obtain good returns while limiting the variability in those returns over time. This can be achieved by selecting stocks that show different patterns of returns. In this question, we will use clustering to identify clusters of stocks that have similar returns over time; an investor would select a diverse portfolio by selecting stocks from different clusters.
For this question, we will use the dataset NasdaqReturns.csv, which contains monthly stock returns from the NASDAQ stock exchange during 2000-2009. The companies selected in this dataset are limited to those that were listed on the stock exchange for this entire time period and whose stock price never fell below $1. The NASDAQ is the second-largest stock exchange in the world, and it lists many technology companies. The variables in the dataset are described in Table 2.
Table 2: Variables in the dataset NasdaqReturns.csv
Variable: StockSymbol / Industry /SubIndustry /Ret2000.01- Ret2009.12
1.Let us start by exploring the dataset.
(a) How many companies are there in this dataset? (2 points) How many companies are there in each of the industries? (2 points)
(b) In the aftermath of the dot-com bubble bursting in the early 2000s, the NASDAQ was quite tumultuous. In December 2000, how many stocks in this dataset saw their value increase by 10%(including 10%) or more? (2 points) Decrease by 10%(including -10%) or more?(2 points)
(c) Entering the Great Recession, most stocks lost significant value, but some sectors were hit harder than others. In October 2008, which 3 industries had the worst average return? (3points)
2.Let us now cluster the stocks according to the monthly returns. For the remainder of this question, make sure that you are just clustering the observations based on the variables Ret2000.01-Ret2009.12(i.e., StockSymbol, Industry, and SubIndustry should not be used to cluster the observations).(2 points)
(Hint: You can do this by creating a new data frame without irrelevant variables using the function within() we learned in the lecture Model selection.)
(a) In this analysis, we will not normalize our data prior to clustering. Why is this a valid approach for this question and dataset? (3 points)
(b) Cluster the data using Hierarchical clustering. (2 points) Clearly indicate which distance metrics you used for point distances and cluster distances. (2 points) Plot the resulting dendrogram. (2 points) What do you think are reasonable choices for the number of clusters to select, based on the dendrogram? (3 points) A further consideration for the stock selection problem is that we should include enough stocks to create our well-diversified portfolio. Based on the dendrogram and this specific concern, select a number of clusters to use for the rest of the question, and justify your choice. (3 points)
(c) Extract cluster assignments from your hierarchical clustering model, using the number of clusters you selected in (b).(2 points) Describe each cluster, using the number of observations in the cluster (3 points), the most common industry of the companies in the cluster (3 points), and the most common subindustry of the companies in the cluster (3 points).
(Hint: Since we never changed the order of the observations, you can create a data frame including the number of observations in each industry/subindustry that is counted by the function table()(recall what you learned in the 3rd tutorial). You can then use the order() function to sort this data frame in the order of frequency.)
(d) For some months, we expect there to be significant differences between the returns of stocks in different clusters. For February 2000, do some clusters have negative average returns while other clusters have positive average returns? (2 points) How about for March 2000?(2 points)
(e) Now run the K-means clustering algorithm on this data (when clustering, only use the variables Ret2000.01- Ret2009.12). You should select the same number of clusters that you used for Hierarchical clustering. (3 points) Extract cluster assignments from your K-means clustering model, and compare them to the Hierarchical cluster assignments by common industries. (3 points) Open-ended question: Are there any similar clusters? (1 point)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!