Question: For this homework assignment, we will build and train a simple neural network, using the famous iris dataset. We will take the four variables `Sepal.Length`,

For this homework assignment, we will build and train a simple neural network, using the famous "iris" dataset. We will take the four variables `Sepal.Length`, `Sepal.Width`, `Petal.Length`, and `Petal.Width` to create a prediction for the species.

We will train the network using gradient descent.

## Task 0:

Split the iris data into a training and testing dataset. Scale the data so the numeric variables are all between 0 and 1.

```{r}

# split between training and testing data

set.seed(1)

n <- dim(iris)[1]

rows <- sample(1:n, 0.8 * n)

train <- iris[rows,]

test <- iris[-rows,]

# code here for task 0

train$Sepal.Length <- train$Sepal.Length/max(train$Sepal.Length)

train$Sepal.Width <- train$Sepal.Width/max(train$Sepal.Width)

train$Petal.Length <- train$Petal.Length/max(train$Petal.Length)

train$Petal.Width <- train$Petal.Width/max(train$Petal.Width)

test$Sepal.Length <- test$Sepal.Length/max(test$Sepal.Length)

test$Sepal.Width <- test$Sepal.Width/max(test$Sepal.Width)

test$Petal.Length <- test$Petal.Length/max(test$Petal.Length)

test$Petal.Width <- test$Petal.Width/max(test$Petal.Width)

```

## Setting up our network

Our neural network will have four neurons in the input layer - one for each numeric variable in the dataset. Our output layer will have three outputs - one for each species. There will be a `Setosa`, `Versicolor`, and `Virginica` node. When the neural network is provided 4 input values, it will produce an output where one of the output nodes has a value of 1, and the other two nodes have a value of 0. This is a similar classification strategy we used for the classification of handwriting digits.

I have arbitrarily chosen to have three nodes in our hidden layer.

We will add bias values before applying the activation function at each of our nodes in the hidden and output layers.

We will define each matrix of values as follows:

$W^{(1)}$ the weights applied to the input layer.

$B^{(1)}$ are the bias values added before activation in the hidden layer.

$W^{(2)}$ the weights applied to the values coming from the hidden layer.

$B^{(2)}$ are the bias values added before the activation function in the output layer.

$J$ is a matrix of 1s so that the bias values in B can be added to all rows.

### Sigmoid Activation function

We will use the sigmoid function as our activation function.

Express the forward propagation as R code using the training data. For now use random uniform values as temporary starting values for the weights and biases.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!