Question: Suppose you make two CNN architectures, one with 1 5 layers ( A ) and another with 3 5 layers ( B ) , and

Suppose you make two CNN architectures, one with 15 layers (A) and another with 35 layers (B), and train them both (with identical infrastructure, training scheme etc.) on a 10-class classification task. The performance of the models is as follows:
Model-A: Training accuracy =85%, validation accuracy =82%
Model-B: Training accuracy =78%, validation accuracy=73%
Which of the following could be a possible explanation for these results?
Model-B, being huge, is overfitting and is thus not performing well
Model-B, being huge, is difficult to train (because of issues such as vanishing/exploding gradient)
Model-B is underfitting, perhaps because larger networks sometimes have low 'learning capacity'

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!