Suppose you make two CNN architectures, one with 1 5 layers ( A ) and another with 3 5 layers ( B ) , and train them both ( with identical infrastructure, training scheme etc ) on a 1 0 class classification task The performance of the models is as follows Model A Training accuracy 8 5 , validation accuracy 8 2 Model B Training accuracy 7 8 , validation accuracy 7 3 Which of the following could be a possible explanation for these results Model B , being huge, is overfitting and is thus not performing well Model B , being huge, is difficult to train ( because of issues such as vanishing exploding gradient ) Model B is underfitting, perhaps because larger networks sometimes have low 'learning capacity'

Question

Suppose you make two CNN architectures, one with 1 5 layers ( A ) and another with 3 5 layers ( B ) , and train them both ( with identical infrastructure, training scheme etc  ) on a 1 0   class classification task  The performance of the models is as follows  Model   A  Training accuracy   8 5   , validation accuracy   8 2   Model   B  Training accuracy   7 8   , validation accuracy   7 3   Which of the following could be a possible explanation for these results  Model   B , being huge, is overfitting and is thus not performing well Model   B , being huge, is difficult to train ( because of issues such as vanishing   exploding gradient ) Model   B is underfitting, perhaps because larger networks sometimes have low 'learning capacity'

SolutionInn · Accepted Answer

The Answer is in the image, click to view ...

Question: Suppose you make two CNN architectures, one with 1 5 layers ( A ) and another with 3 5 layers ( B ) , and

Step by Step Solution

Students Have Also Explored These Related Programming Questions!