Question: Image captioning using Deep Learning General Instructions: You are recommended to use Google Colab or Jupyter notebook. No need to upload data. If the pre-processed

Image captioning using Deep Learning

General Instructions:

You are recommended to use Google Colab or Jupyter notebook.

No need to upload data. If the pre-processed data, in the form of Python pickle or JSON, is used, this is allowed.

The URL of the source has to be clearly mentioned in the notebook and use Tensorflow and Keras only for Model building

You are expected to provide the Python notebook file (ipynb), and the pdf of the notebook showing the outputs clearly.

Task Response and Task Completion

All the models should be logically sound and have decent accuracy.

There are a lot of subparts, so answer each completely and correctly, as no partial marks will be awarded for partially correct subparts.

The model layers, parameters, hyperparameters, evaluation metrics, etc. should be properly implemented.

Please organize your code with correct line spacing and indentation, and add comments to make your code more readable.

Problem Statement:

Topic: Generate Image Captions using CNN+LSTM. (you can use pre-trained models)

Dataset Type: Common Objects in Context (COCO)

You can use any data source given below to use the dataset.

Data Source 1: https://cocodataset.org/#download

Data Source 2: https://www.tensorflow.org/datasets/catalog/coco

Data Source 3: https://www.kaggle.com/datasets/awsaf49/coco-2017-dataset

Definition: Image Captioning is the process of generating a textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.

Encoder is a CNN. The input image is given to CNN to extract the features. The last hidden state of the CNN is connected to the Decoder. Decoder is LSTM which does language modeling up to the word level. The first time step receives the encoded output from the encoder and also the START vector.

Steps to Perform in the Jupyter Notebook

1. Import Libraries/Dataset

Import the required libraries.

Check the GPU available (recommended - use free GPU provided by Google Colab). ii. Data Processing

Convert the data into the correct format which could be used for the DL model. Plot at least two samples and their captions (use matplotlib/seaborn/any other library).

Load the data into train and test data in the required format.

2. Model Building

Use any pre-trained model trained on the ImageNet dataset (available publicly on google) for image feature extraction.

Create a 2-layer LSTM model and other relevant layers for image caption generation.

Add one layer of dropout at the appropriate position and give reasons.

Choose the appropriate activation function for all the layers.

Print the model summary.

Justify the choice of a number of layers, activation function, and any other hyperparameters used.

3. Model Compilation

Compile the model with the appropriate loss function.

Use an appropriate optimizer.

Justify the choice of the learning rate, optimizer, loss function, and any other hyperparameter used.

4. Model Training

Train the model for an appropriate number of epochs.

Print the train and validation loss for each epoch. Use the appropriate batch size. Plot the loss and accuracy history graphs for both the train and validation sets.

Print the total time taken for training.

5. Model Evaluation

Take 5 random images from Google and generate a caption for that image.

Print confusion metrics and classification reports for the test data.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Goal: Develop an appreciation for using deep learning solutions for different types of problems (regression, binary classification, multiclass classification). Learning objectives: Learn how to...

1. When Algorithms Decide What You Pay YOU MAY NOT REALIZE IT, but every website you visit is created, literally, the moment you arrive. Each element of the page the pictures, the ads, the text, the...

to solve the below mentioned Deep Nural Network assignment step by step: Part A: Literature Exploration and Comparison (8 marks) Objective: Explore a specific application within a specific domain,...

A Journal Article Review for " The interaction between technology, business environment, society, and regulation in ICT industries". 1. Write the Title that reflects the main focus of your work. ......

Business Problem A new startup business, Ripe Pumpkins - a movie review-aggregation service, would like to implement Pumpkinmeter, a measurement of collaborative recommendation for millions of fans....

Need help with the document attached. The company that this paper is about is Nvidia. Project Descriptions SEC 10-K Paper You will be asked to select a company that is publically traded. You must...

Journal Article Review 1. Write Title that reflects the main focus 2. Cite the article 3. Article Identification 4. Introduction 5. Summarize the Article 6. Critique 7. Conclusion The interaction...

Yego Domestic: a bond is issued in the US Global: a bond is issued in the US and foreign markets Eurobonds: a bond denominated in USD is issued in the foreign market. You need to estimate the...

D O NO T Ta KE IF YOU CANNOT ANSWEAR ALL THE QUESTION AS SUPPOSED OR I WILL RATE UNHELPFUL AND REPORT FOR PLAG Peru Domestic: a bond is issued in the US Global: a bond is issued in the US and foreign...

Ethical and Sustainable Procurement with input from Leading global excellence in procurement and supply CONTENTS Introduc on 3 Considera ons when sourcing from another country 4 Organisa onal...

The statement of financial position of Kingbird Limited follows for the current year, 2020: KINGBIRD LIMITED Statement of Financial Position December 31, 2020 Current assets $135,660 Current...

As part of the Berkeley Guidance Study the heights (in cm) and weights (in kg) of 13 girls were measured at age 2 and again at age 9. Of course, the average height and weight were much greater at age...

Perch plc is a stock exchange listed business. Today the business's share price increased. What effect will this have on its P / E ratio ( market value per share / earnings per share ) and its...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

how would you have done things differently?

3. What information do participants need?

How would using this approach compare with using appreciative inquiry or Six Sigma?