Question: I need help cleaning a dataset please provide the code it can be downloaded from here https://www.kaggle.com/tmdb/tmdb-movie-metadata/data what i have done so far below, dont

I need help cleaning a dataset please provide the code

it can be downloaded from here https://www.kaggle.com/tmdb/tmdb-movie-metadata/data

what i have done so far below, dont mind the importing because I will use the rest when I have a clean set.

from datetime import timedelta, date import datetime import numpy as np import pandas as pd import string import re import csv import requests import string

data from https://www.kaggle.com/tmdb/tmdb-movie-metadata/data df_movies = pd.read_csv('tmdb_5000_movies.csv', delimiter = ',', header = 0, skipinitialspace = True)

df_movies.drop(columns='homepage', inplace=True) df_movies.drop(columns='popularity', inplace=True) df_movies.drop(columns='overview', inplace=True) df_movies.drop(columns='status', inplace=True) df_movies.drop(columns='tagline', inplace=True) df_movies.drop(columns='vote_average', inplace=True) df_movies.drop(columns='vote_count', inplace=True) df_movies.drop(columns='id', inplace=True)

df_movies.drop(columns='id', inplace=True)

df_movies.head()

I want it so that the 'genres' column only says the genre whether it is action adventure and so on. Same goes for 'production_company' and 'production_country' and 'spoken_language'.

Then I need you to remove all rows where 'spoken_language is not english or en, and create a separate column with just the year of the movie's release, titled 'release_year' and order it by 'release-year' and then 'revenue'.

Thanks!

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!