Question: USE PYTHON and can ONLY IMPORT PANDAS Hi I am having trouble cleaning a specific column in my dataframe that is not allowing me to

USE PYTHON and can ONLY IMPORT PANDAS

Hi I am having trouble cleaning a specific column in my dataframe that is not allowing me to finish my project.

I have a column that states the hours of a different schools. The data isn't clean so there are different variants of how this is read. Below is a sample size of the data in this column.

School_Hours
08:00 AM-03:00 PM
08:15 AM-03:15 PM
08:30 AM-03:30 PM
7:45 AM - 2:45 PM
7:30 AM-2:30 PM
7:45 AM - 2:45 PM
8:30 AM - 2:55 PM
08:30 AM-03:30 PM
8:30 AM-3:30 PM
08:00 AM-03:00 PM
8:00 am-3:30 pm
9:00 AM - 4:15 PM
7:00 am-3:00 pm
08:30 AM-03:30 PM
08:00 AM-03:00 PM
08:45 AM-03:45 PM
8:00 AM-3:30 PM
8:00 AM-3:30 PM
7:45 AM-2:45 PM
07:45 AM-02:45 PM
8:00 am-3:30 pm
9:00 AM - 4:08 PM
7:50 am-3:30 pm
8:00 AM - 3:30 PM
08:30 AM-03:30 PM
7:15a.m.-2:45p.m.
8:00 am-3:30 pm
9:00 AM - 4:00 Pm
08:00 AM-03:00 PM
8:00 AM - 3:13 PM
08:15 AM-03:15 PM
M, T, W, Th: 7:45 AM-3:05 PM F: 7:45 AM-2:07 PM
08:45 AM-03:45 PM
7:30 AM-3:00 PM
7:45 AM - 2:45 PM
08:00 AM-03:00 PM
7:45 AM - 3:00 PM
08:45 AM-03:45 PM
07:45 AM-02:45 PM
8:00 am-3:00 pm

:

I want to take this column from another dataframe called df and add to my dataframe school_df as a new column. The added column should grab the start time of schools rounded down to the hour. For ex) 8:45am would be 8, and 7:30am would be 7. For all blanks/nulls will be the mean of the column.

------------------------------------------------------------------------------------------------------------------------------------

This is my current script for the column:

school_df['Starting Hour'] = df['School_Hours'].str.extract("(^\d*)")
school_df['Starting Hour'] = school_df['Starting Hour'].str.replace('0', '')

This is the unique results that I get for the column:

['8', '7', '9', '', nan]

---------------------------------------------------------------------------------------------------------------------------------------------------

I would prefer to not have to replace the 0. If you can get a script that grabs the first non-zero digit in the column that would be the best but I couldn't get that to work. The expected results should be 8, 7, and 9. The nan and space should be equal to the mean of the column. The column should also be an int d-type.

Thanks

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!