Question: USE PYTHON and can ONLY IMPORT PANDAS Hi I am having trouble cleaning a specific column in my dataframe that is not allowing me to
USE PYTHON and can ONLY IMPORT PANDAS
Hi I am having trouble cleaning a specific column in my dataframe that is not allowing me to finish my project.
I have a column that states the hours of a different schools. The data isn't clean so there are different variants of how this is read. Below is a sample size of the data in this column.
| School_Hours |
| 08:00 AM-03:00 PM |
| 08:15 AM-03:15 PM |
| 08:30 AM-03:30 PM |
| 7:45 AM - 2:45 PM |
| 7:30 AM-2:30 PM |
| 7:45 AM - 2:45 PM |
| 8:30 AM - 2:55 PM |
| 08:30 AM-03:30 PM |
| 8:30 AM-3:30 PM |
| 08:00 AM-03:00 PM |
| 8:00 am-3:30 pm |
| 9:00 AM - 4:15 PM |
| 7:00 am-3:00 pm |
| 08:30 AM-03:30 PM |
| 08:00 AM-03:00 PM |
| 08:45 AM-03:45 PM |
| 8:00 AM-3:30 PM |
| 8:00 AM-3:30 PM |
| 7:45 AM-2:45 PM |
| 07:45 AM-02:45 PM |
| 8:00 am-3:30 pm |
| 9:00 AM - 4:08 PM |
| 7:50 am-3:30 pm |
| 8:00 AM - 3:30 PM |
| 08:30 AM-03:30 PM |
| 7:15a.m.-2:45p.m. |
| 8:00 am-3:30 pm |
| 9:00 AM - 4:00 Pm |
| 08:00 AM-03:00 PM |
| 8:00 AM - 3:13 PM |
| 08:15 AM-03:15 PM |
| M, T, W, Th: 7:45 AM-3:05 PM F: 7:45 AM-2:07 PM |
| 08:45 AM-03:45 PM |
| 7:30 AM-3:00 PM |
| 7:45 AM - 2:45 PM |
| 08:00 AM-03:00 PM |
| 7:45 AM - 3:00 PM |
| 08:45 AM-03:45 PM |
| 07:45 AM-02:45 PM |
| 8:00 am-3:00 pm |
:
I want to take this column from another dataframe called df and add to my dataframe school_df as a new column. The added column should grab the start time of schools rounded down to the hour. For ex) 8:45am would be 8, and 7:30am would be 7. For all blanks/nulls will be the mean of the column.
------------------------------------------------------------------------------------------------------------------------------------
This is my current script for the column:
school_df['Starting Hour'] = df['School_Hours'].str.extract("(^\d*)") school_df['Starting Hour'] = school_df['Starting Hour'].str.replace('0', '')
This is the unique results that I get for the column:
['8', '7', '9', '', nan]
---------------------------------------------------------------------------------------------------------------------------------------------------
I would prefer to not have to replace the 0. If you can get a script that grabs the first non-zero digit in the column that would be the best but I couldn't get that to work. The expected results should be 8, 7, and 9. The nan and space should be equal to the mean of the column. The column should also be an int d-type.
Thanks
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
