Question: (PYTHON) I am trying to convert a text file with lines like this: 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] GET /history/apollo/ HTTP/1.0 200 6245 into a

(PYTHON)

I am trying to convert a text file with lines like this:

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245 

into a pandas data frame like this:

host timestamp method url version response_code content_size
199.72.81.55 01/Jul/1995:00:00:01 -0400 GET /history/apollo/ HTTP/1.0 200 6245
unicomp6.unicomp.net 01/Jul/1995:00:00:06 -0400 GET /shuttle/countdown/ HTTP/1.0 200 3985

I am really close with this method:

df = pandas.read_csv(src_log_filepath, sep="\s-\s-\s\[|\s(?=/)|\]\s\"|\"(?=\s)|\s(?=\d+)", names=["host", "timestamp", "method", "url", "version", "response_code", "content_size"])

Except for It does not separate the "url" contents from what should go into the "version" column.So it would look like this in the url column and the version column would just be NaN

(PYTHON) I am trying to convert a text file with lines like

Everything else is fine though. But when I try to add "|\s(?=HTTP)" into the "sep" arg it fixes this issue but then the rest of the data columns get messed up. Where the host column and the timestamp column will now have the IP for some reason:

Example host: 10.223.157.186 15/Jul/2009:14:58:59 -0700

Example timestamp: 10.223.157.186 GET

How it looks after:

this: 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245 into

Some how adding "|\s(?=HTTP)" into "sep" causes this.

sep="\s-\s-\s\[|\s(?=/)|\]\s\"|\"(?=\s)|\s(?=\d+)|\s(?=HTTP)" it would look like this

Why does this happen and how can I separate the URL from the method without this occurring?

(Some requirements for the assignment ask me to clean up the string before I put it into the table. that's why my regex is so weird.)

DataFrame: DataFrame

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!