1. Open the file and process it's contents in memory for later processing 2. Search for...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
1. Open the file and process it's contents in memory for later processing 2. Search for all hits that are images (using a regular expression) and gather some stats on those images 3. Find out which is the most popular browser people use to go to this website Unlike previous homeworks, this will not be outlined for you step by step. This means you can organize your program any which way you want, as long as you submit a Python program that meets all the requirements. Make sure to create a github repository for this assignment named IS211_Assignment3. All development should be done in this repository. Useful Reminders 1. Read the assignment over a few times. At least twice. It always helps to have a clear picture of the overall assignment when understanding how to build a solution. 2. Think about the problem for a while, and even try writing or drawing a solution using pencil and paper or a whiteboard. 3. Before submitting the assignment, review the "Functional Requirements" section and make sure you hit all the points. This will not guarantee a perfect score, however. Part I - Pull Down Web Log File Your program should download the web log file from the location provided by a --url parameter. This is just like the previous assignment (remember to use agrparse). The URL you can use for testing is located here: TODO. Part II - Process File Using CSV The file should then be processed, using the CSV module from this week. Here is an example line from the file, with an explanation as to what each fields represents: /images/test.jpg, 01/27/2014 03:26:04, Mozilla/5.0 (Linux) Firefox/34.0, 200, 346547 When broken down by column, separated by commas, we have: path to file, datetime accessed, browser, status of request, request size in bytes Some of this information you will use, some of it you will not. So, in our example, this line indicates that some user requested the file /images/test.jpg on 01/27/2014 03:26:04 using a Firefox browser. The status of the request was 200 (more on this later), and the file was 346547 bytes. Food For Thought: As you read and process the file line by line, how exactly do you plan on storing this data? In what way would storing the data make your life easier? Make sure to read through the whole assignment a few times before coming to any conclusions. Part III - Search for Image Hits After processing the file, your next task will be to search for all hits that are for an image file. To check if a hit is for an image file or not, we will simply check that the file extension is either .jpg, .gif or .png. Remember to use regular expressions for this. Once you have found all the hits relating to images, print out how many hits, percentage-wise, are for images. As an example, your program should print to the screen something like "Image requests account for 45.3% of all requests" Part IV - Finding Most Popular Browser Once Part III is done, your program should find out which browser people are using is the most popular. The third column of the file stores what is known as the User-Agent, which is a string web browser's use to identify themselves. The program should use a regular expression to determine what kind of browser created each hit, and print out which browser is the most popular that day. For this exercise, all you need to do is determine if the browser is Firefox, Chrome, Internet Explorer or Safari. Part VI - Extra Credit For extra credit, your program should output a list of hours of the day sorted by the total number of hits that occurred in that hour. The datetime is given by the second column, which you can extract the hour from using the Datetime module from last week. Using that information, your program should print to the screen something like: "Hour 12 has 1023 hits" "Hour 13 has 983 hits" "Hour 11 has 845 hits" "Hour 03 has 3 hits" "Hour 04 has 0 hits" for all 24 hours of the day. 1. Open the file and process it's contents in memory for later processing 2. Search for all hits that are images (using a regular expression) and gather some stats on those images 3. Find out which is the most popular browser people use to go to this website Unlike previous homeworks, this will not be outlined for you step by step. This means you can organize your program any which way you want, as long as you submit a Python program that meets all the requirements. Make sure to create a github repository for this assignment named IS211_Assignment3. All development should be done in this repository. Useful Reminders 1. Read the assignment over a few times. At least twice. It always helps to have a clear picture of the overall assignment when understanding how to build a solution. 2. Think about the problem for a while, and even try writing or drawing a solution using pencil and paper or a whiteboard. 3. Before submitting the assignment, review the "Functional Requirements" section and make sure you hit all the points. This will not guarantee a perfect score, however. Part I - Pull Down Web Log File Your program should download the web log file from the location provided by a --url parameter. This is just like the previous assignment (remember to use agrparse). The URL you can use for testing is located here: TODO. Part II - Process File Using CSV The file should then be processed, using the CSV module from this week. Here is an example line from the file, with an explanation as to what each fields represents: /images/test.jpg, 01/27/2014 03:26:04, Mozilla/5.0 (Linux) Firefox/34.0, 200, 346547 When broken down by column, separated by commas, we have: path to file, datetime accessed, browser, status of request, request size in bytes Some of this information you will use, some of it you will not. So, in our example, this line indicates that some user requested the file /images/test.jpg on 01/27/2014 03:26:04 using a Firefox browser. The status of the request was 200 (more on this later), and the file was 346547 bytes. Food For Thought: As you read and process the file line by line, how exactly do you plan on storing this data? In what way would storing the data make your life easier? Make sure to read through the whole assignment a few times before coming to any conclusions. Part III - Search for Image Hits After processing the file, your next task will be to search for all hits that are for an image file. To check if a hit is for an image file or not, we will simply check that the file extension is either .jpg, .gif or .png. Remember to use regular expressions for this. Once you have found all the hits relating to images, print out how many hits, percentage-wise, are for images. As an example, your program should print to the screen something like "Image requests account for 45.3% of all requests" Part IV - Finding Most Popular Browser Once Part III is done, your program should find out which browser people are using is the most popular. The third column of the file stores what is known as the User-Agent, which is a string web browser's use to identify themselves. The program should use a regular expression to determine what kind of browser created each hit, and print out which browser is the most popular that day. For this exercise, all you need to do is determine if the browser is Firefox, Chrome, Internet Explorer or Safari. Part VI - Extra Credit For extra credit, your program should output a list of hours of the day sorted by the total number of hits that occurred in that hour. The datetime is given by the second column, which you can extract the hour from using the Datetime module from last week. Using that information, your program should print to the screen something like: "Hour 12 has 1023 hits" "Hour 13 has 983 hits" "Hour 11 has 845 hits" "Hour 03 has 3 hits" "Hour 04 has 0 hits" for all 24 hours of the day.
Expert Answer:
Answer rating: 100% (QA)
To accomplish the tasks outlined in your assignment you can create a Python program that performs the following steps Part I Download the Web Log File Use the argparse module to accept a URL as a comm... View the full answer
Related Book For
Project Management The Managerial Process
ISBN: 9781260570434
8th Edition
Authors: Eric W Larson, Clifford F. Gray
Posted Date:
Students also viewed these programming questions
-
Are all content types (text, visual, audio, and video) equally helpful for big companies vs. small businesses? Considering different social media platforms, how can different content be helpful for...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Case Study: Quick Fix Dental Practice Technology requirements Application must be built using Visual Studio 2019 or Visual Studio 2017, professional or enterprise. The community edition is not...
-
Law Management Services began business on January 1, 2012, with a capital investment of $120,000. The company manages condominiums for owners (Service Revenue) and rents space in its own office...
-
What must be the width of a one-dimensional infinite potential well if an electron trapped in it in the n = 3 state is to have an energy of 4.7eV?
-
Let A be an n n symmetric negative definite matrix. (a) What will the sign of det(A) be if n is even? If n is odd? (b) Show that the leading principal submatrices of A are negative definite. (c)...
-
You throw a ball straight up in the air. Which of the following forces do work on the ball while you throw it? Consider the interval from the instant the ball is at rest in your hand to the instant...
-
The Kare Counseling Center was incorporated as a not-for-profit voluntary health and welfare organization 10 years ago. Its adjusted trial balance as of June 30, 2017, follows. 1. Salaries and fringe...
-
Which three design considerations to limit the scope of EIGRP queries are correct?
-
On December 31, Year 2, Palm Inc. purchased 80% of the outstanding ordinary shares of Storm Company for $350,000. At that date, Storm had ordinary shares of $240,000 and retained earnings of $64,000....
-
Analysis on thermal effect on a thick or thin laminate a. Determine the curvatures of a two-layer unsymmetric [0/90] laminate after it is cooled from the curing temperature to the room temperature....
-
True Or False One reason some clients file malpractice claims against their attorney is that they have unrealistic expectations about their case.
-
True Or False A patients morale will never have a bearing on what a professional does in informing them.
-
True Or False Misrepresentation is broader than deceit.
-
Some states have dealt with medical malpractice by a. passing statutes that limit the amount of recovery in medical malpractice cases. b. requiring pretrial review panels to hear malpractice claims....
-
True Or False Coming to the nuisance can be a defense to a private-nuisance claim.
-
Consider the schedule S1 consists of three transactions T1, T2, T3 as the following: T1 R(B) R(D) Commit S1 T2 R(B) W(B) R(D) commit T3 R(A) R(C) commit Answer the following question: a) Is the...
-
Write a function that reads a Float24_t value: Float24_t float24_read(void) A legitimate float24 value string is of the form: "mantissabexponent" where the mantissa (m) and the exponent (e) may have...
-
Safety is a major concern when working on projects abroad. Select a country that you would consider dangerous to work in and look up the travel advisory provided for that country by the U.S. State...
-
On June 23, 2018, in Thailand, a group of 12 boys aged between 11 and 17 from the local football team, named the Wild Boars, and their 23-year-old assistant coach entered the Tham Luang cave. Tham...
-
What kinds of projects is Agile PM best suited for and why?
-
Which of the following statements best describes corporate governance with respect to fraud? 1. Auditors are primarily responsible for the detection of fraud, the Board of Directors for the...
-
Which of the following is not a reason that the prevention and detection of fraud resulting from management override and collusion presents a significant challenge for the antifraud community? 1....
-
Which of the following is not an inherent part of Statement on Auditing Standards, No. 99/113? 1. Greater scrutiny of the chief executive and chief financial officers personal financial condition 2....
Study smarter with the SolutionInn App