C++ This C++ assignment is to extract texts in a webpage andsave them in a text file
Fantastic news! We've Found the answer you've been seeking!
Question:
C++
"This C++ assignment is to extract texts in a webpage andsave them in a text file with predefined formats. Your programshould use the following table to convert a HTML file into a puretext file"
*Here is the required sample .html file:http://www.mediafire.com/file/48jgo660pkb3qbj/Sample.html
Transcribed Image Text:
One simple way to extract the text in a webpage is to remove all HTML tags enclosed by <> pairs. However, the extracted text will be a long character string. This project is to extract texts in a webpage and save them in a text file with predefined formats. Your program should use the following table to convert a HTML file into a pure text file. HTML Tags Page title <h1> Headings (heading 3 to 6 <h2> <h3> use one '#') Paragraphs Line breaks Unordered lists Ordered lists HTML example <title> My webpage title </title> Links <br> Heading 1 </h1> Heading 2 </h2> Heading 3 </h3> <ul> <p> This is paragraph 1. </p> | A blank line before and after the paragraph. A new line <li>Bullet One </li> <li>Bullet Two </li> <li> Bullet three</li> </ul> <ol> <li>Item One </li> <li>Item Two</li> <li> Item three</li> </ol> Converted text === My webpage title <a href="http://www.abc.com"> Link Description </a> # Heading 1 ## Heading 2 ### Heading 3 Bullet One Bullet Two Bullet Three 1. Item One 2. Item Two 3. Item Three Link Description (http://www.abc.com) In this project, you can define your own styles for the tags not in the above table, and/or ignore more complicated tags (e.g. tables). Report You should submit your C++ code and a report with captured screens of HTML files, HTML sources and formatted text files. A sample HTML file (sample.html) was uploaded to Blackboard for you to test your output. One simple way to extract the text in a webpage is to remove all HTML tags enclosed by <> pairs. However, the extracted text will be a long character string. This project is to extract texts in a webpage and save them in a text file with predefined formats. Your program should use the following table to convert a HTML file into a pure text file. HTML Tags Page title <h1> Headings (heading 3 to 6 <h2> <h3> use one '#') Paragraphs Line breaks Unordered lists Ordered lists HTML example <title> My webpage title </title> Links <br> Heading 1 </h1> Heading 2 </h2> Heading 3 </h3> <ul> <p> This is paragraph 1. </p> | A blank line before and after the paragraph. A new line <li>Bullet One </li> <li>Bullet Two </li> <li> Bullet three</li> </ul> <ol> <li>Item One </li> <li>Item Two</li> <li> Item three</li> </ol> Converted text === My webpage title <a href="http://www.abc.com"> Link Description </a> # Heading 1 ## Heading 2 ### Heading 3 Bullet One Bullet Two Bullet Three 1. Item One 2. Item Two 3. Item Three Link Description (http://www.abc.com) In this project, you can define your own styles for the tags not in the above table, and/or ignore more complicated tags (e.g. tables). Report You should submit your C++ code and a report with captured screens of HTML files, HTML sources and formatted text files. A sample HTML file (sample.html) was uploaded to Blackboard for you to test your output.
Expert Answer:
Answer rating: 100% (QA)
Below is a simple C program that reads an HTML file extracts text based on the provided table and saves the formatted text into a new text file includ... View the full answer
Related Book For
Project Management Achieving Competitive Advantage
ISBN: 978-0133798074
4th edition
Authors: Jeffrey K. Pinto
Posted Date:
Students also viewed these electrical engineering questions
-
Use the following table to calculate project schedule variance based on the units listed. (Figures are in thousands.) Schedule Variance Work Units Total 20 15 10 25 20 20 110 Pianned Value Earned...
-
Use the following table to calculate PV, AC, EV, CPI, SPI, and ETC for a project that is in its 11th week of execution.
-
The purpose of this case assignment is to apply forecasting techniques and analysis to a service analytics problem, determine solutions for addressing organizational challenges and communicate...
-
Sumit's age after 12 years will be 6 times his age 8 years back. What is the present age of Sumit? (a) 10 (b) 12 (c) 14 (d) 15 (e) 18
-
In December 2010, Ultravision established its predetermined overhead rate for movies produced during year 2011 by using the following cost predictions: overhead costs, $1,800,000, and direct labor...
-
The proposed small office building in Example 3-2 had 24,000 net square feet of area heated by a natural gas furnace. The owner of the building wants to know the approximate cost of heating the...
-
Memorial Medical Center bought equipment on January 2, 2010, for $30,000. The equipment was expected to remain in service for four years and to perform 1,000 operations. At the end of the equipments...
-
An insurance company has high levels of absenteeism among the office staff. The head of office administration argues that employees are misusing the companys sick leave benefits. However, some of the...
-
5 ped Sales CRUZ, INCORPORATED Income Statement For Year Ended December 31, 2021 Cost of goods sold Gross profit Operating expenses (excluding depreciation) Depreciation expense Income before taxes...
-
Electro-Phi Inc. (the "Company") is a utility provider that sells electricity to the public. The Company produces the electricity using various forms of power generation, including the burning of...
-
with Consider the following state space system x=Fx + Gu y=Hx+ Ju F-[3 3G-H-01 01J-0. F= a) Can you place the poles of this system to -2 and -5. Why? b) Can you place the poles of this system to -3...
-
Combine all like terms to simplify the expression. x+4x+3+7x
-
This morning on your way to work at the BORS General Counsel's office, it suddenly dawned on you how much administrative law impacts on your life. That is, how many processes and occupations you had...
-
The duty to mitigate often influences the ability of someone to collect compensation for their losses. The duty to mitigate applies to insurance claimants such as many Gulf Coast residents affected...
-
Discuss the various query languages used in databases, such as SQL, MDX, and CQL. Explain the differences and when each language is used. Provide examples of queries written in different languages.
-
Logisitics Solutions provides order fulfillment services for dot.com merchants. The company maintains warehouses that stock items carried by its dot.com clients. When a client receives an order from...
-
www. Assume you open two supermarkets one in Istanbul the other one in Adana. How would you determine the price level in each market? Which market is likely to have a price closer to the marginal...
-
Which of the following is NOT a magnetic dipole when viewed from far away? a) A permanent bar magnet. b) Several circular loops of wire closely stacked together with the same current running in each...
-
When deciding on whether or not to crash project activities, a project manager was faced with the following information. Activities of the critical path are highlighted with an asterisk: a. Identify...
-
Define the following terms: a. Path: b. Activity: c. Early start: d. Early finish: e. Late start: f. Late finish: g. Forward pass: h. Backward pass: i. Node: j. AON: k. Float or Slack: l. Critical...
-
What are the benefits of developing a Responsibility Assignment Matrix (RAM) for a project?
-
Journalize the following transactions of Concilio Video Productions Inc. What was the overall effect of these transactions on Concilio's shareholders' equity? April 19 July 22 Nov. 11 Issued 2,000...
-
At December 31, 2019, Blumenthall Corporation reported the shareholders' equity accounts shown here (as adapted, with dollar amounts in millions): Blumenthall's 2020 transactions included the...
-
Blumenthall Corporation earned \(\$ 350\) million in net income in 2020. Use this and the data in exercise E8-23 to prepare the shareholders' equity section of the company's balance sheet at December...
Study smarter with the SolutionInn App