Question: I'm really confusing to the python beautiful soup, and have no idea about my HW :( Please help me to solve the first question and
I'm really confusing to the python beautiful soup, and have no idea about my HW :(
Please help me to solve the first question and explain to me.
The b_soup_1.py file contains the code from the Week 2 lecture notes, showing how to start with the HTML for a web site and process that HTML into a list of table data value strings (str) using the BeautifulSoup module.
First, modify b_soup_1.py so that the programs only output is the final sequence of table cell value lists: no yc_temp.txt file, no intermediate results being displayed, etc.
Then, modify the code at the end of the program so that the table cell values are accumulated into a list of lists, representing the table of rows, something like this:
daily_yield_curves = [
[ header list ],
[ first data list ],
[ final data list ]
]
The first inner list should represent the header row:
['Date', '1 mo', '3 mo', '6 mo', '1 yr', '2 yr', '3 yr',
'5 yr', '7 yr', '10 yr', '20 yr', '30 yr']
Following that should be a list for each data row. Be sure to convert each interest rate value from a string to a float:
['01/02/18', 1.29, 1.44, 1.61, 1.83, 1.92, 2.01,
2.25, 2.38, 2.46, 2.64, 2.81]
...
['09/14/18', 2.02, 2.16, 2.33, 2.56, 2.78, 2.85,
2.90, 2.96, 2.99, 3.07, 3.13]
Create a file named daily_yield_curves.txt containing a neatly formatted table of this information.
the b_soup_1.py content is :
from urllib.request import urlopen # b_soup_1.py from bs4 import BeautifulSoup
# Treasury Yield Curve web site, known to be HTML code html = urlopen('https://www.treasury.gov/resource-center/' 'data-chart-center/interest-rates/Pages/' 'TextView.aspx?data=yieldYear&year=2018')
# create the BeautifulSoup object (BeautifulSoup Yield Curve) bsyc = BeautifulSoup(html.read(), "lxml")
# save it to a file that we can edit fout = open('bsyc_temp.txt', 'wt', encoding='utf-8')
fout.write(str(bsyc))
fout.close()
# print the first table print(' First table tag:', str(bsyc.table)) # ... not the one we want input('Press Enter to continue...')
# so get a list of all table tags table_list = bsyc.findAll('table')
# how many are there? print(' there are', len(table_list), 'table tags') input('Press Enter to continue...')
# look at the first 50 chars of each table print(' the first 50 chars of each table:') for t in table_list: print(str(t)[:50]) input('Press Enter to continue...')
# only one class="t-chart" table, so add that # to findAll as a dictionary attribute tc_table_list = bsyc.findAll('table', { "class" : "t-chart" } )
# how many are there? print(' there are', len(tc_table_list), 't-chart tables') input('Press Enter to continue...')
# only 1 t-chart table, so grab it tc_table = tc_table_list[0]
# what are this table's components/children? print(' the first 50 chars of each t-chart table child:') for c in tc_table.children: print(str(c)[:50]) input('Press Enter to continue...')
# tag tr means table row, containing table data # what are the children of those rows? print(' the children of the children of the t-chart table:') for c in tc_table.children: for r in c.children: print(str(r)[:50]) input('Press Enter to continue...')
# we have found the table data! # just get the contents of each cell print(' the contents of the children of the t-chart table:') for c in tc_table.children: for r in c.children: print(r.contents) input('Press Enter to continue...')
# the contents of each cell is a list of one # string -- we can work with those!
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
