Question: I'm really confusing to the python beautiful soup, and have no idea about my HW :( Please help me to solve the first question and

I'm really confusing to the python beautiful soup, and have no idea about my HW :(

Please help me to solve the first question and explain to me.

The b_soup_1.py file contains the code from the Week 2 lecture notes, showing how to start with the HTML for a web site and process that HTML into a list of table data value strings (str) using the BeautifulSoup module.

First, modify b_soup_1.py so that the programs only output is the final sequence of table cell value lists: no yc_temp.txt file, no intermediate results being displayed, etc.

Then, modify the code at the end of the program so that the table cell values are accumulated into a list of lists, representing the table of rows, something like this:

daily_yield_curves = [

[ header list ],

[ first data list ],

[ final data list ]

]

The first inner list should represent the header row:

['Date', '1 mo', '3 mo', '6 mo', '1 yr', '2 yr', '3 yr',

'5 yr', '7 yr', '10 yr', '20 yr', '30 yr']

Following that should be a list for each data row. Be sure to convert each interest rate value from a string to a float:

['01/02/18', 1.29, 1.44, 1.61, 1.83, 1.92, 2.01,

2.25, 2.38, 2.46, 2.64, 2.81]

...

['09/14/18', 2.02, 2.16, 2.33, 2.56, 2.78, 2.85,

2.90, 2.96, 2.99, 3.07, 3.13]

Create a file named daily_yield_curves.txt containing a neatly formatted table of this information.

the b_soup_1.py content is :

from urllib.request import urlopen # b_soup_1.py from bs4 import BeautifulSoup

# Treasury Yield Curve web site, known to be HTML code html = urlopen('https://www.treasury.gov/resource-center/' 'data-chart-center/interest-rates/Pages/' 'TextView.aspx?data=yieldYear&year=2018')

# create the BeautifulSoup object (BeautifulSoup Yield Curve) bsyc = BeautifulSoup(html.read(), "lxml")

# save it to a file that we can edit fout = open('bsyc_temp.txt', 'wt', encoding='utf-8')

fout.write(str(bsyc))

fout.close()

# print the first table print(' First table tag:', str(bsyc.table)) # ... not the one we want input('Press Enter to continue...')

# so get a list of all table tags table_list = bsyc.findAll('table')

# how many are there? print(' there are', len(table_list), 'table tags') input('Press Enter to continue...')

# look at the first 50 chars of each table print(' the first 50 chars of each table:') for t in table_list: print(str(t)[:50]) input('Press Enter to continue...')

# only one class="t-chart" table, so add that # to findAll as a dictionary attribute tc_table_list = bsyc.findAll('table', { "class" : "t-chart" } )

# how many are there? print(' there are', len(tc_table_list), 't-chart tables') input('Press Enter to continue...')

# only 1 t-chart table, so grab it tc_table = tc_table_list[0]

# what are this table's components/children? print(' the first 50 chars of each t-chart table child:') for c in tc_table.children: print(str(c)[:50]) input('Press Enter to continue...')

# tag tr means table row, containing table data # what are the children of those rows? print(' the children of the children of the t-chart table:') for c in tc_table.children: for r in c.children: print(str(r)[:50]) input('Press Enter to continue...')

# we have found the table data! # just get the contents of each cell print(' the contents of the children of the t-chart table:') for c in tc_table.children: for r in c.children: print(r.contents) input('Press Enter to continue...')

# the contents of each cell is a list of one # string -- we can work with those!

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!