Question: Help with Python code: I am doing my paper for C996 programming in python. I was able to write the code (don't know if it

Help with Python code:

I am doing my paper for C996 programming in python. I was able to write the code (don't know if it is correct). I have to extracts links that point to other HTML pages and include them in cvs file. All links should be in absolute format.

Below are two different codes and none of them gives me correct answer (it should be 118 links).

Can you please advise me in what i need to add/correct

CODE 1:

from bs4 import BeautifulSoup

import urllib.request

import urllib.parse

from urllib.parse import urljoin

import requests

import csv

#Scrape website, specify the url, send the requests and catch the response

source='https://www.census.gov/programs-surveys/popest.html'

url_results=requests.get (source)

#Parse the HTML using BeautifulSoup to get the content from the website

soup=BeautifulSoup(url_results.content,'html.parser')

print (soup.prettify())

#Number of retreived URLs

all_links=soup.find_all('a')

for link in soup.find_all ('a', href=True):

href=link.get('href')

urls = []

for link in soup.find_all('a', href=True):

urls.append(link['href'])

#deduplicate links

urls_list=[]

for link in urls:

if link not in urls_list:

urls_list.append(link)

#printing deduplicated links

for link in urls_list:

print(link)

#Create a file to write to

with open("outputsep0903.csv", "w",newline="") as f:

cw = csv.writer(f,delimiter=" ")

cw.writerow(urls_list)

f.close()

CODE 2:

from bs4 import BeautifulSoup

import urllib.request

import urllib.parse

from urllib.parse import urljoin

import requests

import csv

#Scrape website, specify the url, send the requests and catch the response

source_url='https://www.census.gov/programs-surveys/popest.html'

url_results=requests.get (source_url)

#Parse the HTML using BeautifulSoup

soup=BeautifulSoup(url_results.content,'html.parser')

print (soup.prettify())

#Counting links, loop created to print out all tags, all links to html pages

all_links=soup.find_all('a')

for link in all_links:

link.get('href')

print ("All links:",len(all_links))

#Unique links, absolute path, deduplication

def unique_links (all_links, source_url):

output_links=set()

for link in all_links:

link=link.get ('href')

if link is None:

continue

if link.endswith ('/') or link.endswith('#'):

link=link[-1]

actual_url=urllib.parse.urljoin(source_url,link)

output_links.add (actual_url)

return output_links

output_links=unique_links(all_links,source_url)

print ('Number of not duplicated links:',len(output_links))

#Create a file to write to

with open("outputsep3.csv", "w",newline="") as f:

cw = csv.writer(f,delimiter=" ")

cw.writerow(output_links)

f.close()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

I am doing my paper for C996 programming in python. I was able to write the code but I am not able to deduplicate the links and also export the links to the CSV file. below is my code. can you please...

(Data Mining short Python code only.) I just need some code to start bellow. Please help me write some Python code. Only fill the YOUR CODE HERE part. Frequent Pattern Mining implementing the APRIORI...

Help need python code Lab 5.5 - Programming Challenge 1 - Yum Yum Burger Joint Write the Flowchart and Python code for the following programming problem and the pseudocode below. Write a program that...

Description for PYTHON This file will: Download 1 year COST (costco) stock data from finance.yahoo.com. First go to 'finance.yahoo.com', search COST on top, click "historical data", choose "1 year"...

I need help writing a Python code for this problem in my programming lab I want to be able to run in IDLE(Python 3.6 32-bit) so I can understand and study how the code works: Write an address book...

Hi I really need help in this code please help. This code needs to be in Python language please only write it in python Please answer this correctly and make sure it works. And please only answer it...

help with python code 5 Software Design We will be representing an association between a key and a value in python using a tuple with two elements, the first being the value, and the second being the...

CHA P TER 9 Understanding Software: A Primer for Managers 1. INTRODUCTION L E A R N I N G O B J E C T I V E S 1. Recognize the importance of software and its implications for the rm and strategic...

I need full code for this project. All the resources are found here. The code for lab 3 is: -) https://ucsb csB.github.io/w19 matni/lab/project01/ Goal and Background The goal of this project is to...

You are doing a fantastic job at Chada Tech in your new role as a junior developer, and you exceeded expectations in your last assignment for Airgead Banking. Since your team is impressed with your...

Use the proof of Picard-Lindelf's y' = ty, Theorem to find the solution to y(0) = 1.

Given the following observations from a population, calculate the mean, the median, and themode. 20 15 252010 15 25 20 15

How would you enhance the MNC s working capital to i mprove liquidity ( CCC ) and solvency?

2023 1040 form 1.Phillip and Claire are married and file a joint return. Phillip is self-employed as a real estate agent, and Claire is a flight attendant. Phillip and Claire have three dependent...