Question: in this project, I will have to make aTextAnalyzer class. The methods of the class are described below. I will do my work in the

in this project, I will have to make aTextAnalyzer class. The methods of the class are described below. I will do my work in the Analyzing Text Jupyter notebook included in the project files. Be sure to comment on my code well.

import requests, re

from bs4 import BeautifulSoup

from collections import Counter

import statistics as stats

import string

#I must create your class here

import operator "Part 1 complete

import matplotlib.pyplot as plt; plt.rcdefaults()

class TextAnalyzer:

def __init__(self, src, src_type="discover"):

self._src_type = None

self._content = None

self._orig_content = None

# determine src_type if not specified

if src_type == "discover":

if src.startswith("http"):

src_type = "url"

elif src.endswith(".txt"):

src_type = "path"

else:

src_type = "text"

self._src_type = src_type

# load content based on src_type

if self._src_type == "url":

response = requests.get(src)

self._orig_content = response.text

elif self._src_type == "path":

with open(src, "r") as f:

self._orig_content = f.read()

elif self._src_type == "text":

self._orig_content = src

# preprocess content

self._content = self._preprocess(self._orig_content)

def _preprocess(self, text):

# remove punctuation

text = text.translate(str.maketrans("", "", string.punctuation))

# remove whitespace

text = re.sub(r"\s+", " ", text)

# convert to lowercase

text = text.lower()

return text

def __init__(self, src, src_type="discover"):

self._src_type = None

self._content = None

self._orig_content = None

# determine src_type if not specified

if src_type == "discover":

if src.startswith("http"):

src_type = "url"

elif src.endswith(".txt"):

src_type = "path"

else:

src_type = "text"

self._src_type = src_type

# load content based on src_type

if self._src_type == "url":

response = requests.get(src)

self._orig_content = response.text

elif self._src_type == "path":

with open(src, "r") as f:

self._orig_content = f.read()

elif self._src_type == "text":

self._orig_content = src

# preprocess content

self._content = self._preprocess(self._orig_content)

def _preprocess(self, text):

# remove punctuation

text = text.translate(str.maketrans("", "", string.punctuation))

# remove whitespace

text = re.sub(r"\s+", " ", text)

# convert to lowercase

text = text.lower()

return text

Part 2 complete

def set_content_to_tag(self, tag, tag_id=None):

"""

Changes _content to the text within a specific element of an HTML document.

Keyword arguments:

tag (str) - Tag to read

tag_id (str) - ID of tag to read

"""

try:

# Create a BeautifulSoup object from the original content

soup = BeautifulSoup(self._orig_content, "html.parser")

# Check if tag_id is specified, and get the text of the tag

if tag_id:

tag_text = soup.find(id=tag_id).get_text()

else:

tag_text = soup.find(tag).get_text()

# Preprocess the tag text and set _content to the preprocessed text

self._content = self._preprocess(tag_text)

except AttributeError:

print("Error: tag not found in HTML document.")

Part 3 complete

def reset_content(self):

"""

Resets _content to full text that was originally loaded.

Useful after a call to set_content_to_tag().

"""

# Reset _content to the preprocessed original content

self._content = self._preprocess(self._orig_content)

Part 4 complete

def _words(self, casesensitive=False):

words = self._content.split()

words = [word.strip(string.punctuation) for word in words]

if not casesensitive:

words = [word.upper() for word in words]

return words

Part 5 complete

def common_words(self, minlen=1, maxlen=100, count=10, casesensitive=False):

words = self._words(casesensitive)

word_counts = Counter(words)

filtered_words = [(word, count) for word, count in word_counts.items() if minlen <= len(word) <= maxlen]

sorted_words = sorted(filtered_words, key=lambda x: x[1], reverse=True)

return sorted_words[:count]

Part 6 (help please)

char_distribution(self, casesensitive=False, letters_only=False)

Returns a list of 2-element tuples of the format (char, num), where num is the number of times char shows up in _content. The list should be sorted by num in descending order.

Keyword arguments:

casesensitive(bool) - Consider case?
letters_only(bool) - Exclude non-letters?

Part 7 (help please)

plot_common_words(self, minlen=1, maxlen=100, count=10, casesensitive=False)

Plots most common words.

Keyword arguments:

minlen(int) - Minimum length of words to include.
maxlen(int) - Maximum length of words to include.
count(int) - Number of words to include.
casesensitive(bool) - If False makes all words uppercase.

Part 8 (help please) plot_char_distribution(self, casesensitive=False, letters_only=False)

Plots character distribution.

Keyword arguments:

casesensitive(bool) - IfFalsemakes all words uppercase.
letters_only(bool) - Exclude non-letters

Part 9 (help please)

Properties

In addition, the class must include these properties:

avg_word_length(self)

The average word length in_contentrounded to the 100th place (e.g, 3.82).

word_count(self)

The number of words in_content.

distinct_word_count(self)

The number of distinct words in_content. This should not be case sensitive: "You" and "you" should be considered the same word.

words(self)

A list of all words used in _content, including repeats, in all uppercase letters.

positivity(self)

A positivity score calculated as follows:

Create localtallyvariable with initial value of 0.
Incrementtallyby 1 for every word inself.wordsfound in positive.txt (in same directory)
Decrementtallyby 1 for every word inself.wordsfound in negative.txt (in same directory)
Calculate score as follows:

round( tally / self.word_count * 1000)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Question: What as the average weekly safety inventory level of refined sugar from the beginning January 2022 to the end of July 2022? A. 512,465.9691 metric tons per week B. 316,002.1474 metric tons...

Assessment Task 2 Written Assessment Part A Due date: Wednesday of Week 7 (23:45 AEST) ASSESSMENT Weighting: 20% 2A Objectives This assessment item relates to the following unit learning outcomes:...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Build a self-website. plz write the code use HTML/CSS/JavaScript . you can put any text to the website. To make sure the template work. below are some information and requirements for design a...

[PYTHON PROGRAMMING] { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Preamble: A Brand new Jay ", " ", "After an eventful season on season 8 of *A Brand New Jay*, the 3...

\fThis is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does...

Please, Please, I need help with why am I getting these errors Please, Please Next, you must develop a Python module in a PY file, using object-oriented programming methodology, to enable CRUD...

25752 Bank Lending and Analytics Individual Assignment Due in Week 12 Instructions This assignment is to be completed individually. Please respect the rules to avoid plagiarism outlined in the...

In October, 2012, seasonally adjusted estimated civilian employment was 143,384,000, while unemployment was 12,258,000 and those out of the labor force numbered 88,341,000. The following month, in...

True or False The area enclosed by the circler=sin 0 is given by A = * r dr do. 2x esin 0 %3D

For Tax Year 2 0 2 3 , what is the excess accumulations tax rate imposed on an individual who falls to withdraw their required minimum distoribution ( or fails to withdraw the full amount ) by the...

DLN is an architectural firm that designs and builds buildings. It prices each job on a cost plus 20% basis. Overhead costs in 2017 are $8,100,000. DLN's simple costing system allocates overhead...

Mathis Music Stands manufactures decorative wooden music stands for discriminating musicians. Each completed music stand contains 2.5 board feet of American cherry wood. In the process of matching...

The shareholders' equity section of Superior Corporation's balance sheet as of December 31, 2012, is as follows Required: 1. Prepare journal entries for each of the above transactions. 2. Calculate...

Jimmy establishes a Roth IRA at age 47 and contributes a total of $89,600 over 18 years. The account is now worth $112,000. How much of these funds may Jimmy withdraw tax-free?

On August 3, 2013, the date of incorporation, Quinn Company accepts separate subscriptions for 1,000 shares of $100 par preferred stock at $104 per share and 9,000 shares of no-par, no-stated-value...

Determine if the overhead allocated to the product relates to a single plantwide overhead rate method, multiple production department factory overhead rate method, or activity-based costing...

6.30 Two different airlines have a flight from Los Angeles to New York that departs each weekday morning at a certain time. Let E denote the event that the first airlines flight is fully booked on a...

6.17 Roulette is a game of chance that involves spinning a wheel that is divided into 38 equal segments, as shown in the accompanying picture. A metal ball is tossed into the wheel as it is spinning,...

6.25 A deck of 52 cards is mixed well, and 5 cards are dealt. a. It can be shown that (disregarding the order in which the cards are dealt) there are 2,598,960 possible five-card hands, of which only...