Question: I am working on a project using Python and Jupyter. This is what I have to work with: # - * - coding: utf -
I am working on a project using Python and Jupyter. This is what I have to work with: # coding: utf
Created on Fri Sep
@author: CS Group Fall
import pandas as pd
import json
import numpy as np
# Assign Dataset JSON File Paths
ENFILEPATH "DataSetsEngsentences.json"
FRAFILEPATH "DataSetsFrasentences.json"
MULTILANGSENTENCEFILEPATH "DataSetsCCsentences.json"
LINKSFILEPATH "DataSetsLinksjson" # Contains Translations to match the sentences from CCsentences.json
# Read the JSON file into a DataFrame
endata pdreadjsonENFILEPATH, linesTrue # English
fradata pdreadjsonFRAFILEPATH, linesTrue # French
sentencesdata pdreadjsonMULTILANGSENTENCEFILEPATH, linesTrue # Multi Lang Sentences
with openLINKSFILEPATH, r as linkfile: # LinkTranslation
linkdata
for line in linkfile:
linkdata.appendjsonloadsline
# Create a DataFrame
endata.setindexSentence id inplaceTrue # English
fradata.setindexSentence id inplaceTrue # French
sentencesdata.setindexSentence id inplaceTrue # Multi Lang Sentences
linkdf pdDataFramelinkdata # LinkTranslation
# Reset index to include 'Sentence id as a column
endata.resetindexinplaceTrue # English
fradata.resetindexinplaceTrue # French
sentencesdata.resetindexinplaceTrue # Multi Lang Sentences
linkdfresetindexdropTrue, inplaceTrue # LinkTranslation
# Display the first few rows of the DataFrame
printendata.head
printfradata.head
printsentencesdata.head
printlinkdfhead
#
# Owen's part
#
# Sentencesdata contains all the sentences available on the website
# Count number of sentences in each available languages
langcount sentencesdata.groupbyLangSentence idaggcountresetindex
langcount langcount.renamecolumnsSentence id:'langcount'
idx langcount.langcount.idxmax
print
language has available sentences which is the most in dataset'.formatlangcount.ilocidx langcount.ilocidx
# linkdf contains the links between the sentences. means that sentence # is the translation of sentence #
# The reciprocal link is also present, so the file will also contain a line that says
# Count number of sentences that has been used
linktranslate linkdfgroupbySentence idaggcountresetindex
linktranslate linktranslate.renamecolumnsTranslation id:'translationcount'
# Get the translations
linktranslate linktranslate.mergesentencesdata.iloc:: how 'left', on 'Sentence id
# Keep only the valid translation
validtranslation linktranslate~linktranslate.Lang.isna
validtranslationcount validtranslation.groupbyLangSentence idaggcountresetindex
# Count number of each language valid translations
validtranslationcount validtranslationcount.renamecolumnsSentence id:'validcount'
idx validtranslationcount.validcount.idxmax
print
language has valid sentences which is the most in dataset'.formatvalidtranslationcount.ilocidx validtranslationcount.ilocidx
# Most popular translated sentence
idx validtranslation.translationcount.idxmax
print
is the most translated sentences'.formatvalidtranslation.locidx
# Translation that has the most different meaning in English
endata endata.mergelinkdf how 'left', righton 'Sentence id lefton 'Sentence id
entranslatecount endataTranslation idvaluecounts
idx entranslatecount.index
print
Translation that has the most different meaning in English:
endataendataTranslation id idxTranslation id'Text'
# Translation that has the most different meaning in French
fradata fradata.mergelinkdf how 'left', righton 'Sentence id lefton 'Sentence id
fratranslatecount fradataTranslation idvaluecounts
idx fratranslatecount.index
print
Translation that has the most different meaning in French:
fradatafradataTranslation id idxTranslation id'Text'
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
