Question: Task 1 : Parallel Corpora Parallel corpora contain a collection of texts in a given language and their translation to one or more other languages.

Task

1

: Parallel Corpora

Parallel corpora contain a collection of texts in a given language and their translation to one or more other

languages. In this task, you will build a small parallel corpus using data from OpenSubtitles.org, a

database that allows you to search and download subtitles for various languages. It was previously used to

build the OpenSubtitles corpus, which consists of around

2.6

billion sentences and covers

60

languages.

Search for the film Monty Python and the Holy Grail

(1975)

on OpenSubtitles.org and download subtitles

for English, German, and a third language of your choosing. Open the files using a text editor

(

.

.

Code

)

and familiarise yourself with the format. Your corpus will include sentences from a famous scene

that starts at

00

17

48 (

first English sentence is :

Quiet

!

There are ways of telling whether she is a witch.

),

and ends at

00

20

31 (

last English sentence is:

. . .

knight of the Round Table.

) .

Your goal is to clean up the

data, match subtitles in different languages and put the lines together, transforming them into the following

format:

line

1

in English

line

1

in German

line

1

in chosen language

line

2

in English

line

2

in German

line

2

in chosen language

. . .

You will see that this manual process is not feasible for greater amounts of data, and you will learn how to

automate a process like this later on in the course.

Save the created corpus as grail

_

corpus.txt and submit the file together with the assignment.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Question: Describe the five stages of the translanguage instructional design cycle. Provide examples provided in Chapter 5. \ CHAPTER 5 Translanguaging Design in Instruction LEARNING OBJECTIVES...

You may practice teaching and learning tactics. Create a list you may use in class, others, and as a solo instructor. 2 Language Structure and Use Learning Outcomes After reading this chapter, you...

Read Classroom Glimpse. Discuss stress, rhythm, pitch, and intonation based on the tale in the classroom 2 Language Structure and Use Learning Outcomes After reading this chapter, you should be able...

Discuss Semantics and the challenges they are in English. 2 Language Structure and Use Learning Outcomes After reading this chapter, you should be able to ... Explain how language contributes to...

So you want to be and Interpreter? An Introduction to Sign Language Interpreting. Class Participation/Canvas (Collaborative Discussions) What were the key points gleaned from the reading? Describe...

Which is the research design used by the authors? Why did the authors use ANOVA test? Do you think it's the most appropriate choice? Why or why not? Did the authors display the data? Do the results...

Research Article Comprehension and Inference: Relationships Between Oral and Written Modalities in Good and Poor Comprehenders During Adolescence Anna PotockiEl and Virginie Lavala Purpose: We...

Providing Quality School-Based Learning and Support Services 239 Chapter 6 Language and literacy support Your core task The core task of almost all TAs is to support students language and literacy...

Suppose that R(A, B, C) is a relational schema with functional dependencies F = {A, B C, C B}. (i) Is this schema in 3NF? Explain. [2 marks] (ii) Is this schema in BCNF? Explain. [2 marks] (b)...

PLG-120 Week 4 Lecture Notes To begin your analysis of the legal issue you have been asked to address, you must identify the governing rule of law. Articulating the legal rule is, of course, the...

Zack wants to have enough in his RRIF when he is 65 to provide $55,000 annual income for 39 years. Assume the funds would get a 1.9% annual real rate of return. How much does he need to have in his...

At what time is it proper to recognize income in the following cases? (a) Installment sales with no reasonable basis for estimating the degree of Collectibility? (b) Sales for future delivery? (c)...

please help me answering allocated cost with working excell formula. thanks Show all images Show all images Show all images done loading

How many times is the following statement executed? cout #include using namespace std; int main() int I, J tor (1-2: I>=0; I--) for (Jel; J

Why is it so difficult to implement effective IMC?

Identify companies you believe practise IMC, based on their marketing communication, and discuss what it is about their marketing communication that makes you choose them.

(5) When was the last time that I gave recognition to an individual for great performance?