Question: Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, w2,... ,ws. The following table shows the estimated background

Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, w2,... ,ws. The following table shows the estimated background language model p(|C) using the whole collection of documents (2nd column) and the word counts for document di (3rd column) and d2 (4th column), where c(w, di is the count of word w in document di. Let Q -fw1, w2, w3, w) be a query 0.800 0.100 0.025 0.025 0.025 0.025 W3 w) (a) Suppose we do not smooth the language model for di and d2. Compute the likeli- hood of the query for both di and d2, i.e., p(Qldi) and p(Qd2) (Do not compute the log-likelihood. You should use the scientific notation (e.g., 0.0061 should be 6.1 10-3) which document would be ranked higher? (b) Suppose we now smooth the language model for di and d2 using the Jelinek-Mercer smoothing method with 0.8 (i.e. , p(uld-A-Pinle(tul Ma) + (1 A).Mnle(u, Me)) Recompute the likelihood of the query for both di and d2, i.e., p(Qldi) and p(Qd2) (Do not compute the log-likelihood. You should use the scientific notation) Which document would be ranked higher? Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, w2,... ,ws. The following table shows the estimated background language model p(|C) using the whole collection of documents (2nd column) and the word counts for document di (3rd column) and d2 (4th column), where c(w, di is the count of word w in document di. Let Q -fw1, w2, w3, w) be a query 0.800 0.100 0.025 0.025 0.025 0.025 W3 w) (a) Suppose we do not smooth the language model for di and d2. Compute the likeli- hood of the query for both di and d2, i.e., p(Qldi) and p(Qd2) (Do not compute the log-likelihood. You should use the scientific notation (e.g., 0.0061 should be 6.1 10-3) which document would be ranked higher? (b) Suppose we now smooth the language model for di and d2 using the Jelinek-Mercer smoothing method with 0.8 (i.e. , p(uld-A-Pinle(tul Ma) + (1 A).Mnle(u, Me)) Recompute the likelihood of the query for both di and d2, i.e., p(Qldi) and p(Qd2) (Do not compute the log-likelihood. You should use the scientific notation) Which document would be ranked higher

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Hi, I was wondering if you can help me with some financial accounting work? I need help in computing the numbers for Forecast Values for 2024 for Humana using the information below. I have a sample...

3 COLLEGE ALGEBRA - TRIGONOMETRY Business and Finance (MAT115) This course will start with a review of basic algebra (factoring, solving linear equations, and equalities, etc.) and proceed to a study...

Please scan the SEC Plain English that I've attached. Please visit to this link.http://www.sec.gov/Archives/edgar/data/320193/000119312513416534/d590790d10k.htm#toc590790_9 Please read pages 25...

How much do you agree with the criteria? Which one do you prefer using in an emergency, If the standards are consistent, you can give compliments or make suggestions explain why? meducators from...

Providing Quality School-Based Learning and Support Services 239 Chapter 6 Language and literacy support Your core task The core task of almost all TAs is to support students language and literacy...

How are the standards similar, different and if they are identical, explain why you think they are identical meducators from kindergarten through college, and parents, students, and other Writing,...

ANSI-SPARC6 Programming Language Compilation Write notes on each of the following topics: (a) the implementation of labels and jumps in a recursive, block structured programming language [7 marks]...

Briefly discuss the cost-volume-profit analysis model and how it is used. Use the attached PDF. APA Citation. 300 words. REVISED PAGES 3 Chapter Three 1 Fundamentals of Cost-Volume-Prot Analysis...

American Journal of Business Antecedents of consumer animosity and the role of product involvement on purchase intentions Ji Eun Park, Sung-Joon Yoon, Article information: To cite this document: Ji...

Note: All ML code must be explained clearly (INJAVAXX)and should be free of needless complexity. 2 CST.2016.1.3 2 Foundations of Computer Science Please help. (2c) (a) A prime number sieve is an...

Agency Prefix 8888885 93 93 93 93 93 20 CFDA Extension 420 110 389 847 865 810 U03 A nonprofit children's hospital has the following federal grant awards for fiscal year ended June 30, 2020. Amount...

Competition-colonization models The metapopulation model from Exercise 15 can be extended to include two species, where one is a superior competitor. The equations are dp1dt c1p1s1 2 p1d 2 m1p1 dp2dt...

The dimensions of market liquidity are: Question 2 options: Tightness, depth, and resilience. Tightness, depth, and concentration. Depth, resilience, and cost. Tightness, resilience, and volume.

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

What are Measures in OLAP Cubes?

How do OLAP Databases provide for Drilling Down into data?

How are OLAP Cubes different from Production Relational Databases?