Question: Task 2 : Design a Jelinek - Mercer based Language Model ( JM _ LM ) that ranks documents in each data collection using the
Task : Design a JelinekMercer based Language Model JMLM that ranks documents in
each data collection using the corresponding topic query for all data collections.
Inputs: long queries topics in theQueries.txt and the corresponding data collections
DataC DataC DataC
Output: ranked document files eg for Query R the output file name is
JMLMRRanking.dat for all data collections and save them in the folder
RankingOutputs
For each long query topic Rx you need to use the following equation to calculate a conditional
probability for each document D in the corresponding data collection dataset:
where is the number of times query word qi occurs in document DD is the number of
word occurrences in D is the number of times query word qi occurs in the data collection
DataCxDataCx is the total number of word occurrences in data collection DataCx and
parameter lambda
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
