Question: Task 2 : Design a Jelinek - Mercer based Language Model ( JM _ LM ) that ranks documents in each data collection using the

Task 2: Design a Jelinek-Mercer based Language Model (JM_LM) that ranks documents in
each data collection using the corresponding topic (query) for all 50 data collections.
Inputs: 50 long queries (topics) in the50Queries.txt and the corresponding 50 data collections
(Data_C101, Data_C102,..., Data_C150).
Output: 50 ranked document files (e.g., for Query R107, the output file name is
JM_LM_R107Ranking.dat) for all 50 data collections and save them in the folder
RankingOutputs.
For each long query (topic) Rx, you need to use the following equation to calculate a conditional
probability for each document D in the corresponding data collection (dataset):
3
where is the number of times query word qi occurs in document D,|D| is the number of
word occurrences in D, is the number of times query word qi occurs in the data collection
Data_Cx,|Data_Cx| is the total number of word occurrences in data collection Data_Cx, and
parameter \lambda =0.4.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!