Question: Project Description Develop a spelling checker (i.e., best word predictor) using a 3-gram language model. Each student needs to collect an Arabic corpus of 1

Project Description Develop a spelling checker (i.e., best word predictor) using a 3-gram language model. Each student needs to collect an Arabic corpus of 1 million words at least. Students can not share the same corpus, fully or partially with each other, and cannot re-use text from previous years. Tokenize the corpus into tokens/words, and then build a tri-gram language model for this corpus. The language model should contain: token, count. + the probability (or log) of the token, and should be saved in a CSV file. Develop an interface to allow the user to write text then click a "spell" button. If the user writes "#" in the text, the program will suggest the top five words and their probability) as a replacement of the #, using the language model. Each student should submit his/her project via Moodle. The project should include. The source code, corpus, language model. The project should be JAVA. Example: Spell # 0.81 0.4 0.38 0.21 0.75
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
