Question: In this questions, you will write the algorithms for a sample application using the Java Collection classes. This part deals with building a prototype for
In this questions, you will write the algorithms for a sample application using the Java Collection classes.
This part deals with building a prototype for the predictive text problem, which is not expected to be ecient, but it will be simple and allow you to compare it with the ecient implementation to be done in later parts. Write the rst two methods in a class named PredictivePrototype inside the package predictive.


Before the advent of touch screens, mobile telephones in English-speaking countries used a keypad like this one: 1 2 (abc) 3 (def) 4 (ghi) | 5 (jkl) | 6 (mno). 7 (pqrs) | 8 tuv) 9 (wxyz) space As you notice, there are keys for digits 1-9, used for dialing phone numbers. But these keys were also used to enter letters a-z. When a text message needed to be entered, the keys corresponding to the letters would be used. However, since there are multiple letters on each key, the required letter needed to be disambuated somehow. In the basic system without predictive text, the user must press the appropriate key a number of times for a particular letter to be shown. Consider the word "hello". With this method, the user must press 4, 4, 3, 3, 5, 5, 5, then pause, then 5, 5, 5, 6, 6, 6. To enter text more easily, the system of predictive text (also called "T9") was devised. The user presses each key only once and the mobile phone uses a dictionary to guess what word is being typed using a dictionary, and displays the possible matches. So the word "hello" can be typed in 5 button presses "13556" without pauses, instead of 12 in the standard system. The numeric string "43556" is referred to as a "signature" of the world "hello". If this is the only match, the user can press space and carry on. If there are multiple matches, the user might need to select one of them before proceeding. A given numeric-signature may correspond to more than one word. Predictive text technology is possible by restricting available words to those in a dictionary. Entering the numeric signature "4663" produces the words "gone" and "home" in many dictionaries. In this exercise, you will design and develop a predictive text system. For simplicity, assume that the user does not need punctuation or numerals. You must also limit your solutions to producing only lower-case words. The final version of your programs should use the dictionary found in /usr/share/dict/words on the School's file systems. However, during testing, it is better for you to create small dictionary files of your own for which you know what outputs to expect. All the classes in this worksheet should be placed in a package called predictive. Use the class/method names given in the question. 1. (5%) : Write a method wordToSignature with the type: public static String wordToSignature(String word) The method takes a word and returns a numeric signature. For example, "home" should return "4663". If the word has any non-alphabetic characters, replace them with a " " (space) in the resulting signature. Accumulate the result character-by-character. You should do this using the StringBuffer class rather than String. Explain, in your comments, why this will be more efficient. 2. (10%): Write another method signatureToWords with the type: public static Set signatureToWords (String signature) It takes the given numeric signature and returns a set of possible matching words from the dictionary. The returned list must not have duplicates and each word should be in lower-case. The method signatureToWords will need to use the dictionary to find words that match the string signature and return all the matching words. In this part of the exercise, you should not store the dictionary in your Java program. Explain in the comments why this implementation will be inefficient. 3. (10%): Create command-line programs (classes with main methods) as follows: Words2SigProto for calling the wordToSignature method, and Sigs2WordsProto for calling the signature ToWords method. Each program must accept a list of strings and call the appropriate method to do the conversion. Hints: Use the Scanner class to read the dictionary line by line, assuming there is only one word per line. When reading the dictionary, ignore lines with non-alphabetic characters. A useful helper method to accomplish this would be: static boolean isValidWord (String word) in PredictivePrototype, which checks if a given word is valid. . Words in the dictionary with upper case letters should be converted to lower-case because only lower-case letters should be returned by the signatureToWords method. . You should be able to complete this part of the Worksheet and test it in about one lab session. To create the command-line programs, you will need to use the args array of the method: public static void main(String[] args) which contains the command line input. For example, when executing sxs@cca112: $ java predictive. Words2SigProto Hello world! this is the input the args array will contain ["Hello", "World!", "this", "is", "the", "input"] . You should ignore any words with non-alphabetic characters given in the input of Sigs2WordsProto. Format the output of Sigs2WordsProto as one line per signature, as there may be more than one word for a given numeric signature. E.g. sxs@cca112: $ java predictive. Sigs2WordsProto 4663 329 4663 : good gone home hone hood hoof 329 : dax fax faz day fay daz the actual output you get will depend on the dictionary used. Notice that the package name predictive qualifies the class name, and this command works in the main directory. You can also use the -cp.. option to run the command from a different directory, e.g., sxsOcca112:"/predictive$ java -cp.. predictive. Sigs2WordsProto 4663 329 The program Words2SigProto can be tested by converting large amounts of text to signatures, the output can be used to test Sigs2WordsProto (and later, in timing comparisons). Try using news articles to start with. Before the advent of touch screens, mobile telephones in English-speaking countries used a keypad like this one: 1 2 (abc) 3 (def) 4 (ghi) | 5 (jkl) | 6 (mno). 7 (pqrs) | 8 tuv) 9 (wxyz) space As you notice, there are keys for digits 1-9, used for dialing phone numbers. But these keys were also used to enter letters a-z. When a text message needed to be entered, the keys corresponding to the letters would be used. However, since there are multiple letters on each key, the required letter needed to be disambuated somehow. In the basic system without predictive text, the user must press the appropriate key a number of times for a particular letter to be shown. Consider the word "hello". With this method, the user must press 4, 4, 3, 3, 5, 5, 5, then pause, then 5, 5, 5, 6, 6, 6. To enter text more easily, the system of predictive text (also called "T9") was devised. The user presses each key only once and the mobile phone uses a dictionary to guess what word is being typed using a dictionary, and displays the possible matches. So the word "hello" can be typed in 5 button presses "13556" without pauses, instead of 12 in the standard system. The numeric string "43556" is referred to as a "signature" of the world "hello". If this is the only match, the user can press space and carry on. If there are multiple matches, the user might need to select one of them before proceeding. A given numeric-signature may correspond to more than one word. Predictive text technology is possible by restricting available words to those in a dictionary. Entering the numeric signature "4663" produces the words "gone" and "home" in many dictionaries. In this exercise, you will design and develop a predictive text system. For simplicity, assume that the user does not need punctuation or numerals. You must also limit your solutions to producing only lower-case words. The final version of your programs should use the dictionary found in /usr/share/dict/words on the School's file systems. However, during testing, it is better for you to create small dictionary files of your own for which you know what outputs to expect. All the classes in this worksheet should be placed in a package called predictive. Use the class/method names given in the question. 1. (5%) : Write a method wordToSignature with the type: public static String wordToSignature(String word) The method takes a word and returns a numeric signature. For example, "home" should return "4663". If the word has any non-alphabetic characters, replace them with a " " (space) in the resulting signature. Accumulate the result character-by-character. You should do this using the StringBuffer class rather than String. Explain, in your comments, why this will be more efficient. 2. (10%): Write another method signatureToWords with the type: public static Set signatureToWords (String signature) It takes the given numeric signature and returns a set of possible matching words from the dictionary. The returned list must not have duplicates and each word should be in lower-case. The method signatureToWords will need to use the dictionary to find words that match the string signature and return all the matching words. In this part of the exercise, you should not store the dictionary in your Java program. Explain in the comments why this implementation will be inefficient. 3. (10%): Create command-line programs (classes with main methods) as follows: Words2SigProto for calling the wordToSignature method, and Sigs2WordsProto for calling the signature ToWords method. Each program must accept a list of strings and call the appropriate method to do the conversion. Hints: Use the Scanner class to read the dictionary line by line, assuming there is only one word per line. When reading the dictionary, ignore lines with non-alphabetic characters. A useful helper method to accomplish this would be: static boolean isValidWord (String word) in PredictivePrototype, which checks if a given word is valid. . Words in the dictionary with upper case letters should be converted to lower-case because only lower-case letters should be returned by the signatureToWords method. . You should be able to complete this part of the Worksheet and test it in about one lab session. To create the command-line programs, you will need to use the args array of the method: public static void main(String[] args) which contains the command line input. For example, when executing sxs@cca112: $ java predictive. Words2SigProto Hello world! this is the input the args array will contain ["Hello", "World!", "this", "is", "the", "input"] . You should ignore any words with non-alphabetic characters given in the input of Sigs2WordsProto. Format the output of Sigs2WordsProto as one line per signature, as there may be more than one word for a given numeric signature. E.g. sxs@cca112: $ java predictive. Sigs2WordsProto 4663 329 4663 : good gone home hone hood hoof 329 : dax fax faz day fay daz the actual output you get will depend on the dictionary used. Notice that the package name predictive qualifies the class name, and this command works in the main directory. You can also use the -cp.. option to run the command from a different directory, e.g., sxsOcca112:"/predictive$ java -cp.. predictive. Sigs2WordsProto 4663 329 The program Words2SigProto can be tested by converting large amounts of text to signatures, the output can be used to test Sigs2WordsProto (and later, in timing comparisons). Try using news articles to start with