Problems Goal Find the shortest common ancestor of a digraph in WordNet, a semantic lexicon for...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Problems Goal Find the shortest common ancestor of a digraph in WordNet, a semantic lexicon for the English language that compu- tational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM's Jeopardy- playing Watson computer system. WordNet groups words into sets of synonyms called synsets. For example, {AND circuit, AND gate} is a synset that represents a logical gate that fires only when all of its inputs fire. WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset {gate, logic gate} is a hypernym of {AND circuit, AND gate} because an AND gate is a kind of logic gate. The WordNet Digraph Your first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→ w denotes that w is a hypernym of v. The WordNet digraph is a rooted DAG: it is acyclic and has one vertex - the root that is an ancestor of every other vertex. However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph is shown below. The WordNet Input File Formats We now describe the two data files that you will use to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas. List of synsets. The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i. The first field is the synset id, which is always the integer i; the second field is the synonym set (or synset); and the third field is its dictionary definition (or gloss), which is not relevant to this assignment. id % more synsets.txt ⠀ synset 34 (AIDS acquired_immune_deficiency_syndrome, a serious (often fatal) disease of the immune system 35, ALGOL, a programming language used to express computer programs as algorithms 36 AND_circuit AND_gate, a circuit in a computer that fires only when all of its inputs fire 37, APC, a drug combination found in some over-the-counter headache remedies 38, ASCII character, any member of the standard code for representing characters by binary numbers 39, ASCII character_set, (computer science) 128 characters that make up the ASCII coding scheme 40, ASCII_text_file, a text file that contains only ASCII characters without special formatting 41, ASL American_sign_language, the sign language used in the United States 42, AWOL (one who is away or absent without leave ⠀ gloss For example, line 36 implies that the synset AND_circuit AND_gate has an id number of 36 and it's gloss is "a circuit in a computer that fires only when all of its inputs fire". The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the words are connected by the underscore character. List of hypernyms. The file hypernyms.txt contains the hypernym relationships. Line i of the file contains the hypernyms of synset i. The first field is the synset id, which is always the integer i; subsequent fields are the id numbers of the synset's hypernyms. id % more hypernyms.txt ⠀ 34,47569,48084 35, 19983 36 42338 37,53717 38,28591 39,28597 40,76057 41,70206 42,18793 ⠀ hypernyms For example, line 36 implies that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as it only hypernym. Line 34 implies that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 48084 (infectious disease). damage harm impairment change alteration modification transition leap hump saltation happening occurrence occurrent natural event miracle increase jump leap event miracle forfeit forfeiture sacrifice resistance opposition demotion act human_action human activity action change motion movement move locomotion travel ↑ run running dash sprint transgression variation group_action descent jump parachuting Problem 1. (WordNet Data Type) Implement an immutable data type called WordNet with the following API: WordNet WordNet (String synsets, String hypernyms) Iterable<String> nouns () boolean is Noun (String word) String sca(String noun1, String noun2) int distance (String noun1, String noun2) constructs a WordNet object given the names of the input (synset and hypernym) files returns all WordNet nouns returns true if the given word is a WordNet noun, and false otherwise returns a synset that is a shortest common ancestor of nount and noun2 returns the length of the shortest ancestral path between nount and noun2 Corner Cases • The constructor should throw a NullPointerException() with the message "synsets is null" if synsets is null and the message "hypernyms is null" if hypernyms is null. • The isNoun () method should throw a NullPointerException ("word is null") if word is null. • The sca() and distance () methods should throw a NullPointerException() with the message "nounl is null" or "noun2 is null" if noun or noun2 is null. The methods should throw an Illegal ArgumentException () with the message "nounl is not a noun" or "noun2 is not a noun" if noun1 or noun2 is not a noun. Performance Requirements • The constructor and the nouns () method should run in time T(n) ~n, where n is the size of the WordNet lexicon. • The isNoun () method should run in time T(n) ~ 1. • The sca() and distance () methods should make exactly one call to the ancestor () and length() methods in Shortest Common Ancestor, respectively. >_"/workspace/project6 $ java WordNet data/synsets.txt data/hypernyms.txt worm bird # of nouns 119188 isNoun (worm)? true isNoun (bird)? true isNoun (worm bird)? false. sca (worm, bird) = animal animate_being beast brute creature fauna distance (worm, bird) = 5 Directions: Instance variables: A symbol table that maps a synset noun to a set of synset IDs (a synset noun can belong to multiple synsets), RedBlackBST<String, SET<Integer >> st. - A symbol table that maps a synset ID to the corresponding synset string, RedBlackBST<Integer, String> rst. - For shortest common ancestor computations, ShortestCommon Ancestor sca. WordNet (String synsets, String hypernyms) Initialize instance variables st and rst appropriately using the synset file. - Construct a DiGraph object & (representing a rooted DAG) with V vertices (equal to the number of entries in the synset file), and add edges to it, read in from the hypernyms file. - Initialize sca using G. • Iterable<String> nouns () - Return all WordNet nouns. boolean isNoun (String word) - Return true if the given word is a synset noun, and false otherwise. ●String sca(String noun1, String noun2) - Use sca to compute and return a synset that is a shortest common ancestor of the given nouns. • int distance (String noun1, String noun2) – Use sca to compute and return the length of the shortest ancestral path between the given nouns. Shortest Common Ancestor An ancestral path between two vertices v and w in a rooted DAG is a directed path from v to a common ancestor x, together with a directed path from w to the same ancestora. A shortest ancestral path is an ancestral path of minimum total length. We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor. Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path. v = 3, w = 10 shortest ancestral path: 3-1-5-9-10 associated length: 4 shortest common ancestor: 1 13 8 (14) 10 8 We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B. 16 22 A = { 13, 23, 24), B = {6, 16, 17 } ancestral path: 13-7-3-1-0-2-6 ancestral path: 23-20-12-5-10-17 ancestral path: 23-20-12-5-2-6 (17 18 v = 1, w = 5 ancestral path: 1-2-3-4-5 shortest ancestral path: 1-0-5 associated length: 2 shortest common ancestor: 0 19 0 20 (23) (24 shortest ancestral path: 13-7-3-9-16 associated length: 4 shortest common ancestor: 3 Problems Goal Find the shortest common ancestor of a digraph in WordNet, a semantic lexicon for the English language that compu- tational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM's Jeopardy- playing Watson computer system. WordNet groups words into sets of synonyms called synsets. For example, {AND circuit, AND gate} is a synset that represents a logical gate that fires only when all of its inputs fire. WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset {gate, logic gate} is a hypernym of {AND circuit, AND gate} because an AND gate is a kind of logic gate. The WordNet Digraph Your first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→ w denotes that w is a hypernym of v. The WordNet digraph is a rooted DAG: it is acyclic and has one vertex - the root that is an ancestor of every other vertex. However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph is shown below. The WordNet Input File Formats We now describe the two data files that you will use to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas. List of synsets. The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i. The first field is the synset id, which is always the integer i; the second field is the synonym set (or synset); and the third field is its dictionary definition (or gloss), which is not relevant to this assignment. id % more synsets.txt ⠀ synset 34 (AIDS acquired_immune_deficiency_syndrome, a serious (often fatal) disease of the immune system 35, ALGOL, a programming language used to express computer programs as algorithms 36 AND_circuit AND_gate, a circuit in a computer that fires only when all of its inputs fire 37, APC, a drug combination found in some over-the-counter headache remedies 38, ASCII character, any member of the standard code for representing characters by binary numbers 39, ASCII character_set, (computer science) 128 characters that make up the ASCII coding scheme 40, ASCII_text_file, a text file that contains only ASCII characters without special formatting 41, ASL American_sign_language, the sign language used in the United States 42, AWOL (one who is away or absent without leave ⠀ gloss For example, line 36 implies that the synset AND_circuit AND_gate has an id number of 36 and it's gloss is "a circuit in a computer that fires only when all of its inputs fire". The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the words are connected by the underscore character. List of hypernyms. The file hypernyms.txt contains the hypernym relationships. Line i of the file contains the hypernyms of synset i. The first field is the synset id, which is always the integer i; subsequent fields are the id numbers of the synset's hypernyms. id % more hypernyms.txt ⠀ 34,47569,48084 35, 19983 36 42338 37,53717 38,28591 39,28597 40,76057 41,70206 42,18793 ⠀ hypernyms For example, line 36 implies that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as it only hypernym. Line 34 implies that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 48084 (infectious disease). damage harm impairment change alteration modification transition leap hump saltation happening occurrence occurrent natural event miracle increase jump leap event miracle forfeit forfeiture sacrifice resistance opposition demotion act human_action human activity action change motion movement move locomotion travel ↑ run running dash sprint transgression variation group_action descent jump parachuting Problem 1. (WordNet Data Type) Implement an immutable data type called WordNet with the following API: WordNet WordNet (String synsets, String hypernyms) Iterable<String> nouns () boolean is Noun (String word) String sca(String noun1, String noun2) int distance (String noun1, String noun2) constructs a WordNet object given the names of the input (synset and hypernym) files returns all WordNet nouns returns true if the given word is a WordNet noun, and false otherwise returns a synset that is a shortest common ancestor of nount and noun2 returns the length of the shortest ancestral path between nount and noun2 Corner Cases • The constructor should throw a NullPointerException() with the message "synsets is null" if synsets is null and the message "hypernyms is null" if hypernyms is null. • The isNoun () method should throw a NullPointerException ("word is null") if word is null. • The sca() and distance () methods should throw a NullPointerException() with the message "nounl is null" or "noun2 is null" if noun or noun2 is null. The methods should throw an Illegal ArgumentException () with the message "nounl is not a noun" or "noun2 is not a noun" if noun1 or noun2 is not a noun. Performance Requirements • The constructor and the nouns () method should run in time T(n) ~n, where n is the size of the WordNet lexicon. • The isNoun () method should run in time T(n) ~ 1. • The sca() and distance () methods should make exactly one call to the ancestor () and length() methods in Shortest Common Ancestor, respectively. >_"/workspace/project6 $ java WordNet data/synsets.txt data/hypernyms.txt worm bird # of nouns 119188 isNoun (worm)? true isNoun (bird)? true isNoun (worm bird)? false. sca (worm, bird) = animal animate_being beast brute creature fauna distance (worm, bird) = 5 Directions: Instance variables: A symbol table that maps a synset noun to a set of synset IDs (a synset noun can belong to multiple synsets), RedBlackBST<String, SET<Integer >> st. - A symbol table that maps a synset ID to the corresponding synset string, RedBlackBST<Integer, String> rst. - For shortest common ancestor computations, ShortestCommon Ancestor sca. WordNet (String synsets, String hypernyms) Initialize instance variables st and rst appropriately using the synset file. - Construct a DiGraph object & (representing a rooted DAG) with V vertices (equal to the number of entries in the synset file), and add edges to it, read in from the hypernyms file. - Initialize sca using G. • Iterable<String> nouns () - Return all WordNet nouns. boolean isNoun (String word) - Return true if the given word is a synset noun, and false otherwise. ●String sca(String noun1, String noun2) - Use sca to compute and return a synset that is a shortest common ancestor of the given nouns. • int distance (String noun1, String noun2) – Use sca to compute and return the length of the shortest ancestral path between the given nouns. Shortest Common Ancestor An ancestral path between two vertices v and w in a rooted DAG is a directed path from v to a common ancestor x, together with a directed path from w to the same ancestora. A shortest ancestral path is an ancestral path of minimum total length. We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor. Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path. v = 3, w = 10 shortest ancestral path: 3-1-5-9-10 associated length: 4 shortest common ancestor: 1 13 8 (14) 10 8 We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B. 16 22 A = { 13, 23, 24), B = {6, 16, 17 } ancestral path: 13-7-3-1-0-2-6 ancestral path: 23-20-12-5-10-17 ancestral path: 23-20-12-5-2-6 (17 18 v = 1, w = 5 ancestral path: 1-2-3-4-5 shortest ancestral path: 1-0-5 associated length: 2 shortest common ancestor: 0 19 0 20 (23) (24 shortest ancestral path: 13-7-3-9-16 associated length: 4 shortest common ancestor: 3
Expert Answer:
Answer rating: 100% (QA)
WordNetjava import javaioFile import javautilArrayList import javautilHashSet import javautilList import javautilSet public class WordNet File synsetsFile File hypernymsFile Digraph hypernyms List syn... View the full answer
Related Book For
Principles of Information Systems
ISBN: 978-0324665284
9th edition
Authors: Ralph M. Stair, George W. Reynolds
Posted Date:
Students also viewed these programming questions
-
answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...
-
Monroe Inc. is an all-equity firm with 500,000 shares outstanding. It has $2,000,000 of EBIT, and EBIT is expected to remain constant in the future. The company pays out all of its earnings, so...
-
A first-year co-op student working for Insidz Corp. recorded the company's transactions for the month. He was a little unsure about the recording process, but he did the best he could. He had a few...
-
A square has perimeter 160 in. What would be the perimeter of an equilateral triangle whose sides each measure the same length as the side of the square?
-
On April 23, 2014, Calvin Loyer admitted his wife, Edeltrud Loyer, to a nursing home administered by Signature Healthcare. During the admissions process, Calvin signed an arbitration agreement...
-
Radmore Memorial Hospital has a problem in its fluids analysis lab. The lab has available three machines that analyze various fluid samples. Recently, the demand for analyzing blood samples has...
-
Explain why if assets are valuable resources and asset accounts have debit balances why do expense accounts also have debit balances. What is meant by the normal balance?
-
3. (a) Suppose the owner of a house has been confined to a wheelchair and so changes are needed to the house so that both the owner and the other residents can live there. Various possible changes...
-
what are some examples of when law enforcement can search a person, place or thing?
-
How do globalization and transnationalism impact the preservation and adaptation of ethnic cultures and traditions in an increasingly interconnected world?
-
The annexation of Crimea by Russia, how this conflict impacted on international order? Please explain in full details.
-
Would a law that allows juveniles to be treated as adults be constitutional if it disproportionately affect one particular group of people?
-
what extent do political factors influence the construction and maintenance of ethnic identities, particularly in regions marked by conflict or colonial legacies?
-
What is your opinion? John Martin, director of manufacturing, is not happy with his colleague, Pete Change, who is director of safety and health at Robbins Engineering Corporation. Change has ordered...
-
Is it a breach of fiduciary duty for a director of a real estate investment trust (REIT) negotiating a joint venture on behalf of the REIT with another director for the development of a portfolio of...
-
You have been hired to perform systems investigation for a French restaurant owner in a large metropolitan area. She is thinking of opening a new restaurant with a state-of-theart computer system...
-
What is the hierarchy of data in a database?
-
Assume that you want to start a new video rental business for students at your college or university. Go through logical design for a new information system to help you keep track of the videos in...
-
In the section of his 2007 letter to the shareholders of Berkshire Hathaway titled Fanciful FiguresHow Public Companies Juice Earnings, Warren Buffett referred to the investment return assumption...
-
Based on 2012 revenues, the six largest providers of oilfield services are: 1. Schlumberger Ltd. (NYSE: SLB) Revenues: $42.1 billion Net income: $5.5 billion 2. Halliburton (NYSE: HAL) Revenues:...
-
On 21 September 2000, Intel Corporation (NASDAQ -GS: INTC)3 issued a press release containing information about its expected revenue growth for the third quarter of 2000. The announced growth fell...
Study smarter with the SolutionInn App