Question: You will construct a simple tokenizer class called URLLexer.java, which takes an array of regular expression strings (one per token category, in the exact order
You will construct a simple tokenizer class called URLLexer.java, which takes an array of regular expression strings (one per token category, in the exact order given above) and a string to tokenize. The class will implement the following methods:
The constructor sets up the tokenizer given the regular expressions.
reset(string) resets the tokenizer to the beginning of string, and sets up any other variables you may need to keep track of, such as current position in the input, the matching index for the token, etc.
nextToken() provides the next token, else null if no more tokens, or when it encounters text in the string which cant be tokenized by any of the regular expressions provided.
getMatchingIndex() returns the index into the array of regular expression strings which matched the token which had recently been returned by nextToken()
getPosition() returns current position in the token stream where the next token will be extracted.
main(...) will do the primary code as described later
Your main(...) function will work as follows. You will repeatedly request a URL by printing URL: . Once the user has provided a URL, you will trim it of whitespace, then tokenize it. As you tokenize it you will print out the tokens one by one, including their token types. If you find a duplicate token type, you will FAIL. You will also FAIL if the tokenizer cannot recognize any further tokens but you still have characters left to tokenize. If you manage to finish tokenizing a URL, you will pass the tokens to the fetch(...) function provided below. Whenever a failure occurs, you will indicate it, then loop again to request another URL.
import java.util.regex.*; import java.io.*; import java.util.*; import java.net.*; public class URLLexer { // These are the 7 tokens in our simplified URL definition public static final int PROTOCOL = 0; public static final int NUMERICAL_ADDRESS = 1; public static final int NON_NUMERICAL_ADDRESS = 2; public static final int PORT = 3; public static final int FILE = 4; public static final int FRAGMENT = 5; public static final int QUERY = 6; // Here you place regular expressions, one per token. Each is a string. public static final String[] REGULAR_EXPRESSION = new String[] { "Not Defined Yet", // protocol "Not Defined Yet. This one will be very long.", // numerical address "Not Defined Yet", // non-numerical address "Not Defined Yet", // port "Not Defined Yet", // file "Not Defined Yet", // fragment "Not Defined Yet", // query }; // This is an array of names for each of the tokens, which might be convenient for you to // use to print out stuff. public static final String[] NAME = new String[] { "protocol", "numerical address", "non-numerical address", "port", "file", "fragment", "query" }; /** Creates a Blank URLLexer set up to do pattern-matching on the given regular expressions. */ public URLLexer() { // IMPLEMENT ME (ABOUT 5 LINES) } /** Resets the URLLexer to a new string as input. */ public void reset(String input) { // IMPLEMENT ME (ABOUT 3 LINES) } public int getMatchingIndex() { // IMPLEMENT ME (ABOUT 1 LINE) } public int getPosition() { // IMPLEMENT ME (ABOUT 1 LINE) } public String nextToken() { // IMPLEMENT ME (ABOUT 10 LINES) } public static void main(String[] args) throws IOException { // IMPLEMENT ME. // // You will repeatedly request a URL by printing "URL: ". Once the user has provided // a URL, you will trim it of whitespace, then tokenize it. As you tokenize it you // will print out the tokens one // by one, including their token types. If you find a duplicate token type, you will // FAIL. You will also FAIL if the tokenizer cannot recognize any further tokens but // you still have characters left to tokenize. If you manage to finish tokenizing // a URL, you will pass the tokens to the fetch(...) function provided below. Whenever // a failure occurs, you will indicate it, then loop again to request another URL. } // perhaps this function might come in use. // It takes various tokenized values, checks them for validity, then fetches the data // from a URL formed by them and prints it to the screen. public static void fetch(String protocol, String numericalAddress, String nonNumericalAddress, String port, String file, String query, String fragment) { String address = numericalAddress; int iport = 80; // verify the URL if (protocol == null || !protocol.equals("http://")) { System.out.println("ERROR. I don't know how to use protocol " + protocol); } else if (query != null) { System.out.println("ERROR. I'm not smart enough to issue queries, like " + query); } else if (numericalAddress == null && nonNumericalAddress == null) { System.out.println("ERROR. No address was provided."); } else if (numericalAddress != null && nonNumericalAddress != null) { System.out.println("ERROR. Both types of addresses were provided."); } else { if (address == null) { address = nonNumericalAddress; } if (fragment != null) { System.out.println("NOTE. Fragment provided: I will not use it."); } if (port != null) { iport = Integer.parseInt(port.substring(1)); // strip off the ":" } else { System.out.println("NOTE. No port provided, defaulting to port 80."); } if (file == null) { System.out.println("NOTE. No file was provided. Assuming it's just /"); file = "/"; } System.out.println("Downloading ADDRESS: " + address + " PORT: " + iport + " FILE: " + file); System.out.println(" ======================================="); java.io.InputStream stream = null; try { java.net.URL url = new java.net.URL("http", address, iport, file); java.net.URLConnection connection = url.openConnection(); connection.connect(); stream = connection.getInputStream(); final int BUFLEN = 1024; byte[] buffer = new byte[BUFLEN]; while(true) { int len = stream.read(buffer, 0, BUFLEN); if (len <= 0) break; System.out.write(buffer, 0, len); } } catch (java.io.IOException e) { System.out.println("Error fetching data."); } try { if (stream != null) stream.close(); } catch (java.io.IOException e) { } System.out.println(" ======================================="); } } } Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
