Question: C LANGUAGE: Need help writing common.h , tokenizer.c , & recognizer.c so they are executable. Need to create two programs: a tokenizer and a recognizer.
C LANGUAGE: Need help writing common.h tokenizer.c & recognizer.c so they are executable. Need to create two programs: a tokenizer and a recognizer. Tokenizer will read an input file line by line and convert the textual input into an ordered collection of tokens and lexemes. Recognizer will use recursive descent to parse the output of the tokenizer and determine if the given tokens form a valid program. Common. The purpose of Common. is to define all includesimports constants, globals, enums, structsobjects which are shared or common between both Tokenizer & Recognizer. Common. will be stored in the same directory as both Tokenizer & Recognizer. Though Tokenizer & Recognizer are distinct programs which compile independently of each other, it is likely they'll share some commonality Tokenizer will read in two command line arguments when run. The first command line argument is the filepath of the input file and the second is the filepath of the output file. Your program will need to take in these two command line arguments, no other methods for acquiring the filepaths are allowed. When run, Tokenizer. will read in all characters from the input file. It will convert these characters into lexemes, then associate each lexeme with a token class. It's my recommendation that you construct lexemes character by character. ie iterate over every individual character in the input file and determine if the current character is part of an alphanumeric lexeme, whitespace, or part of a symbol lexeme. The type of character you identify will determine the next action your tokenizer will take. After generating a lexeme, it must be associated with a token class. You can do this association after a lexeme is generated or you can generate all lexemes then associate all generated lexemes with their respective token classes. While either approach is valid, I recommend the latter approach. Breaking lexeme generation apart from token association simplifies the overall program structure. Two token classes in the provided lexical structure are defined via regular expressionsany string which matches the specified regex is part of that token class The exception is strings explicitly defined in lexical structure which are reserved words. ie "return" would be an IDENTIFIER token as it matches the regex provided for IDENTIFIER in the given lexical structure. But "return" is explicitly defined as a RETURNKEYWORD token earlier in the lexical structure. To avoid any confusion, you ought to compare lexemes against the token classes in the order defined in the lexical structure. ie check if the generated lexeme is a reserved word before checking if it's an IDENTIFIER token. Recognizer will read in two command line arguments when run. The first command line argument is the filepath of the input file and the second is the filepath of the output file. Program will need to take in these two command line arguments, no other methods for acquiring the filepaths will be accepted. When run Recognizer will read in a list of tokens and their associated lexemes from the input file. The output file from Tokenizer will be used as the input file for Recognizer. It will determine if the ordered set of tokens from the input file is legal in the language defined by given EBNF grammar. The purpose of our recognizer is to apply given grammar rules and report any syntax errors. To accomplish this Recognizer will implement a recursive decent parser. The implemented parser must be a recursive decent predictive parser which utilizes singlesymbol lookahead consuming each token one at a time. Parsers that utilize multisymbol lookahead will not be accepted. An input is syntactically invalid if a token or nonterminal was required by the current EBNF grammar rule but not present. If a syntax error is found parsing should halt & program should report an error by printing an error message to the output file. If a token was expected but not present, the error must specify: Which grammar rule had an error, which number token we were examining, expected token & actual token. Example format as follows: Error: In grammar rule body, expected token # to be RIGHTBRACKET but was IDENTIFIER. Given a grammatically valid input every given token must be parsed. If the toplevel grammar rule function is invoked & concludes without error this indicates that the ordered set of input tokens were syntactically valid. If the toplevel grammar rule concludes without error the given set of tokens is only valid if all tokens have been consumed. ie if the first tokens are a syntactically valid input but the input file contained tokens, this is a syntax error. This must be identified and reported with following: Error: Only consumed of the given tokens. If all input tokens are consumed and no syntax errors reported, Recognizer will output "PARSED!!!". Recognizers that dont consume every given token for a grammatically valid input will not be accepted
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
