Question: Write C or C++ Program! Write a tokenizer. The tokenizer should input a stream of ASCII characters and output the token, its line number in

Write C or C++ Program!

Write a tokenizer. The tokenizer should input a stream of ASCII characters and output the token, its line number in the stream, its type, and its value. You must write the code from scratch without the use of any lexicographical parsing libraries or utilities.

Tokenizer should recognize the following:

1. A few keywords: "if", "else", "for", "while"

2. A few single-character symbols: '&', '|', '+', '*', ':', ';'

3. Labels (alpha-numerical strings)

4. Integers (including negative integers)

5. Floating-point numbers in radix notation (such as pi, i.e. 3.14159265)

6. Floating-point numbers in exponential notation (such as Avogadro's number, i.e. 6.022140857E23)

Max 100 lines of code and I/O example, I/O has to be read from file!

Code example down

#include #include #include

// --------------------------------------------------------------------- // this trivial program reads a stream and outputs tokens // valid tokens: // 0: error // 1: whitespace (blank, tab, CR, LF) // 2: word (lowercase only) // 3: number (unsigned decimal integer) // --------------------------------------------------------------------- #define STATE_ERROR 0 #define STATE_WHITESPACE 1 #define STATE_WORD 2 #define STATE_NUMBER 3

char szWhiteSpace[]=" \t "; char szWord[]="abcdefghijklmnopqrstuvwxyz"; char szNumber[]="0123456789"; char *szStates[]={"ERROR", "WHITESPACE", "WORD", "NUMBER"};

int main(void){ char c; char szToken[256]; int nTokenSize=0; int nChars=0, nTokens=0, nLines=1; int nCurState=STATE_ERROR, nNextState=STATE_ERROR;

while((c=getc(stdin))) { if(c==EOF) break; nChars++;

if(strchr(szWord, c)) nNextState=STATE_WORD; else if(strchr(szNumber, c)) nNextState=STATE_NUMBER; else if(strchr(szWhiteSpace, c)) nNextState=STATE_WHITESPACE; else nNextState=STATE_ERROR;

if(nChars==1) nCurState=nNextState;

// uncomment the following line to debug // printf("line %4d, char %4d: %c [%02X] (%d -> %d) ", nLines, nChars, c, c, nCurState, nNextState);

if(nNextState==nCurState) { szToken[nTokenSize++]=c; if(c==' ') nLines++; continue; }

szToken[nTokenSize]=0; if((nCurState==STATE_WORD)||(nCurState==STATE_NUMBER)) printf("token %2d, line %2d: %10s (%s) ", ++nTokens, nLines, szToken, szStates[nCurState]);

nTokenSize=0; szToken[nTokenSize++]=c; nCurState=nNextState; if(c==' ') nLines++; }

return(nTokens); } // --------------------------------------------------------------------- // ---------------------------------------------------------------------

------------------------------------------------------------------------ SAMPLE input: ------------------------------------------------------------------------ one two three

1 11 123 13456

one 1 two 2 ------------------------------------------------------------------------ corresponding output: (./cs305_lex < in.txt) ------------------------------------------------------------------------ token 1, line 1: one (WORD) token 2, line 2: two (WORD) token 3, line 3: three (WORD) token 4, line 5: 1 (NUMBER) token 5, line 5: 11 (NUMBER) token 6, line 5: 123 (NUMBER) token 7, line 5: 13456 (NUMBER) token 8, line 7: one (WORD) token 9, line 7: 1 (NUMBER) token 10, line 8: two (WORD) token 11, line 8: 2 (NUMBER)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!