CSSE 404: Lexical Analyzer

For this milestone you will implement a program to perform lexical analysis for MiniJava. Your lexeical analyzer will take the name of a text file on disk as an argument. The program will read the file and output the tokens recognized in order – one per line – according to the lexical structure

There are five sorts of tokens in the lexical syntax: ID, Integer, ReservedWord, Operator, and Delimiter. When your lexical analyzer recognizes a token, it should output the sort of token, followed by a comma, a space, then the value of the token. Case, spacing, and punctuation count. For example, if the input file were:

  /** This is a test. */
  class Test {
    public static void main (String[] args) {
      System.out.println(2 + 13); // cool
    }
  }

Your lexer should output exactly the following lines:

ReservedWord, class
ID, Test
Delimiter, {
ReservedWord, public
ReservedWord, static
ReservedWord, void
ReservedWord, main
Delimiter, (
ReservedWord, String
Delimiter, [
Delimiter, ]
ID, args
Delimiter, )
Delimiter, {
ReservedWord, System.out.println
Delimiter, (
Integer, 2
Operator, +
Integer, 13
Delimiter, )
Delimiter, ;
Delimiter, }
Delimiter, }

As shown in the example, your program should skip whitespace and comments.

Deliverables

By midnight on the milestone deadline, submit a zipped copy of the following to the corresponding drop-box on Moodle.

Grading

Your grade will be based on the percentage of the full set of sample input files that your program processes correctly. For each input, you’ll either get a zero or a one, i.e., no partial credit for processing a portion of a sample.

In addition to the example above, here are the remaining examples