CSSE 404: Lexical Analyzer

For this milestone you will implement a program to perform lexical analysis for MiniJava. Your lexeical analyzer will take the name of a text file on disk as an argument. The program will read the file and output the tokens recognized in order – one per line – according to the lexical structure

There are five sorts of tokens in the lexical syntax: ID, Integer, ReservedWord, Operator, and Delimiter. When your lexical analyzer recognizes a token, it should output the sort of token, followed by a comma, a space, then the value of the token. Case, spacing, and punctuation count. For example, if the input file were:

  /** This is a test. */
  class Test {
    public static void main (String[] args) {
      System.out.println(2 + 13); // cool

Your lexer should output exactly the following lines:

ReservedWord, class
ID, Test
Delimiter, {
ReservedWord, public
ReservedWord, static
ReservedWord, void
ReservedWord, main
Delimiter, (
ReservedWord, String
Delimiter, [
Delimiter, ]
ID, args
Delimiter, )
Delimiter, {
ReservedWord, System.out.println
Delimiter, (
Integer, 2
Operator, +
Integer, 13
Delimiter, )
Delimiter, ;
Delimiter, }
Delimiter, }

As shown in the example, your program should skip whitespace and comments.


By midnight on the milestone deadline, submit a zipped copy of the following to the corresponding drop-box on Moodle.


Your grade will be based on the percentage of the full set of sample input files that your program processes correctly. For each input, you’ll either get a zero or a one, i.e., no partial credit for processing a portion of a sample.

In addition to the example above, here are the remaining examples