Practical 1: Assembler I

Objectives

This is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.

Following completion of this practical you should be able to:

Guidelines

Time Estimate This practical is estimated to take about 3-4 hours per student (so 6-8 hours total) on average. Time taken will vary depending on your famliarity with Python, comfort with the course material, and how well you work with this partner.

Preliminary Tasks

Obtain the worksheet.

1 Install Python

Install Python 3.x, if you already have Python installed from another class or project you should be good to go. Otherwise, you can download an installer here.

2 Install VSCode

  1. We'll be using VSCode as the IDE for this class, you can download and install it from here. (Read the next step before you click that link.)

  2. During install you may be prompted for what languages you want the IDE to support. If so, select Python and it should install the basic Python extensions created by Microsoft.

    If it does not prompt you during install, you will need to manually install the "Python" and "Python Debugger" Microsoft extensions. Expand the section below for a screenshot if you want it.

    Installing Python support in VS Code
    • Once VSCode opens select the Extensions button on left (it looks like 4 squares with one of them detached).
    • In the search bar type "python", select and install *both* the Python extension from Microsoft and the Python Debugger extension from Microsoft
  3. While you're installing extensions, you should also install two more extensions for upcoming practicals:

    • Verilog HDL - by "leafvmaple"
    • Verilog-HDL/SystemVerilog/Bluespec System Verilog - by "Masahiro Hiramori"
  4. Next, in the search bar at the top of the VSCode window type >terminal and select "Python: Create Terminal" It will open a terminal in the lower part of the screen.

  5. In that terminal type python --version to launch python and ceck the version number to ensure you're running python 3 or newer. You may need to specify a version of python (e.g. type python3.12) to get the right version running in the terminal. Type exit() to close the python interpreter.

3 Get your Assembler repository

To get your git repo, you'll need the git tools installed. If you are using Windows and need to install git, see the git scm website. If you use a different OS, you should use the install method best for your operating system.

  1. We'll be using Github For Education to manage repositories for this course. Read all of these instructions before you begin.

    1. You'll need to create a github account, your account name should be recognizable as your RHIT username.
      1. Use the pattern: "rhit-username", for example Robert's account name would be: "rhit-williarj".
      2. Even if you already have your own personal Github account please make a course-work specific account. We don't want to have to figure out who XxXSuperLuigi2022XxX is when we're grading. If we can't figure out who you are from the username we'll assume you didn't submit the assignment.
      3. You should set up SSH keys for github access. Scroll to the bottom of this page and follow the directions in "Fixing GitHub Authentication Issues". You can read details about that here.
    2. Join the GitHub Classroom for this class.
      1. Go to this url to join the classroom.
      2. You may be prompted to grant Github Education access to your account, grant access if asked.
      3. Select your Rose username from the list of students when you create your account.
        • If your username is not listed, please contact your instructor.
      4. STOP HERE until you have been assigned a group in class. You cannot accept the assignment on github until you have your team assignments from your instructor.
    3. Create and clone your Assembler repository.
      1. Once you've been assigned a group, have one team member create a team on Github Classroom.
        • Select "Accept this assignment" when prompted.
        • IMPORTANT: Name it with both team members RHIT usernames: e.g., username1-username2, and replace "username1" and "username2" with your Rose usernames.
        • This will create your repository copy the link on the nextpage it will look something like this:
          https://github.com/rhit-csse232/risc-v-assembler-username1-username2
        • you can click your custom link to view your repository on github.com. The github.com web page will display a green "Code" button with instructions on how to clone your repo.
      2. Have the other team member find and join the team on GitHub Classroom.
  2. Clone the repository to your computers via SSH.

    1. Open a terminal window. For Windows, the program Git Bash will serve as your terminal. Right click in any folder and select Git Bash Here to start up the terminal.
      • Do NOT clone the repo into a OneDrive folder.
    2. Navigate to where you want your repo (DONT DO THIS IN ONEDRIVE). For example:
      cd Desktop
      • Do NOT go into a OneDrive folder
    3. Use this command to get your repo, adjusting for your url from above.
      git clone git@github.com:rhit-csse232/risc-v-assembler-username1-username2.git

If you have problems, view the suggestion below.

Fixing GitHub Authentication Issues

Build an Assembler

You've been given a partial implementation of a 32-bit RISC-V assembler. Your job for this practical is to implement the parts of the assembler needed to make R-, I-, and S- types work. You only need to edit methods in the practical that are listed below and that have a line like: TODO: Practical 1 in their body. You can ignore the Practical 2 TODOs for now, and do not make changes to any other methods or classes. You are free to add your own new helper methods and classes as you see fit.

At a high level, these are the objectives for this assembler:

  1. Read an .asm file and produce the correct machine code into a .txt file.
  2. Pass all the unit tests in the provided testing suite, including:
    • base_assembler_test.py (Practical 1)
    • advanced_assembler_test.py (Practical 2)

We recommend that you look at the practical worksheet and read the Grading Rubric section below before you start writing code.

Running code and tests in VSCode

Before diving in, you should get famliar with the VSCode development environment, the tests, and the debugger.

Running code and editing arguments

  1. To run your assembler you need to set up the debugger. Your repo should contain a .vscode folder that has a launch.json file inside it. This file specifies the arguments to give the program when it runs.

  2. Select the Run and Debug button on the left side of the screen (it looks like a play button with a bug on top of it). Then near the top you should see a green play button with "Python Debugger: Current File with Arguments". If you have your chosen python file open when you press this button it will start running.

  3. You can change the arguments given to your assembler by opening launch.json and editing the "args" list.

    • Try adding "--help" as an item to this list when you run the assembler.py file below to see its effect.

    You can add and edit this list to change the behavior of the program. This is really just for debugging though, if you leave it with the options provided you can simply edit the contents of test.asm to test different instructions for this assembler. By default it will output the assembled machine code into the file out.txt which will appear in the folder.

Debugging and Testing in VSCode

Expand one of the topics below to learn more about it. You may want to reference this info as you work on this practical.

Debugging in VSCode
  1. You should get familiar with and use the VSCode debugger. We will show you some tips in the `R-type` implementation part of this practical, but generally:
    • You can set breakpoints by clicking to the left of the line number in a file. (1)
    • Use the buttons that appear at the top of the window to navigate through the code as you're debugging. (2)
    • The pane on the left will show you variables in the code as you debug. You can also hover over variables in your code to quickly peek at their values. (3)
Running Unit Tests
  1. We're using the Python `unittest` library for our test cases. Each test is preceded by a decorator that looks like: `@weight(N)` where N is the number of points the test is worth during grading. These decorators are commented out, they are only used when you upload to gradescope.
  2. The provided `settings.json` file should have the testing framework ready for you. Click the flask icon on the left to open the Testing window. (Note, sometimes I have to click or open a file before the button will appear.)
  3. The window will list all the testing files and individual tests. It will look something like this: When you hover over an individual test, or header for a group of tests, three buttons appear to the right of the name. The first one lets you run the test(s).
  4. The second button is the debugger, you can put breakpoints in the tests or your code to use the debugger while the test cases are running to help you find what causes tests to fail.
  5. You should run the tests as soon as you get your repo to make sure that everything is hooked up correctly. You should see output that looks something like this: Each test is listed and then the status is printed. Yours will say "ERROR" instead of "ok" to start with. However, VSCode sometimes defaults to a different python testing framework. If the output mentions `pytest` you should close VSCode and reopen it and it will likely use the correct tests. The tests will still give output, the incorrect output will look something like this: Notice the `pytest-8.2.2` under the first header, and the fact that the individual tests are not mentioned, just the test file name.
Reading test errors

When a test fails it can be very intimidating, here are the two main types of failures you'll see. First, when your code translates an instruction incorrectly you'll see something like this:

This error box is pointing to the test that failed (`test_R_types_add`). In the box it shows the expected and actual result, just left of the orange circle number 1. The top binary string is always the expected (e.g. "right") answer, the bottom is the result from calling your assembler.

Each test group runs several inputs through your assembler, to figure out which exact input caused the failure you need to look at the traceback. Look for the item that refers to the test name that failed. Here it is just left of orange circle number 2. We can see that the input at line 34 caused the failure, so I can go look at it.

Next I would set a breakpoint at that line and press the test debugger button to trace the code.

The second kind of error you'll see will occur when your code does not raise an exception when it should. Here is an example:

This box similarly points to the test that failed (`test_R_types_arguments`). At the top it tells us an exception was not raised when the test expected one, here at orange circle 1 we can see that the test was expecting a `BadArguments` exception but did not get one. At orange circle 2 we can see the line numer of the offending input so we can debug the problem.

This practical provides several different kinds of exceptions for different cases. Some of them may be ambiguous, e.g. when is it a BadArgument vs a BadImmediate. Go with your gut and match the test cases later. The test cases are set up to minimize the number of places in the code you need to check for errors and raise exceptions, rather than detecting errors as soon as possible.

0 Tips and Hints

You should open up assembler.py and review the general code structure (we'll walk you through it in a bit more detail in the next section). You can look through the docs/assembler.html file in your repo for an easy to read list of functions and classes. I recommend you keep this open so you can look at any helper methods at a glance.

Some helper methods are implemented for you which you are free to use, these are not tested so you are free to change them as you see fit.

Seriously, go look at the docs/assembler.html file that is in your repository right now. It has a lot of handy information.

Python implements a few ways to convert integers between bases. First, the int() method takes a second argument that defines the base:

int("101", 2) -> 5

int("11", 2) -> 3

int("101", 16) -> 257

int("11", 16) -> 17

Additionally the bin() method converts a decimal integer into a binary string:

bin(5) -> '0b101'

bin(3) -> '0b11'

bin(-3) -> '-0b11'

Note that it does not use twos-compliment for negative numbers and the strings always start with a '0b'.

1 A Tour of the Assembler

This file is BIG and has a bunch of code. This section will take you on a little tour of the code, explaining the overall structure of the assembler. Don't forget that you can open up an html version of the documentation for this file by opening docs/assembler.html in your practical repo.

Near the top of the file is the assemble_asm() method, this method does all the heavy lifting to take in a text file of assembly instructions and break it down into binary one instruction at a time. It does this in 4 big passes:

  1. remove comments (comments_pass() method)
  2. process pseudoinstructions (pseudoinstruction_pass() method)
  3. process labels (parse_labels() method)
  4. translate individualte lines into machine code (machine_pass() method)

Below assemble_asm() is a section where each "pass" of the assembler is broken up into helper methods, you do not need to worry about these for now.

For this practical you mainly care about step 4. You'll notice that machine_pass() is a loop that calls Assemble() on every instruction, and to do its work, Assemble takes three arguments. Read the comments in the Assemble method to learn what they do. You'll notice two of the arguments have default values -- those will matter for the next practical, but for this practical only inst matters.

During this practical, you will implement Assemble to identify which type of instruction it assembles, and then call the appropriate helper method to do the assembling. We break it up by instruction type to make implementation and testing much easier.

You'll be writing this method later, it will do most of its work by calling the Assemble_* helper methods below.

Assembler's helpers

Each type of instruction in RISC-V has at least one helper (e.g. Assembler_I_Type). Each of these methods will process one instruction type and return the binary representation of a given instruction. These methods are where you will write most of the code for this practical.

Help to speed things up

The next section (starting at around like 355) has some helpful methods. Some of these are implemented for you and some you will need to implement in practical 2. You should look at these methods and consider when you can use them. Some of these helpers also show you how to manipulate strings in python in helpful ways. Using these is CRITICAL for efficiently completing practical 1 and 2. While you don't need them, they will make things go much faster. Spend some time skimming through the helpful utility methods.

Below this is the output() method, which is used to output the final result to a file. You do not need to edit this code.

Data Types

Next, around line 483 is the "Utilities" section, where the different types of instructions and fields are defined. Here you'll find a FieldData class, which is a simple struct to hold the data for different instructions for our use later. There are several other helpers defined in this section when you should look over and make use of.

There's a lot of good stuff down at the bottom of the file. Skim through it and look at functions and dictionaries in this area, specifically:

Exceptions

After the helpers, there are a bunch of exceptions defined. These should be raised when the appropriate conditions are met. For example, if the operands are wrong for an instruction, the assembler should raise the BadOperands exception. The comments in the code and unit tests will help you figure out when to raise an exception.

2 Assembling R-type instructions

Okay, let's begin! To start we'll implement assembling R-type instructions and see how to use the test bench to find errors in our code.

  1. Open up the Testing window (the flask icon), expand the Base_assembler_test.py and then the TestRType groups. Hover over the test_R_types_add test until you see the run button on the right, and click it. You just ran a single set of tests for the add instruction, the test should fail and the output will indicate a NotImplementedError was raised.

  2. Go to the Assemble_R_Type method in the assembler file, replace the raise NotImplementedError with this:

    return "0000 0000 0000 0000 0000 0000 0000 0000"
  3. We made this method turn all R-types into all 0s in binary. This isn't correct, but lets re-run the tests and see how it looks now. Re run the test_R_types_add tests and look at the output. You will see an AssertionError showing the expected output and the recieved output of all 0s just below it.

    • Tip: Double-clicking the failed test will not re-run it; you must hover over the failed test and click the "play" button to re-run it.
  4. Okay, now we can get down to business, lets start by figuring out which R-type we are translating. There is a dictionary we can use for that! Add this code to the method:

    field_data = inst_to_fields[cmd]

    You should look at the docs for details on this dictionary and the objects in it. But generally, it gives us a small object that has the binary for the opcode, funct3, and funct7 fields.

  5. Next, we need the binary codes for register operands in the input instruction. We can extract those using another helper and the operands list:

     rd  = get_register_bin(operands[1])
     rs1 = get_register_bin(operands[1])
     rs2 = get_register_bin(operands[2])
  6. Finally we need to stick all these pieces together. You can do this however you want, but there is a helper built in to make this a bit easier. One way is to start by combining all the parts of the isntruction into a list and then calling the helper:

     inst_field_list = [field_data.func7,
                        rs2,
                        rs1,
                        field_data.func3,
                        rd, 
                        field_data.opcode]
    
     return join_inst_fields_bin(inst_field_list)
  7. The code above has a MISTAKE, but that is ok! We want the error to practice debugging. Re-run the test_R_types_add unit test. You should see something like this:

    Note that the part of the expected and recived result is "underlined" with - and + symbols, if we look at that binary it seems like there is a problem with rd we can go fix it now, but instead we'll use the debugger to help us find the exact location of the error.

Debugging the test_R_types_add test failure.

  1. Double-click the failed test (test_R_types_add), and the code for the test will open so you can see what instructions it is testing. It also shows the failure. This is a little awkward to read, so lets step through it in the debugger.

  2. Put a breakpoint on line 34 of the base_assembler_test.py file, where the first test is called. Do this by hovering over the line number, then when a red dot appears, click it.

  3. Then, hover over the test_R_types_add entry in the test navigator on the left, and press the Debug test button, (it looks like a play button with a bug on it). That will open up the debugger and pause it on the line where we put the breakpoint.

  4. Press the step into button in the little control panel at the top of your screen until the code reaches the assembler.py file (it should take 8 step into clicks). You'll end up in your implementation of Assemble_R_Type. At this point you should see the variable explorer on the left, something like this:

    Here we see the instruction add is passed in as the cmd argument, and the operands are all x1. Step through the code until field_data, rd, rs1, and rs2, have all been assigned. You may want to use the step over button to not go into the helper method code.

  5. Now the variable explorer will show the values of each register ID field, they are all set to '00001' which seems right to me! Continue to step through this code until you are taken back to the test file. This test passes, so the error wasn't here (which we should have notices if we read the error message earlier closer, notice it says it failed on like 38).

  6. Continue stepping through the next test for "add t0, s0, sp". Step through the test code until all the register fields are assigned in the Assemble_R_Type method again, and look at the value of rd is it right? You should see this in the variables pane:

    Both rd and rs1 have the same value, but they are different registers in the operands list! The correct value for t0 is "00101" so there must be something wrong with rd. Fix the call to get_register_bin for rd so that it uses the correct operand.

  7. Re-run the test_R_types_add tests now and you should see they work. And because of the magic of dictionaries if you run the other R-types they should nearly all pass! You can run the full group by pressing the run button while hovering over the TestRType group. Because the only difference between R-types is the opcode, funct3, and funct7 fields the inst_to_fields dictionary already pulls the right data for each type for us.

Fixing the test_R_types_arguments failure.

  1. Now, we're going to track down any problems in the remaining R-type tests. In the Testing tab in VS code, re-run the R-type tests.

    Most of the R-type tests should pass, but test_r_types_arguments fails. Lets figure out why. To take a closer look at this test, double-click it on the test list:

  2. Reading the error message, we can see that it is complaining that "BadOperands not raised", meaning we failed to identify cases where the operands are bad. I guess we need to do some more error checking!

  3. Looking closer at the instructions the tests are running, you can see the types of situations where BadOperands should be raised. Before you fix the error, we will use the debugger to see the exact cause of the failure.

  4. Set a breakpoint in the test_R_types_arguments test on line 90 right before it runs its first test:

  5. Then, run just the one test again to hit your breakpoint:

  6. When it stops on your breakpoint, we can use the debug panel at the top of VSCode to take steps through the assembler.

  7. Stepping over/into a few times, you will arrive at Assemble_R_Types. Look at the inspection panel on the left side to see all the local variables and arguments:

    Notice here that there are FOUR operands! You can also see their values. Clearly it would be a good idea to check for the right number of operands, so lets do that.

  8. Add a check to ensure there are exactly three operands before constructing the machine code:

     if (len(operands) != 3):
         raise BadOperands ( " Incorrect number of args found in R Type on line" +
                             " %s with args:\n\t%s%s\n" % (line_num, cmd, operands) )
  9. With this added, you should now pass the test_R_Types_arguments test and raise an exception with useful info when the instruction does not have the right number of arguments. Re-run the tests to verify.

  10. There are other exceptions that should be raised. For instance, mv t0, t1 is not a legal instruction since RISC-V does not have a mv instruction in its base instruction set, nor is mv defined as a pseudoinstruction, therefore the BadInstruction exception should be raised.

    The exceptions you will encounter include BadImmediate, BadOperands, BadInstruction, BadRegister, BadField, BadFormat, and BadLabel.

    As you work through this and the next practical, be sure to check for these edge cases and raise the appropriate exception. The tests will not catch all possible situation, but to help debug you may want to add more exception checking than the tests require.

  11. Now is a great time to commit your work. You will need to commit to your git repo at least every time you finish the implementation of each instruction assembly method. Commit your updates to git and push your repo with a meaningful commit message. As you debug and fix errors you should consider doing more commits. You must have at least 5 commits in your repo for this practical (one for each of the Assemble methods).

A way to commit to git using VSCode

3 Implementing Assemble()

The heart of the assembler is the Assemble() method. This takes a single instruction and returns its binary representation. At its core though, this method simply figures out the type of the instruction and then calls the appropriate helper function (which you will will write below). Try and make this function do as little work as possible, it will make your code easier to edit in the future.

You should not try and write this function all in one go. I recommend you add to it as you expand your assembler to support more types, just add support for one instruction type at a time.

  1. First, in Assemble we need to call the appropriate helper function based on the type, for now we only have R-type instructions, so lets just call that one directly. The helper functions take 3 arguments the instruction name, the list of operands, and the line number. So, we have to do a little bit of work to split the instruction up. Put this code into the body of the Assemble method, replacing the raise NotImplementedError line:

     split_inst = inst.strip().replace(",", " ").split()
     cmd = split_inst[0]
     operands = split_inst[1:]
     return Assemble_R_Type(cmd, operands, line_num)

    This breaks the instruction up and then calls the R-Type helper. If the method is invoked like this: Assemble("add t0, t0, t0") then Assemble would call Assemble_R_Type("add", ["t0", "t0", "t0"]) and return the result.

  2. Lets make sure we can assemble some code. In the repo, notice there is a test.asm file that contains two R-type instructions and some comments.

  3. Make sure assembler.py is the open tab in VS Code, then run the Python Debugger.

    This should create a new file, out.txt with the instructions translated into binary. The file should contain:

     0000 0000 0010 0100 0000 0010 1011 0011 // 0x002402b3 ;;; 0x400000 - add t0, s0, sp
     0100 0001 1111 1000 0000 0010 1011 0011 // 0x41f802b3 ;;; 0x400004 - sub x5, x16, x31

    There are a few additional bits of info in the comments. The first part after // is the hex representation of the 32-bits. The second is the address location of the instruction in memory (instruction memory starts at 0x00400000). The last part is the original RISC-V instruction for your reference.

  4. Unfortunately, the code implemented in the first step above assumes everything is an R type instruction. Change the code so that instead of always calling Assemble_R_Type() and returning its result, it checks the type of instruction first:

     split_inst = inst.strip().replace(",", " ").split()
     cmd = split_inst[0]
     operands = split_inst[1:]
     if (inst_to_types[cmd] == Types.R):
         return Assemble_R_Type(cmd, operands, line_num)
     else:
         raise NotImplementedError
  5. You should add to the Assemble() method each time you implement one of the new types (or sub types) of instructions in the next section. Do not try and implement it all at once.

4 Assemble Other Types

You will implement the methods for each individual type. Note that the I-types are broken up into separate methods for each slightly different format. Write the code to implement each of these:

  1. Assemble_R_Type
  2. Assemble_I_Type
  3. Assemble_I_Type_shift
  4. Assemble_I_Type_base_offset
  5. Assemble_S_Type

After you finish each of these methods (or pause working) you must commit and push your repo with a meaningful commit message. As you debug and fix errors you should consider doing more commits. You must have at least 5 commits in your repo for this practical (one for each of these methods). See VSCode Commit info for instructions.

Edit the Assemble() function to call the correct one of these for a given instruction as you implement them.

Your assembler only needs to support decimal immediates, assume all numbers passed as operands to an instruction are in decimal. As you work through the test cases you may want to consider the binary or hex representation of the numbers used in the tests.

A few tips as you work on the remaining types:

There are several example exceptions raised throughout the template, take a look at dec_to_bin for a couple examples (and consider why these are raised here).

This program provides a suite of custom exceptions: BadImmediate, BadArguments, BadInstruction, BadRegister, BadField, BadFormat, BadLabel

You can check the code (or the assembler.html file) for a general description of each. Your assembler should probably raise every one of these in at least one place when it is complete. Not all of these are tested, but they are very useful as you debug. For this practical you will not need the BadLabel exception.

Write small test programs

Additionally, write your own small .asm input programs to assemble and check the validity of your assembler. You should write one for each instruction type you implement in this practical:

  1. RType.asm
  2. IType.asm
  3. SType.asm The contents of these test programs will be up to you, however they should be substantial enough (i.e., include different immediate values, different func3/func7, etc.) to give you confidence that you have a valid working assembler

Add and commit these .asm files to your git repository.

Submission and Grading

Functional Requirements

At the end of this practical, you should:

Git Requirements

In addition to the list below, you should regularly commit and push whenever you fix a bug. This is a minimum set of commits you MUST have in your repo for this practical:

You must include your name in a comment at the top of all files you submit. If you didn't do this, go add your names and push another commit. See This info for commit instructions.

Having problems with github authentication? Go back to the Github setup section for tips.

Worksheet Requirement

All the practicals for CSSE232 have these general requirements:

General Requirements for all practicals

  1. The solution fits the need
  2. Aspects of performance are discussed
  3. The solution is tested for correctness
  4. The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

In the worksheet, explain how you satisfy each of these items. Some guidelines:

Final Checklist

Grading Breakdown

Practical 1 Rubric items Possible Points
Worksheet 50
Code 50
Total out of 100