Practical 2: RISC-V Assembler II

Objectives

Following completion of this practical you should be able to:

Guidelines

Time Estimate This practical is estimated to take about 3-5 hours per student (so 6-10 hours total) on average. This estimate varies depending on your familiarity with Python, the course material, and how well you work with your partner.

Preliminary Tasks

Obtain the worksheet.

Review your submission for Practical 1, and familiarize yourself with the utility functions provided in assembler.py.

Take a look at the practical worksheet before you start writing code for this practical. The methods you need to implement are flagged with a TODO: Practical 2 throughout the files.

1 Support the Remaining Instruction types

You need to extend your assembler from Practical 1 to support the missing instruction types listed below. The hints and tips on the Practical 1 page are worth reviewing again before you start this practical.

Tests for this practical are in advanced_assembler_test.py.

As you implement support for the new instruction types, be sure to run that type's corresponding tests in the testing suite.

Implement helper methods

First, start by implementing these helper methods:

  1. Assemble
  2. index_to_address
  3. parse_labels
  4. label_to_offset

You should push at least one commit to your repo that contains the implementation of all of these helpers.

You may find it helpful to use the has_label and split_out_label helper functions provided.

Read the comments inside each of the functions to learn how it is supposed to behave. Here are some hints for implementing these helpers:

Assemble

Add to this as you go to support the new instruction types. The other helpers here will be useful as you write the other methods. You should fully implement any other unimplemented helpers. Think of this as a warm up, there are not explicit test cases for these, so you'll need to figure out how to verify they are working as you go. (This is a good time to review the general requirments for practicals and consider how you will meet these.) You may need to come back and change the behavior of these helpers as you move further into the practical.

index_to_address

For this, you need to think about the starting address of a program in memory. Note that in RISC-V PC does not start at 0x0000 0000, instead it has a different starting address. I wonder if there is some sort of reference document that might tell you that address? If there was, I bet it would be green.

parse_labels

There are three cases you need to consider when processing the argument asm_list:

  1. An instruction (e.g., add x1, x2, x3),
  2. A label with an instruction (e.g., L: sub, x4, x5, x6),
  3. A label (e.g., D:), the associated instruction will be the subsequent line.

Your goal in this function is to handle each of these three cases and return two items:

  1. A tuple containing only the instructions (no labels). This will serve as clean_code for you to iterate through and assemble,
  2. A dictionary mapping labels to addresses in the instruction list. This will help you calculate the byte offset to be used for an instruction’s immediate (i.e., branches and jal).

At least one exception will be thrown in this helper function. Be on a lookout for these cases!

label_to_offset

Given an instruction that uses a label (e.g., beq x5, x6, FOO, jal x0, BAR), this function will calculate the correct byte offset value to be used for the immediate for the instruction. Note that the dictionary of labels passed in as the first argument labels maps labels to instruction addresses, while the index passed in as the third argument instruction_index is a numerical index. You will need to handle this mismatched “data type” to calculate the correct byte offset.

One of the previous helper functions in this section will be of great use to implement this function. If the previous utility helper functions implemented and you plan well, the implementation of this helper function can be as few as a single line

Add new instruction types

You need to implement each of these methods:

  1. Assemble_U_Type
  2. Assemble_SB_Type
  3. Assemble_UJ_Type

After you finish each of these methods (or pause working) you must commit and push your repo with a meaningful commit message. As you debug and fix errors you should consider doing more commits.

SB and UJ types can have two different styles of target operands: a byte offset or a label. It is recommended that you start with the byte offset variant and get that working before implementing label operands for these instructions.

Consider using the helper method is_int() and the helpers you wrote above to help with the operands.

Here are some suggestions.

Byte-Offset Immediate Operands

In this situation, the byte offset is directly written into the instruction (e.g., beq x5, x6, 40). In this case, you can use this offset to directly construct the immediate. Be sure to review PC-relative addressing mode and double check the value you should be put into the instruction's immediate bits. Remember, our calculations are in byte offsets, but the immediate for SB and UJ types are not in byte units.

For instance, beq x5, x6, 40 should assemble into:

        0 000001 00110 00101 000 0100 0 1100011

Notice that if we stitch the immediate together correctly we get 000000010100 = 20 half-words or 40 bytes. Double check your byte offset to half-word conversions when implementing this.

Your assembler only needs to support decimal immediates, assume all numbers passed as operands to an instruction are in decimal. As you work through the test cases you may want to consider the binary or hex representation of the numbers used in the tests.

Label Operands

In the case where the operand provided is a label and not a byte offset, you will have to figure out how far the label exists from the current instruction (this is an offset). Once you've computed that, you can use the same logic you implemented for byte-offset operands to encode the immediate field.

2 Support Pseudoinstructions

Your assembler needs to support a few pseudoinstructions. The behavior of individual pseudoinstructions is defined in pseudoinstruction_handler.py. Note that you can see the list of methods and their docs for this file by opening docs/pseudoinstruction_handler.html in your repo.

Implement individual pseudoinstruction methods

In pseudoinstruction_handler.py you will see these methods you need to implement:

method sample pseudoinstruction effect
double double r1, r2 Reg[r1] = 2 * r2
diffsums diffsums r1, r2, r3, r4, r5 Reg[r1] = (r2 + r3) - (r4 + r5)
push push r1 sp = sp-4; Mem[sp] = r1
li li rd, imm rd = imm
beqz beqz r1, LABEL if(r1 == 0) PC = LABEL
jalif jalif r1, r2, LABEL if(r1 == r2){ra = PC+4; PC = LABEL}

You should push at least one commit to your repo that contains the implementation of all of these pseudoinstructions.

How these pseudoinstruction methods work

The behavior of each of these pseudoinstructions is defined in the code comments. For each of these methods two arguments are given, the method might be called like this:

double("double t5, s0", 7)

The first argument is the actual use of the pseudoinstruction, the second argument is the line/instruction number in the assembled program where this pseudoinstruction starts. For most of these methods this is just for error output, but some of them will need to use this argument in other ways.

This function should return a list of new core instructions that will have the same behavior as the pseudoinstruction.

Recall that pseudoinstructions should not change other registers beyond those implied by the instruction definition, with the exception of at (aka x31) which can be modified freely. Also, recall that any register could be used as any register operand in a pseudoinstruction. Additionally, think carefully about the size of immediates supported by each pseudoinstruction.

The test cases for the pseudoinstructions do not directly test if your implementation of the instruction works. Instead they test general rules about the pseudoinstructions. It will be your job to explain how you know these pseudoinstructions behave correctly in the practical worksheet. You will need to run the code produced by your pseudoinstructions in a RISC-V simulator. You could use this online one, or jump to Practical 3 and install the one we use there.

Advice

Once you've implemented all of the pseudoinstruction handlers, tests in the TestPseudos unit test category should pass, except for the one called test_pseudoinstructions_pass.

Implement the pseudoinstruction pass

Look at the main assemble_asm() method. This function does all the steps of the assembler, notice that after removing comments the next thing is the processing of pseudoinstructions. For now, the assembler assumes there are no pseudoinstructions, go look at the definition of pseudoinstruction_pass().

You need to implement this function. The big picture is this: raw code comes in that may contain pseudoinstructions, this method should return a list of core instructions (and labels) where the pseudoinstructions have been replaced. You need to look at each line of code, determine if it is a pseudoinstruction, if it is then you need to call the correct pseudoinstruction-replacement method (which you wrote above), otherwise simply leave the line unchanged. The pseudoinstruction methods are passed in the second argument of to pseudoinstruction_pass(), so to apply the double method I could do this:

new_code = pseudos_dictionary["double"](my_line, inst_num)

Note that pseudos_dictionary allows you to quickly look up your implementation from pseudoinstruction_handler.py, and then call the corresponding function.

new_code will be a list of the new instructions that I can add to my growing program.

You should look at the other pass methods that are implemented for you, if you need help starting. Keep in mind that one pseudoinstruction can become more than one core instruction, this will affect the line numer/address of each instruction following a pseudoinstruction in the original code.

You should consider the different "cases" you may hit as this method goes over each line of code in a file:

Work on these one at a time, make sure that your pseudoinstruction pass returns the correct number of instructions and labels pointing to the right instructions.

The tests for this method once again do not test the exact instructions you return, since implementations of pseudoinstructions can be variable. Instead, it tests general patterns about what the code should look like. You will need to explain how you know your code is correct on the practical worksheet.

By the time you commit this work to your repo you should have at least 6 commits with meaningful messages, if not more.

Submission and Grading

Functional Requirements

At the end of this practical, you should:

Git Requirements

In addition to the list below, you should regularly commit and push whenever you fix a bug. This is a minimum set of commits you MUST have in your repo for this practical:

You must include your name in a comment at the top of all files you submit. If you didn't do this, go add your names and push another commit. See This info for commit instructions.

Be sure to copy your final Commit ID number for the final question of the worksheet. This ID number can be found on the commit history tab of your Github repository web page.

Having problems with github authentication? Go back to Practical 1's Github setup section for tips.

Worksheet Requirement

All the practicals for CSSE232 have a worksheet that includes these general requirements to explain:

  1. The solution fits the need
  2. Aspects of performance are discussed
  3. The solution is tested for correctness
  4. The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

Complete the worksheet. Some guidelines:

Final Checklist

Grading Breakdown

Practical 2 Rubric items Possible Points Weight
Worksheet 86 50%
Code 70 50%
Total out of 100%