Practical 2: RISC-V Assembler II

Objectives

Following completion of this practical you should be able to:

Assemble RISC-V SB, U, and UJ types.
Understand how the design of instruction types influences and limits their behavior, especially when it comes to immediates.
Explain the relationship between pseudoinstructions and core instructions.
Discuss the benefits and drawbacks of different types of addressing.

Guidelines

This practical should be completed by each student, but you will work in collaboration with a partner assigned by your instructor.
Read the practical instructions completely before beginning.
Don't hesitate to ask for help.
Write both your name and your partner's name on the worksheet and in all code files you edit.
Commit your changes to git often to save your work.

Time Estimate This practical is estimated to take about 3-5 hours per student (so 6-10 hours total) on average. This estimate varies depending on your familiarity with Python, the course material, and how well you work with your partner.

Preliminary Tasks

Obtain the worksheet.

Review your submission for Practical 1, and familiarize yourself with the utility functions provided in assembler.py.

Take a look at the practical worksheet before you start writing code for this practical. The methods you need to implement are flagged with a TODO: Practical 2 throughout the files.

1 Support the Remaining Instruction types

You need to extend your assembler from Practical 1 to support the missing instruction types listed below. The hints and tips on the Practical 1 page are worth reviewing again before you start this practical.

Tests for this practical are in advanced_assembler_test.py.

As you implement support for the new instruction types, be sure to run that type's corresponding tests in the testing suite.

Implement helper methods

First, start by implementing these helper methods:

Assemble
index_to_address
parse_labels
label_to_offset

You should push at least one commit to your repo that contains the implementation of all of these helpers.

You may find it helpful to use the has_label and split_out_label helper functions provided.

Read the comments inside each of the functions to learn how it is supposed to behave. Here are some hints for implementing these helpers:

`Assemble`

Add to this as you go to support the new instruction types. The other helpers here will be useful as you write the other methods. You should fully implement any other unimplemented helpers. Think of this as a warm up, there are not explicit test cases for these, so you'll need to figure out how to verify they are working as you go. (This is a good time to review the general requirments for practicals and consider how you will meet these.) You may need to come back and change the behavior of these helpers as you move further into the practical.

`index_to_address`

For this, you need to think about the starting address of a program in memory. Note that in RISC-V PC does not start at 0x0000 0000, instead it has a different starting address. I wonder if there is some sort of reference document that might tell you that address? If there was, I bet it would be green.

`parse_labels`

There are three cases you need to consider when processing the argument asm_list:

An instruction (e.g., add x1, x2, x3),
A label with an instruction (e.g., L: sub, x4, x5, x6),
A label (e.g., D:), the associated instruction will be the subsequent line.

Your goal in this function is to handle each of these three cases and return two items:

A tuple containing only the instructions (no labels). This will serve as clean_code for you to iterate through and assemble,
A dictionary mapping labels to addresses in the instruction list. This will help you calculate the byte offset to be used for an instruction’s immediate (i.e., branches and jal).

At least one exception will be thrown in this helper function. Be on a lookout for these cases!

`label_to_offset`

Given an instruction that uses a label (e.g., beq x5, x6, FOO, jal x0, BAR), this function will calculate the correct byte offset value to be used for the immediate for the instruction. Note that the dictionary of labels passed in as the first argument labels maps labels to instruction addresses, while the index passed in as the third argument instruction_index is a numerical index. You will need to handle this mismatched “data type” to calculate the correct byte offset.

One of the previous helper functions in this section will be of great use to implement this function. If the previous utility helper functions implemented and you plan well, the implementation of this helper function can be as few as a single line

Add new instruction types

You need to implement each of these methods:

Assemble_U_Type
Assemble_SB_Type
Assemble_UJ_Type

After you finish each of these methods (or pause working) you must commit and push your repo with a meaningful commit message. As you debug and fix errors you should consider doing more commits.

SB and UJ types can have two different styles of target operands: a byte offset or a label. It is recommended that you start with the byte offset variant and get that working before implementing label operands for these instructions.

Consider using the helper method is_int() and the helpers you wrote above to help with the operands.

Here are some suggestions.

Byte-Offset Immediate Operands

In this situation, the byte offset is directly written into the instruction (e.g., beq x5, x6, 40). In this case, you can use this offset to directly construct the immediate. Be sure to review PC-relative addressing mode and double check the value you should be put into the instruction's immediate bits. Remember, our calculations are in byte offsets, but the immediate for SB and UJ types are not in byte units.

For instance, beq x5, x6, 40 should assemble into:

        0 000001 00110 00101 000 0100 0 1100011

Notice that if we stitch the immediate together correctly we get 000000010100 = 20 half-words or 40 bytes. Double check your byte offset to half-word conversions when implementing this.

Your assembler only needs to support decimal immediates, assume all numbers passed as operands to an instruction are in decimal. As you work through the test cases you may want to consider the binary or hex representation of the numbers used in the tests.

Label Operands

In the case where the operand provided is a label and not a byte offset, you will have to figure out how far the label exists from the current instruction (this is an offset). Once you've computed that, you can use the same logic you implemented for byte-offset operands to encode the immediate field.

2 Support Pseudoinstructions

Your assembler needs to support a few pseudoinstructions. The behavior of individual pseudoinstructions is defined in pseudoinstruction_handler.py. Note that you can see the list of methods and their docs for this file by opening docs/pseudoinstruction_handler.html in your repo.

Implement individual pseudoinstruction methods

In pseudoinstruction_handler.py you will see these methods you need to implement:

method	sample pseudoinstruction	effect
`double`	`double r1, r2`	`Reg[r1] = r2 + r2`
`diffsums`	`diffsums r1, r2, r3, r4, r5`	`Reg[r1] = (r2 + r3) - (r4 + r5)`
`push`	`push r1`	`sp = sp-4; Mem[sp] = r1`
`li`	`li rd, imm`	`rd = imm`
`beqz`	`beqz r1, LABEL`	`if(r1 == 0) PC = LABEL`
`jalif`	`jalif r1, r2, LABEL`	`if(r1 == r2){ra = PC+4; PC = LABEL}`

You should push at least one commit to your repo that contains the implementation of all of these pseudoinstructions.

How these pseudoinstruction methods work

The behavior of each of these pseudoinstructions is defined in the code comments. For each of these methods two arguments are given, the method might be called like this:

double("double t5, s0", 7)

The first argument is the actual use of the pseudoinstruction, the second argument is the line/instruction number in the assembled program where this pseudoinstruction starts. For most of these methods this is just for error output, but some of them will need to use this argument in other ways.

This function should return a list of new core instructions that will have the same behavior as the pseudoinstruction.

Recall that pseudoinstructions should not change other registers beyond those implied by the instruction definition, with the exception of at (aka x31) which can be modified freely. Also, recall that any register could be used as any register operand in a pseudoinstruction. Additionally, think carefully about the size of immediates supported by each pseudoinstruction.

The test cases for the pseudoinstructions do not directly test if your implementation of the instruction works. Instead they test general rules about the pseudoinstructions. It will be your job to explain how you know these pseudoinstructions behave correctly in the practical worksheet. You will need to run the code produced by your pseudoinstructions in a RISC-V simulator. You could use this online one, or jump to Practical 3 and install the one we use there.

Advice

You may want to consider using the helpers: replace_all(), assembler.reverse(), assembler.is_int(), assembler.dec_to_bin(), assembler.index_to_address(), assembler.label_to_offset()
IMPORTANT: Implement push last, this pseudoinstruction requires knowledge from the "procedures" lectures that haven't come yet. But you can read ahead if you want to get it started now!

Once you've implemented all of the pseudoinstruction handlers, tests in the TestPseudos unit test category should pass, except for the one called test_pseudoinstructions_pass.

Implement the pseudoinstruction pass

Look at the main assemble_asm() method. This function does all the steps of the assembler, notice that after removing comments the next thing is the processing of pseudoinstructions. For now, the assembler assumes there are no pseudoinstructions, go look at the definition of pseudoinstruction_pass().

You need to implement this function. The big picture is this: raw code comes in that may contain pseudoinstructions, this method should return a list of core instructions (and labels) where the pseudoinstructions have been replaced. You need to look at each line of code, determine if it is a pseudoinstruction, if it is then you need to call the correct pseudoinstruction-replacement method (which you wrote above), otherwise simply leave the line unchanged. The pseudoinstruction methods are passed in the second argument of to pseudoinstruction_pass(), so to apply the double method I could do this:

new_code = pseudos_dictionary["double"](my_line, inst_num)

Note that pseudos_dictionary allows you to quickly look up your implementation from pseudoinstruction_handler.py, and then call the corresponding function.

new_code will be a list of the new instructions that I can add to my growing program.

You should look at the other pass methods that are implemented for you, if you need help starting. Keep in mind that one pseudoinstruction can become more than one core instruction, this will affect the line numer/address of each instruction following a pseudoinstruction in the original code.

You should consider the different "cases" you may hit as this method goes over each line of code in a file:

case 1: a core instruction (e.g., add x1, x2, x3)
case 2: a label (e.g. LABEL:), the associated instruction will be on the next line
case 3: a label and another instruction (e.g. LABEL: add t0, t0, t0)
case 4: a pseudoinstruction (e.g., double x8, x9)
case 5: an unknown instruction (e.g., relprime 1729)

Work on these one at a time, make sure that your pseudoinstruction pass returns the correct number of instructions and labels pointing to the right instructions.

The tests for this method once again do not test the exact instructions you return, since implementations of pseudoinstructions can be variable. Instead, it tests general patterns about what the code should look like. You will need to explain how you know your code is correct on the practical worksheet.

By the time you commit this work to your repo you should have at least 6 commits with meaningful messages, if not more.

Once you've implemented pseudoinstruction_pass (and have working pseudoinstruction handlers), all tests in the TestPseudosFileAssembly and TestPseudos unit test categories should all pass.

Submission and Grading

Functional Requirements

At the end of this practical, you should:

Pass all the tests in the base_assembler_test.py file.
Pass all the tests in the advanced_assembler_test.py file.
- All in TestUType
- All in TestUJType
- All in TestSBType
- All in TestLabels
- All in TestPseudos
- All in TestPseudosFileAssembly
Have written test programs to verify the validity of your assembler
- UType.asm - to substantially check variants of the U type instructions
- SBType.asm - to substantially check variants of the SB type instructions
- UJType.asm - to substantially check variants of the UJ type instructions
- Pseudos.asm - to substantially check various pseudoinstructions
Complete the Practical Worksheet

Git Requirements

In addition to the list below, you should regularly commit and push whenever you fix a bug. This is a minimum set of commits you MUST have in your repo for this practical:

Git commit 1: completed Assemble_U_Type
Git commit 2: completed index_to_address
Git commit 3: completed parse_labels
Git commit 4: completed label_to_offset
Git commit 5: completed Assemble_SB_Type
Git commit 6: completed Assemble_UJ_Type
Git commit 7: completed pseudoinstruction_pass

You must include your name in a comment at the top of all files you submit. If you didn't do this, go add your names and push another commit. See This info for commit instructions.

Be sure to copy your final Commit ID number for the final question of the worksheet. This ID number can be found on the commit history tab of your Github repository web page.

Having problems with github authentication? Go back to Practical 1's Github setup section for tips.

Worksheet Requirement

All the practicals for CSSE232 have a worksheet that includes these general requirements to explain:

The solution fits the need
Aspects of performance are discussed
The solution is tested for correctness
The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

Complete the worksheet. Some guidelines:

These answers should be short and to the point, usually no more than 2 or 3 sentences.
You will upload this sheet to gradescope. Make sure you indicate your partner when you upload.

Final Checklist

Verify your git commits are pushed to github
(Check by viewing your repo on the github web site).
Submit your completed worksheet to gradescope.
Only 1 per team (make sure all team member's names are included). In gradescope add your team members names to the submission.
Upload your code to gradescope.
Upload only assembler.py and pseudoinstruction_handler.py to the gradescope assignment for the code part of this practical. Drag multiple files onto one submission. Verify that the autograder runs on your code as you expect.

Grading Breakdown

Practical 2 Rubric items	Possible Points	Weight
Worksheet	86	50%
Code	70	50%
Total out of		100%