Practical 3: RISC-V Coding and Procedures

Objectives

Following completion of this practical you should be able to:

Write loops in RISC-V programs.
Write RISC-V programs following its procedure calling conventions
Understand the limitations of beq/bne/blt/bge, jal, and jalr.
Properly access memory in RISC-V programs.
Understand some of the issues surrounding register allocation.

Guidelines

This practical should be completed by each student
Read the practical instructions completely before beginning
Don't hesitate to ask for help
Upload your worksheet to gradescope upon completion
Verify you've committed the final working code to your github repository.

Time Estimate This practical is estimated to take about 6-8 hours per student varying based on your depth of understanding about RISC-V coding and procedure calling conventions. Even if you do not have a full grasp of the concepts, this practical is designed to elevate your understanding upon completion.

Preliminary Tasks

This practical will be done alone, each student is expected to write their own code and demonstrate an understanding of assembly programming.

Obtain the worksheet.

1. Install Java and RARS

RARS is a Java program and should work on any machine that supports a Java Virtual Machine. If you need java, you can get it here. Download RARS from here or from the main RARS webpage: https://github.com/TheThirdOne/rars/releases.

On Windows and MacOS you should be able to double click the .jar file.

Sometimes you need to launch it from a command line. Example for running it from a linux command line:

> java -jar rars1_6.jar

Note some MacOS machines will not allow normal file navigation unless you launch the jar from the command line using the linux instructions above.

Some help and info pages are available on the RARS github page.

2. Clone your practical repo

Follow this link to get your new repository for this practical. Follow the instructions in Practical 1 if you have trouble getting access or setting up authentication.

1 Using RARS

Navigation

Launch RARS. A window should appear named similar to "RARS 1.6" with three panes.

The right-most pane shows the contents of the registers. At the bottom of the register list is the pc register, which tracks the address of the current instruction. The tabs at the top of the pane allow you to look at control and status registers. If an error occurs in your code, the simulator might automatically show you the values of the "control and status" registers. We will not use these for a while, so you should switch back to the general registers for now.
The bottom panel is the output console. The 'Messages' tab displays messages from the simulator. The 'Run I/O' tab shows any messages generated from the running program.
The large upper-left pane shows the contents of the currently loaded file. It should be blank when you start up RARS. There are two tabs at the top of the pane.
1. The Edit tab allows you to edit the currently loaded assembly file.
2. The Execute tab shows the assembled code and processor state as the code runs.
There is a setting we need to set up. Under Settings, check Assemble all files in the directory:

Running a program

There should be a p1 directory in your repository. Inside that directory there should be a p1.asm file.
Open it in RARS, by going to File > Open and browse to your repository's p1 folder. The file's contents should appear in the editor panel. Look over the file in your Edit panel.

`.globl`, `.data`, and `.text` segments

The .globl part defines what is in memory, this file simply defines a global variable main which allows us to refer to the label main: in this file from other files. We can define and add new data to memory here, including lists, constants, etc.

The .data portion is not used for p1, however it will be used for later problems. For each array/constant declared in the .globl segment, we will instantiate its value in the .data segment. We will see this in p2 and onwards.

The .text part defines the actual assembly code of the program. Labels can be added arbitrarily in this segment and you can use registers by number or by name.

Assembling and Executing

Click the Assemble button or go to Run > Assemble to assemble the file.

The assembler will translate the assembly instructions into machine code ready for execution.

RARS will automatically switch to the Execute tab. The execute tab shows two views of memory:
1. The text segment window looks messy, but each instruction has five columns of data. The first column is a checkbox that allows us to set a breakpoint. The second is the address of the instruction. The third item is the hexadecimal representation of the machine instruction. The fourth item is (basically) the assembly language representation of the machine instruction. The fifth item (if it's present) is the actual line of code (and line number) from the source file that generated the machine instruction shown in the "Basic" column.
2. The data segment window shows the contents of memory, including the stack contents. You can jump to different regions in memory using the drop-down selector. For now, we'll only be concerned with the (.data) region.
The bottom of this panel is a dropdown menu that allows you to switch what portion of memory you want to look at. The .data, current sp, and .text portions are the most important for us. Select the .text option to pull up the contents of the instruction memory:

Look at the text segment tab and the data view when the text segment is chosen from that drop-down menu. Notice that instructions start at address 0x00400000. Also notice that the PC (program counter) register has the value 0x0400000.

Take a look at how memory is being displayed. It runs as a table that reads left to right in 4-byte (32-bit) words. The addresses in the first column are the "starting" address for the bytes in the row, and the column headers show the offset for each word from that starting address.

Orient yourself with this view so you can identify the address of various bytes displayed.
Once the program is assembled, the ori x2, x0, 0x00000028 instruction will be highlighted.
Notice the value of the PC.
Notice that register 0 currently contains zero and register 2 currently contains a big number.
Click Run > Step or use the Step button to run the currently highlighted line. Notice that the contents of register 2 have changed, and that the new value is shown in hexadecimal form. Finally, notice the value of the PC.
Click Run > Reset or use the reset button. The register and text segment contents will return to the state prior to you running the program. This allows you to rerun the program easily.
Practice editing, navigating through the code, and executing it until you are comfortable using RARS.

2 Summing to N (p2)

In writing these and other assembly language programs you should follow this process:

Solve the problem before coding the solution. Usually this means writing the code in a high-level language or pseudo-code first, then converting it to assembly language.
IMPORTANT: Write down (perhaps in a comment in your code) the purpose that you have in mind for each register that you use. Here's an example of how you might document register use in your code.
```
  ; x3 = i
  ; x4 = j
  ; x5 = n
  ; x6 = key
  ; x7 = ptr (address of array)
  ; x8 = test
  ; x9 = temp
```
These programs involve many loops, so it will be helpful to set breakpoints. Breakpoints mark positions in the code where execution should pause. After assembling a program, click the Bkpt checkbox to set a breakpoint at that line. Execution will stop before that line is run and you can observe the state of the program. Note: All breakpoints are deleted when you assemble the program!
IMPORTANT: Do not use registers 1 and 2 for this section of the practical.
Your solutions to these practicals should not use any pseudoinstructions. The only exception is the li pseudoinstruction.

Items below marked with a Q are to be answered on the practical worksheet.

Starting Code

Open the p2/p2.asm file in RARS. This goal of this program is to sum the integers between 0 and an $N$.

The first portion of p2 is a chunk of comments. Read this carefully as it lays out a lot of defined uses of registers and expected behaviors. Make sure you regularly reference this documentation to ensure you follow its conventions.

x5 has several uses throughout the program. These three uses are summarized in the comments at the top of p2. Each of these three uses correspond to specific lines in the program, identify which these lines so you are clear about x5’s different uses before continuing. (Q)
To help us orient where PC will be jumping to, identify where the labels main, loop, and done are located in the .text portion of memory. Additionally, identify where the constants N and Sum are in the .data segment of memory. To ease this sleuthing process, bring up the Labels panel via Settings -> Show Labels Window (symbol table).

(Q) Where are main, loop, done, N and Sum located (i.e. what address)? Hint: Assemble and use Settings > Show Labels Window in the Execute view.
Before you run the program, calculate the number of instructions that will be executed within p2 and the final value of Sum. Only count the instructions that run between the test setup and teardown (the jal and the jalr).
Execute the program and verify that the program sums the values between 0 and N correctly.

Modifying `p2.asm`

Modify p2.asm so that it will still calculate the sum correctly if N is equal to 0. Be sure to test your modifications to make sure they work. You can do this by changing the value of the test0 variable from 0 to 1. You should also test that it works with N equal to values 1 to 5. Set the testN variable to 1 to test this.
(Q) Assemble the code and set a breakpoint at address 0x00400004. Use Run > Go to run the program and it should stop at the breakpoint. Single step to the end of the program. How many instruction does your modified program execute when N is equal to 5? Can this be improved? If so, how? Hint: If your modified program executes more than 1 additional instruction, you can do better.
(Q) Will your modified program work if N is less than 0? You do not need to make any additional modifications yet, but answer the question and consider how you might address any problems.
Commit and push any files you changed while working on p2 files to git.

3 Swap max with last (p3)

In this part of the practical, you will be given code with an array of integers A and its size A_len and you must swap the maximum element of the array with the last element.

Note: You can use whatever registers you want in this part except for x1 and x2, but you will have an easier time on p4 if you avoid using s0-s11 (x8, x9, x18-x27) now.**

Open the p3/p3.asm file. This goal of this program is to find the maximum element in an array and put it at the end of the array.

Review both the commented documentation at the top of p3.asm and the constants defined as .globl and the instantiations inside the .data segment.
Assemble the code and set a breakpoint at address 0x00400004. This is the start of the program. Set another breakpoint at address 0x00400050. This is the end of your program (should be a jalr).
(Q) Run p3.asm until your second breakpoint. What is the value of max and maxindex at the end of the program? Are they what you expect?
(Q) Comment out slli x10, x7, 2 and re-assemble and run the program. What happens? Compare the address table between the program with and without slli. Which is correct and why?

You may need to scroll the Messages box to the right to read the entire message. (Q) Examine the value of x10 and its use in lw x10, 0(x10). What does “Load address not aligned to word boundary 0x10010001” mean? Why did commenting out the slli instruction invalidate the program? Explain in the context of the x10, x10’s value, and how the .data memory segment is organized.
Undo the changes made in Step 4.
Modify p3.asm so that the largest element of A is swapped with the last element of A. Be sure to adequately test your modified program. Hint: change the elements and size of A, then check your results in the Data segment.

Be sure you adequately test your modified program. You can do so by changing the contents of the array A defined in the .data segment alongside the value of A_len if needed and check the final contents of the array in the .data memory segment in the Execute tab. Note that if you customize the initial array contents and/or the size of the array, the Run I/O will default to [FAIL] as it’s only checking for the initial array configuration. You will have to manually verify the .data contents.
(Q) If you repeatedly apply your modified program to the subarrays of A from 0 to $ A_len-i $ where $i$ is the number of times you've applied your program, what is the final state of A?
(Q) Like p2.asm this program doesn't work if A_len is equal to 0. It is brittle in other ways as well. For example, what happens if all of the elements are less than -1? What if they are all -2^{31}, the max negative value? Is there a more robust way to ensure the max value is identified?
Make additional modifications to p3 so that it will still swap the maximum value in the array with the last element even if all initial values of the array are negative values. Similar to p2, this should require the definitively minimal number of changes. If you are unsure how to approach this, consult your instructor on where to begin.
Commit and push any files you changed while working on p3 files to git. Get in the habit of committing your changed files periodically when you make progress because we are going to stop reminding you.

4 Writing and calling a procedure (p4)

The goal of p4 is write a loop that repeatedly calls p3’s SwapMaxWithLast as a procedure on increasingly smaller subarrays of Array to sort Array, all while following RISC-V procedure calling conventions. You must have a working p3 implementation as p4 will use your modified p3 program.

NOTE: You may not use the callee-saved (s) registers for this portion of the practical. Make sure to avoid x8-x9 (s0-s1), or x18-x25 (s2-s11). To make this easier, we recommend you use register names (t0-t6 and a0-a7) instead of numbers (xN).

NOTE: To avoid name clashes, the array A has been renamed to Array and its length, A_len has been renamed to Len.

Open the files p4/p4-loop.asm and p4/p4-swap.asm in RARS.

Note the two locations marked for adding your code.
In the previous part of this practical, you worked with a program for swapping the maximum element of an array with the last element. Copy your code for "swapping the maximum value of an array with the last element of the array". You'll need all the code between the label p3: and the 'jalr'. Don't copy the label or the 'jalr'.
Paste your copied code into p4-swap.asm in the spot indicated below the label SwapMaxWithLast:.

This copied/pasted code will not immediately work.
- First, the p3 implementation may not follow RISC-V procedure calling conventions. You'll need to fix that.
- Second, the constants are not defined in p4-swap, so you'll need to pass them in as arguments and modify your implementation to use the calling conventions for arguments.
Modify your swapped code so that it complies with the documentation and specifications in p4-swap.asm. Note, SwapMaxWithLast is a procedure which takes 2 arguments - the location (address) of an array of words in memory and the length (in words) of the array - and order matters. Be sure your code conforms with the RISC-V procedure calling conventions and the documentation comments in the p4 files.

Specifically, you will need to:
1. Get the address of "Array" from an argument register rather than doing a "la" on a label.
2. Get the value of "Len" from an argument register rather than loading it from memory.
3. Return from the procedure by doing a "jalr x0, 0(x1)" (this should already be done in the code provided in p4-swap.asm).
4. You may need to refactor your code to use t registers or manage the stack to comply with the RISC-V procedure calling convention. You are strictly forbidden from using s registers x8-x9 or x18-x25.
5. Note that you will not be able to assemble and run this file on it's own now. Since you made this a procedure the arguments must be set up by some other piece of code. You'll write that next. If you get weird behavior while debugging stop and check if you tried to assemble this file on its own. Always check what is in the argument registers first while debugging.

Modify the p4 procedure in p4-loop.asm so that it calls SwapMaxWithLast a single time with Array and Len as arguments. Here's some sample code:

p4:
    la  a0, Array      # put address of Array into x10
    la  a1, Len        # put address of Len into x11
    lw  a1, 0(a1)      # get the *value* of Len and put into x11

    jal ra, SwapMaxWithLast

What output do you expect when you run p4-loop.asm?

Run p4-loop.asm. Is the actual output what you expected? You can check the results in a similar way as you did with p3.
Replace your call to "SwapMaxWithLast" with a call to the procedure "ProcedureConventionTester". This procedure takes the same arguments as SwapMaxWithLast and calls "SwapMaxWithLast", but also checks for compliance with the RISC-V procedure call convention. Run the program with the new call. If the test fails, fix your code so it can pass the test.
Modify the procedure p4 in p4-loop.asm so that it calls SwapMaxWithLast $ Len-1 $ times and with each successive call the length of the array passed is decreased by 1. See the comments in p4-loop.asm for exactly where to put your code. Do not use s registers.

In pseudocode:
```
  for (i=Len; i>1; i--) {
    SwapMaxWithLast(Array, i);
  }
```
What output do you expect when you run p4-loop.asm?
Run p4-loop.asm. Is the actual output what you expected?
Again test your compliance with the RISC-V procedure calling convention by calling "ProcedureConventionTester" instead of "SwapMaxWithLast"

If you are having trouble: Remember, RISC-V procedure calling convention dictates that all procedures must be written abiding the same set of rules. Even if you are the author of both p4-loop and p4-swap and know the exact register usage across both files, you must still abide by all the rules set by the conventions (e.g., even if your p4-loop doesn’t modify a0 and a1, p4-loop must assume they are modified when it calls SwapMaxWithLast.)
With a working p4-loop and p4-swap that follows RISC-V procedure calling conventions, consider the following questions.
- (Q) If SwapMaxWithLast needed to return two values — the two values swapped in its execution — for debugging purposes, what changes to p4-swap would need to be made to accomplish this?
- (Q) If SwapMaxWithLast needs to call another procedure in its execution, what changes to p4-swap would need to be made to accomplish this?
- (Q) If p4-loop needs to be converted into a procedure — similar to what you did when refactoring p3 into p4-swap, what changes to p4-loop would need to be made to accomplish this?
You do not have to implement any of these items, you just describe it with enough detail for an experienced RISC-V programmer to implement.

5 Fixing a broken procedure (fib)

You have been given a broken implementation of a recursive procedure in the fib folder in your repo. The procedure call example on page 108-110 of the book and the factorial example posted on Moodle may be helpful in understanding recursive procedures.

The Fibonacci sequence is defined over nonnegative integers as follows:

$$ \begin{align*} F(0) = 0\\ F(1) = 1\\ F(i) = F(i-1) + F(i-2), i \geq 2\\ \end{align*}$$

While there are several ways to calculate the Fibonacci sequence, for this practical you must use a recursive procedure.

The sequence should be:

N	0	1	2	3	4	5	6	7	8	9	10	..
Fibonacci number	0	1	1	2	3	5	8	13	21	34	55	..

Note: You can (and must) use s-registers for this part of the practical.

Open fib/fib.asm in RARS.
This file contains code to execute a recursive procedure which takes one argument i and returns F(i). Pseudo code is provided below - your final code must follow the algorithm presented below.
```
int
fib(int n)
{
  if (n == 0) {
    return 0;
  }
  else if (n == 1) {
    return 1;
  }
  else {
    return fib(n-1)+fib(n-2);
  }
}
```
Open fib.asm and testfib.asm. A portion of fib is implemented for you; take time to read through the commented documentation to understand fib’s specifications. Additionally, take time to read through the starting implementation of fib to familiarize yourself with what has been completed.

Namely, the first base case N==0 is implemented on lines 40–42. The second base case N==1 is implemented on lines 45–48. Review these thoroughly to fully understand how the conditionals are being checked and how the return values are being handled.
The provided code does not follow the RISC-V calling conventions, therefore it loops infinitely. You need to edit this code to follow the conventions.
- The recursive case is implemented after label CALC on line 50. Notice that the starting implementation currently uses s-registers — s0 to hold N and s1 to hold the return value of fib(N-1).
- (Q) An alternative is to use t-registers instead. What are the differences between using s and t registers to implement fib? Are there any performance differences between the two? Which would you choose and explain why.
Choose whether you want to finish fib using s or t-registers, and modify fib accordingly. If you choose to use s-registers, you must preserve/restore them before changing their values; if you choose to use t-registers, you must preserve/restore them across the recursive fib procedure calls. Be sure to follow all procedure calling conventions, not just the ones associated with s and t-registers.
Implement and test fib with your choice of registers.
- Implement your choice of saving strategy using a stack frame inside your fib procedure. Note that fib is both caller and callee because it is recursive!
- The comment block at the top of fib.asm describes register use and allocation. Update the comments to reflect your design indicating which registers you chose.
- To check compliance with calling conventions, replace your fib calls with fibtest. This is similar to ProcedureConventionTester from p4.
- Test your program with the new fibtest calls. If there are any issues, fix them. Once your program works correctly, restore the original calls to fib. Note: this test confirms that your code follows most of the calling conventions, but doesn't test them all. You should check the green sheet to make sure you haven't missed anything.
With a working implementation of fib, answer the following questions.
- (Q) With your chosen approach, how is your stack frame organized for each fib call? Create a table describing how your stack frame is organized (each table row illustrates a word on the stack, detail the SP offset, content on the stack, etc.).
- (Q) If you call fib(5), does it behave as expected? How many times is fib recursively called?
In fib, you should have two internal calls to fib. Assemble and set a breakpoint at the line right AFTER the first internal call to fib. Then, while calculating fib(5), observe the state of the stack when the breakpoint is hit (i.e., when the first recursive jal ra, fib returns).

Recall that the stack “grows downwards” (sp decrements), therefore you will be reading the table “backwards”. You may also want to uncheck the Hexadecimal Values checkbox to read the stack contents in decimal.
- (Q) Take a screenshot of the stack in RARS (Execute tab → Data Segment panel → dropdown Menu → current sp), and use colored boxes to box out each stack frame
  
  The screenshot below demonstrates what is expected for the worksheet. Note that this is running fib(8) with the stack frames contain some junk values for demonstration — your stack should be minimally set up to implement fib.
  
  If you are stuck on this: the start of fib calls main in testfib, therefore by the time fib is called for the first time sp will have already moved by quite a few bytes away from the top of the stack (0x7FFFEFC). When fib is first called, sp should be at or near 0x7FFFEFBC.
(Q) What happens if fib does not restore the return address before using jalr x0, 0(ra)?
Complete the practical question sheet. You will need to trace the execution of the program and record values from the stack. You can view the stack in RARS: in Execute view, the Data Segment window has a drop down that allows you to check memory at current sp. Use this to see the stack frames.

Working Ahead

If you so choose, work ahead into homework 10 to implement relPrime in RISC-V. A working version of relPrime is needed for the final practicals 9 and 10.

Submission and Grading

Functional Requirements

At the end of the practical you should have done these things:

Walk through p1 to familiarize yourself with RARS,
Modify p2 to accept N=0 as an input and still calculate a sum,
Test p2 for N=0 and N<0
Modify p3 to swap the largest and the last value of array A,
Modify p3 to be more robust than setting the max to -1 initially,
Test p3 with arrays of different sizes and values,
Modify p4-swap to use p3 but accept initial arguments of a0 and a1,
Modify p4-swap to only use t-registers and abide RISC-V procedure calling conventions,
Implement p4-loop to repeatedly call p4-swap to sort array A,
Pass p4-loop using ProcedureConventionTester instead of SwapMaxWithLast,
Test p4-loop with arrays of different sizes and values,
Modify fib to follow calling conventions to fix the recursive procedure,
Completed and submitted the Practical Worksheet.

Git Requirements

In addition to the list below, you should regularly commit and push whenever you fix a bug, work to a stopping point, or make any incremental updates. At minimum, you must have at least 5 commits in your repo for this practical (one for each function):

Git commit 1: upon completion of p2.
Git commit 2: upon completion of p3.
Git commit 3: upon completion of p4-swap.
Git commit 4: upon completion of p4-loop.
Git commit 5: upon completion of fib.

Commit and push via git either using VSCode’s built-in source control or with the git bash terminal. Be sure to copy your final commit ID number for the final question on the worksheet. This ID number can be found on the commit history tab on your Github repository.

Once you are done with your implementation and fixes, please submit your modified .asm files to Gradescope in the "Practical 3 (Code)" gradebox. Please do not zip your files along with their directories, simply drag and drop your modified into the submission box directly. The files you need to submit are p2.asm, p3.asm, p4-swap.asm, p4-loop.asm and fib.asm.

Worksheet Requirement

All the practicals for CSSE232 have these general requirements:

General Requirements for all Practicals

The solution fits the need
Aspects of performance are discussed
The solution is tested for correctness
The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

Complete the worksheet. Some guidelines:

These answers should be short and to the point, usually no more than 2 or 3 sentences.
You will upload this sheet to gradescope. Make sure you indicate your partner when you upload.

Final Checklist

Verify your solutions are committed and the commits are pushed to github.
Submit your completed worksheet to gradescope.
Submit your .asm files to gradescope.

Grading Breakdown

Practical 3 Rubric items	Possible Points	Weight
Worksheet	80	55%
Code	100	45%
Total out of		100%