Practical 8 Pipelined Branches and Jumps

Objectives

This section is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.

Following completion of this practical you should be able to:

Add branch and jump instructions to the implementation of a pipelined processor
Trace code as it executes through a simulated pipelined processor
Use waveform diagrams to debug a processor implementation
Use verilog test benches and a testing framework to test a processor implementation

Guidelines

Because you will be iteratively adding functionality to one processor module, we strongly recommend that you periodically add and commit your progress to git as a backup.

Time Estimate This practical will take approximately 6-9 hours per student, varying depending on your familiarity with Verilog and the pipelined architecture covered in class.

Preliminary Tasks

You will be working in the same groups as Practical 7, so you should use the same repository: your RISC-V-pipelined-processor repository. This also means you should be continuing to use the same .mpf file that you created for the last practical.

Obtain the worksheet.

Add Branches (without flushing)

Recall from lecture that we implement branch detection logic in the ID (decode) stage to minimize the number of invalid instructions in the pipeline. For this first step, do not flush the invalid instructions, instead we will test the SB-type core functionality before fixing the invalid instruction problem.

IMPORTANT: implement the branch logic in the ID (decode) stage.

Branch detection logic

The ALU is not in the decode cycle, so you will have to make a new comparator. Remember how the branch logic the ALU's zero and result[31] outputs to decide whether to branch? You will need to create some new logic that creates these signals.

In addition, you will need to drive the PCSrc mux selector from the ID stage: as soon as you decide whether to branch or not, update the source of the pc to either be the target or newPC (PC + 4).

Branch target computation

You'll also need to be sure to compute the branch target (PC+imm) in the ID stage. Luckily the immediate generator should be in your ID stage already, so this won't be too hard.

(Q) Modify the datapath diagram on the worksheet to show the branch logic in the ID stage.
- The diagram provided has the branch logic in Memory stage. In class we optimized performance by putting the branch logic in the ID (Decode) stage.
(Q) Like you did in the last practical, plan out what is needed in the various pipeline stage registers on the worksheet.
Implement beq
- hint: need a mux as input to the PC
Add control
Run beq test we gave you
- Inspect the tests in test_asm/test_pipe_branches_nohaz.asm
implement bne, blt, and bge (hint: similar modifications to single-cycle)
Run other branch tests we gave you. Add tests to the branches testing plan as described on the worksheet.
- Note that while there are different memory files (assembly code that you must assemble) for each SB instruction, the outcome is the same with the same number of cycles so the same verilog test task should work.

Add `jal` and `jalr` (without flushing)

Repeat the same process as above to implement jal and then jalr. Since flushing is not implemented, the instruction after each jump will execute even though the jump is taken. These are "invalids", which we will fix later.

(Q) Modify the datapath in the worksheet to make the jal and jalr instructions work.
- Note that the branch logic (and branch target computation) are in the ID cycle, so your jump can take effect after ID (decode) just like the branches.
- For jalr, you'll need to use the output of the register file in the ID stage
- For jal, input to PC is same as branch
(Q) Like you did in the last practical, plan out what is needed in the various pipeline stage registers on the worksheet.
Implement datapath and control for jal and jalr
- hint: linking is a bit hard, you might need to keep newPC in your pipeline stage register until WB.
- remember to save the link address in the register specified by the instruction as rd.
Test your implementation (run tests we gave you).
- You will need to update the test bench tasks for jal and jalr to look for the right value in x31. Review the .asm files for these tests to see what value you should expect.
- Consider adding a test to check combinations of jal/jalr that simulates procedure calling and return.

Flush the invalid instructions

Now that you got the branches and jumps working, it's time to flush the invalid instruction that follows each branch and jump because you don't want those to execute when jumping or branching over them.

Implement flushing functionality in the datapath. Recall that flushing involves wiping out the outdated instruction, or making it effectively a nop that does nothing. You should be able to do this with a simple modification to how the IF_ID pipeline stage register is written.
- HINT: this should be a very very small change to how your datapath moves an instruction from IF to ID when it follows a branch/jump that is taken when it moves from ID to EX.
Testing: update the tests in tb_Pipe_branch_jump_nohaz.v to seek the correct behavior, both in final values of registers and in types of instructions in the pipeline. Review the .asm files and figure out what must change when any invalid instructions are flushed.

Running a program to test your processor

For this test, you will assemble and run the program you wrote for HW 10. This is composed of two procedures: relPrime and gcd. Since relPrime calls gcd, it is good to test gcd first.

Assemble and Test `gcd`

Assemble your version of gcd into machine code using your own assembler.
- Put the assembled code into a file called memory-gcd.txt.
- IMPORTANT: since you do not have hazard detection or forwarding, your program may not work right out of the assembler. You may have to manually separate instructions that introduce hazards by inserting nop instructions (add x0, x0, x0) between them so the data and control problems are avoided.
  Tip: how to insert "nop" instructions
  
  If you need to space instructions out, put independent instructions between the two that caused a dependency. You do NOT want to introduce hazards. You can add zero to itself as a nop instruction:
```
xori x2, x3, 4   ; F D X M W
add  x0, x0, x0  ;   F D X M W
add  x0, x0, x0  ;     F D X M W
add  x0, x0, x0  ;       F D X M W
addi x1, x2, x2  ;         F D X M W 
```
  In this code, x2 is put in the reg file while the final addi is getting fetched from memory.
- Create a memory-gcd.txt file with your code and make sure it is added to your git repository.
Run your code from the test bench in tb_Processor_Program.v.
- Start by copying the testProgramA task to test_gcd.
- Have this new test_gcd task load your assembled memory-gcd.txt code using `LOAD_MEMH()
- Change the arguments of the task to take in two input arguments instead of one (since gcd() requires two) in addition to the expected result called expected
- set the argument registers to have the intial inputs using `SET_REG()
- Change $display() and the VU.SET_TEST_NAME lines to reflect your new test info.
- Call test_gcd() from the initial begin block (e.g. test_GCD(32'h6, 32'h18, 32'h3))
- Run the test and see if the right value comes out in x31.
(Q) Take a screenshot of your modelsim waveform running the test_gcd task in tb_Processor_program. Put the screenshot into the practical worksheet.
(Q) Answer the worksheet question about what arguments you passed gcd, whether it worked, and what you had to change to make it work.

Assemble and Test `relPrime`

This will be the most difficult part of this practical.

Repeat the same steps as above (the ones for gcd), but this time with relPrime.

Assemble relPrime (and gcd) using your assembler.
- put the output into a file called memory-relprime.txt
- Remember to inject nop instructions to prevent data and control hazards.
Create a new test task in the tb_Processor_Program.v test bench called test_relprime
Call the task from the initial begin in the test bench multiple times. Try many possible inputs.
- Tip: start with small values of n (e.g., 2 or 8) to make debugging easier at first; then move on to larger values (36, 120, 5040).
(Q) Take a screenshot of your modelsim waveform running the test_gcd task in tb_Processor_program. Put the screenshot into the practical worksheet.
(Q) Answer the worksheet question about what arguments you tried passing to relPrime, whether it worked, and what you had to change to make it work.

Complete the worksheet

(Q) Answer the remaining questions in the practical worksheet.

Working Ahead

After completing this, you can work ahead (go start Practical 9)

Submission and Grading

Functional Requirements

At the end of the practical you should have done these things:

Implement Processor.v to support:
- SB-types
- jal
- jalr
Added flushing to the pipeline to flush invalid instructions.
Modify and pass the following test benches
- test_branches_nohaz
- test_jal_nohaz
- test_jalr_nohaz
Assemble gcd, create a test bench task in tb_Processor_Program to test it, and pass the test you created.
Assemble relPrime, create a test bench task in tb_Processor_Program to test it, and pass the test you created.
Completed and submitted the Practical Worksheet.

Git Requirements

Remember, Do not add and commit every single file ModelSim creates. Only add, commit, and push .v, .do, .txt, .asm, and .mpf files.

In addition to the list below, you should regularly commit and push whenever you fix a bug, work to a stopping point, or make any incremental updates. At minimum, you must have at least 6 commits in your repo for this practical:

Git commit 1: upon completion and tested branches
Git commit 2: upon completion and tested jal
Git commit 3: upon completion and tested jalr
Git commit 4: upon implementation of invalid flushing
Git commit 5: upon completion and tested gcd
Git commit 6: upon completion and tested relPrime

Since this is a team-based practical, there should be numerous iterative commits from each team member.

Worksheet Requirement

All the practicals for CSSE232 have these general requirements:

General Requirements for all Practicals

The solution fits the need
Aspects of performance are discussed
The solution is tested for correctness
The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

(Q) Complete the practical worksheet. Specifically answer the need/iteration/performance questions, and write your final git commit on the worksheet where required.

Final Checklist

Verify that your code compiles and your tests pass (or at least run).
Verify your verilog code is committed and the commits are pushed to github.
Submit your completed worksheet to gradescope.

Grading Breakdown

Practical 8 Rubric items	Possible Points	Weight
Worksheet	70	47%
Code	80	53%
Total out of		100%