Practical 8 Pipelined Branches and Jumps
Objectives
This section is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.
Following completion of this practical you should be able to:
- Add branch and jump instructions to the implementation of a pipelined processor
- Trace code as it executes through a simulated pipelined processor
- Use waveform diagrams to debug a processor implementation
- Use verilog test benches and a testing framework to test a processor implementation
Guidelines
- Because you will be iteratively adding functionality to one processor module, we strongly recommend that you periodically add and commit your progress to git as a backup.
Time Estimate This practical will take approximately 6-9 hours per student, varying depending on your familiarity with Verilog and the pipelined architecture covered in class.
Preliminary Tasks
You will be working in the same groups as Practical 7, so you should use the same repository: your RISC-V-pipelined-processor repository.
This also means you should be continuing to use the same .mpf file that you created for the last practical.
Add Branches (without flushing)
Recall from lecture that we implement branch detection logic in the ID (decode) stage to minimize the number of invalid instructions in the pipeline. For this first step, do not flush the invalid instructions, instead we will test the SB-type core functionality before fixing the invalid instruction problem.
IMPORTANT: implement the branch logic in the ID (decode) stage.
Branch detection logic
The ALU is not in the decode cycle, so you will have to make a new comparator. Remember how the branch logic the ALU's zero and result[31] outputs to decide whether to branch? You will need to create some new logic that creates these signals.
In addition, you will need to drive the PCSrc mux selector from the ID stage: as soon as you decide whether to branch or not, update the source of the pc to either be the target or newPC (PC + 4).
Branch target computation
You'll also need to be sure to compute the branch target (PC+imm) in the ID stage. Luckily the immediate generator should be in your ID stage already, so this won't be too hard.
- (Q) Modify the datapath diagram on the worksheet to show the branch logic in the ID stage.
- The diagram provided has the branch logic in
Memorystage. In class we optimized performance by putting the branch logic in the ID (Decode) stage.
- The diagram provided has the branch logic in
- (Q) Like you did in the last practical, plan out what is needed in the various pipeline stage registers on the worksheet.
- Implement
beq- hint: need a mux as input to the PC
- Add control
- Run beq test we gave you
- Inspect the tests in
test_asm/test_pipe_branches_nohaz.asm
- Inspect the tests in
- implement
bne,blt, andbge(hint: similar modifications to single-cycle) - Run other branch tests we gave you. Add tests to the branches testing plan as described on the worksheet.
- Note that while there are different memory files (assembly code that you must assemble) for each SB instruction, the outcome is the same with the same number of cycles so the same verilog test task should work.
Add jal and jalr (without flushing)
Repeat the same process as above to implement jal and then jalr. Since flushing is not implemented, the instruction after each jump will execute even though the jump is taken. These are "invalids", which we will fix later.
- (Q) Modify the datapath in the worksheet to make the
jalandjalrinstructions work.- Note that the branch logic (and branch target computation) are in the ID cycle, so your jump can take effect after ID (decode) just like the branches.
- For
jalr, you'll need to use the output of the register file in the ID stage - For
jal, input to PC is same as branch
- (Q) Like you did in the last practical, plan out what is needed in the various pipeline stage registers on the worksheet.
- Implement datapath and control for
jalandjalr- hint: linking is a bit hard, you might need to keep
newPCin your pipeline stage register untilWB. - remember to save the link address in the register specified by the instruction as
rd.
- hint: linking is a bit hard, you might need to keep
- Test your implementation (run tests we gave you).
- You will need to update the test bench tasks for
jalandjalrto look for the right value inx31. Review the.asmfiles for these tests to see what value you should expect. - Consider adding a test to check combinations of jal/jalr that simulates procedure calling and return.
- You will need to update the test bench tasks for
Flush the invalid instructions
Now that you got the branches and jumps working, it's time to flush the invalid instruction that follows each branch and jump because you don't want those to execute when jumping or branching over them.
- Implement flushing functionality in the datapath. Recall that flushing involves wiping out the outdated instruction, or making it effectively a
nopthat does nothing. You should be able to do this with a simple modification to how theIF_IDpipeline stage register is written.- HINT: this should be a very very small change to how your datapath moves an instruction from IF to ID when it follows a branch/jump that is taken when it moves from ID to EX.
- Testing: update the tests in
tb_Pipe_branch_jump_nohaz.vto seek the correct behavior, both in final values of registers and in types of instructions in the pipeline. Review the.asmfiles and figure out what must change when any invalid instructions are flushed.
Running a program to test your processor
For this test, you will assemble and run the program you wrote for HW 10. This is composed of two procedures: relPrime and gcd. Since relPrime calls gcd, it is good to test gcd first.
Assemble and Test gcd
-
Assemble your version of
gcdinto machine code using your own assembler.- Put the assembled code into a file called
memory-gcd.txt. - IMPORTANT: since you do not have hazard detection or forwarding, your program may not work right out of the assembler. You may have to manually separate instructions that introduce hazards by inserting
nopinstructions (add x0, x0, x0) between them so the data and control problems are avoided.Tip: how to insert "nop" instructions
If you need to space instructions out, put independent instructions between the two that caused a dependency. You do NOT want to introduce hazards. You can add zero to itself as a
nopinstruction:xori x2, x3, 4 ; F D X M W add x0, x0, x0 ; F D X M W add x0, x0, x0 ; F D X M W add x0, x0, x0 ; F D X M W addi x1, x2, x2 ; F D X M WIn this code,
x2is put in the reg file while the finaladdiis getting fetched from memory. - Create a
memory-gcd.txtfile with your code and make sure it is added to your git repository.
- Put the assembled code into a file called
-
Run your code from the test bench in
tb_Processor_Program.v.- Start by copying the
testProgramAtask totest_gcd. - Have this new
test_gcdtask load your assembledmemory-gcd.txtcode using `LOAD_MEMH() - Change the arguments of the task to take in two input arguments instead of one (since
gcd()requires two) in addition to the expected result calledexpected - set the argument registers to have the intial inputs using `
SET_REG() - Change
$display()and theVU.SET_TEST_NAMElines to reflect your new test info. - Call
test_gcd()from the initial begin block (e.g.test_GCD(32'h6, 32'h18, 32'h3)) - Run the test and see if the right value comes out in
x31.
- Start by copying the
-
(Q) Take a screenshot of your modelsim waveform running the
test_gcdtask intb_Processor_program. Put the screenshot into the practical worksheet. -
(Q) Answer the worksheet question about what arguments you passed
gcd, whether it worked, and what you had to change to make it work.
Assemble and Test relPrime
This will be the most difficult part of this practical.
Repeat the same steps as above (the ones for gcd), but this time with relPrime.
-
Assemble
relPrime(andgcd) using your assembler.- put the output into a file called
memory-relprime.txt - Remember to inject
nopinstructions to prevent data and control hazards.
- put the output into a file called
-
Create a new test task in the
tb_Processor_Program.vtest bench calledtest_relprime -
Call the task from the initial begin in the test bench multiple times. Try many possible inputs.
- Tip: start with small values of
n(e.g., 2 or 8) to make debugging easier at first; then move on to larger values (36, 120, 5040).
- Tip: start with small values of
-
(Q) Take a screenshot of your modelsim waveform running the
test_gcdtask intb_Processor_program. Put the screenshot into the practical worksheet. -
(Q) Answer the worksheet question about what arguments you tried passing to
relPrime, whether it worked, and what you had to change to make it work.
Complete the worksheet
(Q) Answer the remaining questions in the practical worksheet.
Working Ahead
After completing this, you can work ahead (go start Practical 9)
Submission and Grading
Functional Requirements
At the end of the practical you should have done these things:
- Implement
Processor.vto support:- SB-types
jaljalr
- Added flushing to the pipeline to flush invalid instructions.
- Modify and pass the following test benches
test_branches_nohaztest_jal_nohaztest_jalr_nohaz
- Assemble
gcd, create a test bench task intb_Processor_Programto test it, and pass the test you created. - Assemble
relPrime, create a test bench task intb_Processor_Programto test it, and pass the test you created. - Completed and submitted the Practical Worksheet.
Git Requirements
Remember, Do not add and commit every single file ModelSim creates. Only add, commit, and push .v, .do, .txt, .asm, and .mpf files.
In addition to the list below, you should regularly commit and push whenever you fix a bug, work to a stopping point, or make any incremental updates. At minimum, you must have at least 6 commits in your repo for this practical:
- Git commit 1: upon completion and tested branches
- Git commit 2: upon completion and tested jal
- Git commit 3: upon completion and tested jalr
- Git commit 4: upon implementation of invalid flushing
- Git commit 5: upon completion and tested gcd
- Git commit 6: upon completion and tested relPrime
Since this is a team-based practical, there should be numerous iterative commits from each team member.
Worksheet Requirement
All the practicals for CSSE232 have these general requirements:
General Requirements for all Practicals
- The solution fits the need
- Aspects of performance are discussed
- The solution is tested for correctness
- The submission shows iteration and documentation
Some practicals will hit some of these requirements more than others. But you should always be thinking about them.
(Q) Complete the practical worksheet. Specifically answer the need/iteration/performance questions, and write your final git commit on the worksheet where required.
Final Checklist
- Verify that your code compiles and your tests pass (or at least run).
- Verify your verilog code is committed and the commits are pushed to github.
- Submit your completed worksheet to gradescope.
Grading Breakdown
| Practical 8 Rubric items | Possible Points | Weight |
|---|---|---|
| Worksheet | 70 | 47% |
| Code | 80 | 53% |
| Total out of | 100% |