Practical 9 Pipelined hazard resolution
Objectives
This section is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.
Following completion of this practical you should be able to:
- Implement data hazard resolution in a pipelined processor by employing write-before-read, data forwarding, and stalls.
- Use waveform diagrams to debug a pipelined processor implementation with instructions in all stages of the pipe
- Use verilog test benches and a testing framework to test a processor implementation
Guidelines
- Because you will be iteratively adding functionality to one processor module, we strongly recommend that you periodically add and commit your progress to git as a backup.
Time Estimate This practical will take approximately 6-9 hours per student, varying depending on your familiarity with Verilog and the pipelined architecture covered in class.
Preliminary Tasks
You will be working in the same groups as Practical 8, so you should use the same repository: your RISC-V-pipelined-processor repository.
This also means you should be continuing to use the same .mpf file that you created for the last practical.
The general sequence for this practical is (1) try out the tests and see how they work, (2) implement data forwarding, then (3) add stalling.
Run the hazards tests
During this practical, you will gradually be fixing data hazards for R-types until you've fixed them all; then you will look at other types of hazards to fix.
- To begin, open up the file in
test_asm/datahaz/test_datahaz_x2.asmand read the test code (and comments) provided. - (Q) On the worksheet, answer the first question about the need for forwarding in a pipelined processor.
- Open up the
tb_Pipe_hazards.vtest bench and scroll to the bottom. Notice there are a sequence of test tasks commented out, much like in the last practical and the first one (test_no_hazard_detection()) is the only one uncommented. - Scroll up to the implementation of
test_no_hazard_detection()and observe that it (and many other tasks) simply check that the final states of the registers are correct. Answer the question in the worksheet about these tests. - Review
check_data_hazard_general()to ensure it will work (work does not mean pass it just means that you understand the code and see where it will fail in your current implementation) with your pipelined processor implementation. It uses the same shortcuts inpipeline_test_tools.vhthat you may have edited for Practical 8, so hopefully there won't be much to change. - Open the ModelSim project you created for Practical 8 and add
tb_Pipe_hazards.vto the project. Compile it and simulate this test bench. Fix any bugs or errors until you can gettest_no_hazard_detection()to pass its test. (Note: that you may not pass this test if you already have implemented the write-then-read behavior. Consider your answer to the 1.4 question on the worksheet.)
Write then read
- Once you've passed the
check_data_hazard_general()tests, comment it out in the test bench's main initial block. - In that same initial block, uncomment the
test_write_then_read_hazard_detection()task and the call toCLEAR_PIPE()that follows it. (See comments in that block) - Compile and run the test bench in ModelSim. It might fail if you've not implmented write-before-read in your datapath. That's ok!
- Figure out how to make your reg file write before it reads
- hint: consider when you should write to the register file so it can be read at the right time (but before the pipeline stage registers get written).
- Once you get this test to pass, answer the next question on the worksheet:
Describe the process you plan to follow to incrementally address data hazards in your pipeline for R-type instructions. If you’re not sure what process to follow, review the comments in the ASM file (
test_datahaz_x2.asm) and the Test Bench (tb_Pipe_hazards.v).
Data forwarding
-
Uncomment the next test in the test bench (
test_WB_to_EX_fwd()) and compile then run the test bench again. -
(Q) On the worksheet, write some pseudocode that describes how you will detect the need to forward data to one of the two register operands (A or B) when an instruction in EX needs data from WB.
Recall that forwarding happens when there is a data hazard between a register being written by one instruction and a second instruction that reads the same register before the writer puts it in the register file. We pass this "to be written" data from one pipeline stage to another to compensate for the fact that the writing happens too late when these two instructions are too close together in the pipeline.

-
Create forwarding unit module and add it to your Processor.
Tip: forwarding unit module shape
module ForwardingUnit ( input wire [6:0] opcode, // to decide rs 1 / rs 2 check input wire [4:0] rs1, // to check dependency with rd input wire [4:0] rs2, // to check dependency with rd ... // fill in remaining input values output reg [1:0] ALUSrcA, // controls ALU source A mux output reg [1:0] ALUSrcB // controls ALU source B mux ) ;- Get the first forwarding (WB -> EX) working before you try to address the other conditions.
-
Handle WB -> EX forwarding
- SUGGESTION: connect some outputs from the
MEM_WBpipeline stage register and from theID_EXpipeline stage register to determine whether the hazard exists, then create an output that will control a mux to use forwarded data (fromMEM_WB) or the standard data from the EX cycle. - uncomment the test for this in the test bench, and update your forwarding unit and Processor accordingly.
- SUGGESTION: connect some outputs from the
-
Handle MEM -> EX forwarding
- uncomment the test for this in the test bench, and update your forwarding unit and Processor accordingly.
Tip: suggestions for implementing forwarding
Recall the forwarding logic we covered in lecture. Forwarding data into the EX stage might come from MEM or WB.
Generally, this is the plan:
-
Any forwarding from
MEMis prioritized over forwarding fromWBsinceMEMis "newer" data. -
The forwarding unit in
EXwill choose data from theALUOutvalue inMEMinstead ofAif:- The instruction in
MEMis writingrdand the instruction inEXhas read the same register asrs1. - and the instruction in
EXhas an opcode for an R-type, I-type, S-type, or SB-type.
- The instruction in
-
The forwarding unit in
EXwill choose data from theALUOutvalue inMEMinstead ofBif:- the instruction in
MEMis writingrdand the instruction inEXhas read the same register asrs2. - and the instruction in
EXhas an opcode for an R-type, S-type, or SB-type.
- the instruction in
-
The forwarding unit in
EXwill choose data from whatever is going into the register file (eitherALUOut,PC+4, orMemOut) instead ofAif:- The forwarding unit is not forwarding from cycle
MEMintoA - and the instruction in
WBis writingrdand the instruction inEXhas read that register asrs1 - and the instruction in
EXhas an opcode for an R-type, I-type, S-type, or SB-type.
- The forwarding unit is not forwarding from cycle
-
The forwarding unit in
EXwill choose data from whatever is going into the register file (eitherALUOut,PC+4, orMemOut) instead ofBif:- The forwarding unit is not forwarding from cycle
MEMintoB - and the instruction in
WBis writingrdand the instruction inEXhas read that register asrs2 - and the instruction in
EXhas an opcode for an R-type, I-type, S-type, or SB-type.
- The forwarding unit is not forwarding from cycle
Since much as the logic for the forwarding unit feels bulky, leverage the behavior of
ifandelsestatements to implement prioritizingMEMforwarding overWBforwarding. If done correctly, the forwarding logic will be much more approachable. -
At the end of this step, your test bench should run and pass the following tests, in sequence:
test_write_then_read_hazard_detection()test_WB_to_EX_fwd()test_MEM_to_EX_fwd()
-
Add, commit, and push your code changes to git. Be sure to add your assembled versions of the asm files.
-
(Q) On the worksheet, answer the questions about implementing and testing forwarding, and whether your forwarding worked the first time you tried.
Stalling the Pipeline
Recall that stalls happen when there is a data hazard and the data is not yet available. Commonly this happens when an instruction follows a lw and depends on what the lw loads.
- (Q) On the worksheet, answer the question about the need for stalls.
Adding the lw stall
-
Examine test asm file
test_datahaz_lw.asm, then assemble it. -
Create a hazard detection unit module
-
Add logic to the new module that creates a stall when
lwis in EX and the next instruction will use its rd value (see page 322 in the textbook)-
This should be handled in the decode stage
-
hint: There is a special case for UJ and U types that follow a
lw: they don't use register sources and don't need to stall!Suggestions for implementing the Hazard Detection Unit
Here are some things to consider while implementing the HDU:
- This reads
rs1andrs2(source reg numbers) from the decode stage, and also theopcodeto determine if those registers are getting read - It compares those to the
rdregister in theEXstage, including theMemReadandRegWritesignals fromEX, to determine if the instruction in EX is writingrd. - It then does three things when a stall is needed:
- Turns off
PCWriteso the instruction in IF stays there. - Disables writing to the
IF_IDpipeline stage register, so the instruction in ID stays there. - Inserts a "bubble" into EX by writing all zeroes into the
ID_EXregister control bits.
- Turns off
This effectively separates the instructions that were in ID and EX so that during the next cycle they are in ID and MEM (and there's a
nopin EX) - This reads
-
-
Uncomment and run our tests (
test_lw_stall()) -
(Q) On the worksheet, answer the questions about your hazard detection unit and about the feasibility of stalling every instruction once.
Forwarding into sw
Special care needs to be taken for sw since it requires both the forwarded B value (from rs2) and the imm. Most other instructions only require one or the other. sw may also require a stall if a lw precedes it.
Ensure that your implementation forwards B into EX and carries that into the MEM stage of the pipeline while the ALU uses the imm value instead of B.
-
Test
sw. Read and assembletest_datahaz_sw.asm.Performance issues when forwarding into sw
In the implementation we covered in lecture, as well as that being tested for the testbench,
sw’srs2dependencies are resolved in theEXstage by forwarding the correctBvalue into theEXstage, then saving it in theEX_MEMpipeline stage register.While there is another variation that forwards the value directly into the MEM stage from WB, that variation will require additional forwarding checks that we opted to remove for the sake of simplicity.
Our variation of resolving
swin theEXstage does infrequently cause stalls (lwtosw), but that is a performance sacrifice we are making for simplicity of implementation. -
Uncomment and run our tests (
test_sw_forwarding()). Note that one of the tests causes a stall in addition to forwarding. See the above performance note for details. -
Fix any errors that you need to make those tests pass. (You may have to update your forwarding unit.)
-
(Q) On the worksheet, answer the question about forwarding from
lwtosw. -
Now is a great time to commit your changes to git. Include any assembled versions of the asm files.
Handling Additional Hazards
There are some additional forwarding cases you should resolve that are not covered by the testbenches we provide.
Forwarding into Branches
Since branches are in the decode stage, there will be additional edge cases to consider. For example:
add x5, x8, x9 ; F D X m w // x 5 calculated in execute stage
beq x5, x0, L ; F d X M W // needs x 5 in decode stage
Notice that add only has x5 ready for forwarding in the MEM and WB stage,
but beq needs x5 in the decode stage. This means stalling is required.
Work with your team to to determine how this can be resolved. Note that there are two strategies. One approach optimizes performance but has more datapath complexity, while the other optimizes datapath consistency at the cost of lower performance. Explore both to see which one is most approachable for you and your team.
There is no test provided to you to test this. You should write your own .asm
test, assemble it, and write you own testbench task to verify you are resolving
this correctly.
(Q) After resolving and testing the add to beq hazard, answer the
questions on the worksheet and take a screenshot of your tests running in
modelsim. Be sure to make the screenshot clearly show how you know the tests
are working as intended.
Forwarding lui immediate and LinkAddr from MEM
With the base forwarding datapath that we set up in lecture, the MEM stage
forwards ALUOut, whereas the WB stage forwards the value after the MemToReg mux
(in order to forward the correct value if the instruction is a lw, lui, jal, and jalr).
However, if the forwarding value needs to come from lui, jal, or jalr and the instruction
is in the MEM stage, the datapath can only forward ALUOut. This is an issue, especially for
lui+addi combos that we saw in the first two weeks of lecture:
li t0, IMM[31:0] ; li t0, IMM pseudoinstruction decomposition
; .... turns into: ....
lui t0, IMM[31:12] + IMM[11] ; F D X m W // fwd EX_MEM.imm
addi t0, t0, IMM[11:0] ; F D x M W // but EX_MEM.imm is not setup
There's a similar problem for the link address being written by jal or jalr.
You will need to set up some datapath structure to handle this. This should not be overly complicated as you can re-use an existing control signal to make this almost trivially simple.
There is no test provided to test this. You should write your own .asm test, assemble it, and write you own testbench task to verify you are resolving this correctly.
(Q) After resolving and testing this hazard, answer the questions on the worksheet and take a screenshot of your tests running in modelsim. Be sure to make the screenshot clearly show how you know the tests are working as intended.
6 Write and run bigger tests (programs!)
Examine the following code:
// Array A's memory location is in x5
int[] A = {1, 2, 3, 4, 5};
int idx = 0;
while(idx < 5) {
A[idx] = A[idx] + 1;
idx = idx + 1;
}
- Write RISC-V for this in an .asm file in the
test_asmfolder.- Use comments to explain what you're doing.
- Add, commit, and push it to git.
- To initialize the array, it is ok to pick an address in memory and put the integers in your assembled .txt file there. (You don't need to write RISC-V instructions to do that).
- To initialize
x5to have the address ofA, load the address as an immediate (rememberluiandaddi? Or maybe you have an assembler that supports pseudoinstructions likeli?) in your code. idxcan be any register of your choice and does not need to be stored in memory.
-
Open
tb_Processor_Program.vin VS Code and observe how it loads a .txt file and runs the program in that file. -
Make a copy of the
testProgramA()task in the test bench and modify the copy to run the code you wrote above.- HINT: you can use
CHECK_MEM()to check contents of memory in your test bench. Do this to see what the array values are after the program runs. - HINT:
testProgramAtakes an argument and an expected result; you can remove those from your copy for this test.
- HINT: you can use
-
For the last practical, you used your
relPrimeandgcdprogram. Assemble your code again for those procedures into something that your processor can run, but this time do not add thenopinstructions to eliminate hazards. Put that code in thetest_asmfolder in your git repo, replacing the code you addednopinstructions to.- Add, commit, and push your assembly (.asm file) and the assembled code (.txt file).
-
You should already have a copy of the
testProgramA()task in the test bench that will run yourrelPrimeprogram.- Verify that your newly assembled versions (without
nops to reduce hazareds) runs and produces the right answers.
- Verify that your newly assembled versions (without
-
(Q) On the worksheet, explain how you plan to test that
relPrimeworks; specifically, how will pass the input argument to your program from the test bench, and how will your test bench know when the program has finished running (so it can check the result)?- There are many ways to do this; think about the Input/Output lecture from class for a few ideas, or think about how you could tell that the program is done by inspecting a register or the PC.
-
Test your relPrime program on your processor with many inputs, including at least these three:
relPrime(6) = 5relPrime(5040) = 11relPrime(30030) = 17
Design a new instruction
Similar to what you did in practical 6, your last task is to design and implement a new instruction and implement it in your pipeline. You need to provide clear documentation for how it will work, and justify it's inclusion in the instruction set.
- As you plan your design you should consider inventing an instruction that makes relprime run faster (this generally would combine multiple instructions into one new instruction).
- (Q) Document the design and format (in the practical worksheet) and explain how you plan to resolve hazards in the pipeline.
- maybe add a stage to support extra work
- or stall the pipeline
- or add more hardware to existing stages
- (Q) Explain how you expect the new instruction to impact the performance of your processor.
- Implement your design.
- Run relprime with your new instruction (you'll have to rewrite relprime - make sure you keep both versions in your repository.)
- (Q) Compare the two runtimes (number of cycles for each run, before and after your new instruction)
BONUS: Implement Memory-Mapped Input/Output (MMIO)
We discussed I/O in class, one way of implementing I/O is Memory Mapped I/O. For an extra points on this practical you can implement MMIO. You will need to write a test bench to show this works. If you do this you need to do the following:
- Add a datapath drawing to the worksheet which shows the modifications for MMIO.
- Put a Test Plan (following the format from previous practicals) together to show that I/O works.
- Include a clear screenshot of a waveform in your worksheet that shows that the IO succeeded. You should annotate this waveform to indicate key events (e.g. point an arrow at a signal when an input number gets into a register.)
Full credit will only be awarded if you communicate how this works sufficiently in your worksheet. The graders will not look at your code for this problem.
This is a challenge problem, there is less support for this, you are expected to take ownership if you want to complete this challenge.
Working Ahead
Take a look at Practical 10 if you want to work ahead. This is mostly creating a presentation.
Submission and Grading
Functional Requirements
At the end of the practical you should have done these things:
- Implement data forwarding in
Processor.vand pass the following test bench tasks:test_write_then_read_hazard_detectiontest_WB_to_EX_fwdtest_MEM_to_EX_fwdtest_sw_forwarding
- Implement pipeline stalls in
Processor.vand pass the following test bench tasks:test_lw_stall
- Handle additional hazard cases and create test tasks for them:
lw->branch stalling- Forwarding into branches
- Forwarding from
luiand jumps
- Run
relPrime(5040)without artificially-addednopinstructions - OPTIONALLY implement MMIO
- Completed and submitted the Practical Worksheet.
Git Requirements
Remember, Do not add and commit every single file ModelSim creates. Only add, commit, and push .v, .do, .asm, .txt, and .mpf files.
In addition to the list below, you should regularly commit and push whenever you fix a bug, work to a stopping point, or make any incremental updates. At minimum, you must have at least 5 commits in your repo for this practical:
- Git commit 1: upon completion of data forwarding
- Git commit 2: upon completion of stalling (because of lw)
- Git commit 3: upon completion and testing of additional hazard cases
- Git commit 4: upon completion and testing of
relPrime - Git commit 5: upon completion and testing of your new instruction
Since this is a team-based practical, there should be numerous iterative commits from each team member.
Worksheet Requirement
All the practicals for CSSE232 have these general requirements:
General Requirements for all Practicals
- The solution fits the need
- Aspects of performance are discussed
- The solution is tested for correctness
- The submission shows iteration and documentation
Some practicals will hit some of these requirements more than others. But you should always be thinking about them.
(Q) Complete the practical worksheet and write your final git commit on the worksheet where required.
Final Checklist
- Verify that your code compiles and your tests pass (or at least run).
- Verify your verilog code is committed and the commits are pushed to github.
- Submit your completed worksheet to gradescope.
Grading Breakdown
| Practical 9 Rubric items | Possible Points | Weight |
|---|---|---|
| Worksheet | 86 | 52% |
| Code | 80 | 48% |
| Total out of | 100% |
