Practical 7 Building a Pipelined Processor

Objectives

This section is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.

Following completion of this practical you should be able to:

Construct a verilog implementation of a very basic pipelined processor that supports R-type, I-type, lw, sw, and U-type RISC-V instructions.
Trace code as it executes through a simulated pipelined processor
Use waveform diagrams to debug a processor implementation
Use verilog test benches and a testing framework to test a processor implementation

Guidelines

Because you will be iteratively adding functionality to one processor module, we strongly recommend that you periodically add and commit your progress to git as a backup.
Here is a reasonable quick-reference for verilog: Compact Summary

Time estimate Each of the pipeline practicals took students in previous terms ~7-11 hours per team member for a team of 3. This will vary based on your familiarity with Verilog and the pipelined architecture covered in class.

Preliminary Tasks

Follow this sequence of instructions to complete the practical.

Obtain the worksheet.

Obtain your `RISC-V-pipelined-processor` git repo

First, get your team name from your professor (it is probably on moodle or CATME). Once you have been assigned team names, you can create your repository via github classroom.

When you follow the link to get your repo you need to enter your team name exactly as your professor provided it.

Only the first person in your group to set up the repo needs to do this, later people can select the correct team from the list of teams.

Get your repo at this link. Don't click that link without reading the sentences above.

Repository Overview

Before diving in, let’s take a quick overview of the files in your repository:

vunit.v: the testing suite, this should be familiar from the previous practicals,
Processor.v: the main processor module. Reference one of your group member's single-cycle version of this file and import the necessary sub-modules (ALU, etc) from the single-cycle processor as a starting point.
Buffer_*.v: These files are the pipeline stage registers that sit between each pipeline stage. For example, Buffer_IF_ID.v sits between the fetch and decode stages of the pipeline.
opcodes.do: This is a waveform config that will add RISC-V opcodes and other shortcuts to the radix types in your waveform view.
A few testbenches:
- tb_Pipe_base_nohaz.v: testbench for practical 7
- tb_Pipe_branch_jump_nohaz.v: testbench for practical 8
- tb_Pipe_hazards.v: testbench for practical 9
- tb_Processor_Program.v: a testbench to run a single program on your processor. We'll use this in practical 9.
- pipeline_test_tools.vh: a testbench header file you can include in your tests to add convenient macros that make writing tests easier.
A directory test_asm that contains all the programs you will run in the test benches.

Building a pipelined RISC-V Processor

In this and the remaining practicals, planning before implementation is going to be an core exercise. Before diving into integrating the buffer files into your Processor.v, take the time needed to ensure your group understand each wire and detail of the basic pipelined processor.

The more time you spend familiarizing yourself with this datapath, the more potential bugs and accidental errors you will encounter when you wire the processor together. For complex, intricate projects like these, measure twice cut once will be the main theme.

While this is no longer required by the worksheet, tracing out the datapath is going to greatly help you organize and check your work.

Implement R-types

(Q) In the worksheet, plan out what will be needed in each of the Buffer_* pipeline stage registers. For instance, IF_ID stage register will need to hold the 32 bit instruction value (and the PC), but not much else.
Implement your pipeline stage registers' contents.
- Be sure the clock is included as an input.
- For each "thing" in the pipeline stage register, make the output named that thing, and an input should have the same name with an _in suffix. For example:
```
input wire [31:0] inst_in,
output reg [31:0] inst,
```
- Use an initial begin to initialize each output value.
- In the body of the module, use an always block to copy the inputs to their corresponding outputs.
- Remember: Both datapath values and control signals will need to be passed via these stage registers.

Tip: debugging values

While values such as the PC is not needed in stages beyond Fetch (at least for now), passing them into each stage will be helpful as it provides information as you debug the waveform. By passing the PC into each stage, you can keep track of the instruction each stage is handling. You may choose to add additional values as you see fit, even if they are unused.

Add instances of each pipeline stage register to Processor.v.
- read the comments in Processor.v as you do this: they will provide some suggestions.
Between the registers, instantiate the components you need
- Register File
- PC
- ALU
- No data memory yet (use DP_Memory.v from practical, but only connect the "A" ports.)
Connect the components to your pipeline stage registers. Use the datapath diagram as your guide.

Tip: Wire connection by port name

At this point, you’ll notice that the number of wire connections for a pipelined processor is going to be greatly more than the single-cycle processor—improved performance comes at the cost of increased complexity. It will be a good idea to wire by port names instead of declaring a new wire every single time. So instead of:

    wire [31:0] instr ;
    DP_Memory Mem(
        // Port A for instruction read
        .addr_a ( PC_output[11:2]),
        .we_a (1’b0),
        .data_a (32’b0),
        .clk_a (CLK),
        .q_a( instr ),
        ...
    );
    Buffer_IF_ID IF_ID (
        .PC_in( PC_output ), .PC() ,
        .instr_in( instr ), .instr ) ,
        .reset  reset ), .CLK(CLK)
    )

You can skip the wire declaration and directly wire Mem.q_a into instr_in:

    DP_Memory Mem(
    // Port A for instruction read
        .addr_a ( PC_output [11:2]),
        .we_a (1’b0),
        .data_a (32’b0),
        .clk_a (CLK),
        .q_a( instr ),
        ...
    );
    Buffer_IF_ID IF_ID (
        .PC_in( PC_output ), .PC() ,
        .instr_in Mem.q_a), .instr ) ,
        .reset( reset ), .CLK(CLK)
    )

This will greatly reduce the amount of clutter in your Processor.v.

NOTE: if you plan to try putting your processor onto an FPGA board to get extra credit, you cannot use this technique. Instead, you will need to declare tons of wires and connect them like before.

The Memory cycle will just pass data through from EX_MEM to MEM_WB for R-types.
Your WB cycle will refer to the register file in your ID cycle.

Clock Timing

Now that we are switching to a pipelined processor, we no longer need to worry about completing everything in a single clock cycle. Instead, we need to worry about a different kind of timing.

All pipeline stage registers (the PC is technically one of these) needs to update on the rising edge of the clock. When the clock ticks all the pipeline stages should update. This means everything between the Pipeline Stage registers, the register file, the ALU, the control unit, the instruction memory/data memory, etc. should all finish its execution before the rising edge.

Asynchronous components do this automatically, but all clock components needs to be synchronized to the falling edge instead to ensure they're updated before the instructions move to the next stage.

Add control to your datapath

Use your single cycle control component!
Put it into ID
Write the EX/Mem/WB outputs into your ID_EX pipeline stage register. DO NOT Directly wire the control outputs to their destination ports unless they are in the same cycle as control (ID).
Update all your stage registers to have control blocks to "carry" the control signals through the pipeline.

Examine the test bench

tb_Pipe_base_nohaz.v -- This is the test bench you will be running. Open this in vs code to see how it works. Notice there are lots of shortcuts like SET_REG and CHECK_REG that assign register values and check them. These are defined in another file, and to get this test to work with your processor you may need to edit the shortcuts in that other file.
- Notice how the timing for these tests is set up. This test bench runs five cycles to start up the processor (the first instruction has to completely pass through the pipeline), then it expects a new instruction to complete after each subsequent cycle. Aside from start-up time, this is similar to your single cycle tests. This is a good general format for other tests you may write.
pipeline_test_tools.vh -- This is that "other file" were the shortcuts are defined. You can learn how it works by reading the comments.
- Change any references to subresources of UUT so that they correctly reference your wires, registers, and other components. For example, you may need to change UUT.Reg to something else if your register file instance is not called Reg.
- There will be many comments in that file that give you suggestions of what to change. Read the comments.
- Whenever you make a change to this file, you need to tell ModelSim it changed by making a "ghost" edit to your test bench. You can add or remove a blank line in your test bench file to cause it to recompile.
Open up test_asm/test_pipe_type_nohaz.asm to look at the provided code you will use for the tests. Each line of code here tests something valuable. Review the code and ensure you understand it.
Before the .asm files are useful, you will need to assembl the files in this directory to test your processor. Use your assembler from practicals 1 and 2 to do this!

Test your R-types

Create a new modelsim project. Call it Processor.mpf or some similar name.
- Add all the .v files
- compile them
- simulate/start the tb_Pipe_nohaz test bench.
Build a waveform to show all your stages.
- add a divider between each stage to make it clear where the stages are separated.
- Group signals strategically in a way that clearly track instructions moving through each cycle.
HINT: typing do opcodes.do in the console will add some new "Radix" values to the "radix" menu. This lets you display opcodes and functs as human-readable words.
Save your waveform file. (DO THIS!)
Run the test bench and fix any errors.
Test R-types using tb_Pipe_nohaz.v
- Remember, you might need to edit the pipeline_test_tools.vh file to make the tests work

You should aim to both pass the tests and become a master of interpreting ModelSim waveforms to debug. Both are necessary to ensure you have a working processor.

Add I-types (no data memory yet)

(Q) In the worksheet, plan out what will be needed in each of the Buffer_* pipeline stage registers for I-types.
Add any new traced wires or logic to verilog datapath. You'll need to add the ImmGen module.
Add control (or connect it)
- You'll need an ALUSrc mux
Next, prepare some I-type tests.
- Read the tests we've provided (look in the test_asm folder in your repo for the test_pipe_itype_nohaz.asm file).
- Assemble this asm file with your assembler
- then edit the test_I_type_nohaz() task in tb_Pipe_nohaz.v to load and use it. Look at the R-type task for an example.
- At the bottom of the tb_Pipe_nohaz.v test bench, there is an initial block that calls a bunch of sub tasks. Uncomment the line that says test_I_type_nohaz(); and the one after it that clears the pipe.

Hint: Checking what is in all the stages of the pipeline in a test bench

Since there are a lot more moving parts in a pipelined processor, checking what instructions are in each stage of the processor will help you accumulate confidence. The CHECK PIPE STAGES task is a quick way to check the opcode and funct3 in each stage. Use this to your advantage.

Here's a macro in pipeline_test_tools.vh that can help:

    //
    // This task checks that provided instructions are in the right part
    // of the pipeline at the current time .
    // Use don 't cares (x's) for empty stages if applicable .
    // * Opcode constants are defined in opcodes .vh.
    // * Passing in 7'hxx to matche *ANY* opcode ( same as using ANY_OPCODE )
    //
    task CHECK_PIPE_STAGES (
        input [6:0] IF_op , input [2:0] IF_funct 3,
        input [6:0] ID_op , input [2:0] ID_funct 3,
        input [6:0] EX_op , input [2:0] EX_funct 3,
        input [6:0] MEM_op , input [2:0] MEM_funct 3,
        input [6:0] WB_op , input [2:0] WB_funct 3
    );

Run tests
- Recompile everything in modelsim.
- restart the simulation, and run the tests. After the R-type tests complete, the I-type tests should run.
(Q) On the worksheet, answer the questions about how you decided to test I types and what you checked. explain why the provided set of instructions is sufficient to test I-types, or how you changed it to be better.

Add `lw` and `sw`

Follow the same process to implement memory instructions.

Plan carefully what control signals need to be carried through the pipeline stage registers (and to which stage).
Be sure you are wiring the address port of the memory block correctly. Remember you won't use all 32 bits of the address! Refer back to practical 5 and 4 for details.
- This is especially important when understanding the testbenches. An instruction such as sw t0, 40(x0) will be storing content into UUT.Mem.ram[10] and not UUT.Mem.ram[40]. See this in action by looking at the CHECK_MEM convenience macro in pipeline_test_tools.vh: it shifts the address 2 bits to the right to translate from byte address to word number.
Review and run the test_mem_type_nohaz task in tb_Pipe_nohaz.v. Remember to assemble the asm file into something you can load into memory before running the tests.

Add U-type instructions

Follow the same process to implement lui.

NOTE: You need to carry the immediate value all the way through to WB stage. Be sure to update your pipeline stage registers accordingly.
Review and run the test_lui_nohaz task in tb_Pipe_nohaz.v. Remember to assemble the asm file into something you can load into memory before running the tests.

Working Ahead

Work ahead (go start Practical 8)

Submission and Grading

Functional Requirements

At the end of the practical you should have done these things:

Implement Processor.v to support:
- R-types, I types, lw and sw, lui
Update pipeline_test_tools.vh to define macros that reflect your design
Implement the test_I_type_nohaz testbench task
Pass the following testbench tasks:
- test_R_type_nohaz
- test_I_type_nohaz
- test_mem_type_nohaz
- test_lui_nohaz
Completed and submitted the Practical Worksheet.

Git Requirements

Remember, Do not add and commit every single file ModelSim creates. Only add, commit, and push .v, .do, and .mpf files.

In addition to the list below, you should regularly commit and push whenever you fix a bug, work to a stopping point, or make any incremental updates. At minimum, you must have at least 4 commits in your repo for this practical:

Git commit 1: upon completion and tested R types.
Git commit 2: upon completion and tested I types.
Git commit 3: upon completion and tested lw and sw.
Git commit 4: upon completion and tested lui.

Since this is a team-based practical, there should be numerous iterative commits from each team member.

Worksheet Requirement

All the practicals for CSSE232 have these general requirements:

General Requirements for all Practicals

The solution fits the need
Aspects of performance are discussed
The solution is tested for correctness
The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

(Q) Complete the practical worksheet. Specifically answer the performance questions on page 9, the iteration and reflections questions on page 10 and 11, and write your final git commit on the worksheet where required.

Final Checklist

Verify that your code compiles and your tests pass (or at least run).
Verify your verilog code is committed and the commits are pushed to github.
Submit your completed worksheet to gradescope.

Grading Breakdown

Practical 7 Rubric items	Possible Points	Weight
Worksheet	80	47%
Code	90	53%
Total out of		100%