Name: Box: Date:

## HW15 solution

1. (10 points) Draw the pipeline diagram for the following code running on a pipelined MIPS processor. Identify all of the data dependencies (draw forwards and stalls).

add $3,$4, $2 IF ID EX ME WB | | v | sub$5, $3,$1         IF   ID   EX | ME   WB
|
v
lw  $6, 200($3)             IF   ID   EX   ME   WB
|
v
add $7,$3, $6 IF ID nop EX ME WB  Which of the above dependencies are data hazards that will be resolved via forwarding? Which dependencies are data hazards that will cause a stall? The sub and lw have a dependancy on $3 from the first add, so that value must be forwarded. The second add has a dependancy on $6 from lw, but since the data isn't available till the MEM stage, a one cycle stall is needed. The data can be forwarded after the stall. 2. (5 points) Modify the following code to make use of the delayed branch slot on a pipelined MIPS processor. loop: lw$2, 100($3) sub$4, $4,$5
beq  $3,$4, loop

There are several possibilities:

loop: sub $4$4 $5 beq$3 $4 loop lw$2 100($3) #this is the delay slot or something like  sub$4 $4$5     #prep loop condition
loop: lw $2 100($3)
beq $3$4 loop
sub $4$4 $5 #this is the delay slot add$4 $4$5     #fix the extra sub
1. (10 points) Consider the following code to be run on a pipelined MIPS processor:

lw   $4, 4($5)
lw   $3, 0($5)
add  $7,$7, $3 addi$5, $5, 4 sw$6, 0($5) add$8, $8,$4
beq  $7,$8, loop
nop #this delay slot is currently empty

a. Reorder the instructions to maximize performance. Performance may already by maximized.

lw   $4, 4($5)
lw   $3, 0($5)
add  $8,$8, $4 add$7, $7,$3
addi $5,$5, 4
beq  $7,$8, loop
sw   $6, 0($5)     #this is the delay slot

b. Reorder the instructions to minimize performance. Performance may already be minimized.

No change or something like:

lw   $4, 4($5)
add  $8,$8, $4 lw$3, 0($5) add$7, $7,$3
addi $5,$5, 4
sw   $6, 0($5)
beq  $7,$8, loop
nop #this delay slot is currently empty
2. (10 points) We wish to add a variant of the lw (load word) instruction, which increments the index register after loading the word from memory. This instruction (l_inc) corresponds to the following two instructions:

lw   $rt, L($rs)
addi $rs,$rs, 4

Describe the changes you would need to make to the datapath. You may need to make major changes to the pipeline! Do your changes effect other instructions?

Insert a new adder to EX to do the +4. Carry result to WB, add an extra write port to RegFile to accomodate. More forwarding hardware will need to be added to support other instructions interacting with the new one.

or

Use the ALU for two cycles. That is, the new instruction gets two EX stages in a row. If the next instruction needs the ALU for EX, stall that instruction. If not, proceed without stall. Similarly, the new instruction would need two writebacks. Again, if the next instruction doesn't need WB, just use the register file for two cycles. Otherwise, stall the next instruction. This would also need new forwarding and hazard logic.