Name: Box: Date:

HW15 solution

  1. (10 points) Draw the pipeline diagram for the following code running on a pipelined MIPS processor. Identify all of the data dependencies (draw forwards and stalls).

    add $3, $4, $2    IF   ID   EX   ME   WB
                                   |    |
                                   v    |
    sub $5, $3, $1         IF   ID   EX | ME   WB
                                        |
                                        v
    lw  $6, 200($3)             IF   ID   EX   ME   WB
                                                   |
                                                   v
    add $7, $3, $6                   IF   ID   nop  EX   ME   WB
    

    Which of the above dependencies are data hazards that will be resolved via forwarding? Which dependencies are data hazards that will cause a stall?

    The sub and lw have a dependancy on $3 from the first add, so that value must be forwarded. The second add has a dependancy on $6 from lw, but since the data isn't available till the MEM stage, a one cycle stall is needed. The data can be forwarded after the stall.

  2. (5 points) Modify the following code to make use of the delayed branch slot on a pipelined MIPS processor.

    loop: lw   $2, 100($3)
    sub  $4, $4, $5
    beq  $3, $4, loop

    There are several possibilities:

    loop: sub $4 $4 $5
          beq $3 $4 loop
          lw $2 100($3)   #this is the delay slot

    or something like

          sub $4 $4 $5     #prep loop condition
    loop: lw $2 100($3)
          beq $3 $4 loop
          sub $4 $4 $5     #this is the delay slot
          add $4 $4 $5     #fix the extra sub
  1. (10 points) Consider the following code to be run on a pipelined MIPS processor:

    lw   $4, 4($5)
    lw   $3, 0($5)
    add  $7, $7, $3
    addi $5, $5, 4
    sw   $6, 0($5)
    add  $8, $8, $4
    beq  $7, $8, loop
    nop #this delay slot is currently empty

    a. Reorder the instructions to maximize performance. Performance may already by maximized.

    lw   $4, 4($5)
    lw   $3, 0($5)
    add  $8, $8, $4
    add  $7, $7, $3
    addi $5, $5, 4
    beq  $7, $8, loop
    sw   $6, 0($5)     #this is the delay slot

    b. Reorder the instructions to minimize performance. Performance may already be minimized.

    No change or something like:

    lw   $4, 4($5)
    add  $8, $8, $4
    lw   $3, 0($5)
    add  $7, $7, $3
    addi $5, $5, 4
    sw   $6, 0($5)
    beq  $7, $8, loop
    nop #this delay slot is currently empty
  2. (10 points) We wish to add a variant of the lw (load word) instruction, which increments the index register after loading the word from memory. This instruction (l_inc) corresponds to the following two instructions:

    lw   $rt, L($rs)
    addi $rs, $rs, 4

    Describe the changes you would need to make to the datapath. You may need to make major changes to the pipeline! Do your changes effect other instructions?

    Insert a new adder to EX to do the +4. Carry result to WB, add an extra write port to RegFile to accomodate. More forwarding hardware will need to be added to support other instructions interacting with the new one.

    or

    Use the ALU for two cycles. That is, the new instruction gets two EX stages in a row. If the next instruction needs the ALU for EX, stall that instruction. If not, proceed without stall. Similarly, the new instruction would need two writebacks. Again, if the next instruction doesn't need WB, just use the register file for two cycles. Otherwise, stall the next instruction. This would also need new forwarding and hazard logic.