# CSSE 232 - Computer Architecture I Rose-Hulman Institute of Technology Computer Science and Software Engineering Department 

Exam 1

Name: $\qquad$ Section: $1 \quad 2 \quad 3 \quad 4 \quad 5$

This exam is closed book. You are allowed to use the reference card from the book and one $8.5 " \times 11 "$ single sided page of hand written notes. You may not use a computer, phone, etc. during the examination.

You may use a calculator on this exam.
Write all answers on these pages. Be sure to show all work and document your code. Do not use instructions that we have not covered (e.g. no mul or div but you can use instructions like slli, srl, etc).

RISC-V code is judged both by its correctness and its efficiency. Unless otherwise stated, you may not use RISC-V pseudoinstructions when writing RISC-V code.

For Pass/Fail problems there will be a redo opportunity for partial credit on a future date. You must submit a good faith effort to qualify for the redo opportunity.

| Question | Points | Score |
| :---: | :---: | :---: |
| Problem 1 | 20 |  |
| Problem 2 | 15 |  |
| Problem 3 | 20 |  |
| Problem 4 | 25 |  |
| Problem 5 | 15 |  |
| Total: | 95 |  |

Problem 1. Consider the following RISC-V assembly code snippet disassembled from the xv 6 operating system, with partial machine language translations to the right. Note that the translation is in decimal unless specified otherwise.

| 0x00 | 0000 | Lo | bge |  | a | a, L8 | [__-_] |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $0 \times 0040$ | 0004 |  | addi | s1, | a1 | , 8 |  |
| 0x004 | 0008 |  | srai | s1, | a5 | 25, 12 |  |
| $0 \times 0040$ | 000c |  | addi | a1, | , a1 | 1, 16 |  |
| 0x004 | 0010 |  | jal |  | , L | Loop | [___] |
| $0 \times 0040$ | 0014 | L8: | add | a1, | , a1 | 1, x0 |  |
| 0x004 | 0018 |  | jalr | x0 | , 0 | (ra) |  |



## Solution:

(a) (12 points) Fill in the 24 missing values in the instructions above. You may write in decimal or in hex, but must clearly indicate if it is in hex. Write any immediates for UJ and SB instructions in base 10 between the square brackets before filling the table. Do not use binary in the table.
(b) (5 points) Consider the following assembled branch instruction. Assuming this instruction is at address $0 x 0040$ 0020, what address will the branch go to when it is taken? Show your work.

```
111111100110001010000000 1 1100011
```

| Solution: beq t0, t1, Loop |
| :--- |
| new $P C=$ old $P C+$ immediate $* 2$ |
| $=0 x 00400020+2 *(-16)=0 x 00400020-32=0 x 00400020-0 x 20=0 x 00400000$ |

(c) (3 points) This block of 7 instructions is replicated multiple times throughout the kernel code. Would the bge and jal instructions have the same machine translation in every replicate as the ones in the table above? Why or why not?

Solution: Yes because everything is PC-relative.

Problem 2. (15 points) Pretend you are an assembler. For each pseudo-instruction in the following table, give a minimal sequence of actual RISC-V instructions to accomplish the same thing. You may need to use x31 for some of the sequences. BIG indicates an immediate value that is 32 bits and SML indicates an immediate value that fits in 12 bits. You may need to refer to specific bits of the immediate by index, e.g. SML[11].

| Pseudo-instruction | Description |  |
| :--- | :--- | :--- |
| lwByIndex t0, t1, t2 | t1 contains a pointer to <br> an array, and t2 an index <br> in that array, t0 will get <br> the data from the array <br> at index t2 $(\mathrm{t0}=\mathrm{t} 1[\mathrm{t} 2])$. | Solution: slli x31, t2, 2 <br> add x31, x31, t1 <br> lw t0, 0(x31) |
| PUSH a0 | Makes space on the stack <br> and then pushes the data <br> in a0 onto the stack. | Solution: addi sp, sp, -4 <br> sw a0, 0(sp) |
| LL12 t0, 0x888 | Loads the lower 12 bits <br> of a register, filling the <br> top 20 bits with 0s. | Solution: lui t0, \{SML, 8b'0\} <br> srli t0, t0, 20 |
|  |  |  |

Problem 3. Your team is designing a RISC-V-like machine with 16-bit instructions, 12 -bit addresses, and 16-bit words. Assume the machine has 64 different opcodes and has 8 registers.
(a) (5 points) Your team creates an instruction format that has an opcode, one register operand, and an immediate. Draw the instruction format for this instruction type. Label each field and show the size (in bits) of each field. Be sure to label any unused bits.

```
Solution: [ imm-7b ][ rs-3b ][ op-6 ]
```

(b) (5 points) If the above format is used for pc-relative branches, what is the range of branch targets? Express your answer as the number of instructions before and after the PC where the branch can go (for example, "from PC-400 to PC+200 instructions")

Solution: with a 7 -bit immediate, we can select from $2^{1} 28$ instructions by shifting the immediate one bit. Signed immediates mean half are negative, and one is zero. This means, we can branch from $P C-2^{6}$ instructions to $P C+2^{6}-1$ instructions; PC-64 to $\mathrm{PC}+63$.

1. Explain your addressing mode, specifically how you use the immediate field to calculate the branch target.

Solution: Something about the immediate being a signed "instruction offset" added to the PC...to make byte offset, shift one bit. PC = imm $\ll 1+\mathrm{pc}$.
(This problem continues on the next page...)
(c) (5 points) Consider the pseudo-instruction la that loads large immediate values (addresses that are 12-bits in size) into a register. How should la be implemented? Remember, you are the designer - you may use instructions with the format above or design new instruction formats.

Solution: Solutions vary. Beware of sign-extension/zero extension. If immediate field is six-bits and sign-extension is used, gotta deal with that. We're loading addresses, so nothing should be negative. Also NOTE: registers are 16 bits, but the immediate/address is 12 bits. One option: set/shift/ori (note, o below is a zero, but displayed as o for explanation).

```
li 0xBFC == li Ob10 1111 11 1100
# -- note: set and ori zero-extend the immediate --
set rs, 0x2F # 6 upper bits (-> Oboooo oooo oo10 1111)
sll rs, 6 # (-> Oboooo 1010 10oo oooo)
ori rs, Ox3C # next 6 (-> Oboooo 1010 1011 1100)
```

(d) (5 points) Justify your design; state both the major advantages and the major disadvantages of your design (more than one of each).

Solution: Advantages: doesn't require new format, only three instruction ops, simple.
Disadvantages: three instructions to load 12 bits. Requires use of "set" and zero-extension.

Problem 4. (25 points) Below is python code for a small program. Based on the code, answer the following questions and then complete the missing portions of the procedure below, adhering to the RISC-V procedure call conventions. Do NOT optimize or change the logic of this code.

```
def calc_value(A):
    x = A + 7
    y = adjust(x)
    z = y + 3
    z = modify(z, 3)
    result = x - z
    return (result)
```

You can assume that both adjust and modify are procedures that exist (you do not need to write them), and these procedures follow the RISC-V calling conventions. Assume all local variables (e.g. x and y ) are only stored in registers and not in main memory.
calc_value:
$\square$
jal ra, adjust ;call to adjust
$\square$
jal ra, modify ;call to modify
$\square$
jalr x0, 0(ra)

Solution: this solution does not use s-registers, but that would be a viable alternative.

```
calc_value:
```

    addi sp, sp, -8
    sw ra, 0(sp)
    addi a0, a0, 7
    sw a0, 4(sp)
    jal ra, adjust
    addi a0, a0, 3
    addi a1, x0, 3
    jal ra, modify
    lw t0, 4(sp)
    sub a0, t0, a0
    lw ra, 0(sp)
    addi sp, sp, 8
    jalr x0, O(ra)
    Problem 5. You are the lead designer for a real-time 32 -bits RISC-V processor. Your customer has demanded that the execution time of the benchmark program should not be longer than $5 \mu$ seconds.
(a) (5 points) Your customer has provided you with a sample set of benchmarks that they wish to run on your processor. After some calculations, your team collects the following data from the benchmark suite. Calculate the average CPI for your processor.

| Instruction Type | CPI | Count |
| :--- | :--- | :--- |
| mem | 10 | 400 |
| branch | 3 | 50 |
| arithmetic | 7 | 200 |
| logical | 3 | 60 |
| jump | 1 | 40 |

$$
\begin{aligned}
& \text { Solution: } 400+50+200+60+40=750 \text { inst } \\
& 4000+150+1400+180+40=5770 \text { cycles } \\
& 5770 / 750=7.70 \mathrm{CPI}
\end{aligned}
$$

(b) (5 points) What is the minimum clock frequency (remember, frequency $=\frac{1}{\text { cycle time }}$ ) that your processor must run at in order to meet the customer's requirements?

$$
\begin{aligned}
& \text { Solution: } \mathrm{ET}=\text { inst } * \mathrm{CPI} * \text { cycletime } \\
& 5 * 10^{-6}=750 * 5770 / 750 * F=5770 * F \\
& F=8.67 * 10^{-4} * 10^{-6}=8.67 * 10^{-10}=0.867 * 10^{-9} \text { sec/cycle } \\
& =1.15 * 10^{9} \text { cycles/sec }=1.15 \mathrm{GHz}
\end{aligned}
$$

question continues on next page...
(c) (5 points) Your team noticed that a large portion of the benchmark program is made up of memory operations, and thus they have suggested adding a few complex instructions that can read and write from memory in one instruction. After some analysis, this leads to reducing the number of memory instructions by $\mathbf{2 5 \%}$. However, memory instructions now take 12 cycles to execute. What is the new average CPI?

Solution: $300+50+200+60+40=650$ inst
$(300 * 12)+150+1400+180+40=5370$ cycles
$5370 / 650=8.26 \mathrm{CPI}$
(d) (5 points) Assuming you keep the processor running at the same minimum frequency calculated earlier, will the new design still meet the clients requirements? Show your work to support your answer.

$$
\begin{aligned}
& \text { Solution: } \mathrm{ET}=\text { inst } * \mathrm{CPI} * \text { cycletime } \\
& E T=650 * 5370 / 650 * 0.867 * 10^{-9}=5.37 * 10^{3} * 0.867 * 10^{-9}=4.66 * 10^{-6} \mathrm{sec}= \\
& 4.66 \mu \sec \\
& \text { Yes ET still less than } 5 \mu \mathrm{sec}
\end{aligned}
$$

## RISC-V Reference

## Base Integer Instructions

| Inst | Name | FMT | Opcode | funct3 | funct7 | Description | Note |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| add | ADD | R | 0110011 | 000 | 0000000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]+\mathrm{R}[\mathrm{rs} 2]$ |  |
| sub | SUB | R | 0110011 | 000 | 0100000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]-\mathrm{R}[\mathrm{rs} 2]$ |  |
| xor | XOR | R | 0110011 | 100 | 0000000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]^{\wedge} \mathrm{R}[\mathrm{rs} 2]$ |  |
| or | OR | R | 0110011 | 110 | 0000000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \mid \mathrm{R}[\mathrm{rs} 2]$ |  |
| and | AND | R | 0110011 | 111 | 0000000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]$ \& $\mathrm{R}[\mathrm{rs} 2]$ |  |
| sll | Shift Left Logical | R | 0110011 | 001 | 0000000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \ll \mathrm{R}[\mathrm{rs} 2]$ |  |
| srl | Shift Right Logical | R | 0110011 | 101 | 0000000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \gg \mathrm{R}[\mathrm{rs} 2]$ |  |
| sra | Shift Right Arith* | R | 0110011 | 101 | 0100000 | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \gg \mathrm{R}[\mathrm{rs} 2]$ | sign-extends |
| slt | Set Less Than | R | 0110011 | 010 | 0000000 | $\mathrm{R}[\mathrm{rd}]=(\mathrm{rs} 1<\mathrm{rs} 2)$ ? $1: 0$ |  |
| addi | ADD Immediate | I | 0010011 | 000 |  | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs1}]+\mathrm{SE}(\mathrm{imm})$ |  |
| xori | XOR Immediate | I | 0010011 | 100 |  | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]^{\wedge} \mathrm{SE}(\mathrm{imm})$ |  |
| ori | OR Immediate | I | 0010011 | 110 |  | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]$ \| SE(imm) |  |
| andi | AND Immediate | I | 0010011 | 111 |  | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1]$ \& SE(imm) |  |
| slli | Shift Left Logical Imm | I | 0010011 | 001 | $\operatorname{imm}[11: 5]=0 \times 00$ | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \ll \mathrm{imm}[4: 0]$ |  |
| srli | Shift Right Logical Imm | I | 0010011 | 101 | $\operatorname{imm}[11: 5]=0 \times 00$ | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \gg \mathrm{imm}[4: 0]$ |  |
| srai | Shift Right Arith Imm | I | 0010011 | 101 | imm[11:5] $=0 \times 20$ | $\mathrm{R}[\mathrm{rd}]=\mathrm{R}[\mathrm{rs} 1] \gg \mathrm{imm}[4: 0]$ | sign-extends |
| lw | Load Word | I | 0000011 | 010 |  | $\mathrm{R}[\mathrm{rd}]=\mathrm{M}[\mathrm{R}[\mathrm{rs} 1]+\mathrm{SE}(\mathrm{imm})]$ |  |
| sw | Store Word | S | 0100011 | 010 |  | $\mathrm{M}[\mathrm{R}[\mathrm{rs} 1]+\mathrm{SE}(\mathrm{imm})]=\mathrm{R}[\mathrm{rs} 2]$ |  |
| beq | Branch = = | SB | 1100011 | 000 |  | $\begin{aligned} & \text { if(rs1 == rs2) } \\ & \text { PC }+=\text { SE(imm) } \ll 1 \end{aligned}$ |  |
| bne | Branch ! = | SB | 1100011 | 001 |  | if(rs1 != rs2) |  |
| blt | Branch < | SB | 1100011 | 100 |  | if(rs1 < rs2) |  |
|  |  |  |  |  |  | PC $+=$ SE(imm) $\ll 1$ |  |
| bge | Branch $>=$ | SB | 1100011 | 101 |  | $\begin{aligned} & \text { if(rs1 }>=\text { rs2) } \\ & \mathrm{PC}+=\text { SE(imm) } \ll 1 \end{aligned}$ |  |
|  |  | UJ | 1101111 |  |  | $\begin{aligned} & \mathrm{R}[\mathrm{rd}]=\mathrm{PC}+4 ; \\ & \mathrm{PC}+=\mathrm{SE}(\mathrm{imm}) \ll 1 \end{aligned}$ |  |
| jalr | Jump And Link Reg | I | 1100111 | 000 |  | $\begin{aligned} & \mathrm{R}[\mathrm{rd}]=\mathrm{PC}+4 ; \\ & \mathrm{PC}=\mathrm{R}[\mathrm{rs} 1]+\mathrm{SE}(\mathrm{imm}) \end{aligned}$ |  |
| lui auipc | Load Upper Imm Add Upper Imm to PC | $\begin{aligned} & \mathrm{U} \\ & \mathrm{U} \end{aligned}$ | $0110111$ |  |  | $\begin{aligned} & \mathrm{R}[\mathrm{rd}]=\mathrm{SE}(\mathrm{imm}) \ll 12 \\ & \mathrm{R}[\mathrm{rd}]=\mathrm{PC}+(\mathrm{SE}(\mathrm{imm}) \ll 12) \end{aligned}$ |  |
| csrrw | CSR read \& write | I | 1110011 | 001 |  | $\begin{aligned} & \mathrm{R}[\mathrm{rd}]=\mathrm{CSRs}[\mathrm{csr}] ; \\ & \mathrm{CSRs}[\mathrm{csr}]=\mathrm{R}[\mathrm{rs} 1] \end{aligned}$ |  |
| csrrs | CSR read \& set | I | 1110011 | 010 |  | $\begin{aligned} & \mathrm{R}[\mathrm{rd}]=\mathrm{CSRs}[\mathrm{csr}] ; \\ & \mathrm{CSRs}[\mathrm{csr}]=\mathrm{CSRs}[\mathrm{csr}] \mid \mathrm{R}[\mathrm{rs} 1] \end{aligned}$ |  |
| csrrc | CSR read \& clear | I | 1110011 | 011 |  | $\begin{aligned} & \mathrm{R}[\mathrm{rd}]=\mathrm{CSRs}[\mathrm{csr}] ; \\ & \mathrm{CSRs}[\mathrm{csr}]= \\ & \mathrm{CSRs}[\mathrm{csr}] \& \sim \mathrm{R}[\mathrm{rs} 1] \end{aligned}$ |  |
| ecall | Environment Call | I | 1110011 | 000 | imm=0x0 | Transfer control to OS |  |
| ebreak | Environment Break | I | 1110011 | 000 | imm=0x1 | Transfer control to debugger |  |

$\mathrm{R}=$ Register file access $\quad \mathrm{CSR}=$ Control and Status Register SE $=$ Sign extend $\quad$ CSRs $=$ Coprocessor registers

Core Instruction Formats

| $\begin{array}{llll}31 & 27 & 26 & 25\end{array}$ | 2420 | 19 | 15 | $14 \quad 12$ | $11 \quad 7$ | 6 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| funct7 | rs2 | rs1 |  | funct3 | rd | opcode | R-type |
| imm[11:0] |  | rs1 |  | funct3 | rd | opcode | I-type |
| imm[11:5] | rs2 | rs1 |  | funct3 | imm[4:0] | opcode | S-type |
| imm[12\|10:5] | rs2 | rs1 |  | funct3 | imm[4:1\|11] | opcode | SB-type |
| imm[31:12] |  |  |  |  | rd | opcode | U-type |
| imm[20\|10:1|11|19:12] |  |  |  |  | rd | opcode | UJ-type |

## Opcodes, Base conversion

| Binary | Hex | Opcode | Binary | Hex | Opcode | Binary | Hex | Opcode | Binary | Hex | Opcode |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0000000 | 00 |  | 0100000 | 20 |  | 1000000 | 40 |  | 1100000 | 60 |  |
| 0000001 | 01 |  | 0100001 | 21 |  | 1000001 | 41 |  | 1100001 | 61 |  |
| 0000010 | 02 |  | 0100010 | 22 |  | 1000010 | 42 |  | 1100010 | 62 |  |
| 0000011 | 03 | $1 w$ | 0100011 | 23 | SW | 1000011 | 43 |  | 1100011 | 63 | SB-type |
| 0000100 | 04 |  | 0100100 | 24 |  | 1000100 | 44 |  | 1100100 | 64 |  |
| 0000101 | 05 |  | 0100101 | 25 |  | 1000101 | 45 |  | 1100101 | 65 |  |
| 0000110 | 06 |  | 0100110 | 26 |  | 1000110 | 46 |  | 1100110 | 66 |  |
| 0000111 | 07 |  | 0100111 | 27 |  | 1000111 | 47 |  | 1100111 | 67 | jalr |
| 0001000 | 08 |  | 0101000 | 28 |  | 1001000 | 48 |  | 1101000 | 68 |  |
| 0001001 | 09 |  | 0101001 | 29 |  | 1001001 | 49 |  | 1101001 | 69 |  |
| 0001010 | 0 A |  | 0101010 | 2A |  | 1001010 | 4A |  | 1101010 | 6 A |  |
| 0001011 | 0B |  | 0101011 | 2B |  | 1001011 | 4B |  | 1101011 | 6B |  |
| 0001100 | 0 C |  | 0101100 | 2 C |  | 1001100 | 4 C |  | 1101100 | 6 C |  |
| 0001101 | 0D |  | 0101101 | 2D |  | 1001101 | 4D |  | 1101101 | 6D |  |
| 0001110 | 0 E |  | 0101110 | 2E |  | 1001110 | 4E |  | 1101110 | 6E |  |
| 0001111 | 0 F |  | 0101111 | 2 F |  | 1001111 | 4F |  | 1101111 | 6F | jal |
| 0010000 | 10 |  | 0110000 | 30 |  | 1010000 | 50 |  | 1110000 | 70 |  |
| 0010001 | 11 |  | 0110001 | 31 |  | 1010001 | 51 |  | 1110001 | 71 |  |
| 0010010 | 12 |  | 0110010 | 32 |  | 1010010 | 52 |  | 1110010 | 72 |  |
| 0010011 | 13 | I-type | 0110011 | 33 | R-type | 1010011 | 53 |  | 1110011 | 73 | exceptions |
| 0010100 | 14 |  | 0110100 | 34 |  | 1010100 | 54 |  | 1110100 | 74 |  |
| 0010101 | 15 |  | 0110101 | 35 |  | 1010101 | 55 |  | 1110101 | 75 |  |
| 0010110 | 16 |  | 0110110 | 36 |  | 1010110 | 56 |  | 1110110 | 76 |  |
| 0010111 | 17 | auipc | 0110111 | 37 | lui | 1010111 | 57 |  | 1110111 | 77 |  |
| 0011000 | 18 |  | 0111000 | 38 |  | 1011000 | 58 |  | 1111000 | 78 |  |
| 0011001 | 19 |  | 0111001 | 39 |  | 1011001 | 59 |  | 1111001 | 79 |  |
| 0011010 | 1A |  | 0111010 | 3A |  | 1011010 | 5A |  | 1111010 | 7A |  |
| 0011011 | 1B |  | 0111011 | 3B |  | 1011011 | 5B |  | 1111011 | 7B |  |
| 0011100 | 1 C |  | 0111100 | 3 C |  | 1011100 | 5 C |  | 1111100 | 7 C |  |
| 0011101 | 1D |  | 0111101 | 3D |  | 1011101 | 5D |  | 1111101 | 7D |  |
| 0011110 | 1E |  | 0111110 | 3E |  | 1011110 | 5E |  | 1111110 | 7E |  |
| 0011111 | 1F |  | 0111111 | 3F |  | 1011111 | 5F |  |  |  |  |

## Registers

| Register | Name | Description | Saver |
| :--- | :--- | :--- | :--- |
| x0 | zero | Zero constant | - |
| x1 | ra | Return address | Caller |
| x2 | sp | Stack pointer | Callee |
| x3 | gp | Global pointer | - |
| x4 | tp | Thread pointer | - |
| x5-x7 | t0-t2 | Temporaries | Caller |
| x8 | s0 / fp | Saved / frame pointer | Callee |
| x9 | s1 | Saved register | Callee |
| x10-x11 | a0-a1 | Fn args/return values | Caller |
| x12-x17 | a2-a7 | Fn args | Caller |
| x18-x27 | s2-s11 | Saved registers | Callee |
| x28-x30 | t3-t5 | Temporaries | Caller |
| x31 | at | Assembler Temporary | Caller |

## Memory Allocation



