1. RISC-V general registers and program counter
Related reference articles:
RISC-V teaching plan
This article will begin to explain the RISC-V instruction set in detail. The CPU contains 32 general-purpose registers, sometimes referred to as the general-purpose register file, as shown in Figure 1. The general registers are named X0-X31. The value of the first register, X0, is hardwired to 0, so the value is always 0. Other registers X1-X31 are readable and writable. 0-31 is also called the index number. The index number can also be understood as the address of the register. When the instruction needs to call the general register, it can be searched by the index number. Later, when I introduce the FPGA program, I will explain how to design and read and write register files. For a 32-bit system, the width of all general-purpose registers is 32 bits, and the total number of registers is also 32.
PC (program counter) is a program counter and a register. In the CPU, the PC register is not together with the above 32 general-purpose registers, and the PC is not included in the register file. The width of the PC is the same as the width of the general-purpose registers. The value of XLEN is generally related to the RISC-V CPU architecture. If it is a 32-bit architecture CPU, then the value of XLEN is 32. In Figure 1, XLEN-1 = 32-1 = 31, that is, the highest bit in a general-purpose register is 31. In a 64-bit CPU, the width of the general-purpose register is 64, the width of the PC is also 64 bits, and the highest bit is 64-1 = 63.
Figure 1 RISC-V general register and PC [1]
2. RISC-V assembly instruction type
RV32I can be divided into six basic instruction formats:
- R – type instructions for register-register operations
- Type I instructions for immediate and fetch load operations
- S -type instruction for fetching store operations
- Type B instructions for conditional jump operations
- U -shaped instruction for long immediate
- J -instruction for unconditional jumps
Figure 2 lists the machine code formats of the six basic instructions.
Figure 2 Machine code format of 6 basic instructions[2]
2.1. R-Type
R-type is an operation without immediate (immediate number, imm, a number that can be obtained immediately without taking a value from a register). The binary format of the R-type assembly instruction machine code is shown in Figure 2. A binary instruction is 32-bit in length. Bit 0-6 (7 bits) is the opcode (operation code) operation code, which is used to identify the type of instruction. Figure 3 is the opcode of some R-type assembly instructions. You can see that all the opcodes of the R-type instruction are the Same as 011_0011. But the funct3 in bits 12-14 and the funct7 in bits 25-31 are different, and use this to distinguish different R-type assembly instructions. That is to say, opcode determines the general classification of instructions, and funct3 and funct7 determine the more detailed classification of instructions. Bit 7-11 of the R-type instruction is the index number of rd (destination register). The rd register is the register used to store the result. rs1 (source register 1) and rs2 (source register 2) are called source registers. In most cases, instructions need to read the values of these two source register registers for subsequent operations. The index number of rs1 is in bits 15-19, and the index number of rs2 is in bits 20-24. The following is an example of how to use the register index number.
Figure 3 Some R-type instruction opcode example[2]
When the value of the position index number of rd in the instruction is 5’b00011, which is 3 in decimal. After the CPU receives this instruction, it will detect that the value of bits 7-11 is 3, it will find the X3 register as the rd register in the 32 general-purpose registers, and finally write the result to the x3 register. If the value of bit 15-19 and bit 20-24 in the index number position of rs1 and rs2 (binary assembly instruction) are 2 and 4 at this time, the CPU will detect that the general registers corresponding to rs1 and rs2 in the instruction are x2 and After x4, the values from registers x2 and x4 are read and performed.
Notice:
The format of RISC-V assembly instructions is very clear. In the actual encoding process, the arrangement of encoding positions is meaningful. For example, the coding positions of the three register index numbers in different instruction formats are never changed, Rd is in bits 7-11, rs1 is in bits 15-19, and rs2 is in bits 20-24. Even if some of the registers may not be used in some instructions, for example, there is no rs2 in the second instruction type I-type, but the index numbers of rs1 and rd are also in the corresponding positions. For another example, in S-type, funct3 is in bits 12-14, which is the same position as in R-type. Opcode is available in all instruction formats, and the position remains unchanged, always bits 0-6.
2.2. I-Type
Based on the machine code format of R-type assembly instructions, the only difference is that bit 20-31 (high 12 bits) of I-type is an immediate value. Other parts are very similar to R-type. Of course, the opcode of I-type must be different from the opcode value of other types of assembly instructions, because the corresponding specific operations are different.
2.3. S-Type
The characteristic of S-type instruction is that there is no rd register. In this type of instruction, the immediate data (imm) is divided into two parts, imm[11:5] is in bit 25-31, imm[4:0] is in bit 7- 11. The 5 bits of imm[4:0] occupy the position of rd, indicating that the instruction format does not need to be written back. imm[11:5] occupies the position of funct7.
2.4. U-Type
A 20-bit immediate value is provided in the U-type instruction (imm[31:12], which shifts the 20-bit immediate value to the left by 12 bits). The final operation result is associated with the 20-bit immediate value, and the result is written back to the rd register. Opcode determines the type of operation. There are no funct3, rs1, rs2 and funct7 in U-type, the instruction structure of this type is very simple.
2.5. B-Type
The B-type instruction is mainly used as a jump instruction, and it is a conditional jump, that is, whether to jump by judging whether the condition is true or not, can be analogous to the if statement in the Verilog language. The machine code structure of B-type can be seen in Figure 2. The rd register and funct7 are not included in the instruction, but rs1, rs2, funct3, and immediate data are included. The immediate value is divided into two areas, imm[12|10:5] and imm[4:1|11]. The encoding of the immediate data of the B-type instruction is out of order, and the reason will not be described in detail here, mainly to increase the common part with other formats. But the encoding is scrambled, so the corresponding decoding order needs to be adjusted when the CPU executes, and the CPU needs to restore the scrambled immediate data in order after decoding.
2.6. J-Type
The instruction format is very similar to U-type, also only Rd register and immediate value and opcode. At the same time, the immediate value of J-type is also disrupted. That means that when decoding, the CPU must first assemble the immediate data in order to restore the original immediate data.
Notice:
The immediate field of the B-type instruction is rotated one bit based on the S-type, and the immediate field of the J-type is rotated 12 bits based on the U-type. Therefore, RISC-V actually has only four basic formats, but it can be conservatively considered that there are six formats [2]. Because the immediate values of B-type and J-type do not have bit 0, their immediate values are both integer multiples of 2.
Machine codes that are all 0s or all 1s are considered illegal RV32I instructions, but they are often used to help debug or catch common errors [2].
3. Article references
[1] Riscv.org , 2021. [Online]. Available: https://riscv.org/wp-content/uploads/2019/12/riscv-spec-20191213.pdf. [Accessed: 22- Feb- 2021] .
[2] D. Patterson and A. Waterman, The RISC-V reader. Berkeley: Strawberry Canyon LLC, 2018.