1. Learn bus design
Learning Design Bus is the learning purpose of this article, follow the design from simple to complex as you learn. This article will analyze the existing bus to explain why it is necessary to design a unique FII RISC-V BUS. First, the purpose of designing this bus is to transfer data within the CPU.
1.1. The complex but powerful AXI bus
AXI (Advanced eXtensible Interface) bus is ARM’s AMBA (Advanced Microcontroller Bus Architecture) part of the agreement. It is characterized by many bus interfaces, comprehensive protocols, complex operation, and flexible control.
On the other hand, the advantages of the AXI bus described above are also disadvantages. AXI is too large and complex to use or debug for a CPU designed for embedded IoT. Fully supporting a bus like AXI4 would consume too many resources, draining power.
1.2. Simple and easy to control SPI bus
The SPI bus is a synchronous serial bus invented by Motorola in 1979 and is often used in the communication of SD (secure digital) cards and LCD (liquid crystal display) displays. The hardware implementation of the SPI bus is also very simple, and there are relatively few communication lines, generally only 4, but the reason for not using SPI as the bus is that data transmission is generally parallel inside the CPU, and SPI is a serial bus. The SPI bus, as an internal bus of RISC-V, requires multiple serial-to-parallel and parallel-to-serial conversions.
It can be seen from the above two representative buses that the existing buses are either complicated and difficult to control, or too simple to be suitable for the internal transmission of the CPU. Therefore, after comprehensively considering the balance of performance, power consumption, and resources, we choose to implement a small and refined bus.
2. FII RISC-V bus design
Related reference articles:
RISC-V teaching plan
The following code block shows the FII RISC-V bus design. The specific details are explained in the detailed comments in the code module.
//RISCV bus design: RIB (riscv internal bus) bus definition
- 32bit: rib_addr, // address bus
- 32bit: rib_dout, // data bus
- 32bit: rib_din, // data bus
- 1bit: rib_valid, // control bus
- 2bit: rib_ready, // control bus
- 32bit: rib_we, // control bus
- 1bit: rib_rd, // control bus
- 1bit: rib_op, // control bus
//RISCV master bus design (cpu, etc.): verilog code output [31:0] o_rib_maddr, // master send, 32-bit address output [31:0] o_rib_mdout, // master sends, 32-bit data (write operation) input [31:0] i_rib_mdin, // master receives, 32-bit data (read operation) output o_rib_mvalid, // issued by the master to indicate that there is an operation (read/write) happening input [1:0] i_rib_mready, // master receive, bit[1]=bus error, bit[0] peripheral ready output [3:0] o_rib_mwe, // master send, write operation signal, each bit represents one (single edge) output o_rib_mrd, // master send, read operation signal, (single edge) output o_rib_mop, // sent by the master, indicating each operation (single edge) //RISCV slave bus design (peripherals, etc.): verilog code input [31:0] i_rib_saddr, // slave receive, 32-bit address input [31:0] i_rib_sdin, // slave receive, 32-bit data (write operation) output [31:0] o_rib_sdout, // slave issued, 32-bit data (read operation) input i_rib_svalid, // slave received, used to indicate that there is an operation (read/write) happening output [1:0] o_rib_sready, // slave issued, bit[1]=bus error, bit[0] peripheral ready input [3:0] i_rib_swe, // slave receive, write operation signal, each bit represents one (single edge) input i_rib_srd, // slave receive, read operation signal, (single edge) input i_rib_sop, // slave receives, indicating each operation (single edge)
3. Analysis of read and write operations
Next, the individual read and write operations are analyzed. The simplest single-clock cycle bus read operation sequence diagram is shown in Figure 1. Notice:
- mvalid, mready[0] are generated at the same time
- The read signal, address, read data and mop operation signals are all single cycle
In the figure, gray waveforms indicate invalid areas. The blue waveform represents the output (relative to the CPU) and the green waveform represents the input (relative to the CPU).
Figure 1 Timing diagram of single clock cycle read operation
A numerical example is as follows: For address 32’h9000_00xx, read data (mdin) 32’h1234_5678.
mrd = 1, 32'hxxxx_xx78, maddr = 32'h9000_0000, lb(u) mrd = 1, 32'hxxxx_56xx, maddr = 32'h9000_0001, lb(u) mrd = 1, 32'hxx34_xxxx, maddr = 32'h9000_0002, lb(u) mrd = 1, 32'h12xx_xxxx, maddr = 32'h9000_0003, lb(u) mrd = 1, 32'hxxxx_5678, maddr = 32'h9000_0000, lh(u) mrd = 1, 32'h1234_xxxx, maddr = 32'h9000_0002, lh(u) mrd = 1, 32'h1234_5678, maddr = 32'h9000_0000, lw
The single clock cycle bus write operation timing diagram is shown in Figure 2. Note that it is similar to a single clock cycle bus read operation:
- mvalid, mready[0] are generated at the same time
- Write signal, address, write data, and mop operation signals are all single clock cycle operation
Figure 2 Single clock cycle write operation timing diagram
Similar to the read data example, the write data example is as follows: For address 32’h9000_00xx, write data (mdout) 32’h1234_5678.
mwe = 4'b0001, 32'hxxxx_xx78, maddr = 32'h9000_0000, sb mwe = 4'b0010, 32'hxxxx_56xx, maddr = 32'h9000_0001, sb mwe = 4'b0100, 32'hxx34_xxxx, maddr = 32'h9000_0002, sb mwe = 4'b1000, 32'h12xx_xxxx, maddr = 32'h9000_0003, sb mwe = 4'b0011, 32'hxxxx_5678, maddr = 32'h9000_0000, sh mwe = 4'b1100, 32'h1234_xxxx, maddr = 32'h9000_0002, sh mwe = 4'b1111, 32'h1234_5678, maddr= 32'h9000_0000, sw
Single clock cycle bus continuous read operation, its timing diagram is shown in Figure 3. Same as single clock cycle read operation:
- mvalid, mready[0] are generated at the same time
- Read signal, address, read data, and mop operation signal are all single clock cycle operation signals
Figure 3. Timing diagram of continuous read operation in a single clock cycle
Single clock cycle bus continuous write operation, the timing diagram is shown in Figure 4:
- mvalid, mready[0] are generated at the same time
- Write signal, address, write data, and mop operation signal are all single clock cycle operation signals
Figure 4 Timing diagram of a continuous write operation in a single clock cycle
The timing diagram for a multiple clock cycle bus read operation is shown in Figure 5. Multiple cycles mean that ready cannot respond to valid immediately, i.e. the peripheral is slow.
- If mvaild is high, only pull mready[0] high when ready to read data
- Read signal, address is a multiple clock cycle operation signal, read data, mop operation signals are single clock cycle operation signal
Figure 5 Multiple clock cycle read operation timing diagram
The timing diagram for a multiple clock cycle bus write operation is shown in Figure 6.
- If mvaild is high, only pull mready[0] high when data is ready to be written
- Write signal, address, and write data are multiple clock cycle operation signals, and mop operation signals are single clock cycle
Figure 6 Multiple clock cycle write operation timing diagram
Figure 7 shows the timing diagram of the continuous read operation of the bus in multiple clock cycles .
- If mvaild is high, only pull mready[0] high when ready to read data
- Read signal, address is a multiple clock cycle operation signal, read data, mop operation signals are single clock cycle operation signal
Figure 7 Timing diagram of continuous read operation with multiple clock cycles
Figure 8 shows the timing diagram of the continuous write operation of the bus in multiple clock cycles .
- If mvaild is high, only pull mready[0] high when data is ready to be written
- Write signal, address, and write data are multiple clock cycle operation signals, and mop operation signals are single clock cycle
Figure 8 Timing diagram of continuous write operation with multiple clock cycles
The corresponding code modules are shown below, and the specific details are explained in the detailed comments in the code modules.
localparam DEV_NUM = 10; wire [DEV_NUM - 1:0] s_cs;//Peripheral device chip selection assign s_cs[0] = ( i_rib_maddr[31:12] == DBG_BASEADDR[31:12] ) ? 1'b1 : 1'b0;//jtag debug assign s_cs[1] = ( i_rib_maddr[31:16] == PLIC_BASEADDR[31:16] ) ? 1'b1 : 1'b0;//external interrupt assign s_cs[2] = ( i_rib_maddr[31:16] == CPU_BASEADDR[31:16] ) ? 1'b1 : 1'b0;//CPU assign s_cs[3] = ( i_rib_maddr[31:16] == MEM_BASEADDR[31:16] ) ? 1'b1 : 1'b0;//memory assign s_cs[4] = ( i_rib_maddr[31:16] == TMR_BASEADDR[31:16] ) ? 1'b1 : 1'b0;//timer interrupt assign s_cs[5] = ( i_rib_maddr[31:16] == GPIO_BASEADDR[31:16] ) ? 1'b1 : 1'b0;//GPIO assign s_cs[6] = ( i_rib_maddr[31:16] == UART_BASEADDR[31:16] ) ? 1'b1 : 1'b0;//UART //There are some unused options that can be extended later assign s_cs[7] = 1'b0; assign s_cs[8] = 1'b0; assign s_cs[9] = 1'b0; //=============================================================================== always @ ( * ) if(!rst_n) begin o_rib_saddr = 0; o_rib_sdin = 0; o_rib_svalid = 0; o_rib_swe = 0; o_rib_srd = 0; o_rib_sop = 0; end else begin //The information sent by the host, each peripheral device can receive (broadcast) o_rib_saddr = i_rib_maddr; o_rib_sdin = i_rib_mdout; o_rib_svalid = i_rib_mvalid; o_rib_swe = i_rib_mwe; o_rib_srd = i_rib_mrd; o_rib_sop = i_rib_mop; end //=============================================================================== wire bus_err_ack = (i_rib_maddr == i_PC) ? 1'b1 : 1'b0; always @ ( * ) begin //Peripherals cannot send information at the same time //If the current chip select is pulled high, the corresponding peripheral will return data and ready // form a bus distributor/multiplexer together with host information transfer case (s_cs) 10'b00_0000_0001: // DBG_BASEADDR begin o_rib_mdin = i0_rib_sdout; o_rib_mready = i0_rib_sready; end 10'b00_0000_0010: // PLIC_BASEADDR begin o_rib_mdin = i1_rib_sdout; o_rib_mready = i1_rib_sready; end 10'b00_0000_0100: // CPU_BASEADDR begin o_rib_mdin = i2_rib_sdout; o_rib_mready = i2_rib_sready; end 10'b00_0000_1000: // MEM_BASEADDR begin o_rib_mdin = i3_rib_sdout; o_rib_mready = i3_rib_sready; end 10'b00_0001_0000: // TMR_BASEADDR begin o_rib_mdin = i4_rib_sdout; o_rib_mready = i4_rib_sready; end 10'b00_0010_0000: // GPIO_BASEADDR begin o_rib_mdin = i5_rib_sdout; o_rib_mready = i5_rib_sready; end 10'b00_0100_0000: // UART_BASEADDR begin o_rib_mdin = i6_rib_sdout; o_rib_mready = i6_rib_sready; end 10'b00_1000_0000: begin o_rib_mdin = i7_rib_sdout; o_rib_mready = i7_rib_sready; end 10'b01_0000_0000: begin o_rib_mdin = i8_rib_sdout; o_rib_mready = i8_rib_sready; end 10'b10_0000_0000: begin o_rib_mdin = i9_rib_sdout; o_rib_mready = i9_rib_sready; end default: begin o_rib_mdin = 0; o_rib_mready = {1'b1, bus_err_ack}; end endcase end