RISC-V LSU, SRAM, GPIO modules use the LOAD, STORE-related instruction sets in the RISC-V instruction set to access memory or peripheral peripherals.
Related reference articles:
RISC-V teaching plan
RISC-V’s LOAD and STORE are composed of a set of instruction sets, including
LOAD: LBU, LB, LHU, LH, LW;
STORE: SB, SH, SW
Two sets of instruction sets.
The load, store instruction set allows RISC-V CPU to access memory, uart, PWM, etc., many external storage modules of CPU. This part is also an interface. The current RISC-V version is 2.01. In the subsequent RISC-V FPGA version, this module will be transformed: including adding a system bus, so that all peripherals are connected to the internal bus of the CPU, It is convenient for the development of various peripheral modules. The LSU module will form the data bus address, write signal, read signal, write data, read data, etc.
RISC-V CPU LSU architecture:
LSU related code:
module exu_LSU #( parameter [ 31: 0 ] TMR_BASEADDR = 32'h0200_0000, parameter [ 31: 0 ] PLIC_BASEADDR = 32'h0c00_0000, parameter [ 31: 0 ] CPU_BASEADDR = 32'h8000_0000, parameter [ 31: 0 ] MEM_BASEADDR = 32'h9000_0000, parameter [ 31: 0 ] GPIO_BASEADDR = 32'hf000_0000, parameter [ 31: 0 ] UART_BASEADDR = 32'he000_0000, parameter MEM_D_DEEP = 1024, //memory data depth parameter MEM_D_W = 32, //memory data width parameter MEM_MSK_W = 4, //memory data mask width parameter MEM_ADDR_W = 32 //memory address width ) ( input sys_clk, // system clock //============================================================================== input i_EXE_vld, // execute command enable // load & store address PC input [ 31: 0 ] i_D_PC, // load, the data bus pointer pointed to by store input i_LOAD, // load instruction set input [ 4: 0 ] i_load_instr, // load instruction set includes: {rv32i_lbu,rv32i_lb,rv32i_lhu,rv32i_lh,rv32i_lw}; input i_STORE, // store instruction set input [ 2: 0 ] i_store_instr, // store instruction set includes: {rv32i_sb,rv32i_sh,rv32i_sw}; input [ 4: 0 ] i_rd_idx, // rd general register id decoded by the decoding module input [ 31: 0 ] i_rs2_val, // rs2 general register value //============================================================================== input [ 31: 0 ] i_GPIO_dina, // GPIO module A group cpu write data output [ 31: 0 ] o_GPIO_douta, // GPIO module A group cpu read data output [ 31: 0 ] o_GPIO_ta, // direction control of gpio module A group input [ 31: 0 ] i_GPIO_dinb, // GPIO module B group cpu write data output [ 31: 0 ] o_GPIO_doutb, // GPIO module B group cpu read data output [ 31: 0 ] o_GPIO_tb, // direction control of gpio module B group input [ 31: 0 ] i_GPIO_dinc, // GPIO module C group cpu write data output [ 31: 0 ] o_GPIO_doutc, // GPIO module C group cpu read data output [ 31: 0 ] o_GPIO_tc, // direction control of gpio module C group input [ 31: 0 ] i_GPIO_dind, // GPIO module D group cpu write data output [ 31: 0 ] o_GPIO_doutd, // GPIO module D group cpu read data output [ 31: 0 ] o_GPIO_td, // direction control of gpio module D group //============================================================================== output txd_start, // notify the uart peripheral module to write byte output [ 7: 0 ] txd_data, // send to uart peripheral module to write data input txd_done, // The uart sent data returned by the uart peripheral module is completed //============================================================================== output [ 31: 0 ] o_sft_int_v, // software interrupt control register output [ 31: 0 ] o_timer_l, // timer setting register lower 32 bits output [ 31: 0 ] o_timer_h, // timer sets the upper 32 bits of the register input [ 31: 0 ] i_timer_l, // read the lower 32 bits of the current timer counter input [ 31: 0 ] i_timer_h, // read the upper 32 bits of the current timer counter output [ 31: 0 ] o_tcmp_l, // timer compare register lower 32 bits output [ 31: 0 ] o_tcmp_h, // timer compare register high 32 bits output [ 1: 0 ] o_timer_valid, // timer valid flag output [ 31: 0 ] o_tm_ctrl, // timer control register //============================================================================== output o_CPU_cs, // In the Princeton architecture, the selected data is identified in the ITCM space output [ 31: 0 ] o_CPU_PC, // In the Princeton architecture, the selected data is in the ITCM space address input [ 31: 0 ] i_CPU_load_data,// data read from ITCM space in Princeton architecture output o_ls_need, // load or store instruction identifier output o_ls_rdy, // load or stroe is valid output o_rd_wen, // write back to general register enable output [ 4: 0 ] o_wb_rd_idx, // write back the general register id output reg[ 31: 0 ] o_wb_data, // write back the general register value input i_cpu_reset, // cpu core reset input rst_n ); wire [31: 0] cpu_data_in = i_rs2_val << {i_D_PC[1:0],3'b000}; //============================================================================== // Memory section wire mem_cs = ( i_D_PC[ 31: 16 ] == MEM_BASEADDR[ 31: 16 ] ); reg mem_we; reg [ 3: 0 ] mem_wem; // memory mask wire [ 31: 0 ] mem_dout; // memory dout wire [ 31: 0 ] mem_addr_out; // not at all //wire mem_init_rdy; //============================================================================== // GPIO section wire [ 31: 0 ] rb_GPIO_d; wire GPIO_cs = ( i_D_PC[ 31: 16 ] == GPIO_BASEADDR[ 31: 16 ] ) ? 1'b1 : 1'b0; wire GPIO_we; wire [ 3: 0 ] GPIO_wem; // gpio mask //============================================================================== wire UART_cs = ( i_D_PC[ 31: 16 ] == UART_BASEADDR[ 31: 16 ] ) ? 1'b1 : 1'b0; wire [ 31: 0 ] o_UART_dout; //============================================================================== wire PLIC_cs = ( i_D_PC[ 31: 16 ] == PLIC_BASEADDR[ 31: 16 ] ) ? 1'b1 : 1'b0; wire [ 31: 0 ] o_PLIC_dout; //============================================================================== wire t_sft_cs = ( i_D_PC[ 31: 16 ] == TMR_BASEADDR[ 31: 16 ] ) ? 1'b1 : 1'b0; wire sft_cs = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 0 ) ); wire tm_ctrl_cs = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 1 ) ); wire t_cs0 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 2 ) ); wire t_cs1 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 3 ) ); wire tcmp_cs0 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 4 ) ); wire tcmp_cs1 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 5 ) ); wire fpga_ver_cs = t_sft_cs & ( i_D_PC[ 12 ] & ( i_D_PC[ 5: 2 ] == 0 ) ); wire fpga_test_cs = t_sft_cs & ( i_D_PC[ 12 ] & ( i_D_PC[ 5: 2 ] == 1 ) ); wire [31:0] fpga_ver = 32'h0000_0201; reg [31:0] fpga_test = 32'h0000_0000; always @ (posedge sys_clk) if ( fpga_test_cs ) begin if (mem_wem[0]) fpga_test[ 7: 0] <= cpu_data_in[ 7: 0] ; if (mem_wem[1]) fpga_test[15: 8] <= cpu_data_in[15: 8] ; if (mem_wem[2]) fpga_test[23:16] <= cpu_data_in[23:16] ; if (mem_wem[3]) fpga_test[31:24] <= cpu_data_in[31:24] ; end //============================================================================== //assign o_CPU_cs = ( i_D_PC[ 31: 16 ] == CPU_BASEADDR[ 31: 16 ] ) ? i_LOAD & i_EXE_vld : 1'b0; reg CPU_cs = 0; always @ (*) if(i_EXE_vld) CPU_cs = ( i_D_PC[ 31: 16 ] == CPU_BASEADDR[ 31: 16 ] ) ? i_LOAD : 1'b0; assign o_CPU_cs = CPU_cs; //assign o_CPU_cs = ( i_D_PC[ 31: 16 ] == CPU_BASEADDR[ 31: 16 ] ) ? i_LOAD : 1'b0; assign o_CPU_PC = { i_D_PC[ 31: 2 ], 2'b00 }; // lock current data address reg [ 1: 0 ] data_sft_r = 0; always @ (posedge sys_clk) if(o_CPU_cs & i_EXE_vld) data_sft_r <= i_D_PC[1:0]; wire [ 4: 0 ] data_sft = {data_sft_r[ 1: 0 ], 3'b000}; //wire [ 4: 0 ] data_sft = {i_D_PC[ 1: 0 ], 3'b000}; wire [ 31: 0 ] o_CPU_dout = i_CPU_load_data >> data_sft; //============================================================================== wire [ 31: 0 ] ls_rb_d_t_sft = sft_cs ? o_sft_int_v : ( tm_ctrl_cs ? o_tm_ctrl : ( t_cs0 ? i_timer_l : ( t_cs1 ? i_timer_h : ( fpga_ver_cs ? fpga_ver : ( fpga_test_cs ? (fpga_test>>{i_D_PC[1:0],3'b000}) : ( tcmp_cs0 ? o_tcmp_l : ( tcmp_cs1 ? o_tcmp_h : o_CPU_dout ) ) ) ) ))); wire [ 31: 0 ] ls_rb_d = mem_cs ? mem_dout : ( GPIO_cs ? rb_GPIO_d : ( UART_cs ? o_UART_dout : ls_rb_d_t_sft ) ); //wire [ 31: 0 ] ls_rb_d = mem_cs ? mem_dout : ( GPIO_cs ? rb_GPIO_d : ( UART_cs ? o_UART_dout : o_CPU_dout ) ); always@( * ) begin mem_we <= 1'b0; mem_wem <= 4'b0; o_wb_data <= 32'b0; if ( i_LOAD ) begin //&mem_init_rdy case ( i_load_instr ) // i_load_instr ={rv32i_lbu, rv32i_lb, rv32i_lhu, rv32i_lh, rv32i_lw}; 5'b00001: begin //rv32i_lw o_wb_data <= ls_rb_d; end 5'b00010: begin //rv32i_lh o_wb_data <= { { 16{ ls_rb_d[ 15 ] } }, ls_rb_d[ 15: 0 ] }; end 5'b00100: begin //rv32i_lhu o_wb_data <= { { 16{ 1'b0 } }, ls_rb_d[ 15: 0 ] }; end 5'b01000: begin //rv32i_lb o_wb_data <= { { 24{ ls_rb_d[ 7 ] } }, ls_rb_d[ 7: 0 ] }; end 5'b10000: begin //rv32i_lbu o_wb_data <= { { 24{ 1'b0 } }, ls_rb_d[ 7: 0 ] }; end default: ; endcase end if ( i_STORE ) begin //&mem_init_rdy mem_we <= 1'b1; case ( i_store_instr ) //i_store_instr ={rv32i_sb, rv32i_sh, rv32i_sw}; 3'b001: begin //rv32i_sw mem_wem <= 4'b1111; end 3'b010: begin //rv32i_sh mem_wem <= 4'b0011 << {i_D_PC[1],1'b0}; end 3'b100: begin //rv32i_sb mem_wem <= 4'b0001 << i_D_PC[1:0]; end default: mem_wem <= 4'b0; endcase end end //============================================================================== D_sram #( .MEM_D_DEEP ( MEM_D_DEEP ), .MEM_D_W ( MEM_D_W ), .MEM_MSK_W ( MEM_MSK_W ), .MEM_ADDR_W ( MEM_ADDR_W ) ) D_sram_inst ( .clk ( sys_clk ), .rst_n ( rst_n ), .din ( cpu_data_in ), .addr ( i_D_PC ), .dout ( mem_dout ), .cs ( mem_cs ), .we ( mem_we ), .wem ( mem_wem ), //.mem_init_rdy (mem_init_rdy), .o_D_PC ( mem_addr_out ) //not at all ); //============================================================================== assign GPIO_we = mem_we; assign GPIO_wem = mem_wem; fii_GPIO #( .GPIO_DEEP ( 8 ), // register number .GPIO_W ( MEM_D_W ), .GPIO_MSK_W ( MEM_MSK_W ), .GPIO_ADDR_W ( MEM_ADDR_W ) ) fii_GPIO_inst ( .clk ( sys_clk ), .rst_n ( rst_n ), .i_ls_GPIO_din ( cpu_data_in ), .o_rb_GPIO_dout ( rb_GPIO_d ), .i_addr ( i_D_PC ), .i_cs ( GPIO_cs ), .i_we ( GPIO_we ), .i_wem ( GPIO_wem ), .i_GPIO_dina ( i_GPIO_dina ), .o_GPIO_douta ( o_GPIO_douta ), .o_GPIO_ta ( o_GPIO_ta ), .i_GPIO_dinb ( i_GPIO_dinb ), .o_GPIO_doutb ( o_GPIO_doutb ), .o_GPIO_tb ( o_GPIO_tb ), .i_GPIO_dinc ( i_GPIO_dinc ), .o_GPIO_doutc ( o_GPIO_doutc ), .o_GPIO_tc ( o_GPIO_tc ), .i_GPIO_dind ( i_GPIO_dind ), .o_GPIO_doutd ( o_GPIO_doutd ), .o_GPIO_td ( o_GPIO_td ) ); //=============================================================================== wire UART_we = mem_we; wire UART_wem = mem_wem; fii_UART fii_UART_inst ( .clk ( sys_clk ), .i_PERI_din ( cpu_data_in ), .o_PERI_dout ( o_UART_dout ), .i_addr ( i_D_PC ), .i_cs ( UART_cs ), .i_we ( UART_we ), .i_wem ( UART_wem ), .txd_start ( txd_start ), .txd_data ( txd_data ), .txd_done ( txd_done ), .rst_n ( rst_n ) ); //=============================================================================== wire tmr_sft_we = mem_we; fii_timer_lsu fii_timer_lsu_inst ( .clk ( sys_clk ), .i_sft_timer_din ( cpu_data_in ), .i_tmr_sft_we ( tmr_sft_we ), .i_tm_ctrl_cs ( tm_ctrl_cs ), .o_tm_ctrl ( o_tm_ctrl ), .i_sft_cs ( sft_cs ), .o_sft_int_v ( o_sft_int_v ), .i_tcs0 ( t_cs0 ), .o_timer_l ( o_timer_l ), .i_tcs1 ( t_cs1 ), .o_timer_h ( o_timer_h ), .i_tcmp_cs0 ( tcmp_cs0 ), .o_tcmp_l ( o_tcmp_l ), .i_tcmp_cs1 ( tcmp_cs1 ), .o_tcmp_h ( o_tcmp_h ), .o_timer_valid ( o_timer_valid ), .rst_n ( rst_n & (!i_cpu_reset)) ); //=============================== assign o_rd_wen = i_LOAD; //o_wb_need assign o_ls_need = i_LOAD | i_STORE; assign o_ls_rdy = 1; assign o_wb_rd_idx = i_rd_idx; //=============================================================================== endmodule
Port description:
input i_EXE_vld , // execute the instruction enable
// load & store address PC
input [ 31: 0 ] i_D_PC , // load, the data bus pointer pointed to by store
input i_LOAD , // load instruction set
input [ 4: 0 ] i_load_instr , // load instruction set includes: {rv32i_lbu, rv32i_lb, rv32i_lhu, rv32i_lh, rv32i_lw};
input i_STORE , // store instruction set
input [ 2: 0 ] i_store_instr , // store instruction set includes: {rv32i_sb, rv32i_sh, rv32i_sw};
input [ 4: 0 ] i_rd_idx , // rd decoded by the decoding module is general register id
input [ 31: 0 ] i_rs2_val , // rs2 general register value
The above signals are related to the cpu core, and all come from the decoding module and the execution module.
gpio group port:
input [ 31: 0 ] i_GPIO_dina , // GPIO module a group cpu write data
output [ 31: 0 ] o_GPIO_douta , // gpio module a group cpu read data
output [ 31: 0 ] o_GPIO_ta , // gpio module a group a direction control
input [ 31: 0 ] i_GPIO_dinb , // gpio module b group cpu write data
output [ 31: 0 ] o_GPIO_doutb , // gpio module b group b cpu read data
output [ 31: 0 ] o_GPIO_tb , // gpio module b group direction control
input [ 31: 0 ] i_GPIO_dinc , // gpio module c group cpu write data
output [ 31: 0 ] o_GPIO_doutc , // gpio module c group c cpu read data
output [ 31: 0 ] o_GPIO_tc , // gpio module c group direction control
input [ 31: 0 ] i_GPIO_dind , // gpio module d group cpu write data
output [ 31: 0 ] o_GPIO_doutd , // gpio module d group cpu read data
output [ 31: 0 ] o_GPIO_td , // gpio module d group direction control
In the current LSU module interface, we designed the peripherals of gpio, and defined 4 groups of gpio a, b, c, d. Each of these groups includes 32 gpio pins. All gpio pins are inout, and these signals are defined in the top-level module of the whole project to match the gpio group port.
Example of a top-level module for an entire RISC-V project:
fii_iobuf #( .IO_WIDTH( 32 ) ) fii_iobuf_insta ( .i_dio_t ( gpio_ta ), // corresponding to the o_GPIO_ta of the current module .i_dio ( gpio_oa ), // corresponding to the o_GPIO_douta of the current module .o_dio ( gpio_ia ), // corresponding to the o_GPIO_dina of the current module .io_dio_p ( gpio_a ) // corresponds to the real PIN of fpga );
uart module port:
output txd_start , // notify the uart peripheral module to write byte enable output [ 7: 0 ] txd_data , // send to uart peripheral module to write data input txd_done , // The uart sent data returned by the uart peripheral module is completed
In the current LSU module interface, we define the uart module. One end of these interfaces is connected to cpu load, store signal interface, and the other end txd_start, txd_data, txd_done will be connected to the real uart communication module.
timer interface:
output [ 31: 0 ] o_sft_int_v , // software interrupt control register output [ 31: 0 ] o_timer_l , // timer setting register lower 32 bits output [ 31: 0 ] o_timer_h , // timer sets the upper 32 bits of the register input [ 31: 0 ] i_timer_l , // read the lower 32 bits of the current timer counter input [ 31: 0 ] i_timer_h , // read the upper 32 bits of the current timer counter output [ 31: 0 ] o_tcmp_l , // timer compare register lower 32 bits output [ 31: 0 ] o_tcmp_h , // timer compare register high 32 bits output [ 1: 0 ] o_timer_valid , // timer valid flag output [ 31: 0 ] o_tm_ctrl , // timer control register
In the current LSU module interface, we define the timer module.
Interface of ITCM module:
output o_CPU_cs , // In the Princeton architecture, the selected data is identified in the ITCM space output [ 31: 0 ] o_CPU_PC , // In the Princeton architecture, the selected data is in the ITCM space address input [ 31: 0 ] i_CPU_load_data , // data read from ITCM space in Princeton architecture
These signals are used to read related data. When we use assembly instructions, sometimes the load data is just in the ITCM area. At this time, the instruction pc needs to jump to the itcm data end to read the data (do not think that reading machine code when outputting data), after reading it, restore the pc to the pc value of the code being executed during the session.
External interface signal:
output o_ls_need , // Load or store instruction identifier to indicate to other modules. output o_ls_rdy , // load or stroe is valid, this signal is always one.
General register write back port:
output o_rd_wen , // write back to general register enable output [ 4: 0 ] o_wb_rd_idx , // write back the general register id output reg[ 31: 0 ] o_wb_data , // write back the general register value
Code Analysis:
wire [31: 0] cpu_data_in = i_rs2_val << {i_D_PC[1:0],3’b000};
Instructions related to store include SB, SH, SW, among which:
The address of SB (i_D_PC) can be any address from 0 to 4G. Note: This address is the data address in Harvard architecture, (not like the instruction address: RV32 must be 4bytes aligned, RV32C can have 2bytes alignment), it can be any address, for example: 0x9000_00000, 0x9000_00001, 0x9000_00002, 0x9000_00003 . Not aligned. The sb instruction can access any address. So we will analyze i_D_PC[1:0], which has 4 options:
2’b00: Indicates that the current address is exactly 4bytes aligned (the address can be divisible by 4), the corresponding data is i_rs2_val[7:0], shifted left by 0 bits
2’b01: Indicates that the current address is divided by 4 and the remainder is 1, set the data as i_rs2_val[7:0], and shift left by 8 bits
2’b10: Indicates that the current address is divided by 4 and the remainder is 2, set the data to i_rs2_val[7:0], and shift left by 16 bits
2b’11: Indicates that the current address is divided by 4 and the remainder is 3, the data is i_rs2_val[7:0], and the left shift is 24 bits
The address of SH is similar to the above, except that there are only 2 cases of 2’00 and 2’b10.
The address of SW is similar to the above case, except that it is only 2’00, which is only one case.
memory segment:
wire mem_cs = ( i_D_PC[ 31: 16 ] == MEM_BASEADDR[ 31: 16 ] ); // Chip select signal, which corresponds to the DTCM address space.
reg mem_we ; // any one of the store instruction group is valid (sb,sh,sw), the current signal is valid
reg [ 3: 0 ] mem_wem ; // write mask, 4bit, any bit is 1, the corresponding data is Write to DTCM. Example: mem_wem = 4’b0100, then cpu_data_in[23:16] is written to DTCM
wire [ 31: 0 ] mem_dout ; // read data from dtcm
wire [ 31: 0 ] mem_addr_out ; // currently not used.
gpio segment:
wire [ 31: 0 ] rb_GPIO_d ; // data read from gpio register
wire GPIO_cs = ( i_D_PC[ 31: 16 ] == GPIO_BASEADDR[ 31: 16 ] ) ? 1’b1 : 1’b0; // gpio register Chip Select
wire GPIO_we ; // gpio register write signal is used the same as mem_we.
wire [ 3: 0 ] GPIO_wem ; // gpio register write mask, same as mem_wem
uart segment:
wire UART_cs = ( i_D_PC[ 31: 16 ] == UART_BASEADDR[ 31: 16 ] ) ? 1’b1 : 1’b0; // uart register group chip select
wire [ 31: 0 ] o_UART_dout ; // read from uart register The data
wire PLIC_cs = ( i_D_PC[ 31: 16 ] == PLIC_BASEADDR[ 31: 16 ] ) ? 1’b1 : 1’b0; // currently not used
wire [ 31: 0 ] o_PLIC_dout ; // currently not used
Timer register chip select:
wire t_sft_cs = ( i_D_PC[ 31: 16 ] == TMR_BASEADDR[ 31: 16 ] ) ? 1'b1 : 1'b0; // timer register segment chip selection wire sft_cs = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 0 ) ); // software interrupt selection wire tm_ctrl_cs = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 1 ) ); // timer control register selection wire t_cs0 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 2 ) ); // Timer register lower 32 bits selection wire t_cs1 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 3 ) ); // select the upper 32 bits of the timer register wire tcmp_cs0 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 4 ) ); // Timer compare register lower 32 bit selection wire tcmp_cs1 = t_sft_cs & ( ( ~i_D_PC[ 12 ] ) & ( i_D_PC[ 5: 2 ] == 5 ) ); // select the upper 32 bits of the timer compare register
Version and Test Register Chip Selects:
wire fpga_ver_cs = t_sft_cs & ( i_D_PC[ 12 ] & ( i_D_PC[ 5: 2 ] == 0 ) ); wire fpga_test_cs = t_sft_cs & ( i_D_PC[ 12 ] & ( i_D_PC[ 5: 2 ] == 1 ) );
Test register write:
always @ (posedge sys_clk) if ( fpga_test_cs ) begin if (mem_wem[0]) fpga_test[ 7: 0] <= cpu_data_in[ 7: 0] ; if (mem_wem[1]) fpga_test[15: 8] <= cpu_data_in[15: 8] ; if (mem_wem[2]) fpga_test[23:16] <= cpu_data_in[23:16] ; if (mem_wem[3]) fpga_test[31:24] <= cpu_data_in[31:24] ; end
load data in ITCM:
Will be used in the Princeton architecture to inform itcm that a data (not an instruction) needs to be read from the itcm area.
reg CPU_cs = 0; always @ (*) if(i_EXE_vld) CPU_cs = ( i_D_PC[ 31: 16 ] == CPU_BASEADDR[ 31: 16 ] ) ? i_LOAD : 1'b0; assign o_CPU_cs = CPU_cs; assign o_CPU_PC = { i_D_PC[ 31: 2 ], 2'b00 }; reg[1:0] data_sft_r = 0; always @ (posedge sys_clk) if(o_CPU_cs & i_EXE_vld) data_sft_r <= i_D_PC[1:0]; wire [ 4: 0 ] data_sft = {data_sft_r[ 1: 0 ], 3'b000}; wire [ 31: 0 ] o_CPU_dout = i_CPU_load_data >> data_sft; // The data read from itcm is paired with LB, LH, LW, etc.
The data selection read by the load instruction set of the entire module, according to different chip select signals, send the corresponding data to the cpu from different modules
wire[ 31: 0 ] ls_rb_d_t_sft = sft_cs ? o_sft_int_v : ( tm_ctrl_cs ? o_tm_ctrl : ( t_cs0 ? i_timer_l : ( t_cs1 ? i_timer_h : ( fpga_ver_cs ? fpga_ver : ( fpga_test_cs ? (fpga_test>>{i_D_PC[1:0],3'b000}) : ( tcmp_cs0 ? o_tcmp_l : ( tcmp_cs1 ? o_tcmp_h : o_CPU_dout ) ) ) ) ))); wire[31:0] ls_rb_d = mem_cs ? mem_dout : ( GPIO_cs ? rb_GPIO_d : ( UART_cs ? o_UART_dout : ls_rb_d_t_sft ) );
LOAD, STORE instruction set group read and write operations:
always@(*) begin mem_we <= 1'b0; mem_wem <= 4'b0; o_wb_data <= 32'b0; if ( i_LOAD ) begin //&mem_init_rdy case ( i_load_instr ) // i_load_instr ={rv32i_lbu, rv32i_lb, rv32i_lhu, rv32i_lh, rv32i_lw}; 5'b00001: begin //rv32i_lw o_wb_data <= ls_rb_d; end 5'b00010: begin //rv32i_lh o_wb_data <= { { 16{ ls_rb_d[ 15 ] } }, ls_rb_d[ 15: 0 ] }; end 5'b00100: begin //rv32i_lhu o_wb_data <= { { 16{ 1'b0 } }, ls_rb_d[ 15: 0 ] }; end 5'b01000: begin //rv32i_lb o_wb_data <= { { 24{ ls_rb_d[ 7 ] } }, ls_rb_d[ 7: 0 ] }; end 5'b10000: begin //rv32i_lbu o_wb_data <= { { 24{ 1'b0 } }, ls_rb_d[ 7: 0 ] }; end default: ; endcase end if ( i_STORE ) begin //&mem_init_rdy mem_we <= 1'b1; case ( i_store_instr ) //i_store_instr ={rv32i_sb, rv32i_sh, rv32i_sw}; 3'b001: begin //rv32i_sw mem_wem <= 4'b1111; end 3'b010: begin //rv32i_sh mem_wem <= 4'b0011 << {i_D_PC[1],1'b0}; end 3'b100: begin //rv32i_sb mem_wem <= 4'b0001 << i_D_PC[1:0]; end default: mem_wem <= 4'b0; endcase end end
According to the LOAD, STORE instruction set, the related data is written or read.