Menu Close

RISC-V EXU Module and CPU Excution(2)CPU Excution (Pipeline)

When the entire RISC-V kernel module is completed, the ITCM module is required to read the machine code of the code written by the software, and execute the current machine code according to logic such as instruction fetch, decoding, execution, memory access, and write back. These 5 parts It is not sent every time, and sometimes returns early, such as addi, without accessing external memory. In versions of RISC-V V3.00 and later, the pipeline method is used to process cpu instructions. We use a 3-stage pipeline method. 

Related reference articles:

RISC-V teaching plan

Pipeline block diagram:

 

The blue squares in the figure are all D flip-flop latches. Represents the level of the pipeline.

 

 

First stage pipeline:

Since we use dual-port memory as ITCM (instruction memory), after each address is given, we need to wait for one clock cycle before we can get the corresponding instruction data. So the process of fetching takes 2 clock cycles. But this part still belongs to the first-level pipeline category (instruction fetch). If you use reg to accumulate a ram, it will save one clock cycle, but the disadvantage is that it takes up a lot of FPGA system resources.

Second stage pipeline:

Process the entire decoding, and execution stage. Including decoder (decoding), ALU (execution), branch (branch instruction execution), CSR (register related execution)

The third stage pipeline:

Writes back 32 general-purpose registers (regfile). If it is a LOAD/STORE-related instruction, handle peripheral-related operations. This part will be connected to the bus, including gpio, timer, Uart, and peripherals required for RISC-V cpu expansion later.

The cpu instruction fetch module pipeline:

`timescale 1ns / 1ps 

module fii_instr_rib 
# (
    parameter [31:0] DBG_BASEADDR  = 32'h0000_0800,
    parameter [31:0] TMR_BASEADDR  = 32'h0200_0000,
    parameter [31:0] PLIC_BASEADDR = 32'h0c00_0000,
    parameter [31:0] CPU_BASEADDR  = 32'h8000_0000,
    parameter [31:0] MEM_BASEADDR  = 32'h9000_0000,
    parameter [31:0] UART_BASEADDR = 32'he000_0000,
    parameter [31:0] PWM_BASEADDR  = 32'hf000_2000,
    parameter [31:0] GPIO_BASEADDR = 32'hf000_0000
)
(
    input  sys_clk,

    output reg [31:0] o_curr_pc,
    output [31:0]   o_curr_instr,

//===============================================================================
    input           i_pc_chg_flag,
    input           i_ir_hold_flag,
    input  [31:0]   i_exe_next_PC,

    output          o_CPU_load_vld,
//===============================================================================

    input  i_ext_irq,
    input  i_sft_irq,
    input  i_tmr_irq,

    input  o_glb_irq,                 // mstatus[3]; enable all Machine Mode interrupt

    input  [31:0] i_dbg_entry_adr,    // jtag debug entry address
    input  i_dbg_entry_set,           // jtag debug entry address enable ( one clock )

    input  [31:0] i_dpc_r,
    input  [31:0] i_dbg_instr,
    input  dret,

    input  [31:0]  i_mepc,
    input  [31:0]  i_vect_pc,                    // mtvec regitesr content
    output o_irq_src,
    output o_exp_src,
    input  mret,
    input  dbg_start,

//===============================================================================
    input  itcm_wea,
    input  [31:0] itcm_addra,
    input  [31:0] itcm_dina,    

//===============================================================================
    input  [31:0] i_rib_saddr,
    input  [31:0] i_rib_sdin,
    output [31:0] o_rib_sdout,
    input  i_rib_svalid,
    output [1:0] o_rib_sready,
    input  [3:0]  i_rib_swe,
    input  i_rib_srd,
    input  i_rib_sop,
//===============================================================================
    // master bus signal
    input  i_rib_mvalid,
    input  [1:0] i_rib_mready,
//===============================================================================

    input  i_cpu_reset,    
    input  rst_n
);

//===============================================================================
// cpu master bus operation

// long instruction : dec_instr should be set as 'INST_NOP, during long instruction.
reg  rib_delay_op = 0;
always @ ( posedge sys_clk or negedge rst_n)
if(!rst_n) rib_delay_op <= 0;
//else if ( o_rib_mvalid & ~i_rib_mready[0] ) rib_delay_op <= 1;
else if ( i_rib_mvalid ^ i_rib_mready[0] ) rib_delay_op <= 1;
else rib_delay_op <= 0;


wire  bus_hsk_ok = i_rib_mvalid & i_rib_mready[0];

// long period load & store operation; normal operation long_ir_flag is 0
wire  long_ir_flag = ( i_rib_mvalid & (~i_rib_mready[0]) ) | rib_delay_op;

// long period instruction pos edge; normal operation long_ir_cs is 0 
wire  long_ir_cs = ( i_rib_mvalid & (~i_rib_mready[0]) ) & i_rib_sop;

//===============================================================================
wire o_CPU_load_cs = ( i_rib_saddr[ 31: 16 ] == CPU_BASEADDR[ 31: 16 ] ) ? i_rib_sop : 1'b0;
/*
ls_r is used for 0x8000_0000 DATA load instruction, becasue 0x8000_0000 is dual
port ram , it will be delay 2 clock (output data form dual port ram), the ls_r[1]
is signal that o_cpu_load_cs delay 2 clock
*/


reg [1:0] ls_r = 0;
always @( posedge sys_clk or negedge rst_n)
if(!rst_n) ls_r <= 0;
else
    ls_r <= {ls_r[0], o_CPU_load_cs};


assign o_CPU_load_vld = ls_r[1];
assign o_rib_sready = {1'b0, ls_r[1]};
//===============================================================================
wire [31:0] pc_instr;

wire [31:0] i_CPU_load_data = pc_instr;

wire [ 4: 0 ] data_sft = {i_rib_saddr[ 1: 0 ], 3'b000};  //i_D_PC[ 1: 0 ] * 8;  
wire [ 31: 0 ] o_CPU_dout = i_CPU_load_data >> data_sft;

assign o_rib_sdout = o_CPU_dout;

//===============================================================================
reg [ 31: 0 ] instr_fch_PC = 0;
//===============================================================================

/*
next_addr is relevent o_cpu_load_cs, for example : current program pc is 0x8000_0004,
(load instruction ,data at 0x8000_01000), next_addr will be next program pc
0x8000_0004 + 4, after load instruction finish. 
*/

reg [31:0] next_addr = 0;
always @( posedge sys_clk )
if(!rst_n | i_cpu_reset) next_addr <= CPU_BASEADDR;
else if(i_rib_sop) next_addr <= instr_fch_PC;
//===============================================================================
assign o_exp_src = 0;
wire nop_flag; 
// don't generate interrupt when cpu load & flush instruction
assign o_irq_src = (long_ir_flag | nop_flag) ? 1'b0 : (i_ext_irq | i_sft_irq | i_tmr_irq);

wire irq_exp_flag = (o_irq_src & o_glb_irq) | o_exp_src ;  // edge o_irq_src high, then o_glb_irq low
//===============================================================================
reg  [31:0] jump_addr = 0;
always @( * )
begin
    if(i_dbg_entry_set)
        jump_addr = i_dbg_entry_adr;
    else if(dret)
        jump_addr = i_dpc_r;
    else if (long_ir_cs) // long address & (addr == 0x8000_xxxx)
        jump_addr = {i_rib_saddr[31:2], 2'b00};
    else if( irq_exp_flag ) 
        jump_addr = i_vect_pc;    // mtvec regitesr content
    else if( mret )
        jump_addr = i_mepc;       // The PC before enter interrupt
    else if( i_ir_hold_flag )
        jump_addr = o_curr_pc;    // long term instruction ,pc hold at current position
    else jump_addr = i_exe_next_PC;
end

wire w_jump_flag = i_dbg_entry_set | dret | irq_exp_flag | i_pc_chg_flag | mret | long_ir_cs | i_ir_hold_flag;
//===============================================================================
//stage  1
always @( posedge sys_clk )
if (( !rst_n ) | i_cpu_reset ) instr_fch_PC <= CPU_BASEADDR;
else 
begin
    if (w_jump_flag)
        instr_fch_PC <= jump_addr;
    else if( ~bus_hsk_ok & rib_delay_op ) // long period load & store 
        instr_fch_PC <= next_addr;
    else
        instr_fch_PC <= instr_fch_PC + 4;
end


//===============================================================================
//stage  2
always @( posedge sys_clk or negedge rst_n )
if ( !rst_n ) o_curr_pc <= CPU_BASEADDR;
else  o_curr_pc <= i_ir_hold_flag ? o_curr_pc : (bus_hsk_ok & rib_delay_op) ? next_addr : instr_fch_PC;
//else  o_curr_pc <= instr_fch_PC;

//===============================================================================
/*
 next clock should be output a inst_nop , because dual port ram , address jump, 
 the pipeline address (current address + 4) will be ignore, 
*/

reg waiting_r = 0;
always @ (posedge sys_clk or negedge rst_n)
if(!rst_n) waiting_r <= 0;
else  waiting_r <= w_jump_flag;

//===============================================================================
// w_dbg_pc  point to 0x8000_xxxxx address , we need get data from dual port ram ,
// so, w_dbg_pc need 2 clock to get the data(0x8000_xxxx)

wire [31:0] w_dbg_pc = instr_fch_PC;

localparam [ 31: 0 ] DBG_RAM_ADDR    = 32'h0000_0400;

// 0x400 - 0x800


wire dbg_addr_sel = (w_dbg_pc[31:12] ==  DBG_BASEADDR[31:12]) & (w_dbg_pc[11:10] !=  2'b00) ? 1'b1 : 1'b0;
reg dbg_addr_cs = 0;

always @ (posedge sys_clk or negedge rst_n)
if(!rst_n) dbg_addr_cs <= 0;
else    dbg_addr_cs <= dbg_addr_sel;

//===============================================================================
wire [31:0] instr_data_sel = dbg_addr_cs ? i_dbg_instr : pc_instr ;
assign o_curr_instr = (waiting_r) ? `INST_NOP : instr_data_sel;


assign nop_flag = (o_curr_instr == `INST_NOP) ? 1'b1 : 1'b0;
//===============================================================================
wire instr_ena = ( instr_fch_PC[ 31: 16 ] == CPU_BASEADDR[31:16] ) ? 1'b1 : 1'b0;

TDP_RAM_INSTR  program_inst
(
    .clka   ( sys_clk ),
    .ena    ( 1'b1 ),
    .wea    ( itcm_wea ),
    .addra  ( itcm_addra[31:2] ),
    .dina   ( itcm_dina ),
    .douta  ( ),

    .clkb   ( sys_clk ),
    .enb    ( instr_ena ),
    .web    ( 1'b0 ),
    .addrb  ( instr_fch_PC[ 31: 2 ] ), //8K 32bits, 32K byte
    .dinb   ( 32'b0 ),
    .doutb  ( pc_instr )
);

//===============================================================================



endmodule

 

Port Description:

input sys_clk , // system clock

output reg [31:0] o_curr_pc , // output current PC value
output [31:0] o_curr_instr , // output current command value

//========== ===================================================== ===========
input i_pc_chg_flag , // decoding, the execution module informs the pc value that needs to be changed, it is no longer normal + 4
input i_ir_hold_flag , // process the multiplication and division instruction
input [31:0] i_exe_next_PC , // Decoding, the execution module notifies the next pc value 

output o_CPU_load_vld , // The data provided by the Princeton architecture itcm is valid
//======================== ===================================================

inputi_ext_irq , // external interrupt
input i_sft_irq , // software interrupt
input i_tmr_irq , // timer interrupt 

input o_glb_irq , // CSR global interrupt bit, mstatus[3]; enable all Machine Mode interrupt

input [31:0] i_dbg_entry_adr , // jtag debug entry address, jtag debug entry address
input i_dbg_entry_set , // jtag debug entry enable, jtag debug entry address enable ( one clock )

input [31:0] w_dpc_r , // jtag debug pc
input [31:0] w_dbg_instr , / /jtag debug command 
input dret, // jtag debug interrupt return

input [31:0] i_mepc , // when interrupted, store the current pc value
input [31:0] i_vect_pc , // when interrupted, jump into the interrupted pc value, mtvec regitesr content
output o_irq_src , // Notify CSR module to interrupt signal line
output o_exp_src , // Notify CSR module of abnormal signal line
input mret , // Interrupt return signal
input dbg_start , // Enter jtag debug signal

//============ ===================================================== ==========
input itcm_wea , // ITCM update write enable 
input [31:0]itcm_addra , // ITCM update address
input [31:0] itcm_dina , // ITCM update data

//============================== ===============================================

// riscv internal bus slave bus
input [31:0] i_rib_saddr , // riscv internal bus address
input [31:0] i_rib_sdin , // riscv internal bus write data
output [31:0] o_rib_sdout ,// riscv internal bus read Data
input i_rib_svalid , // riscv internal bus command is valid
output [1:0] o_rib_sready , // riscv internal bus, data read by peripherals is valid
input [3:0] i_rib_swe , // riscv internal bus write enable
input i_rib_srd , // riscv internal bus read signal
input i_rib_sop , // riscv internal bus command operation enable
//================================ ============================================
// riscv internal bus master bus, master bus signal
input i_rib_mvalid , // riscv internal bus command valid
input [1:0] i_rib_mready ,// riscv internal bus, externally read data is valid
//======= ===================================================== =============

Long cycle instruction operation:

In long-cycle operations, it is necessary to insert empty instructions to keep the pipeline continuous.

// cpu master bus operation

// long instruction : dec_instr should be set as ‘INST_NOP, during long instruction.
reg rib_delay_op = 0;
always @ ( posedge sys_clk or negedge rst_n)
if(!rst_n) rib_delay_op <= 0;
else if ( i_rib_mvalid ^ i_rib_mready[0] ) rib_delay_op <= 1;
else rib_delay_op <= 0;

wire bus_hsk_ok = i_rib_mvalid & i_rib_mready[0];

// long period load & store operation; normal operation long_ir_flag is 0
wire long_ir_flag = ( i_rib_mvalid & (~i_rib_mready[0]) ) | rib_delay_op;

// long period instruction pos edge; normal operation long_ir_cs is 0
wire long_ir_cs = ( i_rib_mvalid & (~i_rib_mready[0]) ) & i_rib_sop;

Long cycle instructions, including Princeton structured data reads, jtag related instructions.

 

Under the Princeton architecture, read data latency:

reg [1:0] ls_r = 0;
always @( posedge sys_clk or negedge rst_n)
if(!rst_n) ls_r <= 0;
else
ls_r <= {ls_r[0], o_CPU_load_cs};

assign o_CPU_load_vld = ls_r[1];
assign o_rib_sready = {1’b0, ls_r[1]};

Since the dual-port memory is used, it needs to be delayed by one clock cycle and used together with the bus.

 

Related processing of load instructions such as lb, lw, lh:

wire [ 4: 0 ] data_sft = {i_rib_saddr[ 1: 0 ], 3’b000}; //i_D_PC[ 1: 0 ] * 8;
wire [ 31: 0 ] o_CPU_dout = i_CPU_load_data >> data_sft;

assign o_rib_sdout = o_CPU_dout;

 

Under the Princeton architecture, save the current pc operation:

reg [31:0] next_addr = 0;
always @( posedge sys_clk )
if(!rst_n | i_cpu_reset) next_addr <= CPU_BASEADDR;
else if(i_rib_sop) next_addr <= instr_fch_PC;

 

Interrupt source handling:

assign o_exp_src = 0;
wire nop_flag;
// don’t generate interrupt when cpu load & flush instruction
assign o_irq_src = (long_ir_flag | nop_flag) ? 1’b0 : (i_ext_irq | i_sft_irq | i_tmr_irq);

wire irq_exp_flag = (o_irq_src & o_glb_irq) | o_exp_src ; // edge w_irq_src high, then o_glb_irq low

Long-cycle instructions, empty instructions, etc. need to wait, and in other cases, an interrupt is triggered.

 

PC jumps in various situations:

reg [31:0] jump_addr = 0;
always @( * )
begin
    if(i_dbg_entry_set)
        jump_addr = i_dbg_entry_adr;
    else if(dret)
        jump_addr = i_dpc_r;
    else if (long_ir_cs)                // long address & (addr == 0x8000_xxxx)
        jump_addr = {i_rib_saddr[31:2], 2’b00};
    else if( irq_exp_flag )
        jump_addr = i_vect_pc; // mtvec regitesr content
    else if( mret )
        jump_addr = i_mepc;      // The PC before enter interrupt
    else if( i_ir_hold_flag )
        jump_addr = o_curr_pc; // long term instruction ,pc hold at current position
    else jump_addr = i_exe_next_PC;
end

 

PC ready to jump:

wire w_jump_flag = i_dbg_entry_set | dret | irq_exp_flag | i_pc_chg_flag | mret | long_ir_cs | i_ir_hold_flag;

always @( posedge sys_clk )
if (( !rst_n ) | i_cpu_reset ) instr_fch_PC <= CPU_BASEADDR;
else
begin
     if (w_jump_flag)
         instr_fch_PC <= jump_addr;
     else if( ~bus_hsk_ok & rib_delay_op ) // long period load & store
         instr_fch_PC <= next_addr ;
     else
         instr_fch_PC <= instr_fch_PC + 4;
end

The actual pc to jump to:

always @( posedge sys_clk or negedge rst_n )
if ( !rst_n ) o_curr_pc <= CPU_BASEADDR;
else o_curr_pc <= i_ir_hold_flag ? o_curr_pc : (bus_hsk_ok & rib_delay_op) ? next_addr : instr_fch_PC;

 

Empty instruction enable signal:

When a jump occurs (not the normal pc+4), a null instruction needs to be inserted to keep the pipeline continuous, 

reg waiting_r = 0;
always @ (posedge sys_clk or negedge rst_n)
if(!rst_n) waiting_r <= 0;
else waiting_r <= w_jump_flag;

assign o_curr_instr = (waiting_r) ? `INST_NOP : instr_data_sel;

The pipeline operation is a little more complicated than the state machine operation, especially after adding the debug function of JTAG. In this part, you can refer to the above code and simulate the entire RISC-V project to help understand the related cpu operation of the pipeline operation.

Posted in FPGA, RISC-V, RISC-V Textbook, Textbook and Training Project

Related Articles

Leave a Reply

Your email address will not be published.

Leave the field below empty!