Menu Close

FII RISC-V3.01 FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA Board Coremark Migration Guide

1.Introduction to Coremark

 

Coremark has been EEMBC’s CPU judging standard since 2009. EEMBC (Embedded Microprocessor Benchmark Consortium, the Embedded Microprocessor Benchmark Consortium) is a non-profit organization whose members include technology companies such as Huawei, Intel, ARM, and Analog Devices. EEMBC is an important criterion for evaluating embedded processors and compilers.

Coremark mainly detects ALU (Arithmetic Logic Unit, that is, arithmetic logic unit), memory reference, pipeline and branch operations. It is designed to make it impossible for the CPU to run benchmarks ahead of time, thus guaranteeing its fairness and impartiality. In the specified test time, Coremark does not allow third-party library calls, and the results are completely based on compiler optimization and CPU execution processing time. Because Coremark mainly provides the test of CPU architecture, in order to abandon the high and low level of hardware manufacturing process, the final test results of Coremark will be normalized, that is to say, the final test results will be equally divided into the clock of each Hz of the system, the unit is is Coremark/MHz. The main code of Coremark is written in C language, including list operation, state machine processing, matrix operation, CRC (Cyclic Redundancy Check, cyclic redundancy check) calculation.

 

2. Coremark porting

 

The first step is to download the C language source code folder from the EEMBC official website, or directly search the EEMBC github . There are 8 files in the source folder that need to be copied to the project workspace (here using FreedomStudio as the platform). As follows:

  • core_list_join.c
  • core_main.c
  • core_matrix.c
  • core_state.c
  • core_util.c
  • coremark.h
  • core_portme.c
  • core_portme.h

Three of the files in the column need to be changed. The three files are “core_portme.h”, “core_portme.c” and “coremark.h” (use core_portme.h and core_portme.c under the “simple” folder). Other files can be added directly to the project without modification.

 

2.1 Porting core_portme.h

There are a total of 14 macros in the entire project. They are shown in Table 1 as follows:

 

macro

describe
HAS_FLOAT

Defined as 1 if the porting platform supports floating point.

HAS_TIME_H

Defines whether the porting platform has the time.h header file and the implementation of its functions.

USE_CLOCK

Defines whether the porting platform has the time.h header file and the implementation of its functions.

HAS_STDIO

If the porting platform has stdio.h, define it as 1.

HAS_PRINTF

Define it as 1 if the porting platform has stdio.h and implements the printf function.

COMPILER_VERSION

Put your compiler version here (eg GCC 7.2.0).

COMPILER_FLAGS

Put compiler flags here (eg O3).

MEM_LOCATION

Put the storage location of your code execution here (e.g. STACK).

CORTIMETYPE

Defines the return value type of the timing function.

SEED_METHOD

Defines methods to obtain seed values ​​that cannot be computed at compile time.

MEM_METHOD

Defines a method for acquiring memory.

MULTITHREAD

Define parallel computing.

MAIN_HAS_NOARGC

This flag is required if the porting platform does not support passing arguments to main (this flag only matters if the value of MULTITHREAD is defined as greater than 1).

MAIN_HAS_NORETURN

This flag is required if the porting platform does not support returning values ​​from main.

Table 1 Macro definitions and their descriptions

 

The above macros should be modified and configured according to the transplant system and platform. For example, if floating point arithmetic is not supported, HAS_FLOAT should be set to 0. HAS_TIME_H and USE_CLOCK define whether the timer has an implementation of the time.h header file and its functions. The time.h header file is used to call the timer in core_portme.c and then calculate the time spent in the iterative loop in the main function. If the platform uses another function to define the time, it should be overridden. HAS_PRINTF defines whether the porting platform uses the standard I/O library for output printing, and can also be redefined as needed. MEM_LOCATION defines the memory location where the code is executed.

coremark has_float

Figure 1 Configuring HAS_FLOAT

 

In addition, the execution mode can also be configured. As shown in Figure 2, PROFILE_RUN, PERFORMANCE_RUN and VALIDATION_RUN can be selected. TOTAL_DATA_SIZE can be modified in coremark.h.

coremark configuration execution mode

Figure 2 Configuration execution mode

 

Some parameters can also be defined here, such as ITERATIONS, as shown in Figure 3.

coremark parameter configuration

Figure 3 Parameter configuration

 

The last thing to note is that the corresponding libraries should be added and modified according to the configuration.

 

2.2 Porting  core_portme.c

“EE_TICKS_PER_SEC” is the total ticks per second, relative to the system clock. If the timer needs to be modified, it should be modified accordingly. The most important functions of core_portme.c are related to timers, “start_time”, “stop_time” and “get_time”. As mentioned above, the time-dependent functions can be modified as needed. See Figure 4 for details of the time-related functions. The first function called in main in core_main.c is “portable_init”. Initialization and debug printing information can be implemented here, as shown in Figure 5. Note that the library should also be added accordingly.

coremark time-related functions

Figure 4 Time-dependent functions

 

coremark initial function

Figure 5 Initial function

 

2.3 transplant coremark.h

Memory definitions such as “malloc” and “free” can be modified here, as shown in Figure 6.

coremark memory-related functions

Figure 6 Functions related to memory

 

2.4 Others

The entire program must run for at least 10 seconds. The iteration time can be modified accordingly. In theory, the more iterations you run, the more accurate the results will be. In addition, README.md is also included in the directory. More information can be found in the accompanying documents. For example, as shown in Figure 7, the README.MD lists some rules to ensure that Coremark results are valid.

 

coremark README

Figure 7 README.md

 

 

3. Evaluation results

 

The system clock of FII RISC-V3.01 on the FII-PRX100-S (ARTIX-7, XC7A100T) XILINX FPGA board is 50MHz, and the Coremark test score shown in Figure 8 is 3.38 (169/50 Coremark/MHz).

 

Be RISC-V Coremark

图 8 FII RISC-V3.01 Coremark

 

Figure 9 lists other CPU Coremarks that some of the EEMBC websites have certified.

 

EEMBC Partial CPU Coremark Score

Figure 9 EEMBC part of the CPU Coremark score

 

CPU Coremark comparison chart

Figure 10 CPU Coremark comparison chart

 

FII RISC-V3.01 is a single-core, 2-stage and 3-stage pipeline mixed CPU. Figure 10 plots the Coremarks of some single-core CPUs certified by EEMBC together with FII RISC-V3.01 as a line graph. Coremark for FII RISC-V3.01 has been highlighted in red. As can be seen, among the 15 CPUs listed, the FII RISC-V3.01 Coremark is above average. For the three CPUs with significantly higher Coremarks, they are STMicroelectronics’ STM32H72x/73x rev Z, STM32H7B3 rev Z, and Renesas Electronics’ RX66T (all marked blue). According to the official manual from STMicroelectronics, the STM32H72x/73x rev Z and STM32H7B3 rev Z both use Cortex-M7 and both have a 6-stage superscalar pipeline. The Renesas RX66T uses the RXv3 core, which has an improved 5-stage pipeline and optimized branch prediction.

Since the superscalar pipeline has parallel instructions, multiple instructions can be dispatched to execute simultaneously in the same clock cycle, and the performance of the CPU will undoubtedly be better. At the same time, FII RISCV-V3.01 has no branch prediction and out-of-order execution functions in its implementation, which may also be the reason why the Coremark score is lower than that of RX66T. The Texas Stellaris Cortex-M3 also implements branch prediction and has the same 3-stage pipeline processor as the FII RISC-V3.01. The comparison result is that the Coremark of FII RISC-V is much larger than the Coremark of Texas Stellaris Cortex-M3 (marked in blue), even more than double. Compared with another Cortex-M0-based processor Microchip ATSAML21J18B (marked in blue), which is also a 3-stage pipeline and has no branch prediction, the Coremark score of FII RISC-V3.01 is still much higher. This may be because the Cortex series chips are based on the ARM architecture, while the FII RISCV-V3.01 is based on the RISC-V architecture. From the perspective of the instruction set, RISC-V is more streamlined than ARM, so the CPU performance of the corresponding design may be higher. it is good. Without the function of branch prediction, the Coremark score of FII RISCV-V3.01 is higher than that of Texas Stellaris Cortex-M3 with branch prediction function, which also shows that the internal logic of FII RISCV-V3.01 is better than that of Texas Stellaris Cortex-M3. All in all, the excellent internal implementation logic of FII RISCV-V3.01 makes its Coremark score higher than the average level of other CPUs under the same functional implementation (three-stage pipeline, no branch prediction, no superscalar pipeline design).

Posted in FPGA, RISC-V, RISC-V Textbook, Textbook and Training Project

Related Articles

Leave a Reply

Your email address will not be published.

Leave the field below empty!