Performance Evaluation of FII RISC-V3.01 on FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA Board

William 2022-06-02 FPGA 0 Comments

FII mainly uses Coremark and Dhrystone as the benchmark test to evaluate the performance of RISC-V3.01 CPU (Central Processing Units, central processing unit) on FII-PRX100-S (ARTIX-7, XC7A100T) XILINX FPGA board.

Coremark has been EEMBC’s CPU judging standard since 2009. EEMBC (Embedded Microprocessor Benchmark Consortium, the Embedded Microprocessor Benchmark Consortium) is a non-profit organization whose members include technology companies such as Huawei, Intel, ARM, and Analog Devices. EEMBC is an important criterion for evaluating embedded processors and compilers.

Coremark mainly detects ALU (Arithmetic Logic Unit, that is, arithmetic logic unit), memory reference, pipeline and branch operations. It is designed to make it impossible for the CPU to run benchmarks ahead of time, thus guaranteeing its fairness and impartiality. In the specified test time, Coremark does not allow third-party library calls, and the results are completely based on compiler optimization and CPU execution processing time. Because Coremark mainly provides the test of CPU architecture, in order to abandon the high and low level of hardware manufacturing process, the final test results of Coremark will be normalized, that is to say, the final test results will be equally divided into the clock of each Hz of the system, the unit is is Coremark/MHz. The main code of Coremark is written in C language, including list operation, state machine processing, matrix operation, CRC (Cyclic Redundancy Check, cyclic redundancy check) calculation.

The FII RISC-V3.01 system clock is 50MHz, and the Coremark test score shown in Figure 1 is 3.38 (169/50 Coremark/MHz).

Figure 1 FII RISC-V Coremark

Figure 2 captures other CPU Coremarks provided by some EEMBC websites.

Figure 2 EEMBC part of the CPU Coremark score

Figure 3 CPU Coremark comparison chart

FII RISC-V3.01 is a single-core, 2-stage and 3-stage pipeline mixed CPU. Figure 3 plots the Coremarks of some single-core CPUs certified by EEMBC together with FII RISC-V3.01 as a line graph. Coremark for FII RISC-V3.01 has been highlighted in red. As can be seen, among the 15 CPUs listed, the FII RISC-V3.01 Coremark is above average. For the three CPUs with significantly higher Coremarks, they are STMicroelectronics’ STM32H72x/73x rev Z, STM32H7B3 rev Z, and Renesas Electronics’ RX66T (both marked blue). According to the official manual from STMicroelectronics, the STM32H72x/73x rev Z and STM32H7B3 rev Z both use Cortex-M7 and both have a 6-stage superscalar pipeline. The Renesas RX66T uses the RXv3 core, which has an improved 5-stage pipeline. With more pipeline stages, the performance of the CPU will undoubtedly be better. So it makes more sense to compare FII RISCV-V3.01 with Texas Stellaris Cortex-M3, which is also the same 3-stage pipeline processor as FII RISC-V3.01. Nevertheless, the Coremark of FII RISC-V is much larger than that of Texas Stellaris Cortex-M3 (marked in blue), even more than double. Compared with another processor Microchip ATSAML21J18B (marked in blue), which is also a 3-stage pipeline, the Coremark score of FII RISC-V3.01 is still much higher. All in all, with the same number of pipelines and cores, FII-RISCV3.01 performs well and stands out.

Dhrystone was also a generally accepted benchmark for evaluating CPU performance before Coremark came along. It was invented by Reinhold Weicker in 1984. Together with Livermore, Whetstone and Linpack, it is known as the “Classic Benchmark” and was popular in the 1970s and 1980s. Each of these four benchmarks has a biased focus. Dhrystone is used for integer benchmarks, Livermore is used for numerical benchmarks, and Whetstone and Linpack are used for floating point benchmarks. Dhrystone mainly has two versions, Dhrystone1.1, 2.1. Dhrystone2.1 has been improved on the basis of Dhrystone1.1, so that part of the code will not be run due to optimization, thus affecting the accuracy of evaluation. The main code of Dhrystone is written in C language. The performance test is based on the number of millions of operations per second (MIPS, Millions of Instructions Per Second) of the CPU. The final result needs to be normalized and divided by 1757. This is because the Dhrystone test result of the 1977 Digital Vax 11/780, which was considered to be the world’s first microcomputer with an operation level of one million instructions per second, was 1757 Dhrystone/s (the Dhrystone benchmark that can execute 1757 times per second). Therefore Dhrystone MIPS (DMIPS) tested on other platforms will become a relative value. The limitation of Dhrystone is that it calls some library functions to perform iteration, and the compiler usually optimizes it and converts it into assembly language for execution, so in fact Dhrystone is also testing the compiler’s optimization of the C library functions of a specific CPU. . As shown in Figure 4, the test score of FII RISC-V3.01 Dhrystone 2.1 is 98360 Dhrystone/s, 55.98 DMIPS (98360/1757), and 1.12 DMIPS/MHz (55.98/50MHz).

Figure 4 Dhrystone test results

Figure 5 CPU Dhrystone comparison chart

Figure 5 shows some CPU Dhrystone results from the official ARM website. FII RISC-V3.01 has been highlighted in red. As can be seen, among the 10 CPUs, CORTEX-M7 (marked in blue) has the highest Dhrystone. On the official ARM website, CORTEX-M7, CORTEX-R4 (marked in blue) and CORTEX-A5 (marked in blue) all have more than 3 stages of pipeline. And CORTEX-R4 and CORTEX-A5 have multiple cores. Again, to make the performance comparison more reasonable, compare the FII RISCV-V3.01 with the CORTEX-M3 (marked in blue), since the latter is also the same 3-stage pipelined processor as the FII RISC-V3.01. It can be seen from Figure 5 that the Dhrystone of CORTEX-M3 is slightly higher than that of FII RISC-V3.01. This may be because during Dhrystone testing, FII RISC-V3.01 was only tested with a generic compiler. Although Dhrystone relies heavily on standard library functions, compiler vendors often optimize these functions in assembly language. In other words, the Dhrystone test depends heavily on the optimization level of the compiler rather than the actual hardware operation capabilities. A specially designed compiler can greatly improve Dhrystone’s results. Experiments have shown that with a specially designed compiler, Dhrystone can even outperform by a factor of 4. All in all, the Dhrystone results for FII RISC-V are mediocre compared to CPUs with the same number of cores and pipeline stages. Additionally, Dhrystone’s results are often cited and used by many in the industry, but they have not been scrutinized. Dhrystone is sometimes misused because there is no authoritative organization to certify test results. Despite its limitations, the Dhrystone benchmark is still a good self-assessment tool to use as a reference.

In addition to Coremark and Dhrystone, there are many open source benchmarks, such as AIM Multiuser Benchmark, Embench, HINT, etc., as well as paid industry standard benchmarks, such as SPEC (Standard Performance Evaluation Corporation, Standard Performance Evaluation Corporation), BAPCo (Business Application Performance Corporation, Benchmarks that can be audited and verified by commercial application performance companies), etc.

Tagged Coremark, CPU evaluation, CPU performance test, Dhrystone, Performance Evaluation of FII RISC-V3.01 on FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA Board

Posted in FPGA, RISC-V, RISC-V Textbook, Textbook and Training Project

Performance Evaluation of FII RISC-V3.01 on FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA Board

Related Articles

Leave a Reply Cancel reply