Menu Close

FII RISC-V3.01 FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA Board Dhrystone Migration Guide

When evaluating hardware, benchmarks are often used to measure the performance of CPUs (Central Processing Units). They are specially designed programs that run a certain set of operations over multiple iterations. Benchmarks generally focus on evaluating a specific computing capability of the CPU, such as the CPU’s floating-point performance. And create benchmarks for this to mimic the corresponding workload on the component or system. When it comes to CPU testing, it’s impossible to bypass the Dhrystone benchmark. Along with Livermore, Whetstone and Linpack, they were the most popular “classic benchmarks” of the 1980s. Dhrystone tests integer operations on UNIX (Uniplexed Information and Computing Service) systems, while Whetstone tests floating-point operations on minicomputers, Linpack tests floating-point operations on workspaces, and Livermore tests numerical operations on supercomputers. Here, FII uses the Dhrystone benchmark to test RISC-V3.01 on the FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA board , and will briefly explain the Dhrystone migration steps.

It is difficult to find the original Dhrystone code on the official website these days. Roy Longbottom has created a website listing many of the benchmarks he has collected. Source files are available on the website (http://www.roylongbottom.org.uk/dhrystone%20results.htm) or (https://wiki.cdot.senecacollege.ca/wiki/Dhrystone_howto#What_Dhrystone_really_does). Also, there are some Dhrystone codes for different systems on Github . Note that Dhrystone version 2.1 will be used here.

Typically, the Dhrystone benchmark folder contains 3 main files: “dhry.h”, “dhry_1.c” and “dhry_2.c”. The main code to modify is in “dhry_1.c”, basically the print and time functions. Source files found on the web may have been modified on both functions. The ported print and time functions may vary for different operating systems and different CPUs.

The first step in porting is to comment out all time functions and their definitions and initial values. Note that there is a “HZ” parameter, which is defined together in the time function, so the HZ in the entire code should be modified accordingly according to the actual situation.

dhystone time related functions

dhrystone time related function 1

Figure 1 Comment out all time-related functions

 

The second step is to comment out some unneeded print functions that were supposed to be used to debug whether the evaluation results were valid. See Figure 2 for details.

dhrystone print function

Figure 2 Comment out the unnecessary print function

 

The third step is to modify the number of iterations. The number of iterations (number of loop runs) in the original code was defined as a manual entry. Comment out the input function and fix the number of iterations in the code.

dhrystone iteration number function

Figure 3 Comment out the input function for the number of iterations

 

The fourth step is to add the corresponding time and print functions. Remember to also include the corresponding header files. Set the start and end timers to the correct positions. Note that the system clock is different on different CPUs. The print function is used to output Dhrystone results and custom debugging information. One more thing to be aware of is to avoid adding a print function inside the test loop (between starting the timer and stopping the timer) as this will consume time and thus make the Dhrystone run time inaccurate.

Add dhrystone time function

Add dhrystone time function 1

Figure 4 Add the corresponding time function and place it in the corresponding position

 

The fifth step is to recheck to make sure the run test loop time is at least greater than 2 seconds before running the benchmark after all setup is done. In theory, the more iterations the CPU runs, the more accurate the Dhrystone evaluation will be. The Dhrystone printout of the FII-PRX100-S (ARTIX-7, XC7A100T) XILINX FPGA board of FII RISC-V 3.01 is shown in Figure 5.

Dhrystone test results

Figure 5 Dhrystone printout results

 

The test result was 98360 Dhrystones per second. For historical reasons, DMIPS (Dhrystone Millions of Instructions Per Second) is related to the 1977 Digital Vax 11/780 with a Dhrystone test result of 1757 Dhrystone/s (1757 Dhrystone iterations per second). After normalization with a 1977 Digital Vax 11/780, the actual test result was 55.98 DMIPS (98360/1757). Another general form of expressing the result is to assign the result to a clock per megahertz. In this case, the Dhrystone test result is 1.12  DMIPS/MHz (55.98/50 because the system clock is 50 MHz).

dhrystone result comparison

Figure 6 CPU Dhrystone comparison chart

 

Figure 6 shows some CPU Dhrystone results from the official ARM website. FII RISC-V3.01 has been highlighted in red. As can be seen, among the 10 CPUs, CORTEX-M7 (marked in blue) has the highest Dhrystone. FII RISC-V3.01 is a single-core, composed of 2-stage and 3-stage pipeline CPU. On the official ARM website, CORTEX-M7, CORTEX-R4 (marked in blue) and CORTEX-A5 (marked in blue) all have more than 3 stages of pipeline. And CORTEX-R4 and CORTEX-A5 have multiple cores. With more pipeline stages or multiple cores, the performance of the CPU will undoubtedly be better. To make the performance comparison more reasonable, the FII RISCV-V3.01 is compared to the CORTEX-M3 (marked in blue), since the latter is also the same 3-stage pipelined processor as the FII RISC-V3.01. It can be seen from Figure 6 that the Dhrystone of CORTEX-M3 is slightly higher than that of FII RISC-V3.01. This may be because during Dhrystone testing, FII RISC-V3.01 was only tested with a generic compiler. Although Dhrystone relies heavily on standard library functions, compiler vendors often optimize these functions in assembly language. In other words, the Dhrystone test depends heavily on the optimization level of the compiler rather than the actual hardware operation capabilities. A specially designed compiler can greatly improve Dhrystone’s results. Experiments have shown that with a specially designed compiler, Dhrystone can even outperform by a factor of 4. All in all, the Dhrystone results for FII RISC-V are mediocre compared to CPUs with the same number of cores and pipeline stages. Additionally, Dhrystone’s results are often cited and used by many in the industry, but they have not been scrutinized. Dhrystone is sometimes misused because there is no authoritative organization to certify test results. Despite its limitations, the Dhrystone benchmark is still a good self-assessment tool to use as a reference.

Posted in FPGA, RISC-V, RISC-V Textbook, Textbook and Training Project

Related Articles

Leave a Reply

Your email address will not be published.

Leave the field below empty!