In computing, benchmarks are used to quantitatively measure the performance of the CPU. They are specially designed programs to run several iterations which includes specific operations. Normally, the benchmark would assess characteristics behavior of the CPU, for instance, the floating point operation performance of the CPU. To achieve that, benchmarks are created to mimic a corresponding workload on a component or system [1]. When it comes to CPU test, one cannot bypass Dhrystone. Along with Livermore, Whetstone, and Linpack, they are the most popular benchmarks at 80’s in the last century. While Dhrystone tests the integer operation for UNIX systems, Whetstone tests the floating point operations for minicomputers, Linpack tests the floating point operations for workstation, and Livermore tests the numeric operations for supercomputers [2]. Generally, Dhrystone reflects the performance of the C compilers and libraries. Here, Dhrystone will be used as the benchmark to test the FII RISC-V3.01 on FII-PRX100-D (ARTIX-7, XC7A100T) XILINX FPGA Board (https://fpgamarketing.com/FII-PRX100-S-ARTIX-100T-XC7A100T-Xilinx-RISC-V-FPGA-Board-FII-PRX100-S-1.htm) and a simply guide for porting Dhrystone will be illustrated.
It is quite difficult to find the original Dhrystone source code on official websites. There is a website created by Roy Longbottom which lists a lot of benchmarks collected by him. There’s an available source file on the website (http://www.roylongbottom.org.uk/dhrystone%20results.htm) or (https://wiki.cdot.senecacollege.ca/wiki/Dhrystone_howto#What_Dhrystone_really_does). Also, there are some Dhrystone code available for different systems on Github (https://github.com/sifive/benchmark-dhrystone). Note Dhrystone 2.1 is used here.
Normally, there will be 3 main files in the Dhrystone benchmark directory: “dhry.h”, ”dhry_1.c”, and “dhry_2.c”. The primary code to modify is in “dhry_1.c”, basically are print and time functions. The source files found online may be already pre-modified on these two functions. For different operating systems , the print and time function may be different [5].
The first step is to comment out all the time functions, along with the time definition and declaration. Note that there is a parameter called “HZ”, which is defined along with the time function, so wherever HZ occurs, it should be modified.
Figure 1 Comment out all the time related functions
The second step is to comment out some unnecessary print functions that were used to debug whether the evaluation result is valid.
Figure 2 Comment out the print function
The third step is to modify the iteration number. The original code is built for manually inputting the iteration number. Comment out that input function, instead, set the designed iteration number solidly.
Figure 3 Comment out the input function for iteration times
The fourth step is to add your own time and print function. Remember to include the corresponding header file as well. Set the start timer and end timer at the right place. Also note the different system clocks on different CPUs. Print function is used for outputting the Dhrystone results and your own choice of debugging information. One more thing for attention is that avoid setting print function inside the test loop (where between the start and stop timer), because it consumes time and thus making the Dhrystone run time not accurate.
Figure 4 Add your own time function and put in the corresponding place
The fifth step is after setting up everything, just before running the benchmark, recheck to make sure the iteration time is at least larger than 2 seconds. Theoretically, the more iterations it runs, the more accurate the result will be. The example Dhrystone printout on FII-PRX100-S (ARTIX-7, XC7A100T) XILINX FPGA Board with FII-RISC-V 3.01 is shown in Figure 5.
Figure 5 Sample Dhrystone result
The Dhrystone per second is 98360. Historically, the DMIPS (Dhrystone Millions of Instructions Per Second) is related to 1977 Digital Vax 11/780, whose Dhrystone test result is 1757 Dhrystone/s (Every second it executes 1757 Dhrystone iterations). After normalization with 1977 Digital Vax 11/780, the actual test result becomes 55.98 DMIPS (98360/1757). Another universal way to express the result is to distribute the result to every million Hz clock. In this case, the Dhrystone test result becomes 1.12 DMIPS/MHz (55.98/50, for the system clock is 50 MHz).
Figure 6 CPU Dhrystone plot
Figure 6 shows some CPU Dhrystone results provided by ARM official website. FII RISC-V3.01 has been red highlighted. It can be seen that the CORTEX-M7 (marked blue) has the highest Dhrystone among the 10 CPUs. FII RISC-V3.01 is a single-core, a mix of 2-stage and 3-stage pipeline CPU. From the ARM official website, CORTEX-M7, CORTEX-R4 (marked blue), and CORTEX-A5 (marked blue) all have more than 3-stage pipelines. CORTEX-R4, and CORTEX-A5 have more than one core. Since with more stages of pipeline or cores, undoubtedly the better performance of CPU is, to make the comparison of performance more fair, FII RISC-V3.01 is compared with CORTEX-M3 (marked blue), which is also a single-core and has a 3-stage pipeline. From Figure 6, Dhrystone of CORTEX-M3 is slightly higher than FII RISC-V3.01’s. It could be true since during Dhrystone testing, a universal compiler is used to test FII RISC-V3.01. While Dhrystone relies heavily on standard library functions, and compiler vendors usually optimizes these functions in assembly language. In other saying, Dhrystone test largely depends on the level of the optimization of the compiler instead of the actual hardware operation ability [4]. A specialized compiler could significantly improve the Dhrystone result. It can be proved that by experiments that with specially designed compiler, Dhrystone marks could be even 4 times higher [6]. To conclude, FII RISC-V behaves middling when comparing with the CPU that has the same amount of core and stages of pipeline. Besides, Dhrystone results are frequently quoted and used by many people in the industry, but they do not withstand scrutiny. Without authoritative organizations to certify the test result, Dhrystone is sometimes misused. Although, Dhrystone benchmark has its limitation, it is still a good self-estimation tool as a reference.
References
- “Benchmark (computing)| Wikiwand”, Wikiwand, 2020. [Online]. Available: https://www.wikiwand.com/en/Benchmark_(computing). [Accessed: 28- Sep- 2020].
- “Roy Longbottom’s PC benchmark Collection – Classic Benchmarks”,org.uk, 2020. [Online]. Available: http://www.roylongbottom.org.uk/classic.htm. [Accessed: 28- Sep- 2020].
- Staff and N. Dahad, “Benchmarking an ARM-based SoC using Dhrystone: A VFT perspective – Embedded.com”,Embedded.com, 2020. [Online]. Available: https://www.embedded.com/benchmarking-an-arm-based-soc-using-dhrystone-a-vft-perspective/. [Accessed: 28- Sep- 2020].
- “Dhrystone howto – CDOT Wiki”,cdot.senecacollege.ca, 2020. [Online]. Available: https://wiki.cdot.senecacollege.ca/wiki/Dhrystone_howto#What_Dhrystone_really_does. [Accessed: 28- Sep- 2020].
- Staff, “Benchmarking an ARM-based SoC using Dhrystone: A VFT perspective – Embedded.com”,Embedded.com, 2020. [Online]. Available: https://www.embedded.com/benchmarking-an-arm-based-soc-using-dhrystone-a-vft-perspective/. [Accessed: 28- Sep- 2020].
- T. Riemersma, “The Dhrystone benchmark, the LPC2106 and GNU GCC”, Compuphase.com, 2020. [Online]. Available: https://www.compuphase.com/dhrystone.htm. [Accessed: 28- Sep- 2020].