Scaling with Multiple Threads
We have recently done some performance measurement to see how well the RANGO algorithm scales with multiple threads. For this benchmark we used an Intel i7 860 with 2.8GHZ. This machine has 4 cores, and it is possible to enable Hyper-threadding.
In this benchmark, we set both stops.min_iterations and stop.max_iterations to 3,000,000, and set stops.min_runtime to 0 and stops.max_runtime to a ridiculous large number like 99999999. This effectively lets RANGO run purely iteration based for exactly 3,000,000 iterations. See the API documentation for more information on the stops. The result of that matching can be seen in the above picture.
Scaling with Cores
We then proceeded to change performance.threads. The default setting is 0, which means use as many threads as available. With Hyper-threadding enabled, this would mean 8 on that CPU. We set successively set the number of threads to 1, 2, 3, and 4, measured runtime, and got this nice result.
It shows that RANGO scales very well with the number of cores. It is not completely linear, but each additional thread substantially improves performance.
Scaling with Hyper-Threadding
We continued increasing the number of threads to 5, 6, 7, and 8 so that hyper-threadding was used.
The performance improvement continues, even though two threads are now running on the same core. It looks like Hyper Threading does indeed improve performance for RANGO. The trend has flattend quite a bit compared to using cores directly, but the speedup is still quite noticeable.
The morale of the story: Using all available threads definitely pays off. When keeping stops.threads at 0 (the default configuration) will use all available threads for maximum performance. So just keep the setting at 0, and enable hyper threadding in the BIOS if your CPU supports it.