Fiddling with OpenMPI
So I begin my foray into OpenMPI, with the expectation of turning my many old computers into a powerful supermachine that could potentially do powerful things very quickly. I set it up, begin the make and let it complete on a bunch of machines. My biggest interest in its usefulness surrounds how one would split work, after all, this is a heterogeneous cluster.
So I pick two of the most wildly different (performance wise) machines I can find. One with AVX2 and a recent (Skylake) desktop Core i5 and the other, with an old 2nd gen laptop Core i5. They sound similar but the desktop literally does 35% more in single core performance AND has twice the number of cores. This doesnt even consider the larger cache and architectural improvements.
Considering I dont have a good network bridge, I take a cookie cutter example that is more processor constrained than network constrained, which is simply calculation of pi to the nth digit. The only information sent across the network are the n and the result and nothing else.
Here is my hardware:
- Laptop: i5 2520M (dual core) 4GB RAM 128/512/3072 KB L1/L2/L3 cache
- Desktop: i5 6402p (quad core) 8GB of RAM 256/1024/6144 KB of L1/L2/L3 cache
Here are my results:
- Serial Time: 33 seconds (Desktop)
- OpenMP Time: 10 seconds (Desktop)
- Serial Time: 50 seconds (Laptop)
- OpenMP Time: 19 seconds (Laptop)
- MPI Cluster Time: 20 seconds
As can be seen, the performance difference between the machines on the cluster makes more sophisticated scheduling required.
You can find my code here