So I begin my foray into OpenMPI, with the expectation of turning my many old computers into a powerful supermachine that could potentially do powerful things very quickly. I set it up, begin the make and let it complete on a bunch of machines. My biggest interest in its usefulness surrounds how one would split work, after all, this is a heterogeneous cluster.

So I pick two of the most wildly different (performance wise) machines I can find. One with AVX2 and a recent (Skylake) desktop Core i5 and the other, with an old 2nd gen laptop Core i5. They sound similar but the desktop literally does 35% more in single core performance AND has twice the number of cores. This doesnt even consider the larger cache and architectural improvements.

Considering I dont have a good network bridge, I take a cookie cutter example that is more processor constrained than network constrained, which is simply calculation of pi to the nth digit. The only information sent across the network are the n and the result and nothing else.

Here is my hardware:

  1. Laptop: i5 2520M (dual core) 4GB RAM 128/512/3072 KB L1/L2/L3 cache
  2. Desktop: i5 6402p (quad core) 8GB of RAM 256/1024/6144 KB of L1/L2/L3 cache

Here are my results:

  1. Serial Time: 33 seconds (Desktop)
  2. OpenMP Time: 10 seconds (Desktop)
  3. Serial Time: 50 seconds (Laptop)
  4. OpenMP Time: 19 seconds (Laptop)
  5. MPI Cluster Time: 20 seconds

As can be seen, the performance difference between the machines on the cluster makes more sophisticated scheduling required.

You can find my code here