HPC Scalability - "Why running on 96 cores doesn't yield results faster than a simple 4-core run?"

High Performance Computing (HPC) Scalability is one of the major concerns of modern computational tools. Simply put, it refers to the amount of time saved (or speed up) versus the total number of computing cores utilized in the simulation.

Using SimScale we can leverage cloud computing by running our simulations on up to 96 cores. The more the merrier might be true for most of the cases. Typically, a higher core machine can lead to lower computational times speeding up our work.

However this is not always the case. The amount of speed up one can achieve will also depend on the overall model size, typically depicted through the overall cell or node count.

In very simple words, what is really happening in the background is that each of the cores takes up a portion of the full problem, splitting the load among them. Given a fixed size problem, the more cores put into play, the less load each of them has to handle. Ideally, this means that doubling the cores will halve the computational time required.

Now, at the end of each computational iteration, the cores need to communicate with each other to transfer all the simulation related information, before they move on to the next iteration. Roughly speaking, each core has to “talk” to the rest of the team to transfer this information. Its becoming clear, that the more team-members (cores) the more time will be required to communicate with each other.

These are two contradicting actions that help us understand that there is always a limit until which, increasing the cores will yield faster results. After this limit, the cost of core communication will be such that it will shadow any benefit gained from load splitting.

You can try this on your own on a small model (few tenths of cells). Try running on 4, 32 and 96 cores. You may be surprised that the 4-core run is the fastest of all.

If unsure, you can always go with the Automatic option when choosing the number of cores. Based on the model size, SimScale will smartly choose the optimum number of processors to ensure fast computation in the most economical way (optimize the number of cores-hours consumed).

1 Like