thanks for providing the results of your benchmark!
Let me shed some light on your numbers. Whenever a meshing job is executed there is some overhead involved, for example copying data, creating the 3D visualizations for the browser, etc. Not all of these steps scale according to the number of cores. Since SimScale is a highly distributed system, some communication over networks takes place, which does usually not scale noticeably with the number of available processors.
All processing which is limited by the number and power of the CPUs is called CPU bound while all operations which have to wait on reading, writing or latency of drives or network connections are called I/O bound (input/output bound). Increasing the number of cores reduces the time of CPU spent on CPU bound processes while it has almost no impact on I/O bound processes.
The meshing jobs in your example are relatively short which means that the CPU bound fraction (which would benefit from more CPUs) is rather low, and the influence of the I/O bound actions is significant. Hence, your suspicion that a larger model size would lead to a better scaling is true. In case of a larger model you would have more CPU bound processing than I/O bound.
There is also such a thing as too many cores: if a rather small job is executed on too many CPUs, the processors lose more time on communicating and waiting on each other than doing actual work. In your case you seem to have hit the sweet spot with 4 cores. A larger job can easier benefit from more processors. For example, in the case of CFD, the optimum is usually somewhere around 100,000-250,000 cells per processor.
The queue also has to do with SimScale being a distributed system where computation takes place on many different machines. When a meshing or simulation job is issued by a user we first check if there is any free hardware where the job can be run. If yes, the job starts right away (it happened in the 4-core example). If not we wait for a little while to see if any hardware becomes free (this explains the other delays). And if after some time still no free hardware was available, we add more to the pool. This can take about 2-3 minutes. The job stays queued meanwhile.
I hope I was able to answer your questions. Have fun simulating!