I see that you used a number of cores, and that is not recommended, as mentioned by @ahmedhussain18. I shall like to comment on it, hoping this might help in future simulations.
- Computational speed can generally be increased with an increase in the number of processors. Using parallel execution on more cores, one can reduce the run time of the analysis consisting of a large number of nodes and elements, as well as analysis that require a large number of increment. It is possible to use both a simple thread based domain decomposition implementation (in which the threads communicate by sharing the same memory pool to perform different lightweight tasks simultaneously within the same application) as well as an MPI (Message Passing Interface) based domain decomposition parallel implementation (in which multiple analysis processes communicate with each other via messages for parallelization), which is normally used on computer clusters.
The parallelization in general works either at the loop level or the domain level. In the default domain-level method, the model is broken into various topological domains, each with nearly the same amount of computational expense, and the computer assigns each domain to a processor. These domains then undergo analysis independently, and information is passed between the domains through a thread or MPI based method.
The loop-level method parallelizes low-level loops (like element, node, contact pair operations) that are responsible for most of the computational costs.This speeding up may be significantly less that what can be achieved through domain-level parallelization, and will vary depending on the features of the analysis. Some analysis, like general contact algorithm and kinematic constraints, do not make use of parallel loops at all. The results of this method are not dependent on the number of processors, but use of large number of processors (more than 4) may scale poorly.
In transient simulations (like the particle method), I have personally seen that until the number of particles is more than 10000, using a larger number of processors actually slows the real time taken for computation.
To speed up your simulation, especially if you are running the simulation for a long time period, I can suggest you try mass and time scaling. Both these methods can be a bit tricky to handle, so I suggest care, but if done properly, the simulation speeds up substantially.
Do not use a high number of particles- only as many as required. This may sound very general advice, and I see that you are not making this mistake, but I have seen many users use an excessively fine mesh/high number of particles, and hence lose a lot of time waiting for the simulation to run.
I hope this helped.