Every re-mesh using Hex-dominant parametric algorithm gives a different number of cells

DaleKramer · September 10, 2018, 5:56am

I have noticed that if I re-mesh an existing Hex-dominant parametric mesh with exactly the same parameters I end up with a mesh that has a different number of cells and faces than the original.

I re-meshed an 8.5 million cell mesh 7 times and the range of number of cells for the 8 meshes was about 16,000 cells.

All meshes were meshed with default parameter values and a 1.28 meter cell grid on the background mesh box.

The geometry was a complete aircraft and was refined and layered nicely to a yPlus of 50.

The background mesh box was significantly larger than the first region refinement box.

All meshes had 0 illegal cells but they all failed the OpenFoam quality check as ‘Mesh quality check failed. The mesh is not OK.’

The fact that all these meshes are not the same number of cells concerned me enough to run them all through an incompressible steady state simulation and make a chart of the results variations.

Here is the results chart for all 8 sim runs:

Basically and with some relief, I found that for my situation all the meshes produced virtually the same results.

I present this forum topic for a discussion thread about whether this meshing anomaly could have any other significance or could it affect results in a way I have not foreseen.

Dale

jousefm · September 10, 2018, 3:08pm

Hi Dale and sorry for the late response. This might take some time to investigate - same as with the other post about cores per cells.

As always I am inviting some of the PowerUsers to this discussion - @vgon_alves, @Get_Barried, @Anware, @oscarcorripio, @mlarreta! Do you have some experience with this type of problem?

Best,

Jousef

jprobst · September 11, 2018, 9:43am

Hi Dale,
I’m quite impressed by the thorough investigation you did
Your observation that not every run of hex-dominant meshing produces the same number of cells is correct and definitely relevant. We have observed this phenomenon ourselves at a number of occasions. Let me explain why it can happen:
The mesher is capable of running on multiple processors, using multiple processes. Each process is responsible for a continguous part of the mesh. In order to decompose the mesh domain into chunks which are suitable for multiprocessing, a graphing algorithm is used which tries to cut the mesh into chunks of equal size while keeping the number of faces shared between the chunks minimal. This is an optimization problem which does not necessarily have a unique solution.

I have drawn a minimal example which visualizes this. The graph network symbolizes a mesh where the dots are cells and the lines between are faces which connect them. We are considering the same mesh twice, once in the upper half and once again in the lower half of the picture.

If we distribute this mesh across 2 processors, we have 2 options which are equally good. The processor partitions are visualized with blue circles. In both scenarios we get one partition with 3 and one with 4 cells, and in both cases they share 1 face.

In such a case, the graph algorithm is free to choose either decomposition at random. Now, both processors are responsible for a different part of the mesh and that’s when they begin to produce different output. For example, one processor might decide to do more smoothing iterations due to the extra cells. And likewise, more cells may be added or removed. However, the meshes will not be totally different in the end, just a little bit here and there. It adds up to a small difference in cells overall which usually doesn’t cause problems, like you observed.

Johannes

DaleKramer · September 11, 2018, 9:49am

Thank you for this detailed explanation of the phenomenon.

Wouldn’t the decision on which cells are given to each processor result in the same split, on each re-mesh, if the job was split to the same number of cores each time? After all, we are dealing with the same code, parameters and mathematical precision each time?

After that, having any sort of randomly made decisions sounds so un-scientific , especially where repeatabilty is a concern.

I guess as long as the quality of the ‘different’ cells are the same then we should be OK.

Dale

EDIT: @jprobst Sorry I decided my first drowsy response was not so good after I looked at it when I got up this morning…

jprobst · September 13, 2018, 7:26am

Hi Dale

haha your response was fine, no worries.

You’re right - we don’t like this kind of random behaviour either. It makes testing and repeating impractical, especially when we get bug reports where a problem only appears with a certain probability. We had cases where certain issues would only manifest itself in about 10% of the jobs while all settings were identical. Imagine you’re trying to reproduce a bug on a meshing job which lasts 1 hour and the chance of it occurring is only 10 %. You end up spending about 10 hours or more for the bug to appear…

The reason for the randomness is a bit intricate and it has to do with the underlying data structures used to store the graphs described earlier. Most programming languages (C++ in this case) offer hash based data structures (e.g. maps, sets or certain kinds of trees) to perform fast lookups. However, their ordering is (usually) not guaranteed - for performance and security reasons. Interestingly, most languages will consciously use some “salting” to avoid a repeatable ordering. Knowing in advance in which order the elements in a map or a set appear can be a vector of attack in software. Hence, the elements will be shuffled differently each time, using for example the system clock as a source of (pseudo-)randomness.

This means, the code of the mesher might say “find all best ways of partitioning the mesh and take the first one”. If the data structure holding the partitionings is hash based (e.g. a hash set), then the first element could be a different one each time, unless the best way of partitioning the mesh is unique. This depends entirely on the topology of the mesh, and it’s well possible that there is a unique best partitioning for a mesh. In that case there will be no randomness. If there is more than one, things become unpredictable.

In principle this is something which could be fixed, for example by implementing an algorithm which looks at all partitionings and decides to take the same one each time. At the expense of performance and complexity of the code.

Hannes

DaleKramer · September 13, 2018, 9:20am

Been there, done that, not fun!!

In debugging that case it would be worth it to have a mode that re-meshes repeatably even if it took 4 hrs instead of 1 , one less variability out of the way… (and hope that the bug still exists by the way or else this anomaly I have seen was the cause and may need to be dealt with)

That is very interesting, never knew that, too bad what has to be done to protect us from the dark side…

Wouldn’t there be a way to turn salting off if the coder was not concerned of an attack?

Thanks,
Dale

jprobst · September 13, 2018, 11:32am

I hear you…

Most libraries that I’ve seen don’t have a switch which allows to configure this. There are some exceptions, for example Python has an ordered hash map. Other languages, for example Go, don’t offer anything out of the box (at least to my knowledge).

DaleKramer · September 14, 2018, 10:19pm

Hi Hannes,

Just when I thought I had this topic under control in my mind …

I had a meshing of the same geometry as above, that had a larger Background Box and which had 9,378,368 cells in it.

I then increased the size of the geometry refinement box (GB) that I had ‘enclosing’ the geometry by 0.01 meters in the 6 cardinal directions and other than that, it was refined exactly the same way.

Surprise, surprise, the new mesh had exactly the same number of cells AND faces AND points.

I believe some things are coincidental but I have a hard time believing this is one of them.

I had to check a number of times that I truly meshed the 2nd mesh with a 0.01m larger all around GB.

Even looking at the layering of each of the 23 faces that got layered, the # of layer cells on each face is exactly the same.

The only difference I can see at the end of the meshing log that verifies that I am not looking at the same log is the slightly different meshing and layering times, see for yourself:

So… if you want to have a re-mesh with exactly the SAME number of cells, faces, points and layer cells, I guess you just have to change a refinement a little and which apparently seems to dissolve that grain of salt a little

Seriously, though, the next step I did was to run two sims with the only difference being the mesh I chose for both sims, just to see if the solver would give the same results for each mesh.

Fortunately (I think ) , the results and convergence were exactly the same (to 10 decimal places):

So,

I know I only changed the GB box dimensions by 0.02m in each axis (1/2 of the grid for the refinement level of which the GB box is refined to, this rounds to 1 level 5 cell bigger), but I would still think that the mesh should have changed, why did it NOT change?
What about the fact that the 2nd mesh that was generated was exactly the same mesh as the mesh generated from the smaller GB box, surely that is not a coincidence and is this related at all to the re-mesh anomaly of this topic (which is not an anomaly is this case if the 0.02m was rounded down or truncated somehow)?

I’m very confused

Thanks,
Dale

jprobst · September 17, 2018, 4:17pm

Hi Dale

wow you’re cranking out all possible edge cases of our software. I love it

I believe these are roundoff errors, as you suspected. The refinements are the first thing appearing in meshing, at a time when the background mesh is still very coarse. I’ve drawn it up below. The blue box represents the refinement and the black squares are the background mesh. Let’s say that a cell only gets refined if the refinement box contains the cell centroid (we use the average of all coordinates to establish a midpoint). It can be seen that the cells being refined don’t change if the box is changed by an amount smaller than half of a cell width. Once the change is big enough to contain more cell centroids, it should take effect on those cells as well. If not - that would be a bug which we need to check.

Funny though that the mesh is finally reproducible when it is expected the least

Hannes

DaleKramer · September 17, 2018, 4:40pm

Thanks, understood … I may yet try to see how big I have to change it (or move it for that matter) to get a different mesh, it must be that I have to make the refinement box more than 1/2 the size of the level 0 grid to make it different, not 1/2 of the refinement level 5 of the refinement box itself.

This makes me think that there may be a simpler and more basic underlying reason that causes the same parameters for re-meshes to produce a different mesh and which gets skipped when the solver has to think about rounding off the refinement box size/position…

Dale