SimScale CAE Forum

OpenFabrics transport error, is this an issue?


#1

EDIT: I have re-titled this forum topic because I think I have come across a very broad error that causes sims to fail. I will leave the below posts alone, but as you will see, this issue is broader than this first post…

By looking at a similar simulation that does not error, I was able to determine that my floating point error has to do with ‘Selecting incompressible transport model Newtonian’.

Here is the error in the Solver Log:

Reading/calculating face flux field phi
--------------------------------------------------------------------------
[[21079,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
  Host: 
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[0] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
[0] #1  Foam::sigFpe::sigHandler(int) at ??:?
[0] #2   in "
[0] #3  Foam::surfaceInterpolation::makeWeights() const at ??:?
[0] #4  Foam::surfaceInterpolation::weights() const at ??:?
[0] #5  
[0]  at ??:?
[0] #6  __libc_start_main in "
[0] #7  
[0]  at ??:?
[:00167] *** Process received signal ***
[:00167] Signal: Floating point exception (8)
[:00167] Signal code:  (-6)
[:00167] Failing at address: 0x3e9000000a7
[:00167] [ 0]  [0x7fd6ffc8bcb0]
[:00167] [ 1]  [0x7fd6ffc8bc37]
[:00167] [ 2]  [0x7fd6ffc8bcb0]
[:00167] [ 3]  [0x7fd702bc3dff]
[:00167] [ 4]  [0x7fd702bc3ff5]
[:00167] [ 5] simpleFoam() [0x41ad29]
[:00167] [ 6]  [0x7fd6ffc76f45]
[:00167] [ 7] simpleFoam() [0x41c610]
[:00167] *** End of error message ***

And here is the same location in a simulation which does not error:

Reading/calculating face flux field phi
Selecting incompressible transport model Newtonian
Selecting RAS turbulence model kOmegaSST
kOmegaSSTCoeffs
{
    alphaK1         0.85;
    alphaK2         1;
    alphaOmega1     0.5;
    alphaOmega2     0.856;
    gamma1          0.555555555556;
    gamma2          0.44;
    beta1           0.075;
    beta2           0.0828;
    betaStar        0.09;
    a1              0.31;
    b1              1;
    c1              10;
    F3              false;
}

I can not figure out why ‘another transport’ model is needed, as far as I know the only difference in the sims is the number of volumes in their meshes of the same geometry.

Any ideas?

Thanks,
Dale


Sudden Floating point exception Mesh Error
#2

Hi @DaleKramer,

Hmm a peculiar problem and interesting indeed. Not sure what could be the possible problems. Quality of the mesh? Tough to tell. Maybe Darren (@1318980) has some ideas?

Cheers,

Regards,
Barry


#3

@Get_Barried

Hi Barry,

Interesting and peculiar eh :rofl:, like me…

Darren is on vacation :smile:

I am doing a Mesh Independence Study for a single geometry and I have 2 of 5 meshes that error with this error.

Here is a summary of 5 study meshes and of which, 2 error with this error:

  1. 1,841,719 volumes, 0 illegal faces, NO error
  2. 5,351,405 volumes, 6 illegal faces, NO error
  3. 11,149,405 volumes, 16 illegal faces, YES error
  4. 24,637,421 volumes, 16 illegal faces, NO error
  5. 48,326,067 volumes, 23 illegal faces, YES error

I have only found this with Google so far, and its resolution method means nothing to me (it is in a totally unrelated field but it has a very similar transport message).

The closest I can come with some certainty is that it may have something to do with setting up the cores but I tried changing a lot of Simulation Control parameters, like # of cores, initialization method and domain decomposition method yet still had the error.

Thanks,
Dale


#4

Hi @DaleKramer,

Interesting “solution”. Seems to me like it is some sort of back end issue. I honestly have no idea how to resolve the issue and there dosen’t seem to be any particular parameter that is giving the error. Since Darren is on vacation, maybe the other @PowerUsers_CFD can give some ideas but this does seem like a rather unique problem indeed.

Apologies for the lack of resolution, not much contribution from me here.

Cheers.

Regards,
Barry


#5

Barry,

I have been receiving many discouraging sim errors lately, especially on larger meshes.

It is amazing to me that so many of them have this nasty little notice built into them, even though they do not report the same error as my Post 1:

--------------------------------------------------------------------------
[[21079,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
  Host: 
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

I have a sneaky feeling that if we fix the root cause of this, then many sim errors will suddenly vanish :smile:

I think very much that we should track this down :wink:

FWIW, sure would be nice if we could get the whole Solver Log files somehow :wink:

Dale


#6

Here are two more transport errors that may be the root cause of these errors.

On a sim of my mesh MIS5 with 34,808,259 volumes and 161 illegal faces:

[40] Unset SIGSEGV(11) signal handler
[40] Unset SIGTERM(15) signal handler
[40] Unset SIGQUIT(3) signal handler
[40] Writing old times:
[40] 1 times to write
[40] Writing current time 1
--------------------------------------------------------------------------
[[48070,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
  Host: 
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[:00150] 95 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[:00150] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[18] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
[18] #1  Foam::writeOldTimesOnSignalFunctionObject::sigHandler(int) at ??:?
[18] #2   in "
[18] #3  Foam::GAMGSolver::scale(Foam::Field<double>&, Foam::Field<double>&, Foam::lduMatrix const&, Foam::FieldField<Foam::Field, double> const&, Foam::UPtrList<Foam::lduInterfaceField const> const&, Foam::Field<double> const&, unsigned char) const at ??:?
[18] #4  Foam::GAMGSolver::Vcycle(Foam::PtrList<Foam::lduMatrix::smoother> const&, Foam::Field<double>&, Foam::Field<double> const&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::PtrList<Foam::Field<double> >&, Foam::PtrList<Foam::Field<double> >&, unsigned char) const at ??:?
[18] #5  Foam::GAMGSolver::solve(Foam::Field<double>&, Foam::Field<double> const&, unsigned char) const at ??:?
[18] #6  Foam::fvMatrix<double>::solveSegregated(Foam::dictionary const&) at ??:?
[18] #7  Foam::fvMatrix<double>::solve(Foam::dictionary const&) at ??:?
[18] #8  
[18]  at ??:?
[18] #9  
[18]  at ??:?
[18] #10  __libc_start_main in "
[18] #11  
[18]  at ??:?
--------------------------------------------------------------------------
mpirun noticed that process rank 18 with PID 169 on node  exited on signal 15 (Terminated).
--------------------------------------------------------------------------

And on a sim of my mesh MIS6 with 67,113,721 volumes and 446 illegal faces:

[45] Unset SIGSEGV(11) signal handler
[45] Unset SIGTERM(15) signal handler
[45] Unset SIGQUIT(3) signal handler
[45] Writing old times:
[45] 1 times to write
[45] Writing current time 1
--------------------------------------------------------------------------
[[34099,1],20]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
  Host: 
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
[:00151] 95 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[:00151] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[0] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
[0] #1  Foam::writeOldTimesOnSignalFunctionObject::sigHandler(int) at ??:?
[0] #2   in "
[0] #3  Foam::GAMGSolver::scale(Foam::Field<double>&, Foam::Field<double>&, Foam::lduMatrix const&, Foam::FieldField<Foam::Field, double> const&, Foam::UPtrList<Foam::lduInterfaceField const> const&, Foam::Field<double> const&, unsigned char) const at ??:?
[0] #4  Foam::GAMGSolver::Vcycle(Foam::PtrList<Foam::lduMatrix::smoother> const&, Foam::Field<double>&, Foam::Field<double> const&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::PtrList<Foam::Field<double> >&, Foam::PtrList<Foam::Field<double> >&, unsigned char) const at ??:?
[0] #5  Foam::GAMGSolver::solve(Foam::Field<double>&, Foam::Field<double> const&, unsigned char) const at ??:?
[0] #6  Foam::fvMatrix<double>::solveSegregated(Foam::dictionary const&) at ??:?
[0] #7  Foam::fvMatrix<double>::solve(Foam::dictionary const&) at ??:?
[0] #8  
[0]  at ??:?
[0] #9  
[0]  at ??:?
[0] #10  __libc_start_main in "
[0] #11  
[0]  at ??:?
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 152 on node  exited on signal 15 (Terminated).

Anyone with any ideas after seeing all my errors?

Thanks
Dale


#8

Hi Dale and thanks for reporting this!

Creating an issue for our engineers who will have a look at that :+1:

Best,

Jousef


#9

@jousefm

I think it may be possible that this transport error simply precedes all error screen output after ANY error. If this is correct then I really need help finding out why all the 4 sims of this topic happened and how to fix them please … :cry: (I can share the project with you and point you to the error sims if you would like to take a crack at it :smile:)

I may have been fooled into thinking the transport model of this action, ‘Selecting incompressible transport model Newtonian’ was what the nasty notice was referring to in my Post #1.

Thanks,
Dale


#10

Hi Dale,

feel free to share the project with me - will check it after my exam tomorrow :slight_smile: If the engineers find something I will share it with you of course.

Cheers!

Jousef


#11

I think this is due to negative cell volumes, or cells that are inside out. The error messages show where the divergence occur:

[0] #3  Foam::surfaceInterpolation::makeWeights() const at ??:?
[0] #4  Foam::surfaceInterpolation::weights() const at ??:?

This is while the solver is reading/calculating face flux field phi. Cell-centre values can’t be interpolated to face-centres on illegal faces. The solver stops while calling the makeWeights() function.

and

[18] #3  Foam::GAMGSolver::scale(Foam::Field<double>&, Foam::Field<double>&, Foam::lduMatrix const&, Foam::FieldField<Foam::Field, double> const&, Foam::UPtrList<Foam::lduInterfaceField const> const&, Foam::Field<double> const&, unsigned char) const at ??:?
[18] #4  Foam::GAMGSolver::Vcycle(Foam::PtrList<Foam::lduMatrix::smoother> const&, Foam::Field<double>&, Foam::Field<double> const&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::Field<double>&, Foam::PtrList<Foam::Field<double> >&, Foam::PtrList<Foam::Field<double> >&, unsigned char) const at ??:?

this is while the linear system solver is trying to converge the pressure field, as you can see the multi-grid solver is at its Vcycle.

I don’t think you can check mesh quality on SimScale yet @jousefm, right?


#12

Thanks, good to know, I will work harder on removing illegal faces.

My main question was about the OpenFabrics transport error that seems to be everywhere.


#13

Hi @dylan,

only the normal mesh log (I think you know about that) providing the user with the information on how bad or good the mesh is. You could also check the mesh with the wireframe option in Paraview which would be more visual.

Cheers,

Jousef


#14

So visually, how does your eye know it is a good mesh in Paraview?

I am just coming to realize that a good visual indicator might be that neighboring cells do not change in volume by more than a 8 to 1 or 0.125 volume change as this is the volume change ratio at refinement level changes (I assume that refinement level changes ARE ok). Even a 2D mesh view would sort of confirm < 4 to 1 area ratio changes (or 8 to 1 volume changes as long as cells are generally ‘squarish’ in the missing dimension).

Edit:

@jousefm

Also, with regard to mesh log quality info:
How can I make sure the mesh log always includes the last 10 lines we see here? (I have not seen these 10 lines very often):

Dale


#15

Hey @jousefm

I am aware of the mesh quality output from snappyhexmesh, but I was referring to the checkMesh utility in OpenFOAM. The output log from snappyhexmesh doesn’t tell us much or export problematic faces to a set. It is next to impossible visually examining a mesh without knowing where those face sets are.


#16

Hi @DaleKramer!

Visually, you can see if you messed up in the building/pre-processing part when you write your lines inside the OpenFOAM script to create a wedge section let’s say but also using other software and then mesh it with SnappyHexMesh (SHM) or blockMesh (not available on SimScale). We open the built mesh in Paraview and sometimes look if there is any weird artifact that has to be fixed before proceeding with the simulation setup. If I have the time I will post an example to demonstrate what I mean :slight_smile:

@dylan: For that you’d have to download the mesh and check it inside OF, I am not aware of lines inside the log indicating the checkMesh utility. But very good point you bring up here - that’s good step-by-step material (will note it and create a small tutorial if time allows).

All the best,

Jousef