-
assigned issue to
Numerical error in kokkos-based 3d heat conduction examples
I've discovered a numerical error in the Kokkos 3d heat conduction example - it fails to properly initialize the computational domain.
The easiest way to see the problem is to search for the variable T0
which is supposed to be the initial temperature of elements in the 3d domain. This variable is initialized to zero and optionally parsed from the command-line, but is never applied to the computational domain!
This error has been copied to all three examples in our repos based on this code:
- extras examples/kokkos_3dhalo/upcxx_heat_conduction.cpp
- extras examples/kokkos_3dhalo/host_upcxx_heat_conduction.cpp
- Impl example/prog-guide/rput-rpc.cpp
The pure Kokkos upstream version and the first two copies above using Kokkos all allocate the computational domain using T = Kokkos::View<double***>(...)
which IIUC zero initializes the data, which has the effect that any T0
input is ignored and T0
is unconditionally forced to zero. The third example (from the Programming Guide) allocates the computational domain using T = new double[(hi-lo)*X*X]
and never explicitly initializes the domain, which means it computes from a starting point of random/garbage data.
Actions I think we need to take:
- I'm fixing example/prog-guide/rput-rpc.cpp immediately, which is the most egregious problem and has a straightforward solution. I will PR the solution soon.
- We should report this defect in the Kokkos tutorial example to the upstream Kokkos folks. There are at least two possible solutions:
- use the
T0
parameter as documented to initialize the domain, or - remove the
T0
parameter that currently does nothing, fixing the documentation but preserving current behavior.
- use the
- We should eventually patch both our ported Kokkos examples to match whatever fix is applied upstream. If we end up adopting "solution 1" then our solution validation needs to also be adjusted when
T0
is changed.
On a related note, I'm finding that setting T0
to properly initialize the domain with a value larger than about 1.7 leads to numerical overflows in the average temperature that might be indicative of a secondary numerical problem (probably in compute_T()
).
Comments (5)
-
reporter -
reporter PR 408 is merged resolving the copy in example/prog-guide/rput-rpc.cpp
@Daniel Waters will handle the remaining tasks
-
I told the Kokkos team about the bug in their upstream code and they told me to create a pull request with the fix, which is mirrored in our
kokkos_3dhalo
example. -
reporter Proposed solution in extras PR 37
-
reporter - changed status to resolved
Extras examples fixed in extras PR 37 merged at 7b57ffe
- Log in to comment
PR 408 deploys a proposed fix to example/prog-guide/rput-rpc.cpp
I'd like @Daniel Waters to handle the remaining tasks on this issue.