Numerical error in kokkos-based 3d heat conduction examples

Issue #537 resolved
Dan Bonachea created an issue

I've discovered a numerical error in the Kokkos 3d heat conduction example - it fails to properly initialize the computational domain.

The easiest way to see the problem is to search for the variable T0 which is supposed to be the initial temperature of elements in the 3d domain. This variable is initialized to zero and optionally parsed from the command-line, but is never applied to the computational domain!

This error has been copied to all three examples in our repos based on this code:

  1. extras examples/kokkos_3dhalo/upcxx_heat_conduction.cpp
  2. extras examples/kokkos_3dhalo/host_upcxx_heat_conduction.cpp
  3. Impl example/prog-guide/rput-rpc.cpp

The pure Kokkos upstream version and the first two copies above using Kokkos all allocate the computational domain using T = Kokkos::View<double***>(...) which IIUC zero initializes the data, which has the effect that any T0 input is ignored and T0 is unconditionally forced to zero. The third example (from the Programming Guide) allocates the computational domain using T = new double[(hi-lo)*X*X] and never explicitly initializes the domain, which means it computes from a starting point of random/garbage data.

Actions I think we need to take:

  1. I'm fixing example/prog-guide/rput-rpc.cpp immediately, which is the most egregious problem and has a straightforward solution. I will PR the solution soon.
  2. We should report this defect in the Kokkos tutorial example to the upstream Kokkos folks. There are at least two possible solutions:
    1. use the T0 parameter as documented to initialize the domain, or
    2. remove the T0 parameter that currently does nothing, fixing the documentation but preserving current behavior.
  3. We should eventually patch both our ported Kokkos examples to match whatever fix is applied upstream. If we end up adopting "solution 1" then our solution validation needs to also be adjusted when T0 is changed.

On a related note, I'm finding that setting T0 to properly initialize the domain with a value larger than about 1.7 leads to numerical overflows in the average temperature that might be indicative of a secondary numerical problem (probably in compute_T()).

Comments (5)

  1. Dan Bonachea reporter

    PR 408 is merged resolving the copy in example/prog-guide/rput-rpc.cpp

    @Daniel Waters will handle the remaining tasks

  2. Daniel Waters

    I told the Kokkos team about the bug in their upstream code and they told me to create a pull request with the fix, which is mirrored in our kokkos_3dhalo example.

  3. Log in to comment