Provide upcxx::is_init()?

Issue #9 resolved
Amir Kamil created an issue

This is a summary of our discussion on initialization and finalization in the UPC++ meeting on 5/6/15 and is a fork of issue #3.

1. Implicit calls to upcxx::init() and upcxx:finalize()

We'd like remove the requirement that the user explicitly call init() and finalize() in main(). However, we cannot do so until GASNet allows us to initialize it without passing it &argc and &argv. Once this restriction is removed, we could implicitly call UPC++ init() and finalize() by declaring a static object that calls init() in its constructor and finalize() in its destructor. However, C++ provides us no way to guarantee that this constructor is the first to run and that the destructor is the last to run. (There is a std::atexit() function to queue things to run on program termination, but there is no way to guarantee that they run after all destructors.)

2. Initialization semantics

Regardless of whether init() is called implicitly or explicitly, we need to define what operations are legal before initialization. (If initialization is implicit, then we would specify that initialization happens at some arbitrary point before main, and what operations are legal before entering main.) So far, the operations we think are important are declaring shared variables and arrays and being able to call query the number of ranks and rank ID.

a) Shared variables and arrays

Currently, we queue initialization of shared variables and actually initialize them once init() is called. For shared arrays, we require user to explicitly initialize them at some later point, but we think we can queue their initialization as well.

b) Ranks and rank IDs

We don't want to have to check for initialization when calling myrank() or ranks(). (Currently, the functions aren't implemented efficiently, but they can be implemented as just reads.) Instead, we can provide special functions to be used before main that check for and perform initialization, but tell users to otherwise use myrank() and ranks() for performance Suggestions on functions names are welcome. (myrank_preinit() and ranks_preinit()?)

c) Arbitrary code

Finally, we can provide the ability for the user to queue arbitrary code to run upon initialization, using a call such as upcxx::atinit(). We would specify that functions registered using atinit() would run sequentially in the order they are registered once UPC++ is initialized (which may be at some arbitrary point before main if initialization is implicit). If UPC++ is already initialized when atinit() is called, then the given function would run immediately.

3. upcxx::is_init() and multiple calls to upcxx::init()/finalize()

For composability, Dan suggested in issue #3 either introducing an is_init() or reference counting calls to init()/finalize(). I actually think both should be provided. is_init() is useful to determine what UPC++ operations are legal to use as defined by UPC++ initialization semantics. Allowing multiple calls to init()/finalize() would allow us to introduce implicit initialization and finalization without breaking existing code.

4. Finalization/exit semantics

Currently, calling upcxx::finalize() (whether explicitly now or implicitly in the future) results in program termination without performing the cleanup steps required by C++. In particular, C++ specifies that destructors be called on thread-local and static objects, functions registered by calling std::atexit() be called, C streams be flushed and closed, and files created with std::tmpfile be removed (see here for more details). None of these steps are done by UPC++. Moving to implicit finalization would result in an unpredictable subset of destructors and functions registered through atexit() being run before exit.

We have not been able to come up with a solution to this issue. Ideally, on normal termination, we'd call a barrier after C++ cleanup completes and then call GASNet exit. (The barrier is necessary to ensure that cleanup has completed on all ranks, since GASNet exit will cause all ranks to exit.) However, there don't appear to be any standard C++ hooks for running code after cleanup. Even if there were, we'd need some way of distinguishing between collective and non-collective termination, since calling a barrier in the latter case could lead to deadlock. However, this should be doable.

It seems like at the moment, we have to specify that the standard C++ cleanup process is not guaranteed to run in UPC++, which is unfortunate. Adding the ability to queue functions with upcxx::atexit() might help mitigate this, but it's not clear we can guarantee that such functions run in the case of non-collective termination.

5. Non-collective termination

UPC++ should provide a upcxx::exit() call for non-collective termination of a program. However, the issues with finalization occur here as well, and in fact may be worse since we wouldn't be able to use a barrier to ensure cleanup completion.

Comments (15)

  1. Yili Zheng

    Re: Finalization/exit semantics

    Should this be fixed in GASNet? Why GASNet has to kill any process in a normal shutdown? This seems to cause troubles for any C/C++ programs that require some cleanup during a normal exit.

  2. Amir Kamil reporter

    I've been looking into this in more detail, so here are some updated thoughts about these issues.

    1. Implicit calls to upcxx::init() and upcxx:finalize()

    2. Initialization semantics

    Until GASNet provides the ability to call gasnet_init() without the command-line arguments, we cannot implement implicit calls to upcxx::init(). However, we can implement implicit calls to upcxx::finalize(). This means that we currently can only allow declarations of shared arrays and variables and provide upcxx::atinit(). We may not be able to provide myrank_preinit() and ranks_preinit().

    Another useful feature to have before main is the ability to allocate in-segment memory. This includes upcxx::allocate() as well as constructing multidimensional arrays/grids, which calls allocate() implicitly. Unfortunately, I see no way of providing the former prior to initialization. Grids may be able to use the same mechanism as shared variables.

    The C/C++ specs guarantee that static variables within a compilation unit are initialized in the order in which they appear in the source. There is no standard mechanism for specifying order between compilation units. In fact, at least for GCC, ordering between compilation units is controlled by the linker/loader and not by GCC itself.

    That being said, I think we can actually guarantee initialization of UPC++ prior to user code that calls into UPC++. The trick is to declare a static variable in upcxx.h that checks if UPC++ is initialized and calls init() if it is not. In that case, we can also register an atexit() handler (manually or through a destructor) to call finalize(). C++ guarantees that exit handlers and destructors are run in the reverse order of registration or construction, so this would ensure that finalize() is called after any destructor that might use UPC++. With this trick, any code that uses UPC++ features may safely do so at any point after including upcxx.h.

    Of course, the above can only be implemented if we can call gasnet_init() without command-line arguments.

    4. Finalization/exit semantics

    Currently, upcxx::finalize() does not actually call gasnet_exit(). So UPC++ execution terminates by falling off main, and gasnet_exit() is implicitly called using an exit handler. Since this handler isn't registered until upcxx::init() is called, which must be done explicitly at the moment, it will be among the first to run post main. This means that if GASNet does not ensure C/C++ exit handlers and destructors are run, then nearly all such handlers will not actually run before termination.

    We really do need some mechanism for ensuring that C++ termination happens properly in the case of normal termination. As Yili says, I think we have to push GASNet to provide this.

  3. Yili Zheng

    I think Amir's idea about implicit init and finalize is very nice and can improve user experience significantly. We can remove init and finalize from the spec. We will no longer need myrank_preinit() kind of thingies because UPC++ will always be automatically initialized before any UPC++ feature is used. And there is no need to check is_init because UPC++ is guaranteed to be inited from the user's perspective!

    BTW, I remember that this is what Cray co-array C++ does too.

  4. Amir Kamil reporter

    In the UPC++ meeting on 7/1/15, we discussed whether or not static initialization is a collective context. The fundamental question is whether or not we can guarantee that initialization order is consistent across all processes in a job. Since all processes run the same executable, the expectation is that the compiler, linker, and loader ensure that every time the same executable is run, initialization is done in a consistent manner. However, as far as we can tell, this consistency is not guaranteed be any spec.

    The C++ spec states that initialization of different translation units is unordered, meaning that it is done in some indeterminate sequence. In addition, initialization of static class template data members is unordered, since they are defined at the same source location, unless such members are explicitly specialized. Initialization of thread-local variables in different translation units is unsequenced, meaning that they can be initialized concurrently.

    That being said, it is unlikely that a compiler would produce an initialization order that is non-deterministic (excepting thread-local variables) between program runs. Similarly, the linker and loader for statically linked object files are also unlikely to cause non-deterministic ordering. However, it’s not clear whether or not this is the case for shared objects, and dynamically loading multiple libraries may result in inconsistent ordering between them if the loader performs lazy initialization.

    The possibility of inconsistent initialization is troubling, as it breaks our entire initialization/finalization scheme and semantics for what’s allowed in an initialization context. At a minimum, we’d have to outlaw collective operations in such a context, force manual initialization and finalization of UPC++, and require that shared objects be manually initialized after entry to main. We would not be able to defer and automatically initialize shared objects, since we would have no way of aligning them and determining a consistent order ourselves. (For example, if process 0 declares shared_var<int> A and then shared_var<int> B, but process 1 does the reverse, we would not be able to detect this in the runtime and fix the order.)

    If we could partially initialize UPC++/GASNet automatically in a non-collective manner, we might be able to enable non-collective UPC++ calls, such as to ranks() or myrank(). To do anything that requires memory allocation, such as construct a multidimensional array, would require the ability to set up the registered segment, and it’s not clear to me that this can be done non-collectively in the presence of PSHM.

    The alternative is to just assume and/or require that UPC++ be initialized in a consistent order across all processes. We’d have to ensure that this is the case on all systems of interest, or that the user has some mechanism to ensure so (e.g. in the case of dynamically loading UPC++ with a lazy loader, require that the user immediately call into UPC++ after loading). Yili and Paul have been looking into whether or not it’s possible to get an inconsistent order on current systems, and here are their reports:

    Yili: A quick update about the discussion concerning the OS loader may initialize static objects in different compilations units in random order when running an executable in SPMD fashion.  I’ve been thinking about this and did experiments on both MAC OS and Linux with three different compilers, Intel, GCC and Clang, with pure C++ programs (no GASNet or UPC++).  For both statically linked executables and dynamically linked executables with shared libraries, the order of static object initialization for different processes is always identical.  Thinking it a bit more, I’m convinced that this is actually the expected behavior for most users because the same executable should give reproducible outputs when running multiple times or in multiple instances with the same input and same environment.  Even for dynamically loading use cases, e.g., in Python, I think there are a couple of solutions, but let’s focus on one problem at a time.  In summary, I think it’s reasonable to assume that the order of static object initialization of the same executable on the same platform is deterministically consistent.

    Paul: I can report that I failed in my attempts on Linux and Mac OSX to induce constructors from different shred libs to run in a order dependent on which one gets called first. This failure is additional evidence to support your view that it is safe to assume the order is deterministic.

    So it seems that at the moment, our inclination is to go forward with assuming consistent ordering of initialization across all processes. We should fully document this, however, in case inconsistent behavior arises in the future.

  5. Amir Kamil reporter

    We revisited this issue once again in the UPC++ meeting on 7/22/15. Since our last discussion, we've run into more issues with initialization of libraries that UPC++ depends on. Libraries that initialize automatically through dynamic initialization but provide no way to manually initialize are problematic, since we can't guarantee that they are initialized before UPC++ is. UPC++ itself is not prone to this problem, since we place the initializer in the upcxx.h header, ensuring that UPC++ is always initialized before use. The libraries that are causing us problems apparently place their initializers in their source rather than header files, so that their initialization is located in a different translation unit and are thus ordered arbitrarily with respect to UPC++ code.

    As a result, we decided in the UPC++ meeting to go back to manual initialization in main. While this is not ideal for users, it's the only way we can ensure that UPC++ and its dependencies are properly initialized on new systems with a minimal amount of development effort.

    Shortly after the meeting, we discovered a hole in our reasoning of initialization order when it comes to static class template members. Full details can be found at https://bitbucket.org/upcxx/upcxx/commits/559cf7dac1ab21abf9e233431d0ce19c23fd58a0#general-comments, but to summarize, the C++ spec states the following in Section 3.6.2:

    Dynamic initialization of a non-local variable with static storage duration is either ordered or unordered. Definitions of explicitly specialized class template static data members have ordered initialization. Other class template static data members (i.e., implicitly or explicitly instantiated specializations) have unordered initialization. Other non-local variables with static storage duration have ordered initialization. Variables with ordered initialization defined within a single translation unit shall be initialized in the order of their definitions in the translation unit... Otherwise, the unordered initialization of a variable is indeterminately sequenced with respect to every other dynamic initialization.

    Thus, initialization of non-specialized class template static data members is arbitrarily ordered with respect to other initialization. This means that it is legal for a C++ compiler to initialize such members before running initializers that appear before them in source order in the same translation unit. And in fact, we've observed an example of Clang doing so. So we cannot actually guarantee static class template data members that use UPC++ features are initialized after UPC++.

    The end result is that it seems we have no choice but to revert to manual initialization and finalization. This also means that we have to define what is legal before and after UPC++ is initialized in main. We've also chosen to require that shared objects, including shared_arrays and shared_vars, be manually initialized after UPC++ initialization in order to avoid the inconsistent initialization issue discussed in the previous comment. Though we haven't observed inconsistent initialization, this guarantees that we will not run into it on future platforms since it makes it the user's responsibility to properly order initialization of shared objects. And of course, initialization of shared objects should be defined as collective operations.

    As for what is legal outside of main, it is illegal to call any UPC++ function (including ranks() and myrank()) before initialization or after finalization. We expect it to be legal to declare UPC++ types, but we need to ensure that the default constructor for each type does not do anything illegal, and we need to specify for each type which constructors, if any, other than the default are legal to call outside of main. We also should audit all UPC++ types to ensure that the set of constructors they provide makes sense under the new initialization scheme and what initialization/finalization mechanisms to provide post-entry into and prior to exit from main.

  6. BrianS

    Standard-compliant static initialization is tricky, and gets trickier with dynamic libraries. For V1.0 John, Bryce and I discussed init at length.

    init and finalize need push/pop semantics. For each package that uses UPC++ there will be an init, and there are an equal number of finalize calls, such that gasnet is initialized just once, and shut down just once.

    we also need

    static bool upcxx::intialized(); static bool upcxx::finalized();

    There was a proposal to use RAII style for the upcxx runtime.

  7. Dan Bonachea

    I believe this issue is mostly resolved in the current 1.0 spec and implementation.

    The main outstanding question/proposal is whether to provide a upcxx::is_init() function, to allow utility-type code to be written generically for correct operation either inside or outside init (or to assert the caller has initialized UPCXX, if the utility has such a precondition).

  8. BrianS

    So, my experience with running future.cpp like the Programmer's Guide directs me results in upcxx::barrier getting called even though init was not called. That was because I built future.cpp like the Programmer's Guide suggests building things

    clang++ -std=c++11 $($upcxx PPFLAGS) $($upcxx LDFLAGS) $($upcxx LIBFLAGS) future.cpp -o future.exe
    

    since this compile line is including the PPFLAG -DUPCXX_BACKEND=gasnet1_seq

    The wrong util code is included, invoking upcxx::barrier and the program assert fails.

    so it seems we still need upcxx::initialized to protect such backed calls in utility code.

  9. BrianS

    proposal

    uint32 upcxx::initialized();
    

    Preconditions: None Returns the number of calls to upcxx::init() that have not been closed out by a matching call to upcxx::finalize(). For many users it is sufficient to know that the result is either zero (there is no upcxx runtime active) or non-zero (upcxx is running). The function is callable from any persona. UPC++ progress level:none

    we will need to figure out how to make this function safely reentrant.

  10. BrianS

    Re: init depth

    I was thinking it would come up more in how we configure our own tests, and when we start building utility layers on top of upcxx. For now all I would need is a bool, but I’m not sure if that is tying our hands further down the road. Our spec does discuss the multiple init/finalize nature of upcxx. If we expect users to correctly finalize programs (say, in their own library’s abort function) then they either have to know how much to really finalize, or we also need an abort function, like MPI has, or we need to allow them to throw an exception and unwind to the the levels of the call stack that can execute their own finalize as the exception propagates. Querying the scope depth should be something we put in a test program when we implement scoped initialization.

    From a consistency perspective, if we have spec'd an init/finalize scope design, then a user deserves the ability to query this state. Something akin to Scott asking for global_ptr<T>::where , although I can readily think of cases where I want to know where when a design wants to send a function to data instead of bringing the data to the function.

  11. Paul Hargrove

    Unless I am misunderstanding the intent, I think I disagree with exposing the "init depth".
    In the case of any composition of multiple code modules using upc++, one cannot assume that one's own code is the only one to have called init. So, asking how many times the library has been initialized cannot safely be used to make a corresponding number of fini calls.

    Bottom line: I think we should expose a Boolean, not a count.

  12. Log in to comment