Jeff Squyres avatar Jeff Squyres committed b2dbb89 Draft

Fixes #3642: Move r28658 to v1.7 branch

---svn-pre-commit-ignore-below---

r28658 [[BR]]
Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.

There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.

cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres

r28662 [[BR]]
Tweaked the help message a bit (this is the end result of iterating on
the message in email between Mike, Ralph, Jeff).

Add this to CMR #3642 and #3643.

Comments (0)

Files changed (2)

orte/runtime/help-orte-runtime.txt

 a different location to be used (use -h to see the cmd line option), or
 simply let the system pick a default location.
 #
+[orte:session:dir:nopwname]
+Open MPI was unable to obtain the username in order to create a path
+for its required temporary directories.  This type of error is usually
+caused by a transient failure of network-based authentication services
+(e.g., LDAP or NIS failure due to network congestion), but can also be
+an indication of system misconfiguration.
+
+Please consult your system administrator about these issues and try
+again.
 #
 [orte_nidmap:too_many_nodes]
 An error occurred while trying to pack the information about the job. More nodes

orte/util/session_dir.c

     if (NULL != pwdent) {
         user = strdup(pwdent->pw_name);
     } else {
-        if (0 > asprintf(&user, "%d", uid)) {
-            return ORTE_ERR_OUT_OF_RESOURCE;
-        }
+        orte_show_help("help-orte-runtime.txt",
+                       "orte:session:dir:nopwname", true);
+        return ORTE_ERR_OUT_OF_RESOURCE;
     }
 #else 
     if (!GetUserName(info_buf, &info_buf_length)) {
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.