fatal: Out of memory? mmap failed: No such device

Issue #30 resolved
Aaron Bartell created an issue

I have installed ibmichroot on a customer's machine. I created five custom chroot environments, one for each developer. Each environment is Node.js + Git.

Git worked great for a couple weeks for all developers and then two days ago it started throwing the below error message in each and every chroot environment. Including an environment that hasn't been in use since we originally created it.

fatal: Out of memory? mmap failed: No such device

I've had this issue before with other customers and it was remedied by creating chroot environments (seclusion and selection of exact binaries and libs), so now I am scratching my head on how five separate chroot environments all started getting the same Git error at the same time.

This occurs for all Git commands (i.e. git init in new folder, git status for existing repo, etc) in all the chroot environments. Further, Git is not installed outside of chroot (shouldn't make a difference, but wanted to note).

Google searches have turned up a couple things** but none of them have resolved the issue.

** - removing the ~/.gitconfig folder - git gc - git fsck

##Thoughts on how to further debug this?

Comments (61)

  1. Former user Account Deleted

    We risk public debugging by water cooler ... cool ... how about them republicans ...

    So, taken face value, git tripped over itself ... see this related link

    Possible git 32 bit is using HUGE memory model, allocated too much to heap space, as mmap/shmat and heap/alloc/new share 32 bits. You can dial down HUGE heap by using environment variable LDR_CNTRL=MAXDATA=0x20000000 (or maybe 0x10000000).

    So, git is a mmap pig (reference link). Another answer, compile git 64 bit, then use env var PASE_MAXSHR64 to let git64 take over the world with mmap (Muah-ha-ha-ha-ha!)

    Git, oh git, maybe leaking ... but ... well ... that would mean a git error ... those guys are never wrong (Muah-ha-ha-ha-ha!).

    All above fails, we need to debug, easy way, you need my help, and, access to STRSST to look into the kernel (Muah-ha-ha-ha-ha!).

  2. Former user Account Deleted

    BTW -- related link refers to different version of git out of memory. Therefore, ignoring our message about mmap, maybe git-y-up needs more heap LDR_CNTRL=MAXDATA=0x60000000 (try less heap first). Anyway, LDR_CNTRL has many settings including DSA settings to reclaim the shared object segments away form shared libx.a/thing.so (D, F segments). Aka, LDR_CNTRL experimentation may work for you, before jumping to the git 64-bit train.

  3. Aaron Bartell reporter

    Tried both heap sizes and neither fixed the issue.

    All above fails, we need to debug, easy way, you need my help, and, access to STRSST to look into the kernel (Muah-ha-ha-ha-ha!).

    How should we proceed if I can only reproduce the error on the customer's machine? Should I follow these autopsy/STRSST steps you've authored on YiPs? This is a production machine.

  4. Former user Account Deleted

    Try this setting ...

    $ export LDR_CNTRL=MAXDATA=0@DSA
    $ git (operation that fails)
    
  5. Former user Account Deleted

    Ok, something bad in git-y-up requires debug. I am going home for the weekend, ping me Monday.

  6. Aaron Bartell reporter

    ping me Monday.

    $ ping ranger.rochester.ibm.com
    

    What would you recommend for next steps?

  7. Former user Account Deleted

    Well, let's see if we can make you into a pase object code debug person (should be interesting).

    The following zzmini example demonstrates dbx stop in libc.a export syscall mmap. In your case, of course, substitute git as main program, not zzmini. As you can see, dbx stops at mmap start every time called (only once this example), then i use stepi, to instruction step (assembler step), until it reachs the actual branch to system call mmap kernal/slic (4e800420 bctr), then one more stepi and system call kernel takes over (we do not get to see the kernel), and the return is back into our main program (git or zzmini). We can see the address of the file mmap in $r3=0x30000000, meaning the mmap was successful. In your case, i would expect to see $r3=0xffffffff or -1, indicating the git mmap fails. After seeing 0xffffffff (-1), we can find the address of errno at 0x2ff22ff8, and, dump some memory via dbx 0x2ff22ff8 / X, first 4 bytes are the errno in hex PASE file /usr/include/errno.h, where we would see our ENOMEM.

    So, i am assuming git will open a ton of files, which, leads us to the eventual mmap fail. So, debug skills 101, make sure you record (on paper), each stop mmap return address $r3=0xnnnnnnnn, thereby we can watch as git slowly fills all the memory of a 32 bit process. This MAY be a loooooooooong process, wherein, you may want to fall alseep at the wheel, but stiff upper lip grasshopper, you must suffer for your answer. Or, then, agin, maybe something happens quickly, and we still have an answer.

    At this point, with all your good data, recorded. Myabe we are ah, ha, moment. Or perhaps, with dbx stuck at ENOMEM, we MAY want to look a STRSST, and peek around in the kernel. You will need my help for that ...

    Good luck new prince of c code.

    bash-4.3$ dbx zzmini
    Type 'help' for help.
    reading symbolic information ...
    (dbx) stopi in mmap  <--- set my assembler stop (object code libc.a)
    [1] stopi in glink.mmap   
    (dbx) cont
    [1] stopped in glink.mmap at 0x100006a0
    0x100006a0 (mmap)    81820058         lwz   r12,0x58(r2)
    (dbx) stepi
    stopped in glink.mmap at 0x100006a4
    0x100006a4 (mmap+0x4) 90410014         stw   r2,0x14(r1)
    (dbx) stepi
    stopped in glink.mmap at 0x100006a8
    0x100006a8 (mmap+0x8) 800c0000         lwz   r0,0x0(r12)
    (dbx) stepi
    stopped in glink.mmap at 0x100006ac
    0x100006ac (mmap+0xc) 804c0004         lwz   r2,0x4(r12)
    (dbx) stepi
    stopped in glink.mmap at 0x100006b0
    0x100006b0 (mmap+0x10) 7c0903a6       mtctr   r0
    (dbx) stepi
    stopped in glink.mmap at 0x100006b4
    0x100006b4 (mmap+0x14) 4e800420        bctr
    (dbx) stepi <--- i am going into kernel/slic (and, you do not get to see that, you, you, user)
    stopped in main at 0x10000454 <--- i am back in your program (zzmini or git) 
    0x10000454 (main+0xd4) 80410014         lwz   r2,0x14(r1)
    (dbx) registers
      $r0:0x00003608  $stkp:0x2ff22b80   $toc:0x00415e7d    $r3:0x30000000  <- $r3 where mapped file, 0xffffffff fail
      $r4:0x00000031    $r5:0x00000000    $r6:0x00000000    $r7:0x00000008  
      $r8:0x80556000    $r9:0x2200000a   $r10:0x051f8000   $r11:0x051f8f30  
     $r12:0x0000f032   $r13:0xdeadbeef   $r14:0x00000001   $r15:0x2ff22ce0  
     $r16:0x2ff22ce8   $r17:0xdeadbeef   $r18:0xdeadbeef   $r19:0xf0174f6c  
     $r20:0xdeadbeef   $r21:0xdeadbeef   $r22:0xdeadbeef   $r23:0xdeadbeef  
     $r24:0xdeadbeef   $r25:0xdeadbeef   $r26:0xdeadbeef   $r27:0x0000000a  
     $r28:0xf010cb70   $r29:0xd010cd80   $r30:0x00000003   $r31:0x100007b8  
     $iar:0x10000454   $msr:0x0002f032    $cr:0x2200000a  $link:0x10000454  
     $ctr:0x30000000   $xer:0x34000000    $mq:0x00000000  
              Condition status = 0:e 1:e 7:le 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in main at 0x10000454
    0x10000454 (main+0xd4) 80410014         lwz   r2,0x14(r1)
    (dbx) print &errno
    0x2ff22ff8 
    (dbx) 0x2ff22ff8 / 10X <-- dump errno, 1st four hex bytes error number in /usr/include/errnoh.h)
    0x2ff22ff8:  00000000 2ff22ff8 00000000 00000000
    0x2ff23008:  00000000 00000000 00000000 00000000
    0x2ff23018:  00000000 00000000
    (dbx) quit
    bash-4.3$
    
  8. Former user Account Deleted

    Oh, one trick with dbx. If your git has parameters (> git do this here thing tex), then you need to start dbx with just the main program, set the stop, then run with parameters.

    bash-4.3$ dbx zzmini
    Type 'help' for help.
    reading symbolic information ...
    (dbx) stopi in mmap
    [1] stopi in glink.mmap
    (dbx) run do this here tex
    [1] stopped in glink.mmap at 0x100006a0
    0x100006a0 (mmap)    81820058         lwz   r12,0x58(r2)
    (dbx) 
    
  9. Former user Account Deleted

    Oh one more ... dbx may have nest depth issue on your open source git project ... use -d 100.

    bash-4.3$ dbx -d 100 zzmini
    Type 'help' for help.
    reading symbolic information ...
    (dbx) stopi in mmap
    [1] stopi in glink.mmap
    (dbx) run do this here tex
    [1] stopped in glink.mmap at 0x100006a0
    0x100006a0 (mmap)    81820058         lwz   r12,0x58(r2)
    (dbx) cont
    i am not a frog.
    i am a toad.
    so there.
    sniffle.
    
    execution completed
    (dbx) quit
    
  10. Aaron Bartell reporter

    Thanks for guiding this adventure. My first hopping of grass:

    % mkdir git_dbx && cd git_dbx
    % dbx -d 100 git
    Type 'help' for help.
    cannot read git
    enter object file name (default is `a.out', ^D to exit): libc.a  <---- tried libc.a as a wild guess
    cannot read libc.a
    enter object file name (default is `a.out', ^D to exit):   <---- tried a.out 
    cannot read a.out
    

    I am reading through the AIX dbx docs to see what I can try next.

    NOTE: I am trying this on my machine, where git works, before trying on customer's machine.

  11. Former user Account Deleted

    No, no, no, bad grasshopper, you give up to soon.

    Further to previous post, it appears dbx needs git to be compiled with the -g option

    Forget -g, is ONLY source level debug, aka, wimps that need source code debugging. We are teaching 'real man' skills here, no bloody source c code, only binary level assembler debugging.

    cannot read git

    You need the full path to git (relative will not work), again, manly debugging grasshopper. Also make very sure the LIBPATH is set correctly, because these objects actually are relative.

    dbx -d 100 /opt/freeware/bin/git
    ... so on ...
    
  12. Aaron Bartell reporter

    See below line with "<-------" in it.

    % echo $PATH
    /opt/freeware/bin:/QOpenSys/usr/bin:/usr/ccs/bin:/QOpenSys/usr/bin/X11:/usr/sbin:.:/usr/bin:/home/AARON/bin
    % echo $LIBPATH
    /opt/freeware/lib
    % dbx -d 100 /opt/freeware/bin/git
    Type 'help' for help.
    reading symbolic information ...warning: no source compiled with -g
    
    (dbx) stopi in mmap
    "mmap" is not a subprogram   <------- Guessing this is a problem.  Have I set my LIBPATH correctly?
    (dbx) run init
    Initialized empty Git repository in /home/aaron/git_dbx/.git/
    
    execution completed
    (dbx)
    
  13. Former user Account Deleted

    Uf Da!!!! Silly AIX OFF_MAX large file workarounds bite average user (again), try stopi in mmap64 (below). Yes, misleading, because your git program is still 32 bit ... argh ... AIX OFF_MAX monkey business.

    bash-4.3$ dbx /QOpenSys/usr/bin/git      
    Type 'help' for help.
    reading symbolic information ...
    (dbx) stopi in mmap64
    [1] stopi in mmap64
    (dbx) where
    __start() at 0x10000128
    (dbx) 
    
  14. Aaron Bartell reporter

    Now we're cooking with oil. Below is what I run for commands.

    export PATH=/opt/freeware/bin:$PATH
    export LIBPATH=/opt/freeware/lib
    dbx -d 100 /opt/freeware/bin/git
    stopi in mmap64
    run init
    return     <------ Loop through return, registers, cont commands until error occurs.  
    registers
    cont
    print &errno        <----- now get errno and use the resulting value on next line
    0x2ff22ff8 / 10X      <-- dump errno, 1st four hex bytes error number in /usr/include/errnoh.h)
    

    Here's what the full session looks like.

    $ ssh -o ServerAliveInterval=5 aaron@ibmi
    % export PATH=/opt/freeware/bin:$PATH
    export LIBPATH=/opt/freeware/lib
    dbx -d 100 /opt/freeware/bin/git
    Type 'help' for help.
    reading symbolic information ...warning: no source compiled with -g
    
    (dbx) stopi in mmap64
    [1] stopi in mmap64
    (dbx) run init
    [1] stopped in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) registers
      $r0:0x20280400  $stkp:0x2ff22530   $toc:0x207aba88    $r3:0x00000000
      $r4:0x00000024    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f
      $r8:0x00000000    $r9:0x00000000   $r10:0x078f8000   $r11:0x078f8f30
     $r12:0x207a7a3c   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d20
     $r16:0x3003c030   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4
     $r20:0x101a90ec   $r21:0x2ff22970   $r22:0x101a90cc   $r23:0x101a90d8
     $r24:0x00000000   $r25:0x00000024   $r26:0x3003bff0   $r27:0x00000000
     $r28:0x101a8f78   $r29:0x3003bff0   $r30:0x30009be8   $r31:0x00000024
     $iar:0x20280400   $msr:0x0002f032    $cr:0x84203088  $link:0x10045554
     $ctr:0x20280400   $xer:0x04000000
              Condition status = 0:l 1:g 2:e 4:eo 6:l 7:l
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) cont
    [1] stopped in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) registers
      $r0:0x20280400  $stkp:0x2ff22530   $toc:0x207aba88    $r3:0x00000000
      $r4:0x00000035    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f
      $r8:0x00000000    $r9:0x00000000   $r10:0x078f8000   $r11:0x078f8f30
     $r12:0x207a7a3c   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d20
     $r16:0x3003c1d0   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4
     $r20:0x101a90ec   $r21:0x2ff22970   $r22:0x101a90cc   $r23:0x101a90d8
     $r24:0x00000000   $r25:0x00000035   $r26:0x3003bff0   $r27:0x00000000
     $r28:0x101a8f80   $r29:0x3003bff0   $r30:0x30009be8   $r31:0x00000035
     $iar:0x20280400   $msr:0x0002f032    $cr:0x86203088  $link:0x10045554
     $ctr:0x20280400   $xer:0x04000000
              Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:l
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) cont
    [1] stopped in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) registers
      $r0:0x20280400  $stkp:0x2ff22530   $toc:0x207aba88    $r3:0x00000000
      $r4:0x00000043    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f
      $r8:0x00000000    $r9:0x00000000   $r10:0x078f8000   $r11:0x078f8f30
     $r12:0x207a7a3c   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d20
     $r16:0x3003c650   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4
     $r20:0x101a90ec   $r21:0x2ff22970   $r22:0x101a90cc   $r23:0x101a90d8
     $r24:0x00000000   $r25:0x00000043   $r26:0x3003c610   $r27:0x00000000
     $r28:0x101a8f78   $r29:0x3003c610   $r30:0x30009be8   $r31:0x00000043
     $iar:0x20280400   $msr:0x0002f032    $cr:0x86203088  $link:0x10045554
     $ctr:0x20280400   $xer:0x04000000
              Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:l
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) cont
    [1] stopped in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) registers
      $r0:0x20280400  $stkp:0x2ff22530   $toc:0x207aba88    $r3:0x00000000
      $r4:0x0000005c    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f
      $r8:0x00000000    $r9:0x00000000   $r10:0x078f8000   $r11:0x078f8f30
     $r12:0x207a7a3c   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d20
     $r16:0x3003c610   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4
     $r20:0x101a90ec   $r21:0x2ff22970   $r22:0x101a90cc   $r23:0x101a90d8
     $r24:0x00000000   $r25:0x0000005c   $r26:0x3003bff0   $r27:0x00000000
     $r28:0x101a8f78   $r29:0x3003bff0   $r30:0x30009be8   $r31:0x0000005c
     $iar:0x20280400   $msr:0x0002f032    $cr:0x86203088  $link:0x10045554
     $ctr:0x20280400   $xer:0x04000000
              Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:l
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0x20280400 ($t1)
    0x20280400 (mmap64)    7c0802a6        mflr   r0
    (dbx) cont
    Initialized empty Git repository in /home/aaron/git_dbx/.git/
    
    execution completed
    (dbx) registers
      $r0:0x00000000  $stkp:0x2ff22b00   $toc:0x207aba88    $r3:0x00000000
      $r4:0x30034384    $r5:0x00000008    $r6:0x00000006    $r7:0x00000000
      $r8:0x100ce013    $r9:0x100ce013   $r10:0x078f8000   $r11:0x00000000
     $r12:0x2007c3bc   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d20
     $r16:0x2ff22d2c   $r17:0x00000000   $r18:0xdeadbeef   $r19:0xdeadbeef
     $r20:0xdeadbeef   $r21:0xdeadbeef   $r22:0xdeadbeef   $r23:0xdeadbeef
     $r24:0xdeadbeef   $r25:0x2073a1c0   $r26:0x20739f20   $r27:0x00000000
     $r28:0x00000000   $r29:0x300064fc   $r30:0x207ccf54   $r31:0xffffffff
     $iar:0x200828a8   $msr:0x0002f032    $cr:0x28200086  $link:0x2007c3c8
     $ctr:0x20425d00   $xer:0x04000002
              Condition status = 0:e 1:l 2:e 6:l 7:ge
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0x200828a8 ($t1)
    0x200828a8 (_exit)    81820948         lwz   r12,0x948(r2)
    

    I will now have this run on the machine where Git isn't working. Stay tuned.

  15. Former user Account Deleted

    Reminder -- we are looking for why git throws the error 'out of memory', so remember to record $r3 mmap64 locations. We want to see if git chews up all the memory ...

    (dbx) registers
      $r0:0x00003608  $stkp:0x2ff22b80   $toc:0x00415e7d    $r3:0x30000000  <- $r3 where mapped file, 0xffffffff fail
    
  16. Former user Account Deleted

    You need registers every stop, so we can see where/when starts to go wrong. No short cuts grasshopper, and, remember to write down the addresses returned from mmap until 0xffffffff is returned (-1).

    dbx>cont
    dbx>registers
    dbx>cont
    dbx>registers
    ... so on ...
    
  17. Aaron Bartell reporter

    Below is a dbx session of the failed git attempt.

    Above you focused on register $r3 but it appears $r29 and $r30 are where it falls apart for me. How does one know which register is the return code for mmap64?

    Also, do you have a good resource for me to read that would give me more insight as to why I am looking for particular things? I am reading the AIX docs but those more tell me how to use the commands and not what to watch for concerning issues. A lot of the other dbx docs are Oracle based and I am hesitant to pursue those because I don't know whether they are in the same vein as AIX or not.

    $ mkdir git_dbx
    $ cd git_dbx/
    $ export PATH=/opt/freeware/bin:$PATH
    $ export LIBPATH=/opt/freeware/lib
    $ dbx -d 100 /opt/freeware/bin/git
    Type 'help' for help.
    reading symbolic information ...warning: no source compiled with -g
    
    (dbx) stopi in mmap64
    [1] stopi in mmap64
    (dbx) run init
    [1] stopped in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) registers
      $r0:0xde2555e0  $stkp:0x2ff22520   $toc:0xf193ea50    $r3:0x00000000  
      $r4:0x00000024    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f  
      $r8:0x00000000    $r9:0x00000000   $r10:0x04131000   $r11:0x04131f30  
     $r12:0xf193aecc   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x3003d1f0   $r27:0x00000000  
     $r28:0x101a8f78   $r29:0x3003d1f0   $r30:0x30009be8   $r31:0x00000024  
     $iar:0xde2555e0   $msr:0x0002f032    $cr:0x84203089  $link:0x10045554  
     $ctr:0xde2555e0   $xer:0x04000000    $mq:0x00000000  
              Condition status = 0:l 1:g 2:e 4:eo 6:l 7:lo 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) cont
    [1] stopped in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) registers
      $r0:0xde2555e0  $stkp:0x2ff22520   $toc:0xf193ea50    $r3:0x00000000  
      $r4:0x00000024    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f  
      $r8:0x00000000    $r9:0x00000000   $r10:0x04131000   $r11:0x04131f30  
     $r12:0xf193aecc   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x3003d1f0   $r27:0x00000000  
     $r28:0x101a8f78   $r29:0xffffffff   $r30:0xffffffff   $r31:0x00000024              <---------- Falls apart?
     $iar:0xde2555e0   $msr:0x0002f032    $cr:0x33203089  $link:0x100455a0  
     $ctr:0xde2555e0   $xer:0xf4ffffff    $mq:0x00000000  
              Condition status = 0:eo 1:eo 2:e 4:eo 6:l 7:lo 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) cont
    fatal: Out of memory? mmap failed: No such device
    
    execution completed (exit code 128)
    (dbx) registers
      $r0:0x00000000  $stkp:0x2ff21f10   $toc:0xf193ea50    $r3:0x00000080  
      $r4:0x300356e8    $r5:0x00000008    $r6:0xffff8006    $r7:0x00000000  
      $r8:0x105aa04f    $r9:0x105aa04f   $r10:0x04131000   $r11:0x00000000  
     $r12:0xde08783c   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x00000000   $r27:0x300064fc  
     $r28:0x00000000   $r29:0xf18d8bc8   $r30:0xf19609a8   $r31:0xffffffff  
     $iar:0xde090750   $msr:0x0002f032    $cr:0x28203086  $link:0xde087848  
     $ctr:0xd62c2f00   $xer:0x04000002    $mq:0x00000000  
              Condition status = 0:e 1:l 2:e 4:eo 6:l 7:ge 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in mmap64 at 0xde090750 ($t1)
    0xde090750 (_exit)    81820b4c         lwz   r12,0xb4c(r2)
    (dbx) 
    
  18. Former user Account Deleted

    Almost c-code student, but, you are stopping at entrance to mmap64, aka, nothing interesting in registers until exit of mmap64 (after it runs). You need to add the dbx command return (se below). BTW -- my stepi instructions until back to caller also work, but are much less elegant than dbx return.

    [adc@oc7083008330 ~]$ ssh -X ranger@lp0364d
    Welcome to LP0364D.rchland.ibm.com
    $ bash
    bash-4.3$ which git
    /QOpenSys/usr/bin/git
    bash-4.3$ dbx -d 100 /QOpenSys/usr/bin/git
    Type 'help' for help.
    reading symbolic information ...
    (dbx) stopi in mmap64
    [1] stopi in mmap64
    (dbx) run init
    [1] stopped in mmap64 at 0xd5512360 ($t1)
    0xd5512360 (mmap64)    7c0802a6        mflr   r0
    (dbx) return <--- need to actually run mmap64, then stop back in caller
    stopped in . at 0x10035d40 ($t1)
    0x10035d40 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) registers
      $r0:0x00003608  $stkp:0x2ff22560   $toc:0xf11ff7d0    $r3:0xb0000000  <-- mmap64 address
      $r4:0x00000024    $r5:0x00000000    $r6:0x00000000    $r7:0x00000008  
      $r8:0x80557000    $r9:0x22203089   $r10:0x08ef4000   $r11:0x08ef4f30  
     $r12:0x10035d40   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d48  
     $r16:0x30043c10   $r17:0x10187dec   $r18:0x00000005   $r19:0x10187e64  
     $r20:0x10187e5c   $r21:0x2ff229a0   $r22:0x10187e3c   $r23:0x10187e48  
     $r24:0x00000000   $r25:0x00000024   $r26:0x300434f0   $r27:0x00000000  
     $r28:0x10187ce8   $r29:0x300434f0   $r30:0x3000cbd0   $r31:0x00000024  
     $iar:0x10035d40   $msr:0x0002f032    $cr:0x22203089  $link:0x10035d40  
     $ctr:0xb0000000   $xer:0xb4000000    $mq:0x00000000  
              Condition status = 0:e 1:e 2:e 4:eo 6:l 7:lo 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in . at 0x10035d40 ($t1)
    0x10035d40 (???) 80410014         lwz   r2,0x14(r1)
    (dbx)
    
  19. Aaron Bartell reporter

    Ok, I've updated the previous comment to include return in the instructions.

    For others following along you can read this SO to understand 0xffffffff.

    Here's the new log that includes invoking return:

    $ dbx -d 100 /opt/freeware/bin/git
    Type 'help' for help.
    reading symbolic information ...warning: no source compiled with -g
    
    (dbx) stopi in mmap64
    [1] stopi in mmap64
    (dbx) run init
    [1] stopped in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) return
    stopped in . at 0x10045554 ($t1)
    0x10045554 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) registers
      $r0:0x00003608  $stkp:0x2ff22520   $toc:0xf193ea50    $r3:0xffffffff    <----   -1 right out of the gates, though no "Out of memory" error (yet)
      $r4:0x00000024    $r5:0x00000017    $r6:0x3003d5f8    $r7:0x073939d8  
      $r8:0x80558000    $r9:0x86203089   $r10:0x09a5e000   $r11:0x09a5ef30  
     $r12:0x10045554   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x3003d1f0   $r27:0x00000000  
     $r28:0x101a8f78   $r29:0x3003d1f0   $r30:0x30009be8   $r31:0x00000024  
     $iar:0x10045554   $msr:0x0002f032    $cr:0x86203089  $link:0x10045554  
     $ctr:0xffffffff   $xer:0xf4ffffff    $mq:0x00000000  
              Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:lo 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in . at 0x10045554 ($t1)
    0x10045554 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) cont
    [1] stopped in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) return
    stopped in . at 0x100455a0 ($t1)
    0x100455a0 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) registers
      $r0:0x00003608  $stkp:0x2ff22520   $toc:0xf193ea50    $r3:0xffffffff  
      $r4:0x00000024    $r5:0x00000017    $r6:0x3003d5f8    $r7:0x073992f0  
      $r8:0x80556000    $r9:0x33203089   $r10:0x09a5e000   $r11:0x09a5ef30  
     $r12:0x100455a0   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x3003d1f0   $r27:0x00000000  
     $r28:0x101a8f78   $r29:0xffffffff   $r30:0xffffffff   $r31:0x00000024  
     $iar:0x100455a0   $msr:0x0002f032    $cr:0x33203089  $link:0x100455a0  
     $ctr:0xffffffff   $xer:0xf4ffffff    $mq:0x00000000  
              Condition status = 0:eo 1:eo 2:e 4:eo 6:l 7:lo 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in . at 0x100455a0 ($t1)
    0x100455a0 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) cont
    fatal: Out of memory? mmap failed: No such device
    
    execution completed (exit code 128)
    (dbx) return
    cannot continue execution
    (dbx) registers
      $r0:0x00000000  $stkp:0x2ff21f10   $toc:0xf193ea50    $r3:0x00000080  
      $r4:0x300356e8    $r5:0x00000008    $r6:0xffff8006    $r7:0x00000000  
      $r8:0x10546033    $r9:0x10546033   $r10:0x09a5e000   $r11:0x00000000  
     $r12:0xde08783c   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x00000000   $r27:0x300064fc  
     $r28:0x00000000   $r29:0xf18d8bc8   $r30:0xf19609a8   $r31:0xffffffff  
     $iar:0xde090750   $msr:0x0002f032    $cr:0x28203086  $link:0xde087848  
     $ctr:0xd62c2f00   $xer:0x04000002    $mq:0x00000000  
              Condition status = 0:e 1:l 2:e 4:eo 6:l 7:ge 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in . at 0xde090750 ($t1)
    0xde090750 (_exit)    81820b4c         lwz   r12,0xb4c(r2)
    (dbx)
    
  20. Former user Account Deleted

    You failed 0xffffffff. but, you did not dump the errno.

    (dbx) return
    stopped in . at 0x10045554 ($t1)
    0x10045554 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) registers
      $r0:0x00003608  $stkp:0x2ff22520   $toc:0xf193ea50    $r3:0xffffffff    <----   -1 right out of the gates, though no "Out of memory" error (yet)
      $r4:0x00000024    $r5:0x00000017    $r6:0x3003d5f8    $r7:0x073939d8  
      $r8:0x80558000    $r9:0x86203089   $r10:0x09a5e000   $r11:0x09a5ef30  
     $r12:0x10045554   $r13:0xdeadbeef   $r14:0x00000002   $r15:0x2ff22d18  
     $r16:0x3003d230   $r17:0x101a907c   $r18:0x00000005   $r19:0x101a90f4  
     $r20:0x101a90ec   $r21:0x2ff22960   $r22:0x101a90cc   $r23:0x101a90d8  
     $r24:0x00000000   $r25:0x00000024   $r26:0x3003d1f0   $r27:0x00000000  
     $r28:0x101a8f78   $r29:0x3003d1f0   $r30:0x30009be8   $r31:0x00000024  
     $iar:0x10045554   $msr:0x0002f032    $cr:0x86203089  $link:0x10045554  
     $ctr:0xffffffff   $xer:0xf4ffffff    $mq:0x00000000  
              Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:lo 
            [unset $noflregs to view floating point registers]
            [unset $novregs to view vector registers]
    in . at 0x10045554 ($t1)
    0x10045554 (???) 80410014         lwz   r2,0x14(r1)
    
    =====
    like this
    =====
    
    dbx) 0x2ff22ff8 / X
    xxxxxxxxx <- errno in hex 
    
  21. Former user Account Deleted

    As long as we are re-doing, let's gather some more information about what is going on, find out the name of the file causing the problem.

    bash-4.3$ dbx -d 100 /QOpenSys/usr/bin/git
    Type 'help' for help.
    reading symbolic information ...
    (dbx) stopi in open  <--- add a stop in open
    [1] stopi in open
    (dbx) stopi in mmap64 <-- also our failure location 
    [2] stopi in mmap64
    (dbx) run init
    [1] stopped in open at 0xd52fe6e0 ($t1)
    0xd52fe6e0 (open)    7c0802a6        mflr   r0
    (dbx) print (char *)$r3  <--- register 3 has the name of file to be open
    /unix 
    (dbx) return
    stopped in open64 at 0xd52fe0fc ($t1)
    0xd52fe0fc (open64+0x3c) 60000000         ori   r0,r0,0x0
    (dbx) print $r3 <--- register 3 has the return code -1 (0xffffffff), failed, but we are not done
    0xffffffff 
    (dbx) 0x2ff22ff8 / X <-- display errno hex (convert decimal and see /usr/include/errno.h)
    0x2ff22ff8:  00000002
    (dbx) cont <--- next open file, aka, we did not fail mmap64, so keep going ... and going ... until mmap64 fails
    [1] stopped in open at 0xd52fe6e0 ($t1)
    0xd52fe6e0 (open)    7c0802a6        mflr   r0
    (dbx) print (char *)$r3 <--- name of the file to be open ... so on
    /dev/null 
    (dbx) return
    stopped in open64 at 0xd52fe0fc ($t1)
    0xd52fe0fc (open64+0x3c) 60000000         ori   r0,r0,0x0
    (dbx) print $r3        
    0x0000000e 
    (dbx) cont
    [1] stopped in open at 0xd52fe6e0 ($t1)
    0xd52fe6e0 (open)    7c0802a6        mflr   r0
    (dbx) print (char *)$r3
    /opt/freeware/lib/charset.alias 
    (dbx) return
    stopped in localcharset.get_charset_aliases [/opt/freeware/lib/libiconv.a] at 0xd5809a60 ($t1)
    0xd5809a60 (get_charset_aliases+0x108) 80410014         lwz   r2,0x14(r1)
    (dbx) print $r3                
    0xffffffff 
    (dbx) 0x2ff22ff8 / X
    0x2ff22ff8:  00000002
    (dbx)
    
  22. Aaron Bartell reporter

    Ok, I ran it again and did the addtl print (char *)$r3 etc stuff. The log was big so I created a snippet(click here).

    The final errno is 13 (permission denied), though leading up to it we have the following...

    0x2ff22ff8:  00000002  <-- "not found" (expected)
    0x2ff22ff8:  00000011  <-- "try again" (not sure if expected, may be beginning of issue)
    0x2ff22ff8:  00000016  <-- "resource busy" (happens on .git/config and .git/config.lock
    0x2ff22ff8:  00000013  <-- "permission denied" (errno after 'return' from mmap64, though I assume it relates to previous open attempt?)
    

    Going down the 'permission denied' hole, here's the current state of the .git directory.

    bash-4.3$ ls -all .git/
    total 208
    drwxr-sr-x    6 myuser  0             12288 Mar  4 12:26 .
    drwxr-sr-x    3 myuser  0             12288 Mar  4 11:56 ..
    -rw-r--r--    1 myuser  0                23 Mar  4 11:56 HEAD
    drwxr-sr-x    2 myuser  0             12288 Mar  4 11:56 branches
    -rw-r--r--    1 myuser  0                36 Mar  4 11:56 config
    -rw-r--r--    1 myuser  0                 0 Mar  4 12:26 config.lock
    -rw-r--r--    1 myuser  0                73 Mar  4 11:56 description
    drwxr-sr-x    2 myuser  0             16384 Mar  4 11:56 hooks
    drwxr-sr-x    2 myuser  0             12288 Mar  4 11:56 info
    drwxr-sr-x    4 myuser  0             12288 Mar  4 11:56 refs
    

    Both config and config.lock are rw- for the owner. Same permissions exist on a machine where Git works so I wonder if "permission denied" is a misnomer.

    Guess: The "resource busy" is because one Git "thread" created config.lock and a subsequent "thread" is trying to gain access to it? Or in short, a race condition?

  23. Former user Account Deleted

    errno after 'return' from mmap64, though I assume it relates to previous open attempt?

    You did not include the file opened information in your cut/paste. I am assuming the following sequence is needed before we can speculate ...

    dbx> cont
    stop in open
    dbx> print $r3
    /this/file/caused/issue/with/mmap64 <--- missing from your cut/paste
    dbx> cont
    stop in mapp64
    dbx> return
    dbx> print $r3
    -1 of 0xffffffff
    
  24. Former user Account Deleted

    Mmm ... this is not out of memory error. Instead, is ENODEV, see below. Maybe you needed to take one more dbx>cont??? But in any event, what is up with /home/MYUSER/git_dbx/.git/config ???

    What does this look like???
    
    ls -l /home/MYUSER/git_dbx/.git/config
    
    (dbx) print (char *)$r3
    /home/MYUSER/git_dbx/.git/config 
    (dbx) return
    stopped in open64 at 0xde04137c ($t1)
    0xde04137c (open64+0x3c) 60000000         ori   r0,r0,0x0
    (dbx) print $r3
    0x0000000f <-------------- good open (not -1) file descriptor is 15/0xf, forget errno below
    (dbx) 0x2ff22ff8 / X
    0x2ff22ff8:  00000016 <-- useless, no error above
    (dbx) cont
    [2] stopped in open at 0xde041960 ($t1)
    0xde041960 (open)    7c0802a6        mflr   r0
    (dbx) print (char *)$r3
    /home/MYUSER/git_dbx/.git/config 
    (dbx) return
    stopped in open64 at 0xde04137c ($t1)
    0xde04137c (open64+0x3c) 60000000         ori   r0,r0,0x0
    (dbx) print $r3
    0x00000010 <-------------- good open file descriptor is 16/0x10 (not -1), forget errno below
    (dbx) 0x2ff22ff8 / X
    0x2ff22ff8:  00000016 <-- useless, no error above
    (dbx) cont
    [1] stopped in mmap64 at 0xde2555e0 ($t1)
    0xde2555e0 (mmap64)    7c0802a6        mflr   r0
    (dbx) print (char *)$r3
    (nil) <--- we needed all registers here, but "guessing" from previous post...
      $r0:0xde2555e0  $stkp:0x2ff22520   $toc:0xf193ea50    $r3:0x00000000  
      $r4:0x00000024    $r5:0x00000001    $r6:0x00000002    $r7:0x0000000f  
     $r8:0x00000000    $r9:0x00000000   $r10:0x04131000   $r11:0x04131f30  
    void *mmap64 (
    ($r3) addr = 0x0 (print $r3 = nil), 
    ($r4) len = 24, 
    ($r5) prot = 1, (PROT_READ)
    ($r6) flags = 2, (MAP_PRIVATE)
    ($r7) fildes = f, <-- file descriptor 0xf ... /home/MYUSER/git_dbx/.git/config (above)
    ($r8) off 0)
    (dbx) return
    stopped in . at 0x10045554 ($t1)
    0x10045554 (???) 80410014         lwz   r2,0x14(r1)
    (dbx) print $r3
    0xffffffff <--- mmap failed  ...  
    (dbx) 0x2ff22ff8 / X
    0x2ff22ff8:  00000013 <-- yes error above ... ENODEV 19 - No such device
    (dbx)
    
    bash-4.3$ grep 19 /usr/include/errno.h 
    #define ENODEV  19      /* No such device                       */
    ENODEV The fildes parameter refers to an object that cannot be mapped, such as a terminal.
    bash-4.3$ grep MAP_ /usr/include/sys/mman.h      
    #define MAP_SHARED      0x1             /* share changes */
    #define MAP_PRIVATE     0x2             /* changes are private */
    #define MAP_FIXED       0x100           /* map addr must be exactly as specified */
    #define MAP_VARIABLE    0x00            /* system can place new region */
    #define MAP_FAILED      ((void *)-1)
    #define MAP_FILE        0x00            /* map from a file */
    #define MAP_ANONYMOUS   0x10            /* map an unnamed region */
    #define MAP_ANON        0x10            /* map an unnamed region */
    #define MAP_TYPE        0xf0            /* the type of the region */
    bash-4.3$ grep PROT_ /usr/include/sys/mman.h
    #define PROT_NONE       0               /* no access to these pages */
    #define PROT_READ       0x1             /* pages can be read */
    #define PROT_WRITE      0x2             /* pages can be written */
    #define PROT_EXEC       0x4             /* pages can be executed */
    
  25. Former user Account Deleted

    We are geeks grasshopper, means, we script things when we do not want to type (make dbx a slave). I am using python 2.75, so, i think 3.4 needs parentheses around print.

    import subprocess
    command_line="/QOpenSys/usr/bin/dbx -d 100 /QOpenSys/usr/bin/git"
    args = command_line.split()
    process= subprocess.Popen(args,stdin=subprocess.PIPE,stdout=subprocess.PIPE);
    while True:
      result = process.stdout.readline()
      if result.strip():
        print result.rstrip()
      if "reading symbolic" in result:
        process.stdin.write("stopi in mmap64\n")
        process.stdin.write("stopi in open\n")
        process.stdin.write("run init\n")
      elif "stopped in open" in result:
        process.stdin.write("print (char *)$r3\n")
        process.stdin.write("return\n")
      elif "stopped in mmap" in result:
        process.stdin.write("registers\n")
        process.stdin.write("cont\n")
      elif "stopped in" in result:
        process.stdin.write("print $r3\n")
        process.stdin.write("cont\n")
      elif "execution completed" in result:
        process.stdin.write("quit\n")
        break
    
    > python dbxme.py
    
  26. Aaron Bartell reporter

    Here's the entirety of the .git directory:

    $ ls -all .git/
    total 208
    drwxr-sr-x    6 myuser  0             12288 Mar  4 12:26 .
    drwxr-sr-x    3 myuser  0             12288 Mar  4 11:56 ..
    -rw-r--r--    1 myuser  0                23 Mar  4 11:56 HEAD
    drwxr-sr-x    2 myuser  0             12288 Mar  4 11:56 branches
    -rw-r--r--    1 myuser  0                36 Mar  4 11:56 config
    -rw-r--r--    1 myuser  0                 0 Mar  4 12:26 config.lock
    -rw-r--r--    1 myuser  0                73 Mar  4 11:56 description
    drwxr-sr-x    2 myuser  0             16384 Mar  4 11:56 hooks
    drwxr-sr-x    2 myuser  0             12288 Mar  4 11:56 info
    drwxr-sr-x    4 myuser  0             12288 Mar  4 11:56 refs
    

    We are geeks grasshopper, means, we script things when we do not want to type (make dbx a slave)

    I was about to try the same with an expect script (which I am not well versed in) so the python approach is much better. I will use that from now on.

    I read through all of your last post and I believe the only question you had was the results of ls, but let me know if you want more or for me to run it again with the dbxme.py.

  27. Former user Account Deleted

    We should really try dbxme.py, because previous dump missing full registers at mmap64 fail (not all parms known). In fact, I was only "guessing" mmap file descriptor was 0xf ($r7), based on your old posts (maybe config is innocent).

  28. Aaron Bartell reporter

    Here are the results from dbxme.py run on the machine with the Git issue.

    ##Commentary

    • Line 412 has a mmap error. This is interesting because the script didn't yet reach its stop in mmap.
    • Line 453 has mmap64 stop. Where in a procedure does it stop, at the beginning, at the end? Guessing at the beginning but wanted to be sure.
    • Given the error occurring on Line 412 was before the stop in mmap64 I am wondering if we need to add a stop for a read. Curious as to whether a read is being attempted when a file is no longer available (i.e. xxxx.lock files)

    Also, here's the ls of the .git directory after running dbxme.py:

    $ ls -all ~/gittest/.git
    total 200
    drwxr-sr-x    6 myuser  0             12288 Mar  8 11:58 .
    drwxr-sr-x    3 myuser  0             12288 Mar  8 11:58 ..
    -rw-r--r--    1 myuser  0                23 Mar  8 11:58 HEAD
    drwxr-sr-x    2 myuser  0             12288 Mar  8 11:58 branches
    -rw-r--r--    1 myuser  0                36 Mar  8 11:58 config
    -rw-r--r--    1 myuser  0                73 Mar  8 11:58 description
    drwxr-sr-x    2 myuser  0             16384 Mar  8 11:58 hooks
    drwxr-sr-x    2 myuser  0             12288 Mar  8 11:58 info
    drwxr-sr-x    4 myuser  0             12288 Mar  8 11:58 refs
    
  29. Former user Account Deleted

    Nuts!!! Well, training you to debug c code. We forgot a few things in dbxme.py (below). I added sbrk, as this will include heap memory to go with our mmap memory (out of memory, could be heap or map).

    import subprocess
    command_line="/QOpenSys/usr/bin/dbx -d 100 /QOpenSys/usr/bin/git"
    args = command_line.split()
    process= subprocess.Popen(args,stdin=subprocess.PIPE,stdout=subprocess.PIPE);
    while True:
      result = process.stdout.readline()
      if result.strip():
        print result.rstrip()
      if "reading symbolic" in result:
        process.stdin.write("stopi in mmap64\n")
        process.stdin.write("stopi in open\n")
        process.stdin.write("stopi in sbrk\n")
        process.stdin.write("run init\n")
      elif "stopped in open" in result:
        process.stdin.write("print (char *)$r3\n")
        process.stdin.write("return\n")
      elif "stopped in glink.sbrk" in result:
        process.stdin.write("print $r3\n")
        process.stdin.write("return\n")
      elif "stopped in mmap" in result:
        process.stdin.write("registers\n")
        process.stdin.write("return\n")
      elif "stopped in" in result:
        process.stdin.write("print $r3\n")
        process.stdin.write("0x2ff22ff8 / 4X\n")
        process.stdin.write("cont\n")
      elif "execution completed" in result:
        process.stdin.write("quit\n")
        break
    
  30. Former user Account Deleted

    Mmm ... i dunno ... may look at the IBM i side of the file ...

    WRKLNK OBJ('/QOpenSys/ranger/home/RANGER/dbxme/.git/config')
    
    8     config                 STMF
    
    
                                   Display Attributes
    
     Object . . . . . . :   /QOpenSys/ranger/home/RANGER/dbxme/.git/config
    
     Creation date/time . . . . . . . . . . :   03/08/16  14:48:11
     Last access date/time  . . . . . . . . :   03/08/16  14:48:11
     Data change date/time  . . . . . . . . :   03/08/16  14:48:11
     Attribute change date/time . . . . . . :   03/08/16  14:48:11
    
     Size of object data in bytes . . . . . :   92
     Allocated size of object . . . . . . . :   8192
     File format  . . . . . . . . . . . . . :   *TYPE2
     Size of extended attributes  . . . . . :   0
     Storage freed  . . . . . . . . . . . . :   No
     Temporary object . . . . . . . . . . . :   No
     Disk storage option  . . . . . . . . . :   *NORMAL
     Main storage option  . . . . . . . . . :   *NORMAL
    
     Auditing value . . . . . . . . . . . . :   *NONE
    
  31. Aaron Bartell reporter

    May have found the culprit.... journaling.

    Creation date/time . . . . . . . . . . :   08/03/16  16:16:44       
    Last access date/time  . . . . . . . . :   08/03/16  16:16:44       
    Data change date/time  . . . . . . . . :   08/03/16  16:16:44       
    Attribute change date/time . . . . . . :   08/03/16  16:16:44  <-----
    
    . . .
    
    Auditing value . . . . . . . . . . . . :   *CHANGE    <----
    
     . . .
    
    Object is currently journaled  . . . . :   Yes                   
      Current or last journal  . . . . . . :   A1IJRA                
        Library  . . . . . . . . . . . . . :   MYLIB             
      Journal images . . . . . . . . . . . :   *AFTER                
      Journal entries to be omitted  . . . :   *OPNCLOSYN       
      Last journal start date/time . . . . :   08/03/16  16:16:44     <---- same time as 'Attribute change date/time'
      Partial Transactions:                                          
        Apply journaled changes required . :   No                    
        Rollback was ended . . . . . . . . :   No                    
      Starting journal receiver for apply  :                         
        Library  . . . . . . . . . . . . . :                         
        ASP Device . . . . . . . . . . . . : 
    

    I am going to run some tests on my system to see if that is the case.

  32. Aaron Bartell reporter

    ##Customer stopped replicating and now git init works as expected.

    I am now asking customer about how the vendor(n1) does replication so we learn whether it is something to do with vendor's approach or the IBM i journal feature.

    n1 - I will withhold the name to protect the (currently) innocent.

    FWIW, I tried the below (not sure if I setup correctly) and was not able to reproduce error.

    CRTJRNRCV JRNRCV(LIB1/DBX_JRN)
    
    CRTJRN JRN(LIB1/DBX_JRN) JRNRCV(LIB1/DBX_JRN)
    
    STRJRN OBJ(('/home/aaron/dbx_jrn' *INCLUDE)) JRN('/QSYS.LIB/lib1.lib/dbx_jrn.jrn') SUBTREE(*ALL)   
    
    ENDJRN OBJ(('/home/aaron/dbx_jrn')) SUBTREE(*ALL)
    
  33. Former user Account Deleted

    Yep-R-doodle ... you can NOT mmap a journal file. Hilarious, we just spent a week-or-so chasing a retentive IBM i administrator. These IBM i HA applications should come with a label like tobacco "use of this product on IFS files may kill your PASE application".

  34. Aaron Bartell reporter

    Does PowerHA for IBM i have this problem? I've looked at the documentation and it appears PowerHA does replication at a lower level than journaling, though it doesn't directly call it out from what I've seen.

    Is there a list of procedures, like mmap, that don't work with journaling? I reviewed a couple redbooks and sites but am coming up empty handed.

  35. Aaron Bartell reporter

    Further to my last question, I see mmap does document it doesn't work with journaling, so now I am wondering if there's a list of other APIs that have the same issue.

    Also, it appears we should have been given the ENOTSUP error given the mmap docs, snippet below.

    The mmap() function will fail with ENOTSUP if the file is journaled.

  36. Former user Account Deleted

    The mmap() function will fail with ENOTSUP if the file is journaled.

    PASE gives ENODEV.

    HA

    Every HA application potentially starts journal in IFS directories, which, may kill a PASE application due to mmap not allowed with IFS journal files. As far as i know, anything memory mapped file (mmap, shmat, etc.), is only prominent API failure in journal-my-IFS-world. IFS people clearly understand this issue, and, badness that is occurring in PASE kingdom (whining will not help).

    These IBM i HA applications should come with a label like tobacco "use of this product on IFS files may kill your PASE application".

    Welcome to IBM i, where administrators are king, everyone else is not. March, 8-9, 2016. The days Aaron became aware IFS journal files and mmap do not mix. Personally, i never remember, spend hours debugging (weeks for you), then remember to ask the client if they journal IFS files.

  37. Aaron Bartell reporter

    Every HA application potentially starts journal in IFS directories, which, may kill a PASE application due to mmap not allowed with IFS journal files.

    My understanding is that PowerHA replicates at the iASP level without the need for journals (though journals could still exist for SYSBAS). I am belaboring the point because we (KrengelTech) are noticing a lot more HA/DR usage over the years (we do more than open source) and given Git uses mmap, and given Git is near necessity in development and* deployment of PASE, well, this obviously is an issue if there are zero ways to do HA.

  38. Former user Account Deleted

    I am a PASE guy (and open source), go ask your HA product expert about exact technology implementations.

  39. Aaron Bartell reporter

    go ask your HA product expert about exact technology implementations.

    Will do.

    Thanks for going deep on this one. I learned a boat-load. I plan on documenting a tutorial on what we accomplished so others can learn from it.

    I am marking this issue as resolved.

  40. Chris Hird

    Aaron

    OK just read through the entire entry and see that journalling is going to be a problem for any PASE based application using mmap(). Not sure its a game breaker for most as I would hope that the majority are not going to use mmap functions against files/objects that you would want to replicate. (You should not be replicating everything, that's just bad practice).

    Anyhow there are a few ways around the issue that I know of and we have used at some clients with out HA4i product.

    1. Drop back down to object level replication which is triggered by the auditing flag (this is not journalling as we know it for logical replication and if it does affect mmap you have a bigger problem as it is something most auditors are going to require it). Problem with this approach is file locking, again alternatives to ensure they do eventually get replicated but when the change notification is fired into the audit journal it could be the file is still locked by some process.
    2. Build a process around the CPY API which could copy the object to a temporary object (seems to ignore locks in IFS) and then reverse that on the target.
    3. Investigate using rsync??

    So we can replicate IFS without journalling, the choices available are more than sufficient to make it a non issue in my mind.

    Chris...

  41. Former user Account Deleted

    Hi fellows ... few adds from a evil PASE guy.

    1) The IFS journal issue is a problem for API mmap (above), and, also the AIX API shmat.

    2) We tried out rsync in past, works just fine. You can find one on Pezl, and, IBM will likely PTF one someday.

    3) Speaking to "depth" of problem. Unfortunately, as Open Source products strive for "performance" related to files (IFS), they inevitably use memory mapped files. Worse, product may change implementation in a heart beat minor version. So, yes, i suspect journal IFS issue will 'pop up' in open source products 'randomly'.

  42. Aaron Bartell reporter

    2) We tried out rsync in past, works just fine. You can find one on Pezl, and, IBM will likely PTF one someday.

    For the archives...

    The rsync command arrived earlier this spring. Learn more here

    Here's how I implemented it for a customer that had the mmap issue because of IFS Journaling:

    $ cat /REPLICATE/replicate.sh
    #!/QOpenSys/usr/bin/sh
    
    SECONDS=0
    mkdir /REPLICATE
    echo "Replicating /QOpenSys/ibmichroot_spaces/git-server/repos"
    rsync -a --delete /QOpenSys/ibmichroot_spaces/git-server/repos /REPLICATE
    
    echo "Replicating /home"
    rsync -a --delete /home /REPLICATE
    
    echo "Replicating /www"
    rsync -a --delete /www /REPLICATE
    
  43. Chris Hird

    There is another option for those who still have SNA (need Object Connect installed as well) use the SAVRST command which does a save and restore in one command. I found it to be very slow so we built our own internal product to do the same things and it uses TCP/IP (So much faster).

  44. Former user Account Deleted

    Thanks Aaron.

    I see Kevin already PTF'd rsync. Kevin Adler is a great new face to for IBM Open Source and PASE. Incredibly talented.

    Also, Jesse Gorzinski, IBM Open Source architect is doing a fantastic job trying to work a good chroot friendly path to Open Source packaging based on our yum/rpm work last year.

    I hope you had time to meet with both of these guys at Common.

    Unapologetic plug ... i am working on a new PASE DB2 super driver (another libdb400.a). I hope to include current db2 support (slip under), and possibly all toolkit functions JSON based. If it works (expect yes), we will have a very fast alternative to XMLSERVICE (*). I am doing in Open db2sock, so there will be no more mysteries about DB2 and PASE interactions.

    (*) BTW -- I originally did XMLSERVICE as a fun RPG refresher project in my spare time. XMLSERVICE has grown way beyond original intent. I will maintatin the old XMLSERVICE. However, about time to replace it with better technology anyway.

  45. Chris Hird

    Tony

    I would like to test and help where I can. Deleted last post after I found embedded link with information. Let me know where I can help.

    Chris...

  46. Former user Account Deleted

    Outstanding! Your help testing, especially a healthy dose of meet performance goals (yours), would be fantastic. Please open an issue on db2sock, you can say something like checking new functions for performance and ease of use, whatever. We can carry an open conversation about what liked or not liked that to everyone's benefit.

    Caution: The only thing I am not allowed to do is compare this open source project to another commercial product. All fine, if you wish to do some of that compare, even write anything you like positive or negative, but we must never put good vendors or hopefully an innocent open source project into VP level war. I am just a geek, have skills, think i cn help make a better world with IBM i. I am not a VP. Ok, said my peace brother IBM i guy, let's make PASE something better.

  47. Chris Hird

    Sounds Good! Will attempt to get started ASAP.. Reviews will always be sensitive and considerate of the audience/possible audience. Making things better is the only goal I have :-)

  48. Former user Account Deleted

    Hey, another side note. I built a Javascript tool to help generate php toolkit calls from RPG source. Maybe take a look and see what you think. Yes, this is only PHP, but I am thinking of Open Sourcing the code to help build other toolkits (includin gthe new json based db2sock when finished).

    Both version are anchored at XMLSERVICE -> PHP Toolkit

    Old version rpg D spec only

    New version cobol, rpg D spec, RPG free 2 php toolkit -- added cobol and RPG free

    Aaron, if you want to open a project in litmis, we could publish the Javascript code.

    Security note: Everything runs on the browser (Javascript). The customer RPG cut/paste never goes to through server. Check the new code in your browser source/debug Javascript window, you will see the html form never leaves the browser.

  49. Aaron Bartell reporter

    Aaron, if you want to open a project in litmis, we could publish the Javascript code.

    New itoolkit-generator repo. Tony, you have admin rights. Let me know who else needs them or add people yourself.

    Did you want me to put the code in there or did you want to? I am willing, just didn't know if there were additional things to be aware of.

  50. Former user Account Deleted

    Thanks Aaron. Please put your standard LICENSE text file into the project only. I will then clone and add my Javascript code and the little index.html (only two parts, maybe split someday).

  51. Former user Account Deleted

    Chris, How much time do you have to play around with db2sock? Also when would you like to start?

    The db2sock project ... at moment i have only minimum JSON working as DB2 calls to Apache w/Basic Authentication (see php tests). This will work for any of your favourite languages Node, python, php, ruby, java, even just curl. I write code in any of these languages, is php ok?

    Also, to be clear the JSON will transport directly on a DB2 driver as well, but we have to modify the db2 driver for the language to call the new DB2 CLi 'semi-architecture' API SQL400Json. Aka, REST is slower, Apache and all, BUT direct driver call fast through SQL400Json (obviously). However, again, we would need to modify one of the drivers like php ibm_db2 to make the JSON calls directly. Again php ok (ibm_db2)?

    I changed the makefile to build both ILE (RPG) and PASE c code from same make file in a chroot. I also put a copy of the pase driver on yips (link in project). However, may be difficult to compile the ILE parts without all the gmake stuff. Do you need a little script to compile this part without make???

    Last, IF (big if), you have time soon ... I can switch back to finishing the SQL400Json calls to new toolkit. The new toolkit will look nothing like the old xmlservice. In fact 90% PASE c code with just a slim(ish) RPG stored procedure as target of SQL400Json API. However, i do not have this in the project yet (just on my pc and 400). SO ... if you have time ... i want to know when, so i can put all that stuff up into the project. Again, when?

  52. Chris Hird

    Tony

    My main focus at the moment is php. I am in the process of developing a new PHP quotation system for a client so time is a little constrained, but willing to eak out as much as I can to help. I am not coding in RPG (not a focus area of mine, I can do limited RPG programming, but I am no expert!) I favor C for my development on IBM i (ILE or PASE) because most of the time I am not working on applications where RPG would give me the benefits (C is very good at the low level stuff especially the API's and pointer stuff).

    I am OK with hacking the compiles as long as I know the parms etc. But if you have a script that would be very helpful (that's where I would go eventually).

    I can start as soon as you need me to.. I may not be able to spend days at a time but I do enjoy this enough to make long days long enough to do something. I also want to look at the cross compiler capabilities in gcc so I have lots of things to squeeze into my long days :-)

    I have a number of LPARs on my system with one dedicated to Open Source projects, chroot is there and all of the other items I need. Its kept up to date so should be a good testing ground.

    Chris...

  53. Former user Account Deleted

    I am not coding in RPG (not a focus area of mine

    No problem man. I am expert in c/C++ and RPG (and RPG free). Also good in javascript, python, php, node, ruby, perl, shell scripts, java, and many more languages you have not likely seen (i am a old dude).

    Frankly, I write some things like XMLSERVICE in RPG just so RPG people feel they are not abandoned by IBM Rochester folks. Actually, I like RPG very much, but i am trained as a c/C++ developer into 400 kernel/PASE for many, many, many years.

    I do enjoy this enough to make long days long enough to do something

    Ok. If you open an issue in the db2sock project, i can notify via append that the SQL400Json stuff is ready to test (new toolkit).

    I also want to look at the cross compiler capabilities in gcc

    Oh my! I think I saw a Linux cat! (Tweety Bird cartoon).

    Optional read ...

    I am PASE guy, so i do everything on 400 PASE. However, my laptop(s) both work and home are Linux for over twenty years (can't hardly use Windows at all, just for Turbo Tax).

    PASE gcc compile is messy ...

    Yeah, yeah, we know, messy to set up 'compile environment' with perzl chroot/pkg scripts. PASE ninja turtles back at Rochester understand (*).

    (*) Opinions are my own, do not reflect IBM plans or promises. See Jesse Gorzinski for IBM i Open Source plans. I am just a PASE geek Dude.

  54. Log in to comment