fatal: Out of memory? mmap failed: No such device
I have installed ibmichroot on a customer's machine. I created five custom chroot environments, one for each developer. Each environment is Node.js + Git.
Git worked great for a couple weeks for all developers and then two days ago it started throwing the below error message in each and every chroot environment. Including an environment that hasn't been in use since we originally created it.
fatal: Out of memory? mmap failed: No such device
I've had this issue before with other customers and it was remedied by creating chroot environments (seclusion and selection of exact binaries and libs), so now I am scratching my head on how five separate chroot environments all started getting the same Git error at the same time.
This occurs for all Git commands (i.e. git init
in new folder, git status
for existing repo, etc) in all the chroot environments. Further, Git is not installed outside of chroot (shouldn't make a difference, but wanted to note).
Google searches have turned up a couple things** but none of them have resolved the issue.
**
- removing the ~/.gitconfig
folder
- git gc
- git fsck
##Thoughts on how to further debug this?
Comments (61)
-
Account Deleted -
Account Deleted BTW -- related link refers to different version of git out of memory. Therefore, ignoring our message about mmap, maybe git-y-up needs more heap LDR_CNTRL=MAXDATA=0x60000000 (try less heap first). Anyway, LDR_CNTRL has many settings including DSA settings to reclaim the shared object segments away form shared libx.a/thing.so (D, F segments). Aka, LDR_CNTRL experimentation may work for you, before jumping to the git 64-bit train.
-
reporter Tried both heap sizes and neither fixed the issue.
All above fails, we need to debug, easy way, you need my help, and, access to STRSST to look into the kernel (Muah-ha-ha-ha-ha!).
How should we proceed if I can only reproduce the error on the customer's machine? Should I follow these autopsy/STRSST steps you've authored on YiPs? This is a production machine.
-
Account Deleted Try this setting ...
$ export LDR_CNTRL=MAXDATA=0@DSA $ git (operation that fails)
-
reporter Same error with
export LDR_CNTRL=MAXDATA=0@DSA
-
Account Deleted Ok, something bad in git-y-up requires debug. I am going home for the weekend, ping me Monday.
-
reporter ping me Monday.
$ ping ranger.rochester.ibm.com
What would you recommend for next steps?
-
Account Deleted Well, let's see if we can make you into a pase object code debug person (should be interesting).
The following zzmini example demonstrates dbx stop in libc.a export syscall mmap. In your case, of course, substitute git as main program, not zzmini. As you can see, dbx stops at mmap start every time called (only once this example), then i use stepi, to instruction step (assembler step), until it reachs the actual branch to system call mmap kernal/slic (4e800420 bctr), then one more stepi and system call kernel takes over (we do not get to see the kernel), and the return is back into our main program (git or zzmini). We can see the address of the file mmap in $r3=0x30000000, meaning the mmap was successful. In your case, i would expect to see $r3=0xffffffff or -1, indicating the git mmap fails. After seeing 0xffffffff (-1), we can find the address of errno at 0x2ff22ff8, and, dump some memory via dbx 0x2ff22ff8 / X, first 4 bytes are the errno in hex PASE file /usr/include/errno.h, where we would see our ENOMEM.
So, i am assuming git will open a ton of files, which, leads us to the eventual mmap fail. So, debug skills 101, make sure you record (on paper), each stop mmap return address $r3=0xnnnnnnnn, thereby we can watch as git slowly fills all the memory of a 32 bit process. This MAY be a loooooooooong process, wherein, you may want to fall alseep at the wheel, but stiff upper lip grasshopper, you must suffer for your answer. Or, then, agin, maybe something happens quickly, and we still have an answer.
At this point, with all your good data, recorded. Myabe we are ah, ha, moment. Or perhaps, with dbx stuck at ENOMEM, we MAY want to look a STRSST, and peek around in the kernel. You will need my help for that ...
Good luck new prince of c code.
bash-4.3$ dbx zzmini Type 'help' for help. reading symbolic information ... (dbx) stopi in mmap <--- set my assembler stop (object code libc.a) [1] stopi in glink.mmap (dbx) cont [1] stopped in glink.mmap at 0x100006a0 0x100006a0 (mmap) 81820058 lwz r12,0x58(r2) (dbx) stepi stopped in glink.mmap at 0x100006a4 0x100006a4 (mmap+0x4) 90410014 stw r2,0x14(r1) (dbx) stepi stopped in glink.mmap at 0x100006a8 0x100006a8 (mmap+0x8) 800c0000 lwz r0,0x0(r12) (dbx) stepi stopped in glink.mmap at 0x100006ac 0x100006ac (mmap+0xc) 804c0004 lwz r2,0x4(r12) (dbx) stepi stopped in glink.mmap at 0x100006b0 0x100006b0 (mmap+0x10) 7c0903a6 mtctr r0 (dbx) stepi stopped in glink.mmap at 0x100006b4 0x100006b4 (mmap+0x14) 4e800420 bctr (dbx) stepi <--- i am going into kernel/slic (and, you do not get to see that, you, you, user) stopped in main at 0x10000454 <--- i am back in your program (zzmini or git) 0x10000454 (main+0xd4) 80410014 lwz r2,0x14(r1) (dbx) registers $r0:0x00003608 $stkp:0x2ff22b80 $toc:0x00415e7d $r3:0x30000000 <- $r3 where mapped file, 0xffffffff fail $r4:0x00000031 $r5:0x00000000 $r6:0x00000000 $r7:0x00000008 $r8:0x80556000 $r9:0x2200000a $r10:0x051f8000 $r11:0x051f8f30 $r12:0x0000f032 $r13:0xdeadbeef $r14:0x00000001 $r15:0x2ff22ce0 $r16:0x2ff22ce8 $r17:0xdeadbeef $r18:0xdeadbeef $r19:0xf0174f6c $r20:0xdeadbeef $r21:0xdeadbeef $r22:0xdeadbeef $r23:0xdeadbeef $r24:0xdeadbeef $r25:0xdeadbeef $r26:0xdeadbeef $r27:0x0000000a $r28:0xf010cb70 $r29:0xd010cd80 $r30:0x00000003 $r31:0x100007b8 $iar:0x10000454 $msr:0x0002f032 $cr:0x2200000a $link:0x10000454 $ctr:0x30000000 $xer:0x34000000 $mq:0x00000000 Condition status = 0:e 1:e 7:le [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in main at 0x10000454 0x10000454 (main+0xd4) 80410014 lwz r2,0x14(r1) (dbx) print &errno 0x2ff22ff8 (dbx) 0x2ff22ff8 / 10X <-- dump errno, 1st four hex bytes error number in /usr/include/errnoh.h) 0x2ff22ff8: 00000000 2ff22ff8 00000000 00000000 0x2ff23008: 00000000 00000000 00000000 00000000 0x2ff23018: 00000000 00000000 (dbx) quit bash-4.3$
-
Account Deleted Oh, one trick with dbx. If your git has parameters (> git do this here thing tex), then you need to start dbx with just the main program, set the stop, then run with parameters.
bash-4.3$ dbx zzmini Type 'help' for help. reading symbolic information ... (dbx) stopi in mmap [1] stopi in glink.mmap (dbx) run do this here tex [1] stopped in glink.mmap at 0x100006a0 0x100006a0 (mmap) 81820058 lwz r12,0x58(r2) (dbx)
-
Account Deleted Oh one more ... dbx may have nest depth issue on your open source git project ... use -d 100.
bash-4.3$ dbx -d 100 zzmini Type 'help' for help. reading symbolic information ... (dbx) stopi in mmap [1] stopi in glink.mmap (dbx) run do this here tex [1] stopped in glink.mmap at 0x100006a0 0x100006a0 (mmap) 81820058 lwz r12,0x58(r2) (dbx) cont i am not a frog. i am a toad. so there. sniffle. execution completed (dbx) quit
-
reporter Thanks for guiding this adventure. My first hopping of grass:
% mkdir git_dbx && cd git_dbx % dbx -d 100 git Type 'help' for help. cannot read git enter object file name (default is `a.out', ^D to exit): libc.a <---- tried libc.a as a wild guess cannot read libc.a enter object file name (default is `a.out', ^D to exit): <---- tried a.out cannot read a.out
I am reading through the AIX dbx docs to see what I can try next.
NOTE: I am trying this on my machine, where git works, before trying on customer's machine.
-
reporter Further to previous post, it appears dbx needs git to be compiled with the
-g
option. I reviewed Perzl's build docs and it doesn't appear he uses it. -
Account Deleted No, no, no, bad grasshopper, you give up to soon.
Further to previous post, it appears dbx needs git to be compiled with the -g option
Forget -g, is ONLY source level debug, aka, wimps that need source code debugging. We are teaching 'real man' skills here, no bloody source c code, only binary level assembler debugging.
cannot read git
You need the full path to git (relative will not work), again, manly debugging grasshopper. Also make very sure the LIBPATH is set correctly, because these objects actually are relative.
dbx -d 100 /opt/freeware/bin/git ... so on ...
-
reporter See below line with "<-------" in it.
% echo $PATH /opt/freeware/bin:/QOpenSys/usr/bin:/usr/ccs/bin:/QOpenSys/usr/bin/X11:/usr/sbin:.:/usr/bin:/home/AARON/bin % echo $LIBPATH /opt/freeware/lib % dbx -d 100 /opt/freeware/bin/git Type 'help' for help. reading symbolic information ...warning: no source compiled with -g (dbx) stopi in mmap "mmap" is not a subprogram <------- Guessing this is a problem. Have I set my LIBPATH correctly? (dbx) run init Initialized empty Git repository in /home/aaron/git_dbx/.git/ execution completed (dbx)
-
Account Deleted Uf Da!!!! Silly AIX OFF_MAX large file workarounds bite average user (again), try stopi in mmap64 (below). Yes, misleading, because your git program is still 32 bit ... argh ... AIX OFF_MAX monkey business.
bash-4.3$ dbx /QOpenSys/usr/bin/git Type 'help' for help. reading symbolic information ... (dbx) stopi in mmap64 [1] stopi in mmap64 (dbx) where __start() at 0x10000128 (dbx)
-
reporter Now we're cooking with oil. Below is what I run for commands.
export PATH=/opt/freeware/bin:$PATH export LIBPATH=/opt/freeware/lib dbx -d 100 /opt/freeware/bin/git stopi in mmap64 run init return <------ Loop through return, registers, cont commands until error occurs. registers cont print &errno <----- now get errno and use the resulting value on next line 0x2ff22ff8 / 10X <-- dump errno, 1st four hex bytes error number in /usr/include/errnoh.h)
Here's what the full session looks like.
$ ssh -o ServerAliveInterval=5 aaron@ibmi % export PATH=/opt/freeware/bin:$PATH export LIBPATH=/opt/freeware/lib dbx -d 100 /opt/freeware/bin/git Type 'help' for help. reading symbolic information ...warning: no source compiled with -g (dbx) stopi in mmap64 [1] stopi in mmap64 (dbx) run init [1] stopped in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) registers $r0:0x20280400 $stkp:0x2ff22530 $toc:0x207aba88 $r3:0x00000000 $r4:0x00000024 $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x078f8000 $r11:0x078f8f30 $r12:0x207a7a3c $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d20 $r16:0x3003c030 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22970 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x3003bff0 $r27:0x00000000 $r28:0x101a8f78 $r29:0x3003bff0 $r30:0x30009be8 $r31:0x00000024 $iar:0x20280400 $msr:0x0002f032 $cr:0x84203088 $link:0x10045554 $ctr:0x20280400 $xer:0x04000000 Condition status = 0:l 1:g 2:e 4:eo 6:l 7:l [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) cont [1] stopped in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) registers $r0:0x20280400 $stkp:0x2ff22530 $toc:0x207aba88 $r3:0x00000000 $r4:0x00000035 $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x078f8000 $r11:0x078f8f30 $r12:0x207a7a3c $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d20 $r16:0x3003c1d0 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22970 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000035 $r26:0x3003bff0 $r27:0x00000000 $r28:0x101a8f80 $r29:0x3003bff0 $r30:0x30009be8 $r31:0x00000035 $iar:0x20280400 $msr:0x0002f032 $cr:0x86203088 $link:0x10045554 $ctr:0x20280400 $xer:0x04000000 Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:l [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) cont [1] stopped in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) registers $r0:0x20280400 $stkp:0x2ff22530 $toc:0x207aba88 $r3:0x00000000 $r4:0x00000043 $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x078f8000 $r11:0x078f8f30 $r12:0x207a7a3c $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d20 $r16:0x3003c650 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22970 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000043 $r26:0x3003c610 $r27:0x00000000 $r28:0x101a8f78 $r29:0x3003c610 $r30:0x30009be8 $r31:0x00000043 $iar:0x20280400 $msr:0x0002f032 $cr:0x86203088 $link:0x10045554 $ctr:0x20280400 $xer:0x04000000 Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:l [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) cont [1] stopped in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) registers $r0:0x20280400 $stkp:0x2ff22530 $toc:0x207aba88 $r3:0x00000000 $r4:0x0000005c $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x078f8000 $r11:0x078f8f30 $r12:0x207a7a3c $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d20 $r16:0x3003c610 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22970 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x0000005c $r26:0x3003bff0 $r27:0x00000000 $r28:0x101a8f78 $r29:0x3003bff0 $r30:0x30009be8 $r31:0x0000005c $iar:0x20280400 $msr:0x0002f032 $cr:0x86203088 $link:0x10045554 $ctr:0x20280400 $xer:0x04000000 Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:l [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0x20280400 ($t1) 0x20280400 (mmap64) 7c0802a6 mflr r0 (dbx) cont Initialized empty Git repository in /home/aaron/git_dbx/.git/ execution completed (dbx) registers $r0:0x00000000 $stkp:0x2ff22b00 $toc:0x207aba88 $r3:0x00000000 $r4:0x30034384 $r5:0x00000008 $r6:0x00000006 $r7:0x00000000 $r8:0x100ce013 $r9:0x100ce013 $r10:0x078f8000 $r11:0x00000000 $r12:0x2007c3bc $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d20 $r16:0x2ff22d2c $r17:0x00000000 $r18:0xdeadbeef $r19:0xdeadbeef $r20:0xdeadbeef $r21:0xdeadbeef $r22:0xdeadbeef $r23:0xdeadbeef $r24:0xdeadbeef $r25:0x2073a1c0 $r26:0x20739f20 $r27:0x00000000 $r28:0x00000000 $r29:0x300064fc $r30:0x207ccf54 $r31:0xffffffff $iar:0x200828a8 $msr:0x0002f032 $cr:0x28200086 $link:0x2007c3c8 $ctr:0x20425d00 $xer:0x04000002 Condition status = 0:e 1:l 2:e 6:l 7:ge [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0x200828a8 ($t1) 0x200828a8 (_exit) 81820948 lwz r12,0x948(r2)
I will now have this run on the machine where Git isn't working. Stay tuned.
-
Account Deleted Reminder -- we are looking for why git throws the error 'out of memory', so remember to record $r3 mmap64 locations. We want to see if git chews up all the memory ...
(dbx) registers $r0:0x00003608 $stkp:0x2ff22b80 $toc:0x00415e7d $r3:0x30000000 <- $r3 where mapped file, 0xffffffff fail
-
reporter I've updated my previous post to include
registers
at the end. -
Account Deleted You need registers every stop, so we can see where/when starts to go wrong. No short cuts grasshopper, and, remember to write down the addresses returned from mmap until 0xffffffff is returned (-1).
dbx>cont dbx>registers dbx>cont dbx>registers ... so on ...
-
reporter Sounds good. I've updated the previous post.
-
reporter Below is a
dbx
session of the failedgit
attempt.Above you focused on register
$r3
but it appears$r29
and$r30
are where it falls apart for me. How does one know which register is the return code for mmap64?Also, do you have a good resource for me to read that would give me more insight as to why I am looking for particular things? I am reading the AIX docs but those more tell me how to use the commands and not what to watch for concerning issues. A lot of the other dbx docs are Oracle based and I am hesitant to pursue those because I don't know whether they are in the same vein as AIX or not.
$ mkdir git_dbx $ cd git_dbx/ $ export PATH=/opt/freeware/bin:$PATH $ export LIBPATH=/opt/freeware/lib $ dbx -d 100 /opt/freeware/bin/git Type 'help' for help. reading symbolic information ...warning: no source compiled with -g (dbx) stopi in mmap64 [1] stopi in mmap64 (dbx) run init [1] stopped in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) registers $r0:0xde2555e0 $stkp:0x2ff22520 $toc:0xf193ea50 $r3:0x00000000 $r4:0x00000024 $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x04131000 $r11:0x04131f30 $r12:0xf193aecc $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x3003d1f0 $r27:0x00000000 $r28:0x101a8f78 $r29:0x3003d1f0 $r30:0x30009be8 $r31:0x00000024 $iar:0xde2555e0 $msr:0x0002f032 $cr:0x84203089 $link:0x10045554 $ctr:0xde2555e0 $xer:0x04000000 $mq:0x00000000 Condition status = 0:l 1:g 2:e 4:eo 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) cont [1] stopped in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) registers $r0:0xde2555e0 $stkp:0x2ff22520 $toc:0xf193ea50 $r3:0x00000000 $r4:0x00000024 $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x04131000 $r11:0x04131f30 $r12:0xf193aecc $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x3003d1f0 $r27:0x00000000 $r28:0x101a8f78 $r29:0xffffffff $r30:0xffffffff $r31:0x00000024 <---------- Falls apart? $iar:0xde2555e0 $msr:0x0002f032 $cr:0x33203089 $link:0x100455a0 $ctr:0xde2555e0 $xer:0xf4ffffff $mq:0x00000000 Condition status = 0:eo 1:eo 2:e 4:eo 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) cont fatal: Out of memory? mmap failed: No such device execution completed (exit code 128) (dbx) registers $r0:0x00000000 $stkp:0x2ff21f10 $toc:0xf193ea50 $r3:0x00000080 $r4:0x300356e8 $r5:0x00000008 $r6:0xffff8006 $r7:0x00000000 $r8:0x105aa04f $r9:0x105aa04f $r10:0x04131000 $r11:0x00000000 $r12:0xde08783c $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x00000000 $r27:0x300064fc $r28:0x00000000 $r29:0xf18d8bc8 $r30:0xf19609a8 $r31:0xffffffff $iar:0xde090750 $msr:0x0002f032 $cr:0x28203086 $link:0xde087848 $ctr:0xd62c2f00 $xer:0x04000002 $mq:0x00000000 Condition status = 0:e 1:l 2:e 4:eo 6:l 7:ge [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in mmap64 at 0xde090750 ($t1) 0xde090750 (_exit) 81820b4c lwz r12,0xb4c(r2) (dbx)
-
Account Deleted Almost c-code student, but, you are stopping at entrance to mmap64, aka, nothing interesting in registers until exit of mmap64 (after it runs). You need to add the dbx command return (se below). BTW -- my stepi instructions until back to caller also work, but are much less elegant than dbx return.
[adc@oc7083008330 ~]$ ssh -X ranger@lp0364d Welcome to LP0364D.rchland.ibm.com $ bash bash-4.3$ which git /QOpenSys/usr/bin/git bash-4.3$ dbx -d 100 /QOpenSys/usr/bin/git Type 'help' for help. reading symbolic information ... (dbx) stopi in mmap64 [1] stopi in mmap64 (dbx) run init [1] stopped in mmap64 at 0xd5512360 ($t1) 0xd5512360 (mmap64) 7c0802a6 mflr r0 (dbx) return <--- need to actually run mmap64, then stop back in caller stopped in . at 0x10035d40 ($t1) 0x10035d40 (???) 80410014 lwz r2,0x14(r1) (dbx) registers $r0:0x00003608 $stkp:0x2ff22560 $toc:0xf11ff7d0 $r3:0xb0000000 <-- mmap64 address $r4:0x00000024 $r5:0x00000000 $r6:0x00000000 $r7:0x00000008 $r8:0x80557000 $r9:0x22203089 $r10:0x08ef4000 $r11:0x08ef4f30 $r12:0x10035d40 $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d48 $r16:0x30043c10 $r17:0x10187dec $r18:0x00000005 $r19:0x10187e64 $r20:0x10187e5c $r21:0x2ff229a0 $r22:0x10187e3c $r23:0x10187e48 $r24:0x00000000 $r25:0x00000024 $r26:0x300434f0 $r27:0x00000000 $r28:0x10187ce8 $r29:0x300434f0 $r30:0x3000cbd0 $r31:0x00000024 $iar:0x10035d40 $msr:0x0002f032 $cr:0x22203089 $link:0x10035d40 $ctr:0xb0000000 $xer:0xb4000000 $mq:0x00000000 Condition status = 0:e 1:e 2:e 4:eo 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in . at 0x10035d40 ($t1) 0x10035d40 (???) 80410014 lwz r2,0x14(r1) (dbx)
-
reporter Ok, I've updated the previous comment to include
return
in the instructions.For others following along you can read this SO to understand 0xffffffff.
Here's the new log that includes invoking
return
:$ dbx -d 100 /opt/freeware/bin/git Type 'help' for help. reading symbolic information ...warning: no source compiled with -g (dbx) stopi in mmap64 [1] stopi in mmap64 (dbx) run init [1] stopped in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) return stopped in . at 0x10045554 ($t1) 0x10045554 (???) 80410014 lwz r2,0x14(r1) (dbx) registers $r0:0x00003608 $stkp:0x2ff22520 $toc:0xf193ea50 $r3:0xffffffff <---- -1 right out of the gates, though no "Out of memory" error (yet) $r4:0x00000024 $r5:0x00000017 $r6:0x3003d5f8 $r7:0x073939d8 $r8:0x80558000 $r9:0x86203089 $r10:0x09a5e000 $r11:0x09a5ef30 $r12:0x10045554 $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x3003d1f0 $r27:0x00000000 $r28:0x101a8f78 $r29:0x3003d1f0 $r30:0x30009be8 $r31:0x00000024 $iar:0x10045554 $msr:0x0002f032 $cr:0x86203089 $link:0x10045554 $ctr:0xffffffff $xer:0xf4ffffff $mq:0x00000000 Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in . at 0x10045554 ($t1) 0x10045554 (???) 80410014 lwz r2,0x14(r1) (dbx) cont [1] stopped in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) return stopped in . at 0x100455a0 ($t1) 0x100455a0 (???) 80410014 lwz r2,0x14(r1) (dbx) registers $r0:0x00003608 $stkp:0x2ff22520 $toc:0xf193ea50 $r3:0xffffffff $r4:0x00000024 $r5:0x00000017 $r6:0x3003d5f8 $r7:0x073992f0 $r8:0x80556000 $r9:0x33203089 $r10:0x09a5e000 $r11:0x09a5ef30 $r12:0x100455a0 $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x3003d1f0 $r27:0x00000000 $r28:0x101a8f78 $r29:0xffffffff $r30:0xffffffff $r31:0x00000024 $iar:0x100455a0 $msr:0x0002f032 $cr:0x33203089 $link:0x100455a0 $ctr:0xffffffff $xer:0xf4ffffff $mq:0x00000000 Condition status = 0:eo 1:eo 2:e 4:eo 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in . at 0x100455a0 ($t1) 0x100455a0 (???) 80410014 lwz r2,0x14(r1) (dbx) cont fatal: Out of memory? mmap failed: No such device execution completed (exit code 128) (dbx) return cannot continue execution (dbx) registers $r0:0x00000000 $stkp:0x2ff21f10 $toc:0xf193ea50 $r3:0x00000080 $r4:0x300356e8 $r5:0x00000008 $r6:0xffff8006 $r7:0x00000000 $r8:0x10546033 $r9:0x10546033 $r10:0x09a5e000 $r11:0x00000000 $r12:0xde08783c $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x00000000 $r27:0x300064fc $r28:0x00000000 $r29:0xf18d8bc8 $r30:0xf19609a8 $r31:0xffffffff $iar:0xde090750 $msr:0x0002f032 $cr:0x28203086 $link:0xde087848 $ctr:0xd62c2f00 $xer:0x04000002 $mq:0x00000000 Condition status = 0:e 1:l 2:e 4:eo 6:l 7:ge [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in . at 0xde090750 ($t1) 0xde090750 (_exit) 81820b4c lwz r12,0xb4c(r2) (dbx)
-
Account Deleted You failed 0xffffffff. but, you did not dump the errno.
(dbx) return stopped in . at 0x10045554 ($t1) 0x10045554 (???) 80410014 lwz r2,0x14(r1) (dbx) registers $r0:0x00003608 $stkp:0x2ff22520 $toc:0xf193ea50 $r3:0xffffffff <---- -1 right out of the gates, though no "Out of memory" error (yet) $r4:0x00000024 $r5:0x00000017 $r6:0x3003d5f8 $r7:0x073939d8 $r8:0x80558000 $r9:0x86203089 $r10:0x09a5e000 $r11:0x09a5ef30 $r12:0x10045554 $r13:0xdeadbeef $r14:0x00000002 $r15:0x2ff22d18 $r16:0x3003d230 $r17:0x101a907c $r18:0x00000005 $r19:0x101a90f4 $r20:0x101a90ec $r21:0x2ff22960 $r22:0x101a90cc $r23:0x101a90d8 $r24:0x00000000 $r25:0x00000024 $r26:0x3003d1f0 $r27:0x00000000 $r28:0x101a8f78 $r29:0x3003d1f0 $r30:0x30009be8 $r31:0x00000024 $iar:0x10045554 $msr:0x0002f032 $cr:0x86203089 $link:0x10045554 $ctr:0xffffffff $xer:0xf4ffffff $mq:0x00000000 Condition status = 0:l 1:ge 2:e 4:eo 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] in . at 0x10045554 ($t1) 0x10045554 (???) 80410014 lwz r2,0x14(r1) ===== like this ===== dbx) 0x2ff22ff8 / X xxxxxxxxx <- errno in hex
-
Account Deleted As long as we are re-doing, let's gather some more information about what is going on, find out the name of the file causing the problem.
bash-4.3$ dbx -d 100 /QOpenSys/usr/bin/git Type 'help' for help. reading symbolic information ... (dbx) stopi in open <--- add a stop in open [1] stopi in open (dbx) stopi in mmap64 <-- also our failure location [2] stopi in mmap64 (dbx) run init [1] stopped in open at 0xd52fe6e0 ($t1) 0xd52fe6e0 (open) 7c0802a6 mflr r0 (dbx) print (char *)$r3 <--- register 3 has the name of file to be open /unix (dbx) return stopped in open64 at 0xd52fe0fc ($t1) 0xd52fe0fc (open64+0x3c) 60000000 ori r0,r0,0x0 (dbx) print $r3 <--- register 3 has the return code -1 (0xffffffff), failed, but we are not done 0xffffffff (dbx) 0x2ff22ff8 / X <-- display errno hex (convert decimal and see /usr/include/errno.h) 0x2ff22ff8: 00000002 (dbx) cont <--- next open file, aka, we did not fail mmap64, so keep going ... and going ... until mmap64 fails [1] stopped in open at 0xd52fe6e0 ($t1) 0xd52fe6e0 (open) 7c0802a6 mflr r0 (dbx) print (char *)$r3 <--- name of the file to be open ... so on /dev/null (dbx) return stopped in open64 at 0xd52fe0fc ($t1) 0xd52fe0fc (open64+0x3c) 60000000 ori r0,r0,0x0 (dbx) print $r3 0x0000000e (dbx) cont [1] stopped in open at 0xd52fe6e0 ($t1) 0xd52fe6e0 (open) 7c0802a6 mflr r0 (dbx) print (char *)$r3 /opt/freeware/lib/charset.alias (dbx) return stopped in localcharset.get_charset_aliases [/opt/freeware/lib/libiconv.a] at 0xd5809a60 ($t1) 0xd5809a60 (get_charset_aliases+0x108) 80410014 lwz r2,0x14(r1) (dbx) print $r3 0xffffffff (dbx) 0x2ff22ff8 / X 0x2ff22ff8: 00000002 (dbx)
-
reporter Ok, I ran it again and did the addtl
print (char *)$r3
etc stuff. The log was big so I created a snippet(click here).The final errno is 13 (permission denied), though leading up to it we have the following...
0x2ff22ff8: 00000002 <-- "not found" (expected) 0x2ff22ff8: 00000011 <-- "try again" (not sure if expected, may be beginning of issue) 0x2ff22ff8: 00000016 <-- "resource busy" (happens on .git/config and .git/config.lock 0x2ff22ff8: 00000013 <-- "permission denied" (errno after 'return' from mmap64, though I assume it relates to previous open attempt?)
Going down the 'permission denied' hole, here's the current state of the
.git
directory.bash-4.3$ ls -all .git/ total 208 drwxr-sr-x 6 myuser 0 12288 Mar 4 12:26 . drwxr-sr-x 3 myuser 0 12288 Mar 4 11:56 .. -rw-r--r-- 1 myuser 0 23 Mar 4 11:56 HEAD drwxr-sr-x 2 myuser 0 12288 Mar 4 11:56 branches -rw-r--r-- 1 myuser 0 36 Mar 4 11:56 config -rw-r--r-- 1 myuser 0 0 Mar 4 12:26 config.lock -rw-r--r-- 1 myuser 0 73 Mar 4 11:56 description drwxr-sr-x 2 myuser 0 16384 Mar 4 11:56 hooks drwxr-sr-x 2 myuser 0 12288 Mar 4 11:56 info drwxr-sr-x 4 myuser 0 12288 Mar 4 11:56 refs
Both
config
andconfig.lock
arerw-
for the owner. Same permissions exist on a machine where Git works so I wonder if "permission denied" is a misnomer.Guess: The "resource busy" is because one Git "thread" created
config.lock
and a subsequent "thread" is trying to gain access to it? Or in short, a race condition? -
Account Deleted errno after 'return' from mmap64, though I assume it relates to previous open attempt?
You did not include the file opened information in your cut/paste. I am assuming the following sequence is needed before we can speculate ...
dbx> cont stop in open dbx> print $r3 /this/file/caused/issue/with/mmap64 <--- missing from your cut/paste dbx> cont stop in mapp64 dbx> return dbx> print $r3 -1 of 0xffffffff
-
reporter I believe it is right here. Specifically,
/home/MYUSER/git_dbx/.git/config
.What am I missing?
-
Account Deleted Mmm ... this is not out of memory error. Instead, is ENODEV, see below. Maybe you needed to take one more dbx>cont??? But in any event, what is up with /home/MYUSER/git_dbx/.git/config ???
What does this look like??? ls -l /home/MYUSER/git_dbx/.git/config
(dbx) print (char *)$r3 /home/MYUSER/git_dbx/.git/config (dbx) return stopped in open64 at 0xde04137c ($t1) 0xde04137c (open64+0x3c) 60000000 ori r0,r0,0x0 (dbx) print $r3 0x0000000f <-------------- good open (not -1) file descriptor is 15/0xf, forget errno below (dbx) 0x2ff22ff8 / X 0x2ff22ff8: 00000016 <-- useless, no error above (dbx) cont [2] stopped in open at 0xde041960 ($t1) 0xde041960 (open) 7c0802a6 mflr r0 (dbx) print (char *)$r3 /home/MYUSER/git_dbx/.git/config (dbx) return stopped in open64 at 0xde04137c ($t1) 0xde04137c (open64+0x3c) 60000000 ori r0,r0,0x0 (dbx) print $r3 0x00000010 <-------------- good open file descriptor is 16/0x10 (not -1), forget errno below (dbx) 0x2ff22ff8 / X 0x2ff22ff8: 00000016 <-- useless, no error above (dbx) cont [1] stopped in mmap64 at 0xde2555e0 ($t1) 0xde2555e0 (mmap64) 7c0802a6 mflr r0 (dbx) print (char *)$r3 (nil) <--- we needed all registers here, but "guessing" from previous post... $r0:0xde2555e0 $stkp:0x2ff22520 $toc:0xf193ea50 $r3:0x00000000 $r4:0x00000024 $r5:0x00000001 $r6:0x00000002 $r7:0x0000000f $r8:0x00000000 $r9:0x00000000 $r10:0x04131000 $r11:0x04131f30 void *mmap64 ( ($r3) addr = 0x0 (print $r3 = nil), ($r4) len = 24, ($r5) prot = 1, (PROT_READ) ($r6) flags = 2, (MAP_PRIVATE) ($r7) fildes = f, <-- file descriptor 0xf ... /home/MYUSER/git_dbx/.git/config (above) ($r8) off 0) (dbx) return stopped in . at 0x10045554 ($t1) 0x10045554 (???) 80410014 lwz r2,0x14(r1) (dbx) print $r3 0xffffffff <--- mmap failed ... (dbx) 0x2ff22ff8 / X 0x2ff22ff8: 00000013 <-- yes error above ... ENODEV 19 - No such device (dbx) bash-4.3$ grep 19 /usr/include/errno.h #define ENODEV 19 /* No such device */ ENODEV The fildes parameter refers to an object that cannot be mapped, such as a terminal. bash-4.3$ grep MAP_ /usr/include/sys/mman.h #define MAP_SHARED 0x1 /* share changes */ #define MAP_PRIVATE 0x2 /* changes are private */ #define MAP_FIXED 0x100 /* map addr must be exactly as specified */ #define MAP_VARIABLE 0x00 /* system can place new region */ #define MAP_FAILED ((void *)-1) #define MAP_FILE 0x00 /* map from a file */ #define MAP_ANONYMOUS 0x10 /* map an unnamed region */ #define MAP_ANON 0x10 /* map an unnamed region */ #define MAP_TYPE 0xf0 /* the type of the region */ bash-4.3$ grep PROT_ /usr/include/sys/mman.h #define PROT_NONE 0 /* no access to these pages */ #define PROT_READ 0x1 /* pages can be read */ #define PROT_WRITE 0x2 /* pages can be written */ #define PROT_EXEC 0x4 /* pages can be executed */
-
Account Deleted We are geeks grasshopper, means, we script things when we do not want to type (make dbx a slave). I am using python 2.75, so, i think 3.4 needs parentheses around print.
import subprocess command_line="/QOpenSys/usr/bin/dbx -d 100 /QOpenSys/usr/bin/git" args = command_line.split() process= subprocess.Popen(args,stdin=subprocess.PIPE,stdout=subprocess.PIPE); while True: result = process.stdout.readline() if result.strip(): print result.rstrip() if "reading symbolic" in result: process.stdin.write("stopi in mmap64\n") process.stdin.write("stopi in open\n") process.stdin.write("run init\n") elif "stopped in open" in result: process.stdin.write("print (char *)$r3\n") process.stdin.write("return\n") elif "stopped in mmap" in result: process.stdin.write("registers\n") process.stdin.write("cont\n") elif "stopped in" in result: process.stdin.write("print $r3\n") process.stdin.write("cont\n") elif "execution completed" in result: process.stdin.write("quit\n") break > python dbxme.py
-
reporter Here's the entirety of the
.git
directory:$ ls -all .git/ total 208 drwxr-sr-x 6 myuser 0 12288 Mar 4 12:26 . drwxr-sr-x 3 myuser 0 12288 Mar 4 11:56 .. -rw-r--r-- 1 myuser 0 23 Mar 4 11:56 HEAD drwxr-sr-x 2 myuser 0 12288 Mar 4 11:56 branches -rw-r--r-- 1 myuser 0 36 Mar 4 11:56 config -rw-r--r-- 1 myuser 0 0 Mar 4 12:26 config.lock -rw-r--r-- 1 myuser 0 73 Mar 4 11:56 description drwxr-sr-x 2 myuser 0 16384 Mar 4 11:56 hooks drwxr-sr-x 2 myuser 0 12288 Mar 4 11:56 info drwxr-sr-x 4 myuser 0 12288 Mar 4 11:56 refs
We are geeks grasshopper, means, we script things when we do not want to type (make dbx a slave)
I was about to try the same with an
expect
script (which I am not well versed in) so the python approach is much better. I will use that from now on.I read through all of your last post and I believe the only question you had was the results of
ls
, but let me know if you want more or for me to run it again with thedbxme.py
. -
Account Deleted We should really try dbxme.py, because previous dump missing full registers at mmap64 fail (not all parms known). In fact, I was only "guessing" mmap file descriptor was 0xf ($r7), based on your old posts (maybe config is innocent).
-
reporter Here are the results from
dbxme.py
run on the machine with the Git issue.##Commentary
- Line 412 has a
mmap
error. This is interesting because the script didn't yet reach its stop in mmap. - Line 453 has
mmap64
stop. Where in a procedure does it stop, at the beginning, at the end? Guessing at the beginning but wanted to be sure. - Given the error occurring on Line 412 was before the stop in
mmap64
I am wondering if we need to add a stop for aread
. Curious as to whether a read is being attempted when a file is no longer available (i.e.xxxx.lock
files)
Also, here's the
ls
of the.git
directory after runningdbxme.py
:$ ls -all ~/gittest/.git total 200 drwxr-sr-x 6 myuser 0 12288 Mar 8 11:58 . drwxr-sr-x 3 myuser 0 12288 Mar 8 11:58 .. -rw-r--r-- 1 myuser 0 23 Mar 8 11:58 HEAD drwxr-sr-x 2 myuser 0 12288 Mar 8 11:58 branches -rw-r--r-- 1 myuser 0 36 Mar 8 11:58 config -rw-r--r-- 1 myuser 0 73 Mar 8 11:58 description drwxr-sr-x 2 myuser 0 16384 Mar 8 11:58 hooks drwxr-sr-x 2 myuser 0 12288 Mar 8 11:58 info drwxr-sr-x 4 myuser 0 12288 Mar 8 11:58 refs
- Line 412 has a
-
Account Deleted Nuts!!! Well, training you to debug c code. We forgot a few things in dbxme.py (below). I added sbrk, as this will include heap memory to go with our mmap memory (out of memory, could be heap or map).
import subprocess command_line="/QOpenSys/usr/bin/dbx -d 100 /QOpenSys/usr/bin/git" args = command_line.split() process= subprocess.Popen(args,stdin=subprocess.PIPE,stdout=subprocess.PIPE); while True: result = process.stdout.readline() if result.strip(): print result.rstrip() if "reading symbolic" in result: process.stdin.write("stopi in mmap64\n") process.stdin.write("stopi in open\n") process.stdin.write("stopi in sbrk\n") process.stdin.write("run init\n") elif "stopped in open" in result: process.stdin.write("print (char *)$r3\n") process.stdin.write("return\n") elif "stopped in glink.sbrk" in result: process.stdin.write("print $r3\n") process.stdin.write("return\n") elif "stopped in mmap" in result: process.stdin.write("registers\n") process.stdin.write("return\n") elif "stopped in" in result: process.stdin.write("print $r3\n") process.stdin.write("0x2ff22ff8 / 4X\n") process.stdin.write("cont\n") elif "execution completed" in result: process.stdin.write("quit\n") break
-
reporter New output with
sbrk
mods todbxme.py
. -
Account Deleted Mmm ... i dunno ... may look at the IBM i side of the file ...
WRKLNK OBJ('/QOpenSys/ranger/home/RANGER/dbxme/.git/config') 8 config STMF Display Attributes Object . . . . . . : /QOpenSys/ranger/home/RANGER/dbxme/.git/config Creation date/time . . . . . . . . . . : 03/08/16 14:48:11 Last access date/time . . . . . . . . : 03/08/16 14:48:11 Data change date/time . . . . . . . . : 03/08/16 14:48:11 Attribute change date/time . . . . . . : 03/08/16 14:48:11 Size of object data in bytes . . . . . : 92 Allocated size of object . . . . . . . : 8192 File format . . . . . . . . . . . . . : *TYPE2 Size of extended attributes . . . . . : 0 Storage freed . . . . . . . . . . . . : No Temporary object . . . . . . . . . . . : No Disk storage option . . . . . . . . . : *NORMAL Main storage option . . . . . . . . . : *NORMAL Auditing value . . . . . . . . . . . . : *NONE
-
reporter May have found the culprit.... journaling.
Creation date/time . . . . . . . . . . : 08/03/16 16:16:44 Last access date/time . . . . . . . . : 08/03/16 16:16:44 Data change date/time . . . . . . . . : 08/03/16 16:16:44 Attribute change date/time . . . . . . : 08/03/16 16:16:44 <----- . . . Auditing value . . . . . . . . . . . . : *CHANGE <---- . . . Object is currently journaled . . . . : Yes Current or last journal . . . . . . : A1IJRA Library . . . . . . . . . . . . . : MYLIB Journal images . . . . . . . . . . . : *AFTER Journal entries to be omitted . . . : *OPNCLOSYN Last journal start date/time . . . . : 08/03/16 16:16:44 <---- same time as 'Attribute change date/time' Partial Transactions: Apply journaled changes required . : No Rollback was ended . . . . . . . . : No Starting journal receiver for apply : Library . . . . . . . . . . . . . : ASP Device . . . . . . . . . . . . :
I am going to run some tests on my system to see if that is the case.
-
reporter ##Customer stopped replicating and now
git init
works as expected.I am now asking customer about how the vendor(n1) does replication so we learn whether it is something to do with vendor's approach or the IBM i journal feature.
n1 - I will withhold the name to protect the (currently) innocent.
FWIW, I tried the below (not sure if I setup correctly) and was not able to reproduce error.
CRTJRNRCV JRNRCV(LIB1/DBX_JRN) CRTJRN JRN(LIB1/DBX_JRN) JRNRCV(LIB1/DBX_JRN) STRJRN OBJ(('/home/aaron/dbx_jrn' *INCLUDE)) JRN('/QSYS.LIB/lib1.lib/dbx_jrn.jrn') SUBTREE(*ALL) ENDJRN OBJ(('/home/aaron/dbx_jrn')) SUBTREE(*ALL)
-
Account Deleted Yep-R-doodle ... you can NOT mmap a journal file. Hilarious, we just spent a week-or-so chasing a retentive IBM i administrator. These IBM i HA applications should come with a label like tobacco "use of this product on IFS files may kill your PASE application".
-
reporter Does PowerHA for IBM i have this problem? I've looked at the documentation and it appears PowerHA does replication at a lower level than journaling, though it doesn't directly call it out from what I've seen.
Is there a list of procedures, like
mmap
, that don't work with journaling? I reviewed a couple redbooks and sites but am coming up empty handed. -
reporter Further to my last question, I see mmap does document it doesn't work with journaling, so now I am wondering if there's a list of other APIs that have the same issue.
Also, it appears we should have been given the
ENOTSUP
error given the mmap docs, snippet below.The mmap() function will fail with ENOTSUP if the file is journaled.
-
Account Deleted The mmap() function will fail with ENOTSUP if the file is journaled.
PASE gives ENODEV.
HA
Every HA application potentially starts journal in IFS directories, which, may kill a PASE application due to mmap not allowed with IFS journal files. As far as i know, anything memory mapped file (mmap, shmat, etc.), is only prominent API failure in journal-my-IFS-world. IFS people clearly understand this issue, and, badness that is occurring in PASE kingdom (whining will not help).
These IBM i HA applications should come with a label like tobacco "use of this product on IFS files may kill your PASE application".
Welcome to IBM i, where administrators are king, everyone else is not. March, 8-9, 2016. The days Aaron became aware IFS journal files and mmap do not mix. Personally, i never remember, spend hours debugging (weeks for you), then remember to ask the client if they journal IFS files.
-
reporter Every HA application potentially starts journal in IFS directories, which, may kill a PASE application due to mmap not allowed with IFS journal files.
My understanding is that PowerHA replicates at the iASP level without the need for journals (though journals could still exist for SYSBAS). I am belaboring the point because we (KrengelTech) are noticing a lot more HA/DR usage over the years (we do more than open source) and given Git uses mmap, and given Git is near necessity in development and* deployment of PASE, well, this obviously is an issue if there are zero ways to do HA.
-
Account Deleted I am a PASE guy (and open source), go ask your HA product expert about exact technology implementations.
-
reporter - changed status to resolved
go ask your HA product expert about exact technology implementations.
Will do.
Thanks for going deep on this one. I learned a boat-load. I plan on documenting a tutorial on what we accomplished so others can learn from it.
I am marking this issue as resolved.
-
Aaron
OK just read through the entire entry and see that journalling is going to be a problem for any PASE based application using mmap(). Not sure its a game breaker for most as I would hope that the majority are not going to use mmap functions against files/objects that you would want to replicate. (You should not be replicating everything, that's just bad practice).
Anyhow there are a few ways around the issue that I know of and we have used at some clients with out HA4i product.
- Drop back down to object level replication which is triggered by the auditing flag (this is not journalling as we know it for logical replication and if it does affect mmap you have a bigger problem as it is something most auditors are going to require it). Problem with this approach is file locking, again alternatives to ensure they do eventually get replicated but when the change notification is fired into the audit journal it could be the file is still locked by some process.
- Build a process around the CPY API which could copy the object to a temporary object (seems to ignore locks in IFS) and then reverse that on the target.
- Investigate using rsync??
So we can replicate IFS without journalling, the choices available are more than sufficient to make it a non issue in my mind.
Chris...
-
Account Deleted Hi fellows ... few adds from a evil PASE guy.
1) The IFS journal issue is a problem for API mmap (above), and, also the AIX API shmat.
2) We tried out rsync in past, works just fine. You can find one on Pezl, and, IBM will likely PTF one someday.
3) Speaking to "depth" of problem. Unfortunately, as Open Source products strive for "performance" related to files (IFS), they inevitably use memory mapped files. Worse, product may change implementation in a heart beat minor version. So, yes, i suspect journal IFS issue will 'pop up' in open source products 'randomly'.
-
reporter 2) We tried out rsync in past, works just fine. You can find one on Pezl, and, IBM will likely PTF one someday.
For the archives...
The
rsync
command arrived earlier this spring. Learn more hereHere's how I implemented it for a customer that had the mmap issue because of IFS Journaling:
$ cat /REPLICATE/replicate.sh #!/QOpenSys/usr/bin/sh SECONDS=0 mkdir /REPLICATE echo "Replicating /QOpenSys/ibmichroot_spaces/git-server/repos" rsync -a --delete /QOpenSys/ibmichroot_spaces/git-server/repos /REPLICATE echo "Replicating /home" rsync -a --delete /home /REPLICATE echo "Replicating /www" rsync -a --delete /www /REPLICATE
-
There is another option for those who still have SNA (need Object Connect installed as well) use the SAVRST command which does a save and restore in one command. I found it to be very slow so we built our own internal product to do the same things and it uses TCP/IP (So much faster).
-
Account Deleted Thanks Aaron.
I see Kevin already PTF'd rsync. Kevin Adler is a great new face to for IBM Open Source and PASE. Incredibly talented.
Also, Jesse Gorzinski, IBM Open Source architect is doing a fantastic job trying to work a good chroot friendly path to Open Source packaging based on our yum/rpm work last year.
I hope you had time to meet with both of these guys at Common.
Unapologetic plug ... i am working on a new PASE DB2 super driver (another libdb400.a). I hope to include current db2 support (slip under), and possibly all toolkit functions JSON based. If it works (expect yes), we will have a very fast alternative to XMLSERVICE (*). I am doing in Open db2sock, so there will be no more mysteries about DB2 and PASE interactions.
(*) BTW -- I originally did XMLSERVICE as a fun RPG refresher project in my spare time. XMLSERVICE has grown way beyond original intent. I will maintatin the old XMLSERVICE. However, about time to replace it with better technology anyway.
-
Tony
I would like to test and help where I can. Deleted last post after I found embedded link with information. Let me know where I can help.
Chris...
-
Account Deleted Outstanding! Your help testing, especially a healthy dose of meet performance goals (yours), would be fantastic. Please open an issue on db2sock, you can say something like checking new functions for performance and ease of use, whatever. We can carry an open conversation about what liked or not liked that to everyone's benefit.
Caution: The only thing I am not allowed to do is compare this open source project to another commercial product. All fine, if you wish to do some of that compare, even write anything you like positive or negative, but we must never put good vendors or hopefully an innocent open source project into VP level war. I am just a geek, have skills, think i cn help make a better world with IBM i. I am not a VP. Ok, said my peace brother IBM i guy, let's make PASE something better.
-
Sounds Good! Will attempt to get started ASAP.. Reviews will always be sensitive and considerate of the audience/possible audience. Making things better is the only goal I have :-)
-
Account Deleted Hey, another side note. I built a Javascript tool to help generate php toolkit calls from RPG source. Maybe take a look and see what you think. Yes, this is only PHP, but I am thinking of Open Sourcing the code to help build other toolkits (includin gthe new json based db2sock when finished).
Both version are anchored at XMLSERVICE -> PHP Toolkit
Old version rpg D spec only
New version cobol, rpg D spec, RPG free 2 php toolkit -- added cobol and RPG free
Aaron, if you want to open a project in litmis, we could publish the Javascript code.
Security note: Everything runs on the browser (Javascript). The customer RPG cut/paste never goes to through server. Check the new code in your browser source/debug Javascript window, you will see the html form never leaves the browser.
-
reporter Aaron, if you want to open a project in litmis, we could publish the Javascript code.
New itoolkit-generator repo. Tony, you have admin rights. Let me know who else needs them or add people yourself.
Did you want me to put the code in there or did you want to? I am willing, just didn't know if there were additional things to be aware of.
-
Account Deleted Thanks Aaron. Please put your standard LICENSE text file into the project only. I will then clone and add my Javascript code and the little index.html (only two parts, maybe split someday).
-
reporter Please put your standard LICENSE text file into the project only.
Done.
-
Account Deleted Chris, How much time do you have to play around with db2sock? Also when would you like to start?
The db2sock project ... at moment i have only minimum JSON working as DB2 calls to Apache w/Basic Authentication (see php tests). This will work for any of your favourite languages Node, python, php, ruby, java, even just curl. I write code in any of these languages, is php ok?
Also, to be clear the JSON will transport directly on a DB2 driver as well, but we have to modify the db2 driver for the language to call the new DB2 CLi 'semi-architecture' API SQL400Json. Aka, REST is slower, Apache and all, BUT direct driver call fast through SQL400Json (obviously). However, again, we would need to modify one of the drivers like php ibm_db2 to make the JSON calls directly. Again php ok (ibm_db2)?
I changed the makefile to build both ILE (RPG) and PASE c code from same make file in a chroot. I also put a copy of the pase driver on yips (link in project). However, may be difficult to compile the ILE parts without all the gmake stuff. Do you need a little script to compile this part without make???
Last, IF (big if), you have time soon ... I can switch back to finishing the SQL400Json calls to new toolkit. The new toolkit will look nothing like the old xmlservice. In fact 90% PASE c code with just a slim(ish) RPG stored procedure as target of SQL400Json API. However, i do not have this in the project yet (just on my pc and 400). SO ... if you have time ... i want to know when, so i can put all that stuff up into the project. Again, when?
-
Account Deleted Aaron, I populated new project itoolkit-generator with the Javascript mapping RPG/Free/Cobol to PHP toolkit.
-
Tony
My main focus at the moment is php. I am in the process of developing a new PHP quotation system for a client so time is a little constrained, but willing to eak out as much as I can to help. I am not coding in RPG (not a focus area of mine, I can do limited RPG programming, but I am no expert!) I favor C for my development on IBM i (ILE or PASE) because most of the time I am not working on applications where RPG would give me the benefits (C is very good at the low level stuff especially the API's and pointer stuff).
I am OK with hacking the compiles as long as I know the parms etc. But if you have a script that would be very helpful (that's where I would go eventually).
I can start as soon as you need me to.. I may not be able to spend days at a time but I do enjoy this enough to make long days long enough to do something. I also want to look at the cross compiler capabilities in gcc so I have lots of things to squeeze into my long days :-)
I have a number of LPARs on my system with one dedicated to Open Source projects, chroot is there and all of the other items I need. Its kept up to date so should be a good testing ground.
Chris...
-
Account Deleted I am not coding in RPG (not a focus area of mine
No problem man. I am expert in c/C++ and RPG (and RPG free). Also good in javascript, python, php, node, ruby, perl, shell scripts, java, and many more languages you have not likely seen (i am a old dude).
Frankly, I write some things like XMLSERVICE in RPG just so RPG people feel they are not abandoned by IBM Rochester folks. Actually, I like RPG very much, but i am trained as a c/C++ developer into 400 kernel/PASE for many, many, many years.
I do enjoy this enough to make long days long enough to do something
Ok. If you open an issue in the db2sock project, i can notify via append that the SQL400Json stuff is ready to test (new toolkit).
I also want to look at the cross compiler capabilities in gcc
Oh my! I think I saw a Linux cat! (Tweety Bird cartoon).
Optional read ...
I am PASE guy, so i do everything on 400 PASE. However, my laptop(s) both work and home are Linux for over twenty years (can't hardly use Windows at all, just for Turbo Tax).
PASE gcc compile is messy ...
Yeah, yeah, we know, messy to set up 'compile environment' with perzl chroot/pkg scripts. PASE ninja turtles back at Rochester understand (*).
(*) Opinions are my own, do not reflect IBM plans or promises. See Jesse Gorzinski for IBM i Open Source plans. I am just a PASE geek Dude.
- Log in to comment
We risk public debugging by water cooler ... cool ... how about them republicans ...
So, taken face value, git tripped over itself ... see this related link
Possible git 32 bit is using HUGE memory model, allocated too much to heap space, as mmap/shmat and heap/alloc/new share 32 bits. You can dial down HUGE heap by using environment variable LDR_CNTRL=MAXDATA=0x20000000 (or maybe 0x10000000).
So, git is a mmap pig (reference link). Another answer, compile git 64 bit, then use env var PASE_MAXSHR64 to let git64 take over the world with mmap (Muah-ha-ha-ha-ha!)
Git, oh git, maybe leaking ... but ... well ... that would mean a git error ... those guys are never wrong (Muah-ha-ha-ha-ha!).
All above fails, we need to debug, easy way, you need my help, and, access to STRSST to look into the kernel (Muah-ha-ha-ha-ha!).