db2sock in-job calling (fast, may not be best)

Issue #16 resolved
Former user created an issue

This issue is an open debate about pros and cons of adding 'fast' in-job calling to the toolkit part of db2sock.

Comments (17)

  1. Former user Account Deleted reporter

    PASE system call technology _PGMCALL/_ILECALL (brief)

    PASE provides IBM i specific system calls _PGMCALL and _ILECALL to call PGMs and SRVPGMs respectively. These system calls provide best can do performance calling from PASE to ILE within the current job. Also known as in-job, in-process, in-memory or up calling ILE.

    Technically, system calls _PGMCALL and _ILECALL enable a 'binary' marshaling or mapping technique to call ILE. However, only primitive types like word, double, float, pointer, 'aggregate' (blob), etc., are enabled. Thereby, toolkits must provide IBM i specific types like packed decimal, zoned decimal, RPG structures, and so on. Please search IBM i documentation for full understanding of capabilities.

    For this debate, we need only understand PASE system calls _PGMCALL/_ILECALL could be used for 'fast' binary style calling ILE. Essentially an additional and/or alternative transport for db2sock toolkit calling beyond current db2 connection (QSQSRVR jobs) and REST (fastcgi, Apache, nginx).

    debate (your vote)

    Today, db2sock, we implement db2 connection and rest 'connection' calling ILE programs. These are best understood 'connections' for toolkit calls to existing RPG programs (cmds, shell, etc.). They are also relatively safe when used by knowledgeable script writers.

    Technically db2sock toolkit design is essentially unlimited options for transports. That is, you can add your own db2sock toolkit packet transport mechanism to fit a given environment (ssl sockets, data queues, etc.). However, not clear that all methods should be included in this base project. The following in-memory call via _ILECALL/_PGMCALL should be considered carefully as a base included option.

    Personal: I hesitant to implement additional _PGMCALL/_ILCALL option due to security side affects of in-memory calling in web scripting languages. However, Open Source should bow to popular vote. Therefore, best I can, layout facts as understood for voting, aka, pros and cons next section.

  2. Former user Account Deleted reporter

    pro (_ILECALL/_PGMCALL)

    In-memory calling PASE 2 ILE substantially faster over current db2 connect and REST toolkit options. In fact, _ILECALL/_PGMCALL was implemented by IBM i experts in both PASE and ILE program models . Basically _ILECALL/_PGMCALL is the fastest toolkit option, because nobody in the world could have done better than IBM i experts.

    At this point you would think 'case closed'. This must be the best option for db2sock toolkit.

  3. Former user Account Deleted reporter

    con (_ILECALL/_PGMCALL)

    However, security side affects of alternative in-memory calling in web scripting languages are substantial.

    Why? All fast web servers supporting scripting languages use an idea of 'daemon' scripting jobs. That is, scripting language stays active in a job(s) handling requests. Obviously, we should be careful what 'company data' we leave hanging around live in-memory in scripting language jobs. To wit, any company data still 'live' in a script job can be hacked. Herein lies a major design flaw in-memory call db2sock project (con).

    Quick list of negative issues using _ILECALL/_PGMCALL in a 'toolkit'

    1) Security - Activated PGM/SRVPGM data 'alive' forever in web server job (php job). This is relatively easily hacked by simply re-calling a run procedure and getting the last data in memory (pgm function getlastsocialsecurity number ... please). Also, more sophisticated hackers, relatively simple memory pointer hacks to see the last active data in PGM/STRVPGM.

    2) Security - QTEMP work files not deleted 'alive' forever in web server job (php job). This hack is obvious, QTEMP work files left active in web server job can be read by same toolkit. Aka, simply use in-memory toolkit to open, read, close ... bingo (list of QTEMP credit cards).

    3) Security - Switching profiles or job description attributes can leave web server job with elevated authority (super power php job forever). All of the languages would need to add special code to monitor for profile switching in the web server job. Unlikely IBM i can convince the entire open source world to add post profile 'switch' checking (Uf Da). Anyway, really evil hack is switch to 'higher authority' profile to access a restricted PGM/SRVPGM, it will activate, then switch back to lower authority, functions have callable active memory (out of your control completely).

    4) Functional - Most RPG programs can not handle multiple threaded calls from scripting languages (async calls like node). Existing RPG programs doing more than 'Hello World' simply can not handle async style requests. This means languages like node when properly calling RPG async would simply blow up or deadlock (next).

    5) Functional - Activated PGM/SRVPGM open/lock files deadlocks in web server daemon job (php job alive forever). This is already an issue with xmlservice misused 'private connection' jobs (ask your operator).

    6) Functional - Multiple QTEMPS disable sharing between db2 queries and toolkit calls. All scripting db2 database drivers on IBM i use 'server mode'. This means, a QSQSRVR job is attached/detached to run db2 queries away from the web server job (different job). The in-memory call QTEMP in web server job, db2 query in QSQSRVR job. Therefore popular RPG idea of 'sharing' script db2 and toolkit QTEMP(S) becomes impossible with in-memory toolkit calling. Note: DB2 'server mode' QSQSRVR jobs are required to support multiple requests (threads/async), multiple connections (different profiles), multiple transactions, secure data (not in php job), etc. Also, less understood, many database operations are not thread safe, so uneducated users see 'missing data', etc., using languages like node (db2sock wants to fix this for you).

    There are probably more issues, but this is enough for a starter list for your input.

  4. Former user Account Deleted reporter

    up to you (public vote / debate)

    I can enable _ILECALL/_PGMCALL in a 'toolkit'. I am a IBM i PASE guy, so, well, easy for me. However, I am not at all sure we should provide this interface as the side affects of alternative in-memory calling in web scripting languages are substantial.

    You decide. I will go with the flow here.

  5. Jesse G

    In my view, the primary (possibly only) use case for in-process support is to help with simple scripting and automation tasks (namely changing current job attributes). It should never be used in any kind of web deployment for the reasons Tony mentioned. I agree 100% there.

    Granted, some of your concerns can be mitigated, depending on the application / deployment scenario. For instance a CGI job using an in-process call would avoid most concerns (fresh job each request), but a fastCGI deployment would amplify the worst attributes. My concern is that most users cannot confidently assess these impacts.

    However, the in-process support does have value for very specific scenarios. So, my current vote is to allow in-process calls..... But how would we advise users of the pitfalls? If we can't think of a good way to do so, I change my vote to a "no."

  6. Jesse G

    In previous response, I forgot the other reason I am in support of the in-process support: simplicity. The in-process transport removes the requirement of setting up an apache/nginx instance to host REST calls, while also not requiring a database connection and understanding its intricacies.

    While that's good, maybe it would "trap" too many people into using it who don't understand the implications of in-process work.

  7. Former user Account Deleted reporter

    My concern is that most users cannot confidently assess these impacts.

    Technical (small ray hope) ...We may be able to mitigate some gross hacks by avoiding PGM and/or SRVPGM resolve using PASE. Aka, no cache pgm/function pointer (tagged pointer) in PASE memory by simply avoiding PASE 'resolve' system calls like _RSLOBJ2, _ILESYM, etc. That is, call current design ILE db2sock resolve/call templates (Teemu not like current ILE based design). A bit of a band-aide makes things less 'hackable'.

    Alas, even current design will be forced by 'faster is better' people to cache resolve look-up of pgms and functions.Thereby, the hacks above dealing with 're-call' or 'wrong profile call' will still apply. But ... better than leaving active tagged pgm/srvpgm pointers in PASE memory.

    At least it would be something.

  8. Former user Account Deleted reporter

    I am in support of the in-process support: simplicity.

    Ok, added in memory call (in php job). Yes, is fast. Just add "qual":"*memory" to connection.

    Yips Super Driver - 1.1.2-sg5 - test driver - toolkit add in memory call PGM/SRVPGM/CMD (not rexx-rtvjoba or qsh)

    bash-4.3$ ./test3000_sql400json_memory64 ../json/j0301_cmd_pgm_hello
    input(5000000):
    {"connect":[{"qual":"*memory"},{"script":[
      {"cmd":{"exec":"CHGLIBL LIBL(DB2JSON QTEMP) CURLIB(DB2JSON)"}},
      {"pgm":[{"name":"HELLO"},
            {"s":{"name":"char", "type":"128a", "value":"Hi there"}}
           ]}
    ]}
    
    ]}
    output(117):
    {"script":[{"cmd":["CHGLIBL LIBL(DB2JSON QTEMP) CURLIB(DB2JSON)"]},{"pgm":["HELLO","*LIBL",{"char":"Hello World"}]}]}
    
    result:
    success (0)
    

    Limits: Requests json for qsh or rexx (RTVJOBA), remain run in QSQSRVR job (not php job). Basically, rexx and qsh do not work well in chroot. Also, rexx and qsh slow already, so no real performance benefit. Not worth effort in my opinion.

    BTW -- There are system APIs for RTVJOBA, therefore you can toolkit call in memory and get php job attribute data (much faster than rtvjoba cmd). Or you can use beloved RTVJOBA specifying current php job details.

    change your mind (too dangerous at any speed )?

    If you change your mind, I can easily disable call pgm, srvpgm, cmd in memory support.

  9. Teemu Halmela

    This seems to work reasonably well. It even works with my PHP module, where my _RSLOBJ2/_PGMCALL couldn't find any programs.

    Few problems with it. Currently the whole driver crashes if program has an error or it isn't found. The crashing stops when using ILECALL_EXCP_NOSIGNAL with _ILECALLX

    rc = _ILECALLX((ILEpointer *)ileSymPtr, &arglist->base, iCall400PgmIleSigStruct, RESULT_INT32, ILECALL_NOINTERRUPT|ILECALL_EXCP_NOSIGNAL);
    

    But this doesn't return any useful error messages from the joblog. Is there something wrong with my system (wouldn't be the first time) or does this also happen elsewhere?

  10. Former user Account Deleted reporter

    Few problems with it. Currently the whole driver crashes if program has an error or it isn't found.

    First, for readers watching two geeks talk ... this is about in memory calling ... and .. called user programs that throw an error (or missing programs).

    Ok. A debate 'to be or not to be' (php job). That is, for clarity, your argument (to be), is add ILE exception monitors to keep php process alive at all costs even with a bad user program dying or bad user script (wrong name program). The other argument (not to be), let php die, another php will start in the absence (how fastcgi works). As for errors on the death php, well, go look in the job log for errors ,assuming joblog is cut (joblog ... can be admin controlled).

    So, as you can see you do not need _ILECALL (my tagged pointer security concern). This can be framed under the current ILE call design, and, is more a debate about what it means to run in a php job.

    future

    There are actually two items to be decide on the 'run faster path'.

    1) On the in memory path. Should we put up exception monitors to monitor for bad user programs and/or bad user script (above discussion)?

    2) All paths. Should we cache the resolve to PGM/SRVPGM to make things run faster (aka, maybe 2X faster)? This goes back to security of course. That is, if we cache the tagged function/pgm pointer then we will 'short circut' the natural 'are you authorised' processing during ILE resolve (same as storing _RSLOBJ2 in PASE memory ... bad idea).

    Delay. I will think about both. I am not sure to be honest. This faster debate always comes back to security vs. faster. Also, when we get to 'async' (node instead of php), not clear what a monitored exception in a secondary thread handling a toolkit call would do to a node web site. Maybe only 'good' programs should be called in-memory, not scripts/rpg programs designed to die.

    BTW -- Non in-memory ... RPG dying in QSQSRVR job proxy, not so bad. These bad user programs should report errors ok (toolkit joblog).

    Mmm ... a duality of 'bad' standards.

    PASE program dies (php extension), call out national guard (support, VP, governor). User RPG program dies in-memory, just fine, always does that. Perhaps only good RPG programs should be called when running in memory (faster than fast, no exceptions).

  11. Teemu Halmela

    I thing it would be better to not crash the whole driver and in this case the PHP with it. To me it sound bad that the middleman is crashing because of bad things the user did. It should just tell what went wrong and caller could do whatever it wanted with the information. And it already works that way when using the non-in-memory way. Of course the easier thing is to crash the whole things as that won't leave any dangling pointers around and you don't have to worry about cleaning things up.

    BTW -- Non in-memory ... RPG dying in QSQSRVR job proxy, not so bad. These bad user programs should report errors ok (toolkit joblog).

    I'm also confused why the errors won't show in the same toolkit joblog. That again might be something I don't understand about how these IBM systems work.

    At some point I tried adding some signal handling to my code, but I couldn't get it to work. There is very little documentation about how you should catch these errors from PGMCALL/ILECALL. If you have some insight how one would do that it would be great to see.

  12. Former user Account Deleted reporter

    There is very little documentation about how you should catch these errors from PGMCALL/ILECALL.

    The manual is explicit ... ILECALL_EXCP_NOSIGNAL (0x00000020) - Suppresses signals for IBM i exceptions. This option causes the system to return a function result of -1 (and set errno to 3474) instead of raising a signal for any exception during the call. You can use the QMHRCVPM()--Receive Program Message for IBM PASE for i function specifying message type *EXCP to determine what exception occurred.

    Again, I do not recommend this interface for in-memory toolkit. Basically, on your own collecting 'joblog(ish)' information using system APIs. Much, much, easier (and safer), using QSQSRVR jobs with natural DB2 'system API' stored procs.

    in memory call ... better to not crash the whole driver and in this case the PHP with it

    I am not going to completely dismiss idea of 'special handling' ILE user programs (global monitors toolkit). However, I do not agree fundamentally. In fact, holding PHP alive as 'walking dead' seems illogical and probably security dangerous.

    Why?

    Basically, Toolkit in memory called RPG wanted to end (suicidal RPG program). Technically, user RPG program did not handle a given exception, because the RPG program was never designed to handle the exception. All half-baked state of RPG program is in flux, which is simply an unpredictable security exposure (data left hanging in memory). By lack of exception handling, the user RPG program running in PHP job declares it wants the job/process to die (take PHP down). Point of fact, RPG program during MI exception was already on way to death of process, which is especially problematic when toolkit 'in memory' threads are involved (async PHP call, node callback, etc.).

    This is same as any PASE bad actor extension such as ibm_db2 SEGFAULT/core dump (boom goes PHP process). PASE vs. ILE dualism seems absurd.

    Yet?

    You want to keep PHP/node job on life support with a dying/dead RPG program waddling death throws in memory???

    Mmm ... I am glad we are having conversation in open. 'Obsession for in memory speed' appears completely snow blind to practical dangers. I am simply showing the rocks at the bottom of the in-memory cliff dive. Whoosh ... splat! I assumed most people simply will not leap off the in-memory call cliff. Aka, a nice, safe, QSQSRVR job is a bit slower, but a lot better. However, again, if popular vote design wants in-memory cliff diving ... I can add something to keep PHP alive on the rocks.

  13. Teemu Halmela

    I could probably use forking to not crash the whole PHP when program errors happen. The fork would run dbsock inside its own memory space. Normally fork would stay alive after finishing but when something bad happens we would exit and create new one.

    But when bad things start to happen it would be great if we could show some other error than 'We crashed!'.

  14. Former user Account Deleted reporter

    I could probably use forking to not crash the whole PHP when program errors happen. The fork would run dbsock inside its own memory space.

    Yep ... safe ... but ...

    Stateless job (safe) ... bad news is fork is so slow and resource intensive that using fork 'stateless' will drag the whole machine to ruin (Don't take my word for it, simply run a recursive PASE fork test ... but ... you will have to IPL the machine eventually). In case of a daemon like PHP the constant fork(ing), would kill the machine CPU.

    State full job (re-used) ... If you fork and leave the job alive to handle work (cough QSQSRVR jobs), then you have to build an IPC mechanism to 'attach' (cough QSQSRVR jobs). Aka, shared memory w/semaphore, data queue, pipe, socket, etc. Eventually you will realise that you need to pre-start you fork jobs (cough QSQSRVR again). Then you will need a subsystem (cough QSQSRVR again). Then your will need adopt authority 'connection' to 'change the job description' (cough QSQSRVR again) ... cough ... cough ... wheez ... re-inventing QSQSRVR jobs.

    As far as errors in fork job ...

    Generally Linux/Unix/AIX/PASE processes (jobs), are always 'fork' child from the parent. When child job dies (crash), Unix convention is to leave process exit status to be harvested by the parent (usually a parent shell). The dead fork jobs enter a state known as 'zombies' (really, not making it up), until exit status is collected by the parent. (There are other funky Unix rules about parent dying, killing children, etc., but this adds confusion to general question).

    Anyway to 'where is my joblog full info'???

    I cannot fully recall (PASE guy forgot), but i think IBM i work management did something special to set-up child/fork PASE jobs to avoid mass 'junk' would plug up job logs. Basically, IBM i proper never considered fork a good idea (not supported ILE), therefore PASE 'fork' was an accommodation between full on IBM i jobs, and the thin Unix jobs. So ... you kinda get 'half a loaf' of joblog bread if you use fork as your mechanism ... cough ... better to use full on QSQSRVR jobs (hint).

    BTW -- I do not consider this fork discuss a valid course of action for toolkit. To wit, I am giving you the jet plane fly-by of fork on IBM i. A bit less the faithful researcher of every aspect. If you are still serious about such a fork course we can go into every detail later (you will probably not like the answers).

    back to in memory again...

    So, again, in-memory call, you/clever are on the only path to faster than fast speed nirvana. However, again, I am simply showing you the rocks at the bottom of in memory design cliff dive. Whoosh ...splat!

    Also, I am not intentionally picking on you (only a little). In fact, I consider you very clever. I am simply older, not smarter. As clever as you are, you 'publicly' hold sway on folks that do not have time to work through all this deep technical monkey business (db2sock design in memory). This conversation needed to happen in the open, because many, many folks think faster is always better. Well, now we are having the real debate openly, with many problems in memory calling in web language.

    In the end, we really probably should direct most people to using safe QSQSRVR jobs. However, again, if popular vote design wants in-memory cliff diving ... I can add something to keep PHP alive on the rocks.

    Understand?

  15. Former user Account Deleted reporter

    Halmela. Good help rendered to db2sock project, you have. (Yoda -- Star Wars).

    Do you really need/want a 'pure PASE' option (PGMCALL/ILECALL)? Aka, bypass all DB2JSON ILE stuff via another memory call option (pure PASE)?

    Not my choice ... but if you really want ... i could add another memory call option connect "qual":"*pase" (similar to current "qual":"*memory"). Unfortunately db2sock toolkit programming work is messy, so would take more than a few days (i work Mon-Wed).

  16. Teemu Halmela

    Do you really need/want a 'pure PASE' option (PGMCALL/ILECALL)? Aka, bypass all DB2JSON ILE stuff via another memory call option (pure PASE)?

    I don't think there is necessarily a need for that. *memory works well and with minimal overhead. Only problem as stated is the no returning errors. On normal use that isn't such big of a problem because errors shouldn't be happening in production. It just makes for easier debugging when you see the error straight at your face.

    To combat the whole PHP crashing I made a simple PHP extension which runs db2sock inside a fork. So only the fork needs to be restarted when the call fails. Normally the forks stays alive and the communication happens through a socket, which seems to be plenty fast.

    I've also seen one more weird problem with these in memory calls. When calling a SQL RPG program sometimes it can't find the table from the *LIBL although library list is correctly set. The query returns error -204. I haven't seen this error for a while now and maybe it got fixed when we applied PTFs to our system. But I have to keep my eyes on it.

  17. Log in to comment