HTTPS SSH
(This repo contains detailed bugreport to dynatrace agent)

Summary
==============

Preloading liboneagentproc.so breaks DT_RPATH/DT_RUNPATH processing
during dlopen calls.

Background
==============

DT_RPATH/DT_RUNPATH allows one to modify dynamic library search path
used by specific executable or shared library, with link option. For
example:

    c++ -o some.exe  -Wl,--rpath,/my/custom/libs  some.o  -llibmylib  …

and some.exe, when run, will look for libmylib.so in /my/custom/libs in 
addition to LD_LIBRARY_PATH or ld.so.conf settings (see `man dlopen` for details).
This works also for shared libraries:

    c++ --shared -o libparent.so  -Wl,--rpath,/my/custom/libs  some.o  …

and libparent.so will look into /my/custom/libs for needed libraries
(both those linked-in, and those dlopen-ed dynamically)

One particularly useful idiom is that of using $ORIGIN. For example

    c++ -o some.exe  -Wl,--rpath,\$ORIGIN  …otherflags…

and some.exe will look for needed libraries in the very directory
it is installed. Things like 

    c++ -o some.exe  -Wl,--rpath,\$ORIGIN/../lib  …otherflags…

work too.

The same resolution works for libraries which are dynamically
loaded during execution via ``dlopen`` (like plugins) instead
of being linked-in.

The problem
=================

Dynatrace agent installation creates /etc/ld.so.preload file, what
causes preloading liboneagentproc.so library into any process run on
the machine.

This library provides it's replacement for various crucial libc
functions (connect, execve, dlopen, dlsym, fopen, fclose, open64,
open, close, fopen64), and once it is preloaded, those substitute the
originals (I didn't try debugging, but I guess it grabs info like
„which process opens what” then calls original, or does sth similar):

    $ objdump -TC /lib/x86_64-linux-gnu/liboneagentproc.so | grep DF
    000000000000d948 g     DF .text	000000000000031f connect
    000000000000ad30 g     DF .text	000000000000022a execve
    0000000000004010 g     DF .init	0000000000000000 _init
    000000000000cf70 g     DF .text	0000000000000812 dlopen
    000000000000cdc0 g     DF .text	00000000000001a8 dlsym
    000000000000f480 g     DF .text	00000000000002b0 fopen
    000000000000f9e0 g     DF .text	00000000000001fd fclose
    000000000000ff90 g     DF .text	00000000000003a1 open64
    0000000000069fd8 g     DF .fini	0000000000000000 _fini
    0000000000004c70 g     DF .text	000000000000007d ruxitBareMetalLog
    000000000000fbe0 g     DF .text	00000000000003a1 open
    0000000000010340 g     DF .text	00000000000001f9 close
    000000000000f730 g     DF .text	00000000000002b0 fopen64

In particular this means, that in case of dlopen calls, instead of
standard sequence where 
    (my program or library)
          ---[calls]--->
                glibc.dlopen
now I have
    (my program or library)
          ---[calls]--->
                liboneagentproc.dlopen
                  ---[calls]--->
                        glibc.dlopen

Now: glibc.dlopen inspects RPATH of *the calling executable*.

In normal case, the calling executable is my program or library,
which has RPATH/RUNPATH set, so dlopen considers this setting,
and finds my library

With liboneagentproc preloaded, the calling executable is
liboneagentproc.so, which of course doesn't have my runpath set. So
dlopen doesn't consider any rpath and doesnt find my library.

Example program
=====================

This repo contains simple example program (exe doing dlopen). To run
the example, just execute

    ./compile_and_run.sh

On system without dynatrace it prints:

    ===> Compiling lib
    ===> Compiling prog_linked
    ===> Compiling prog_dlopening
    ===> Running prog_linked
    Hello from program
    Hello from MyLib
    ===> Running prog_dlopening
    Hello from program
    Hello from MyLib
    Finished OK

(the same happens if I remove /etc/ld.so.preload from system where
dynatrace agent is installed)

On system with dynatrace agent installed and it's preload active, it
prints:

    ===> Compiling lib
    ===> Compiling prog_linked
    ===> Compiling prog_dlopening
    ===> Running prog_linked
    Hello from program
    Hello from MyLib
    ===> Running prog_dlopening
    Hello from program
    ERROR, dlopen failed
    libmylib.so: cannot open shared object file: No such file or directory

Note: this is simplest case, to test the problem in detail one should also
consider the case where one shared library dlopens another shared library.

For some details, try running with LD_DEBUG=all set:

On system without dynatrace:

    $  LD_DEBUG=all  ./OUTPUT/myprog_dlopening.exe
    …
    4836:     file=libmylib.so [0];   dynamically loaded by OUTPUT/myprog_dlopening.exe [0]
    4836:     find library=libmylib.so [0]; searching
    4836:      search path=/oracle/app/oracle/product/11.2.0/client_1/lib:/oracle/tuxedo12.1.1.0/lib          (LD_LIBRARY_PATH)
    4836:       trying file=/oracle/app/oracle/product/11.2.0/client_1/lib/libmylib.so
    4836:       trying file=/oracle/tuxedo12.1.1.0/lib/libmylib.so
    4836:      search path=/home/marcink/DEV_hg/bugs/dynatrace_rpath/OUTPUT           (RUNPATH from file OUTPUT/myprog_dlopening.exe)
    4836:       trying file=/home/marcink/DEV_hg/bugs/dynatrace_rpath/OUTPUT/libmylib.so
    4836:     
    4836:     file=libmylib.so [0];  generating link map
    4836:       dynamic: 0x00007f8d03cd6d48  base: 0x00007f8d03ad5000   size: 0x0000000000202018
    4836:         entry: 0x00007f8d03ad5b70  phdr: 0x00007f8d03ad5040  phnum:                  7
    …

With dynatrace installed:

    $  LD_DEBUG=all  ./OUTPUT/myprog_dlopening.exe
    …
      5137:     file=libmylib.so [0];  dynamically loaded by /lib/x86_64-linux-gnu/liboneagentproc.so [0]
      5137:     find library=libmylib.so [0]; searching
      5137:      search path=/oracle/app/oracle/product/11.2.0/client_1/lib:/oracle/tuxedo12.1.1.0/lib          (LD_LIBRARY_PATH)
      5137:       trying file=/oracle/app/oracle/product/11.2.0/client_1/lib/libmylib.so
      5137:       trying file=/oracle/tuxedo12.1.1.0/lib/libmylib.so
      5137:      search cache=/etc/ld.so.cache
      5137:      search path=…various-system-paths…/usr/lib/x86_64:/usr/lib  (system search path)
      5137:       …
      5137:       trying file=/lib/x86_64-linux-gnu/libmylib.so
      5137:       …
      5137:       trying file=/lib/libmylib.so
      5137:       …
      5137:       trying file=/usr/lib/x86_64/libmylib.so
      5137:       trying file=/usr/lib/libmylib.so
      5137:     
      5137:     symbol=_dl_exception_create;  lookup in file=OUTPUT/myprog_dlopening.exe [0]
      5137:     symbol=_dl_exception_create;  lookup in file=/lib/x86_64-linux-gnu/liboneagentproc.so [0]

Different caller, so no rpath.
 
Realistic use cases
=======================

RUNPATH has various realistic use cases, for example AFAIK Qt uses it to smoothly
handle multiple versions installed in parallel on the same system.

My true case is related to python scripting, RUNPATH makes it easier
to have pyextension.so find it's dependant libraries, especially when
there are various versions of pyextension.so scattered around
different virtualenvs on the system (each in need of libs from
different location). Attempts to use LD_LIBRARY_PATH in such a case
are really painful.

Even more important case is that of setuid/setgid, when
LD_LIBRARY_PATH doesn't work at all and RUNPATH is the only smooth
solution in case libraries aren't installed systemwide.