Overview

-----------------------------------------------------------------------
Note (2011-12-14)
-----------------------------------------------------------------------

The taskstats API (which tstime uses) does not work anymore with
non-root user permissions under >= 3.0.0 kernels.

tstime still works called as root, though.

See http://www.openwall.com/lists/kernel-hardening/2011/09/19/9
for a related discussion about the current state of the taskstats API.

-----------------------------------------------------------------------

-----------------------------------------------------------------------
Note (2013-08-23)
-----------------------------------------------------------------------

See https://github.com/gsauthof/cgmemtime for a highwater memory
meassurement approach using cgroups and an overview of related tools.

-----------------------------------------------------------------------

I appreciate feedback and comments (especially about changing APIs which are
related to the programs and Solaris-specific APIs):

mail@georg.so
gsauthof@sdf.lonestar.org

License of the source code is the GPL version 2 or later.
If you don't received the license text go to: http://www.gnu.org/licenses/

Repository contains 2 programs:
- tstime
  
  'time(1)'-like command - but in addition to the runtime it also
  prints the highwater memory usage (rss+vmem) of the controlled process

- tsmon

  prints the runtime/highwater memory usage of every process which is
  exiting on the system until the tsmon is quit

Warning: The usage of tsmon is dangerous. Suddenly, you know for example that
'psi' consumes 32 MB rss, 'bash' forks 5 processes for each time you hit TAB
or that hald does a lot of forking from time to time.


Surprisingly, it is difficult to get the highwater memory usage
information under Linux (situation is similar on Solaris).

There are different approaches to solve this problem:

1. For rss-only highwater just use the ru_maxrss field from struct rusage (see
   getrusage(2) or wait3(2)). Unfortunately  POSIX does not specify it
   and currently (2.6.27) Linux does not write this field
   (Solaris does not, too; AIX, BSD are reported to write this information)

   From time to time patches are sent to the lkml to support ru_maxrss.
   AFAIK, until now nothing got applied.

   The last patch is discussed in
   http://thread.gmane.org/gmane.linux.kernel.mm/29880/focus=32531
   http://thread.gmane.org/gmane.linux.kernel.api/449

   Previous patches:
   http://thread.gmane.org/gmane.linux.kernel/582381
   http://thread.gmane.org/gmane.linux.kernel/568053

2. Sample from /proc during the lifetime of the watched process. Yes, this is
   polling. The old tool 'memtime' does this. Of course, this approach is ultra
   ugly and not suitable for short running processes.

3. Use the taskstats delay accounting API from the Linux 2.6 Kernel.
   For tsmon this is a perfect fit. But to get the taskstats information
   of a single exiting process there is a bit overhead involved:
   you have to retrieve the exit information of all exiting processes
   (and ignore it) until the watched process exits.

   see tstime.c for details

   I tried to wait for a sig-child and than use the child-pid to get
   the accounting information via the taskstats api, which didn't work. At
   this point the information of the child is already gone.

   nl.c contains a try to use libnl for communicating with the taskstats
   delay accounting subsystem of the kernel - but it does not work.
   Patches/hints how to correctly use libnl are appreciated.
   Thus, taskstat.c uses the low-level code from getdelays.c from
   the kernel documentation.

4. Use dtrace. Under Solaris dtrace looks like an elegant way to
   solve the problem. Unfortunately, dtrace does not provide any
   dtrace-provider that provides the rss/vmem-highwater information
   to the user. This is surprising since the Solaris Kernel
   internally already maintains this information.

   Since you can use dtrace to look into kernel structs whose
   fields are not exposed by any provider, you can wuergaround this
   limitation. But obviously, you need to be root for this.

   Even when some dtrace provider is extended in the future,
   another problem with dtrace is that under a default Solaris 10
   installation a normal user is not allowed to use dtrace (for his
   own processes). The admin has to give your account extra
   privileges for this.  Depending on your favourite Solaris admin,
   this may not be an option, since it is feared that malicious
   users would trace every instruction and things like that.
   Obviously, such arguments does not catch. For example malicious
   users already are able to truss every program they start or fork
   bomb the system.


Compile hints:

- Since the taskstats API of the Linux kernel is used you obviously need to
  have the kernel headers available.

  Under a debian-style distribution they are included in the package
  'linux-libc-dev'

- Under current distributions (e.g. Debian style) the default kernel is
  configured to provide the taskstats interface.

  If you get errors like 'Error getting family id, errno 0', your kernel
  probably is not configured for taskstats.

  To check your kernel config type something like this:

$ grep -i taskstat /boot/config-2.*-*-generic 
/boot/config-2.6.27-11-generic:CONFIG_TASKSTATS=y