arbtt-stats: access surrounding samples

Create issue
Issue #52 new
leromarinvit created an issue

I'd like to assign tags based on tags and/or other attributes of surrounding samples. What I'm looking for, at the end of the day, is the ability to say "assign this tag if of +/- N samples around the current sample, at least M have property xy (e.g. tag, active window, etc.)".

I think the most basic primitive to make this work is a way to access a sample at a specific offset from the current one. The aggregation functions could build on that.

This is something I want/need for my own purposes, so I'm probably going to implement it one way or another. It would be great if this could be upstreamed when it's ready, so I'd like to know if this is something you'd consider including at all, and any considerations on the design of such a feature you might have.

I'll volunteer the implementation, but beware that I don't know Haskell, so it might take some time to get it right... ;-)

Comments (6)

  1. nomeata repo owner

    That was actually part of the original design of arbtt! But I never actually had a good use for it, and it got into the way of efficient processing of the samples (because you’d have to keep all the other samples around). So unless there is a very compelling use case, I am hesitant to implement it.

  2. leromarinvit reporter

    My use case is that for time recording on my work laptop, I want to filter out periods when I'm using a browser for more than a certain percentage of the time outside of normal working hours. The idea is that when I'm using the laptop, I'm likely to be working, but when I'm just using a browser, I likely got distracted and am actually doing something else. But I don't want to just filter our all browser-using samples, since I do use a browser as part of my job as well.

    Why would you have to keep all samples in memory? Couldn't you just read them on demand? Assuming that the samples of interest are near the current one (e.g. maybe +/- a few hours at most), the range of samples being used at any one time would be relatively small compared to the total size of the log, with good temporal and spatial locality. So the additional accesses would probably just hit the cache.

  3. nomeata repo owner

    Hmm, interesting use case.

    Let’s put it this way: With the current architecture of arbtt internally, as well as the log file, (which is not perfect, no doubt about it), on-demand loading is tricky.

    So it’s a valid proposal, but don’t hold your breath: I don’t expect to be getting around to implement it any time soon – sorry.

  4. leromarinvit reporter

    Without looking at the source much - is it because of the string sharing described here? https://www.joachim-breitner.de/blog/381-Exploiting_sharing_in_arbtt

    Maybe a sliding window would work then, with the window size depending on how many samples the rules want to look ahead/behind?

    If I go ahead and try my hand at this, what sort of config syntax would you prefer? I've had something like [+-]n ( expr ) in mind, but maybe it's a little too non-obvious:

    +3 (current window program == "Navigator") ==> tag browser-in-3-minutes
    -1 (current window program == "Navigator") && current window program == "x-terminal-emulator" ==> tag switch-from-browser-to-terminal
    
  5. nomeata repo owner

    I am still not convinced. Another problem is that so far, arbtt is pretty independent of the sample rate: You can increase or decrease it as you wish. Look-forward or look-backwards a certain amount of time means that the size of the sliding window needs to dynamically adjust.

    As for the synax, I would probably go for something more verbose; maybe something along the lines of

    [always|sometimes] during the [previous|next] /timespec/ ( expr  )
    
  6. nomeata repo owner

    Maybe before implementing this, a refactoring to use sqlite instead of my simple file format is the right thing to do… but that would require pretty much a complete rewrite of half the codebase.

  7. Log in to comment