HdrHistogramResetOnSnapshotReservoir resets on querying size

Issue #8 new
Ole Langbehn created an issue

When using resetOnSnapshot, I can't query the size without resetting the whole snapshot. I need to query the size to determine if there have been measurements in the last period. Would it be ok to keep a separate counter?

Comments (28)

  1. Marshall Pierce repo owner

    Can you provide a little more context on what you're trying to accomplish? Maybe there should be another accessor, but it will be helpful to know more about the high-level goal.

  2. Ole Langbehn

    same person, different account, sorry.

    I am measuring Real User Monitoring statistics in DataDog using dropwizard metrics. I am using tags quite a lot. Now I have the problem that dropwizard metrics keeps sending stats that have not seen any measurements in a period with all-zero-values. Therefore, in DataDog, aggregation across percentiles is not possible, since all these zero values skew averages.

  3. Marshall Pierce repo owner

    I see. Are you sending the actual histograms themselves to DataDog, or just percentiles? If you're only sending percentiles, that's not mathematically valid to combine with any other percentile, so this count will not help you. (e.g. you cannot "average" the 99.9th percentile of two different histograms to come up with anything meaningful)

  4. Ole Langbehn

    Only sending percentiles. And mathematically, you're correct, but for getting estimates, I think that averaging certain histograms is valid. As long as they're interpreted as such...Taking the max is saying something less relevant in this use case.

  5. Ole Langbehn

    Also, skipping the all-zeroes even helps with the amount of stats sent to datadog (I already had busted their payload max, introduced gz'ing in coursera/metrics-datadog for that) and hopefully also with their bill.

  6. Marshall Pierce repo owner

    I think Gil would disagree with you about the validity of such numbers, but as long as you're aware of the limitations...

    Unfortunately, Recorder doesn't expose the size of its underlying histograms, so it doesn't look like I have a cheap (overhead-wise) way of doing this. I'd suggest using a separate Counter that you update/reset at the same time. It's a little messy, but if your traffic is sparse enough that you're trying to elide measurements for idle periods, it might be a practical solution.

    I don't quite follow why resetting is a problem, though -- if it was all zeros, resetting didn't do anything really, and if it's nonzero, now you have data to send. Right?

  7. Ole Langbehn

    The problem is that I am using a com.codahale.metrics.MetricFilter in order to filter the list of metrics to be reported for a period. I want to filter based on whether there have been measurements in the period, but when I do that (by accessing the size method), the reservoir gets reset before reporting.

  8. Ole Langbehn

    And yes, Gil would absolutely (and correctly so) have a deep discussion with me, I am aware of what I am doing here.

  9. Ole Langbehn

    What about having a separate counter in HdrHistogramResetOnSnapshotReservoir that gets updated on calls to the update method?

  10. Marshall Pierce repo owner

    Hmm.. I don't want to add the overhead and memory use for most users that don't need this feature. What about just wrapping that reservoir type in one that includes whatever sort of counter you prefer and delegates to HdrHistogramResetOnSnapshotReservoir as needed?

  11. Ole Langbehn

    I can follow your reasoning, but do you see the irony in the implementation, resetting on calling the size method?

  12. Marshall Pierce repo owner

    Yep, I see how that would be confusing. At the very least I can document that better!

  13. Marshall Pierce repo owner

    It's not just an integer; it has to be thread safe. So, either I lock (which ruins the nice concurrency of the underling Recorder), or I use an atomic counter and compare-and-swap on snapshot, which also has nonzero, though not ruinous, cost.

  14. Ole Langbehn

    Are you set on not considering introducing a separate counter?

    Also, just a question: If you are set, why not just throw something like an UnsupportedOperationException or something similar in size? That would make the proposed usage most obvious.

  15. Marshall Pierce repo owner

    Also, you might find it easier to implement the filtering you need if you wrapped the datadog reporter so you could intercept what is actually being transmitted.

  16. Marshall Pierce repo owner

    Hm, perhaps throwing would be a good idea. I could also provide a wrapper to provide the atomic counter so that users could opt-in to that overhead if they need size() to be useful (for most definitions of useful).

  17. Ole Langbehn

    Meta: Thanks for your time and thoughts, I appreciate it.

    I don't think wrapping DataDogReporter would work, because the public interface (report method) still gets a set of metrics, which I would need to query beforehand. Or what point of access did you have in mind?

  18. Ole Langbehn

    Your other idea of custom-wrapping a timer would probably work, because I can directly register it with the dropwizard metricsregistry.

  19. Marshall Pierce repo owner

    I try to be a responsive maintainer. :)

    Hm, looks like you'd need an extension point of some kind in DatadogReporter that lets you hook into reportHistogram().

    I think MetricFilter is a somewhat poorly designed interface; it should really let you filter on the value that's about to be operated on, not just externally observable state. If I had more spare time, I have an extensive overhaul of the core abstractions of Metrics I'd like to do. But I don't, so instead we hack around problems like this...

    In the meantime, I do think it would be useful to have a way for size() to not be so startling on the resetting reservoir, but I make no guarantees about when I'll have time to actually implement that. :(

  20. Ole Langbehn

    I can provide a pull request, if you can get the idea across.

    But TBH, probably custom-wrapping a timer would be quicker to do, so I'm also not handing out guarantees ;)

  21. Marshall Pierce repo owner

    Sure -- I think making size() on the resetting reservoir and providing an optional wrapper class that maintains an AtomicLong counter is probably what I would do, but you're welcome to propose other implementation approaches. No worries if that PR doesn't happen. :)

  22. Log in to comment