HdrHistogramResetOnSnapshotReservoir resets on querying size
When using resetOnSnapshot, I can't query the size without resetting the whole snapshot. I need to query the size to determine if there have been measurements in the last period. Would it be ok to keep a separate counter?
Comments (28)
-
repo owner -
same person, different account, sorry.
I am measuring Real User Monitoring statistics in DataDog using dropwizard metrics. I am using tags quite a lot. Now I have the problem that dropwizard metrics keeps sending stats that have not seen any measurements in a period with all-zero-values. Therefore, in DataDog, aggregation across percentiles is not possible, since all these zero values skew averages.
-
My current plan is to filter out these all-zero-values measurements.
-
repo owner I see. Are you sending the actual histograms themselves to DataDog, or just percentiles? If you're only sending percentiles, that's not mathematically valid to combine with any other percentile, so this count will not help you. (e.g. you cannot "average" the 99.9th percentile of two different histograms to come up with anything meaningful)
-
Only sending percentiles. And mathematically, you're correct, but for getting estimates, I think that averaging certain histograms is valid. As long as they're interpreted as such...Taking the max is saying something less relevant in this use case.
-
Also, skipping the all-zeroes even helps with the amount of stats sent to datadog (I already had busted their payload max, introduced gz'ing in coursera/metrics-datadog for that) and hopefully also with their bill.
-
https://github.com/coursera/metrics-datadog/pull/79 for reference
-
repo owner I think Gil would disagree with you about the validity of such numbers, but as long as you're aware of the limitations...
Unfortunately,
Recorder
doesn't expose the size of its underlying histograms, so it doesn't look like I have a cheap (overhead-wise) way of doing this. I'd suggest using a separateCounter
that you update/reset at the same time. It's a little messy, but if your traffic is sparse enough that you're trying to elide measurements for idle periods, it might be a practical solution.I don't quite follow why resetting is a problem, though -- if it was all zeros, resetting didn't do anything really, and if it's nonzero, now you have data to send. Right?
-
The problem is that I am using a com.codahale.metrics.MetricFilter in order to filter the list of metrics to be reported for a period. I want to filter based on whether there have been measurements in the period, but when I do that (by accessing the size method), the reservoir gets reset before reporting.
-
And yes, Gil would absolutely (and correctly so) have a deep discussion with me, I am aware of what I am doing here.
-
What about having a separate counter in HdrHistogramResetOnSnapshotReservoir that gets updated on calls to the update method?
-
repo owner Hmm.. I don't want to add the overhead and memory use for most users that don't need this feature. What about just wrapping that reservoir type in one that includes whatever sort of counter you prefer and delegates to
HdrHistogramResetOnSnapshotReservoir
as needed? -
I can follow your reasoning, but do you see the irony in the implementation, resetting on calling the size method?
-
I at least was surprised that the reservoir was reset after calling the size method
-
And (simplified) is an integer that much of an overhead for a reservoir?
-
repo owner Yep, I see how that would be confusing. At the very least I can document that better!
-
repo owner It's not just an integer; it has to be thread safe. So, either I lock (which ruins the nice concurrency of the underling Recorder), or I use an atomic counter and compare-and-swap on snapshot, which also has nonzero, though not ruinous, cost.
-
Are you set on not considering introducing a separate counter?
Also, just a question: If you are set, why not just throw something like an UnsupportedOperationException or something similar in size? That would make the proposed usage most obvious.
-
repo owner Also, you might find it easier to implement the filtering you need if you wrapped the datadog reporter so you could intercept what is actually being transmitted.
-
repo owner Hm, perhaps throwing would be a good idea. I could also provide a wrapper to provide the atomic counter so that users could opt-in to that overhead if they need size() to be useful (for most definitions of useful).
-
Meta: Thanks for your time and thoughts, I appreciate it.
I don't think wrapping DataDogReporter would work, because the public interface (report method) still gets a set of metrics, which I would need to query beforehand. Or what point of access did you have in mind?
-
Your other idea of custom-wrapping a timer would probably work, because I can directly register it with the dropwizard metricsregistry.
-
repo owner I try to be a responsive maintainer. :)
Hm, looks like you'd need an extension point of some kind in
DatadogReporter
that lets you hook intoreportHistogram()
.I think MetricFilter is a somewhat poorly designed interface; it should really let you filter on the value that's about to be operated on, not just externally observable state. If I had more spare time, I have an extensive overhaul of the core abstractions of Metrics I'd like to do. But I don't, so instead we hack around problems like this...
In the meantime, I do think it would be useful to have a way for
size()
to not be so startling on the resetting reservoir, but I make no guarantees about when I'll have time to actually implement that. :( -
repo owner -
assigned issue to
-
assigned issue to
-
I can provide a pull request, if you can get the idea across.
But TBH, probably custom-wrapping a timer would be quicker to do, so I'm also not handing out guarantees ;)
-
repo owner Sure -- I think making
size()
on the resetting reservoir and providing an optional wrapper class that maintains anAtomicLong
counter is probably what I would do, but you're welcome to propose other implementation approaches. No worries if that PR doesn't happen. :) -
I'll do a POC with a wrapped timer, and if it is easy, revamp it to the reservoir
-
see pull request #3
- Log in to comment
Can you provide a little more context on what you're trying to accomplish? Maybe there should be another accessor, but it will be helpful to know more about the high-level goal.