-
assigned issue to
density method of findThreshold might need a bandwidth adjustment
findThreshold
with method="density"
on the webinar example (HD13M) doesn't look right. Bandwidth problem?
Comments (12)
-
reporter -
I have played with parameters which I was allowed to tune. As we discussed before the main issue is the 'bandwidth' argument in
KernSmooth::bkde
function, which is automatically/internally calculated bykedd::h.ucv
function. Tuning the arguments really doesn't change it. And if we change them the"densiy"
method will loose its generality. -
reporter We don't know that it was a bandwidth problem. That was just a hypothesis.
-
That was a hypothesis. It s a fact now :)
-
reporter Okay, then are you saying that changing the bandwidth did fix the issue?
-
True... the bandwidth calculated by
kedd::h.ucv
is smaller (0.01) than it should be (e.g. 0.025) -
reporter Did you tune the
kedd::h.ucv
parameters and/or try alternative implementations likestats::bandwidth
? -
Yes I tuned parameters... It doesn't change. Most of the parameters are calculated internally and therefore fixed. I am not sure what does
stats::bandwidth
do, but the bandwidth needs to be calculated with 4'th derivative of kernel density which is met bykedd::h.ucv
. A skim over thestats::bandwidth
function doesn't say any thing about it. -
reporter stats::bandwidth
was just an example. I kind of suspect that one has already been tried in the past. Sounds like we need an alternative approach to bandwidth selection. -
reporter Okay @nimanouri, it looks like part of the problem is that the least-squares cross-validation approach to bandwidth detection isn't intended to work on data with ties (duplicate values). More details: http://www.ism.ac.jp/editsec/aism/pdf/060_1_0021.pdf
I initially swapped the package used for bandwidth detection from
kedd::h.ucv
toks::hucv
because theks
package has a lot more parameters and alternative methods for bandwidth selection, but I just went back to the existing packages for now with some minor tweaks:Right now, the only changes I made are form this:
bandwidth <- kedd::h.ucv(distances, 4)$h dens <- KernSmooth::bkde(distances, bandwidth=bandwidth)
To this:
bandwidth <- kedd::h.ucv(unique(distances), 4)$h dens <- KernSmooth::bkde(distances, bandwidth=bandwidth, canonical=TRUE)
Which is infinitely faster as an added benefit.
We can probably swap entirely over to the
ks
package because there are a lot more ways we could tune this withks
, but let's test the changes withkedd
andKernSmooth
for now. If we do swap toks
, the following should be the same as the current changes:bandwidth <- ks::hucv(unique(distances), deriv.order=4) guassian_scaling <- (1/(4 * pi))^(1/10) dens <- ks::kde(distances, h=bandwidth*guassian_scaling, binned=TRUE)
I'm passing it back to you to test the old density method against the changes to the density approach and the gmm method. I'll email the plots separately.
-
reporter Oh, I did experiment with some of the other methods in
ks
for bandwidth selection, but they yielded similar (or the same) values, so I didn't end up swapping. Ie,ks::hscv
andks:hpi
. I didn't adjust parameters other thanderiv.order=4
though. -
reporter - changed status to resolved
Sticking with unique values for bandwidth detection.
- Log in to comment