Cut2 odd behaviour

Issue #35 new
Pablo Acera created an issue

This is not a bug from CHiCAGO itself, but I think affects directly to the output of the OE normalization..

I have seen that one of the functions that you used in .addTLB() is cut2. I have realized an odd behavior of this function that you may want to be aware of. Using the "m" parameter of the function, it bins a list of elements based on a minimum number of occurrences (m) of each element. Example:

 x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) 
cut2(x, m=4)

[1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6)
[13] [4,6) [4,6) [4,6) [4,6) 6 6 6
Levels: [1,3) 3 [4,6) 6

In this case I want bins that has a minimum number of 4 elements in each bin. The final bins are [1,3) 3 [4,6) 6. Although bin 3 has just 3 elements and bin 6 too , instead of [1,3) [3,5) [5,6] where all bins have at least 4 elements.

I have already written a ticket to the Author and I am already fixing the function. I don't think it will massively affect the CHiCAGO output but I just wanted you to be aware and what are your thoughts on this. Thank you very much for your time and sorry for this long ticket. Best.

Comments (1)

  1. Mikhail Spivakov

    Thanks for the report, Pablo! Hopefully the package author will address this issue. You're right though that this is unlikely to make any difference for Chicago (unless this is the tip of the iceberg of a larger problem with cut2).

  2. Log in to comment