Allow Bandwidth Selection by Category

Issue #58 closed
Former user created an issue

What an excellent visual Daniel! Can convey a lot of statistical info in a single view. I am trying to get more detail out of the violin to more accurately represent a true Histogram. My data is organized into date "categories". If I look at a single day, the "Sampling Resolution" can give me a great representation of a histogram with many bins. However, if I am displaying more than one day (multiple violins), I lose any high sampling resolution in the visual. I can force the bandwidth to 1, but even that is not equal to what we saw on the individual day.

I am attaching a PDF with pictures of what I am seeing. Thanks for any help!

Comments (14)

  1. Daniel Marsh-Patrick repo owner

    Hi - thanks very much for your kind words, and thanks for a really great report on this one. Could I ask you to just provide me with a couple of extra details if that's okay?

    • For each individual chart with your desired result, in the Tooltip menu, enable KDE Bandwidth
    • Hover over the chart and get the KDE (Estimated) value for each
    • Pop the values in a comment on this issue

    You have many more data points in one day than the other, so I suspect that individually, the derived bandwidth is sufficiently different as such that when the bandwidth is estimated over your combined data, then it's coming up as shown.

    If this confirms my suspicions then I think we can extend the visual to accomodate this particular use case, but I'll review after I can get those individual bandwidth values and see what I can do in terms of enhancing things.

  2. Marc Ford

    Thanks Daniel. Here are some numbers for the 7 days and number of samples. (looks like you have to right-click open image to get it big enough to read)

    2019-01-02_1622.png

    Note that if I select only the 2 days again, they both say the estimate is -7.2. 2019-01-02_1641.png

    I wonder if the use of Category should allow an option to plot each category with its own bandwith calculation instead of forcing a single value accross every category(day).

    I completely expect wild swings in the number of samples I can expect each day.

    thanks!

  3. Daniel Marsh-Patrick repo owner

    Thanks, Marc - that's pretty much what I expected.

    I had such an enhancement (bandwidth by category) on my mental to-do list, so I'll appropriate your issue for the work.

    The custom visuals framework doesn't have as much functionality open to it as the core visuals do, so settings by measure/category are a bit harder to implement, but I envisage the following options off the top-of my head:

    • A new toggle will be added to the bottom of the Violin Options, called Bandwidth by Category.
    • This will only be visible if multiple categories are present (similar to how the colouring options work).
    • If Specify Bandwidth is disabled (meaning that we're estimating bandwidth), enabling the Bandwidth by Category option will work out the bandwidth across that series's data only.
    • If Specify Bandwidth is enabled (meaning that we're manually providing bandwidth), enabling the Bandwidth by Category option will provide text boxes for each category.
      • This will allow you to override on a per category basis.
      • This is contingent on the custom visuals framework supporting such a feature, so some R&D will be required to confirm it's possible to do this way

    I'll schedule for 1.1.0.0, which is the current version I'm working on. I don't have an ETA at present, as I have a few other commitments on at the moment, but I'm hoping to have a new release ready for submission by mid/end Jan.

    If I can implement, would you be keen on testing/validating? If so, I can give you a specific build to review the changes when they're in a suitable state.

  4. Marc Ford

    Sure, I can test it when you are ready.

    Oh, one other thing (see, now I'm a pain)... Notice my days show up as "Wed Dec 12 2018 00:00:00 GMT -0500"

    This Date field is actually formatted as just Date, format = "2018-12-12". Would be great if it displayed just that and not all the extra baggage.

  5. Daniel Marsh-Patrick repo owner

    I've done some feasibility analysis on the capabilities to deliver the bandwidth changes above and it will be possible to do them as intended. I'll hopefully be in touch soon with some additional info and a build for testing.

    The date issue appears to be something to do with the latest custom visuals API. I've created a new issue (#59) to track that one and hopefully I'll be able to fix it relatively easily.

  6. Daniel Marsh-Patrick repo owner

    Hi Marc,

    I have uploaded a release candidate of 1.1.0 to this site for you to have a look at, should you so wish. This also contains a fix to #59, which you might be interested in, as well as the barcode plot option (the Box Plot menu has now changed to Combo Plot).

    A few notes:

    • Because this visual is in the marketplace, Power BI desktop/service will always serve the published (1.0.0.1) version, even if you upload the file manually
    • As such, I have had to give this version a new GUID so that this does not happen
    • The published 1.1.0.x version (once it gets to the marketplace) will have the same GUID as 1.0.0.1, ensuring that this will auto-update for all users once it goes live
    • Therefore, please don't use the attached one for production purposes, as it will get out of cycle from the marketplace one

    One bonus of this is that you can effectively side-load both versions of the visual in the same report to compare them. So, my suggested approach would be as follows:

    • Add in the 1.1.0.2 release candidate visual via the Import from file option in the visual palette
    • You'll have two instances of the visual in the palette (one, likely the second one, will have a Violin Plot 1.1.0 tooltip)
    • Create a similar chart to your affected one
    • Expected results:
      • If you have a category field in your visual, there will now be an additional Bandwidth By Category option
      • If Specify Bandwidth is disabled, enabling this option will apply the rule-of-thumb bandwidth to each category individually (can be verified by using the KDE Bandwidth option in the Tooltip menu)
      • If Specify Bandwidth is enabled, enabling this option will provide you with a box per category, and specifying a value will apply this to the individual category (can be verified by using the KDE Bandwidth option in the Tooltip menu)
      • For #59, the category formatting will be respected in the x-axis labelling and the legend (if colouring by category)

    I'd be keen to know how you get on, and hopefully this works as you expect.

    If you're happy, I'll kick off the publication process with MS, which takes roughly 3 weeks from start to finish, depending on their capacity (yay!)

    Thanks,

    Daniel

  7. Marc Ford

    Thanks Daniel. I'm playing with it. #59 date issue looks fixed. Great.

    Take a look at these 2 pics and let me know your thoughts. It seems that adding a 4th day impacted the shape of one of the days with a higher number of samples. I would have expected it to remain rock steady the same. (picture typo... lower left chart is the old plugin)

    2019-01-09_1316_no1.png

    2019-01-09_1319_no2.png

    P.S. If you haven't seen this link already, great selection of different visual presentations. All that in BI would be brilliant. http://bl.ocks.org/asielen/1a5e8d77ae8feb464167

  8. Daniel Marsh-Patrick repo owner

    Hi Marc, and thanks for the feedback. The block you supplied was actually my original inspiration for this visual (did you know there's a v3?).

    I'm not 100% sure on adding more functionality similar to this at the moment - I have already coded the clamping feature, but it's unpredictable with different sampling rates, so it's turned-off internally until I can do some more work with it. Andrew's example is the same if you clone the code and start to play with the values a bit - it's usefulness is going to vary quite significantly depending on the number of data points. What I learned very quickly is that Andrew's example is great for the data he's provided but when you start to look at all the possibilities, it starts to get hard to make it behave correctly, which is probably why no-one's attempted it in Power BI before, until I decided to stick with it...

    Based on how the custom visuals APIs work and the complexity required to manage properties for a number of different visuals within one would work, the beeswarm and scatter plots would be better being implemented as their own custom visual. There's also at least two other dedicated box plot visuals in the marketplace, and the MAQ Software one is incredibly comprehensive and probably better than I could do within the confines of this one.

    But anyway, on to your other questions...

    Regarding the adding of a new value with manual bandwidth, and it's default - we're kind of limited here by the custom visuals framework and what it allows us to do. That being said, we could try keeping the existing manual bandwidth field (if Bandwidth By Category is not selected) and use whatever's in here if you haven't manually specified a value for any new categories that might pop in. In this case, you could specify a 'default' of 1 for any new catgories that hadn't been specified. Would that work for you?

    Regarding the estimated bandwidth scenario, the extension of the bottom of the chart is due to the KDE algorithm not always being able to converge at zero due to Javascript floating-point issues, and this has reached the cap-off point we manually put in to stop if going on forever (again, you'll see similar behaviour in Andrew's block if you put your data into it). I'm continuing to look into this over time to see if we can improve it. I would not expect to see this change for an existing series unless something in the newly-added category has an overall effect on the profile of the data, although there may be something hanging-over from the 'common bandwidth' code that existed previously. Is it possible to privately get a copy of your report so that I can test in situ? I appreciate that this may not be possible but there's usually no better way to get to the root of the problem.

    Anyway, let me know what you think, although I think I'll code back in the 'default bandwidth' option for new categories, as it seems fairly sensible.

  9. Daniel Marsh-Patrick repo owner

    Hi Marc - I have uploaded a new release candiate. This follows the same process as before and will create another distinct visual in the image palette (tooltip should read 1.1.0-RC2). I'm hoping that this matches your requirements for manually specified bandwidth and makes more sense.

    Behaviour changed as follows:

    • If using Specify Bandwidth, the Bandwidth property will remain
    • Type your default in here
    • Any new categories that get added will pick up this value
    • You can 'reset' a category to use the global manual value by deleting out the value in there (highlight all text and delete), which will refresh it with the value from above

    I've updated the mouseover text for the Bandwidth and individual category values to try and explain this; would appreciate your feedback in confirming the help text makes sense also.

  10. Marc Ford

    Tested the new one. Looks pretty good to me. Offers about as much flexibility as you could ask for.

    I also tried the barcode option. Looks pretty good as well. Wouldn't mind option to display the "mean" circle or similar in it.

    That's about it, puts me in business, so thank you. Not sure if it could be incorporated or would need to be a new plugin, but the beeswarm with an option to color by a newly selected category would be very useful and more easy to digest than traditional scatterplot. pic... A combination of the category colors on the left with the more organized/ordered view on the right (keeping colors in order red-red-red-blue-blue-green-orange)

    2019-01-10_0818.png

  11. Daniel Marsh-Patrick repo owner

    Awesome - thanks for your help in validating for me!

    I'll package this up and get it sent off to Microsoft (will look into the Mean plotting on the barcode before I do) - usually takes a couple of weeks to make it all the way through to the marketplace but I'll update this ticket with details of progress. Here's what typically happens:

    • MS upload files
    • Visual submitted to AppSource and QA'd by those folks
    • Once approved, the visual shows as updated in the marketplace listing, but the changes do not actually trickle through until the visuals team have done compatibility testing
    • At this point, any reports using an instance of the marketplace visual will automatically update next time they are accessed in either desktop or the service - no further steps are required to add in from your side (it's just confusing while the custom visuals team are doing the last bit)

    Cheers,

    Daniel

  12. Log in to comment