Wiki

Clone wiki

Power BI Visuals - Violin Plot / Usage and Visual Properties

Home | Usage and Visual Properties | Examples | Release Notes and Known Issues | Roadmap | Privacy Policy | Support


Where possible, the visual utilises properties and concepts common to typical Power BI visuals but there are some specific properties that you might be able to get more use out of if correctly explained.

Fields

In order to plot a chart, you will need the following added to the visual fields:

Field Purpose Requirements Max. Data Points Data Reduction Algorithm*
Sampling Field to represent the individual level of each value. One 30,000 Top
Measure Data Measure to analyse across all Sampling instances. One
Category Field to categorise yor data by. Optional (max. one) 100

If the above requirements are not met, then the chart will not render for you.

* The Data Reduction method applies if the number of data points for that field are exceeded. This is to keep the rendered chart sensible for lots of data. In these cases, it is suggested that you create multiple instances of the chart if you wish to exceed these amounts.

Recent changes in the custom visuals API will allow us to load more data and it is intended to introduce this feature once the API stabilises.

For an overview of how these data reduction algorithms are applied, you can refer to the PowerBI-Visuals documentation.

Category limitations are a special case - refer to More About Category Limitations below for more details.

About Sampling

Power BI will aggregate values if something is not supplied to split them by. We therefore need a field to make sure these values are kept separate so that we can do the correct analysis. The Sampling field lets us do this.

Good values for Sampling would either be a unique ID such as a primary key or an index column in the query for your table.

Categories

The plot allows you to split your data by category, by adding a suitable field into the Category box. This will plot a violin for each unique value, e.g.:

categories.png

Colors

If providing categories, the Data Colors menu will offer a By Category option. Select this to apply individual colors, e.g.:

category_colours.png

Note that Legend options also become available if you are specifying Category.

More About Category Limitations

As noted above, categories are limited to a maximum of 100 unique values per visual instance.

The primary reason for this is that the kernel density estimation (KDE) for each unique category is computationally expensive. This can cause the browser to become unresponsive for an undesirable amount of time if too many are added - this can be frustrating for users that might be exploring their data and have too many unique category values, or unintentionally getting the Sampling and Category values the wrong way around (it happens!).

Note that it's also possible that the expected number of categories might not be plotted if your data exceeds the maximum number of samples (30,000).

In these cases, it is recommended that you filter your data accordingy and use multiple instances of the visual.

Sorting

If specifying a category, the visual will sort in ascending order of category name by default. You can change this by using the Sorting menu, which becomes visible if your plot contains categories.

Current options are:

  • Category
  • Samples
  • Median
  • Mean
  • Maximum
  • Minimum

For any sort value, you can specify Ascending or Descending order.

Tuning Your Violin

The distribution of your data can have a very big effect on the shape, or "effectiveness" of your kernel density plot.

For instance, there are a number of different kernels that can be applied to your data when smoothing.

The sampling resolution (or bins) can also have an effect on the shape your line may take.

The kernel also has a bandwidth parameter, which can have a sometimes drastic effect on the resulting estimate produced.

As a result, it is tricky to produce a good "out of the box" violin plot and as such, this visual provides some options in the Violin Options menu to help those who need a bit more control.

Kernels

The visual provides 4 different kernels. Here's an example on how they look with the same data set:

kernels.png

Sampling Resolution

There are 3 different options for the sampling resolution:

sampling_resolution.png

The above visuals are using the Epanechnikov (default) kernel, with an overriden bandwidth (more on that below), in order to help illustrate the differences in resolution.

At higher resolutions, more features within the data are attempted to be identified but this may sometimes not be desirable, depending on your vizualisation objectives.

The distribution and features of your data will invariably have an effect on the usefulness of this in all cases, but provides you with another facet on which to apply your analysis.

Bandwidth

The "correct" bandwidth can be hard to determine automatically, so the visual will by default apply a "rule-of-thumb" estimation of the bandwidth and apply this to your data. If you wish to tune this bandwidth, then you can select the Specify Bandwidth option to override this.

For example, here's the previous example with its estimated bandwidth value of 4.8, and further examples of how overridden values will have an effect on the plot:

bandwidth.png

As you can see, results can vary significantly when the bandwidth is modified. The estimated one in this case does quite a good job, but sometimes tuning it may help to further assist with identifying features in a more granular way, or to help smooth out the plot a bit more, in the case of overestimation.

Identifying Bandwidth Values

You can see the bandwidth values by enabling the KDE Bandwidth option in the Tooltip menu.

If Specify Bandwidth is enabled, both the specified and estimated bandwidth will be shown in the tooltip, to help you to compare your manually specified value vs. what the visual calculated.

Applying Bandwidth by Category

If you're plotting by multiple categories, the Bandwidth by Category property is made available underneath the other bandwidth properties. If Specify Bandwith is set to OFF, then this will calculate the "rule-of-thumb" estimation against each individual category, e.g.:

bandwidth-by-category.png

If Specify Bandwidth is set to ON, then you will have the option to provide a manual value for each category, e.g.:

manual-bandwidth-by-category.png

Note that the original Bandwidth property serves as a default for all categories, unless that particular category has been manually overridden. If you wish to 'revert' a category you've manualy specified back to the default, delete the entire value so that the field goes blank. It will replace itself with the default.

Combo Plots

The Combo Plot properties menu contains a Plot Type setting, which allows you to specify options for the accompanying plot. The two options are detailed here, but you can also turn the combo plot off, should you wish to do so.

Box Plot (Default)

This will display the conventional box plot that would normally be part of a violin plot. A number of other options are provided in order to help you tailor this to your liking, e.g.:

box-options.png

Barcode Plot

This will render your individual data points as a barcode (or strip) plot within the violin. This can also be further customised, e.g.:

barcode-options.png

Tooltips

Why are There 2 Tooltip Property Menus?

This visual contains 2 tooltip-related property menus:

  • Tooltip (the standard Power BI menu)
  • Default Tooltip Details (known as the Tooltip menu in versions of this visual prior to 1.2.0)

Power BI Desktop (March 2019) introduced improved tooltip formatting options. We recently saw the introduction of report page tooltips as well.

In order to take advantage of these features in the violin plot, we need to add in the standard Tooltip menu from Power BI. This will allow you to customise the tooltip look and feel - including report page tooltips, if you would like to use them - in the same way you would for other visuals.

Custom visuals cannot integrate well with standard property menus at this time (we can add properties but can't use the behaviour of standard properties to show/hide others). As such, it hinders the user experience somewhat if we can't disable or enable our custom properties in this way, so we have kept our custom tooltip properties to their own menu.

If using a Default-type tooltip (i.e. not a report page) then the violin plot-specific tooltip data points are configurable under the Default Tooltip Details menu. Any properties you had set previously will be as before the new version of the visual, if your reports were originally using an older version.

Customising Default Tooltips

The Default Tooltip Details menu provides a number of data points you can enable or disable as required:

  • Maximum & Minimum (Default)
  • Span (Min to Max)
  • Median (Default)
  • Mean (Default)
  • Standard Deviation (Default)
  • Upper & Lower Quartiles
  • Inter Quartile Range
  • Whisker (Confidence) Values
  • KDE Bandwidth (Refer to Bandwidth above for more details)
  • (If using Barcode combo plot) Data Point Value

Updated