-.. index:: Distributions

-:obj:`Distribution` and derived classes store empirical

-distributions of discrete and continuous variables.

- store absolute or relative frequencies. It provides a convenience constructor

- which constructs instances of derived classes. ::

- >>> data = Orange.data.Table("adult_sample")

- >>> disc = Orange.statistics.distribution.Distribution("workclass", data)

- <685.000, 72.000, 28.000, 29.000, 59.000, 43.000, 2.000>

- <type 'DiscDistribution'>

- The resulting distribution is of type :obj:`DiscDistribution` since variable

- `workclass` is discrete. The printed numbers are counts of examples that have particular

- >>> workclass = data.domain["workclass"]

- >>> for i in range(len(workclass.values)):

- ... print "%20s: %5.3f" % (workclass.values[i], disc[i])

- Self-emp-not-inc: 72.000

- Distributions resembles dictionaries, supporting indexing by instances of

- :obj:`Orange.data.Value`, integers or floats (depending on the distribution

- type), and symbolic names (if :obj:`variable` is defined).

- For instance, the number of examples with `workclass="private"`, can be

- obtained in three ways::

- print "Private: ", disc["Private"]

- print "Private: ", disc[0]

- print "Private: ", disc[orange.Value(workclass, "Private")]

- Elements cannot be removed from distributions.

- Length of distribution equals the number of possible values for discrete

- distributions (if :obj:`variable` is set), the value with the highest index

- encountered (if distribution is discrete and :obj: `variable` is

- :obj:`None`) or the number of different values encountered (for continuous

- .. attribute:: variable

- Variable to which the distribution applies; may be :obj:`None` if not

- .. attribute:: unknowns

- The number of instances for which the value of the variable was

- Sum of all elements in the distribution. Usually it equals either

- :obj:`cases` if the instance stores absolute frequencies or 1 if the

- stored frequencies are relative, e.g. after calling :obj:`normalize`.

- The number of instances from which the distribution is computed,

- excluding those on which the value was undefined. If instances were

- weighted, this is the sum of weights.

- .. attribute:: normalized

- :obj:`True` if distribution is normalized.

- .. attribute:: random_generator

- A pseudo-random number generator used for method :obj:`Orange.misc.Random`.

- .. method:: __init__(variable[, data[, weightId=0]])

- Construct either :obj:`DiscDistribution` or :obj:`ContDistribution`,

- depending on the variable type. If the variable is the only argument, it

- must be an instance of :obj:`Orange.feature.Descriptor`. In that case,

- an empty distribution is constructed. If data is given as well, the

- variable can also be specified by name or index in the

- domain. Constructor then computes the distribution of the specified

- variable on the given data. If instances are weighted, the id of

- meta-attribute with weights can be passed as the third argument.

- If variable is given by descriptor, it doesn't need to exist in the

- domain, but it must be computable from given instances. For example, the

- variable can be a discretized version of a variable from data.

- Return a list of possible values (if distribution is discrete and

- :obj:`variable` is set) or a list encountered values otherwise.

- Return a list of frequencies of values such as described above.

- Return a list of pairs of elements of the above lists.

- Return the distribution as a list (for discrete distributions) or as a

- dictionary (for continuous distributions)

- .. method:: add(value[, weight=1])

- Increase the count of the element corresponding to ``value`` by

- :type value: :obj:`Orange.data.Value`, string (if :obj:`variable` is set), :obj:`int` for discrete distributions or :obj:`float` for continuous distributions

- :param weight: Weight to be added to the count for ``value``

- .. method:: normalize()

- Divide the counts by their sum, set :obj:`normalized` to :obj:`True` and

- :obj:`abs` to 1. Attributes :obj:`cases` and :obj:`unknowns` are

- unchanged. This changes absoluted frequencies into relative.

- Return the most common value. If there are multiple such values, one is

- chosen at random, although the chosen value will always be the same for

- Return a random value based on the stored empirical probability

- distribution. For continuous distributions, this will always be one of

- the values which actually appeared (e.g. one of the values from

- The method uses :obj:`random_generator`. If none has been constructed or

- assigned yet, a new one is constructed and stored for further use.

- Stores a discrete distribution of values. The class differs from its parent

- class in having a few additional constructors.

- .. method:: __init__(variable)

- Construct an instance of :obj:`Discrete` and set the variable

- :param variable: A discrete variable

- :type variable: Orange.feature.Discrete

- .. method:: __init__(frequencies)

- Construct an instance and initialize the frequencies from the list, but

- leave `Distribution.variable` empty.

- :param frequencies: A list of frequencies

- :type frequencies: list

- Distribution constructed in this way can be used, for instance, to

- generate random numbers from a given discrete distribution::

- disc = Orange.statistics.distribution.Discrete([0.5, 0.3, 0.2])

- This prints out approximatelly ten 0's, six 1's and four 2's. The values

- can be named by assigning a variable::

- v = orange.EnumVariable(values = ["red", "green", "blue"])

- .. method:: __init__(distribution)

- Copy constructor; makes a shallow copy of the given distribution

- :param distribution: An existing discrete distribution

- :type distribution: Discrete

- Stores a continuous distribution, that is, a dictionary-like structure with

- values and their frequencies.

- .. method:: __init__(variable)

- Construct an instance of :obj:`ContDistribution` and set the variable

- :param variable: A continuous variable

- :type variable: Orange.feature.Continuous

- .. method:: __init__(frequencies)

- Construct an instance of :obj:`Continuous` and initialize it from

- the given dictionary with frequencies, whose keys and values must be integers.

- :param frequencies: Values and their corresponding frequencies

- :type frequencies: dict

- .. method:: __init__(distribution)

- Copy constructor; makes a shallow copy of the given distribution

- :param distribution: An existing continuous distribution

- :type distribution: Continuous

- Return the average value. Note that the average can also be

- computed using a simpler and faster classes from module

- :obj:`Orange.statistics.basic`.

- Return the variance of distribution.

- Return the standard deviation.

- Return the standard error.

- .. method:: percentile(p)

- Return the value at the `p`-th percentile.

- :param p: The percentile, must be between 0 and 100

- For example, if `d_age` is a continuous distribution, the quartiles can

- print "Quartiles: %5.3f - %5.3f - %5.3f" % (

- dage.percentile(25), dage.percentile(50), dage.percentile(75))

- Return the probability density at `x`. If the value is not in

- :obj:`Distribution.keys`, it is interpolated.

- A class imitating :obj:`Continuous` by returning the statistics and

- densities for Gaussian distribution. The class is not meant only for a

- convenient substitution for code which expects an instance of

- :obj:`Distribution`. For general use, Python module :obj:`random`

- provides a comprehensive set of functions for various random distributions.

- The mean value parameter of the Gauss distribution.

- The standard deviation of the distribution

- The simulated number of instances; in effect, the Gaussian distribution

- density, as returned by method :obj:`density` is multiplied by

- .. method:: __init__([mean=0, sigma=1])

- Construct an instance, set :obj:`mean` and :obj:`sigma` to the given

- values and :obj:`abs` to 1.

- .. method:: __init__(distribution)

- Construct a distribution which approximates the given distribution,

- which must be either :obj:`Continuous`, in which case its

- average and deviation will be used for mean and sigma, or and existing

- :obj:`GaussianDistribution`, which will be copied. Attribute :obj:`abs`

- is set to the given distribution's ``abs``.

- Return square of :obj:`sigma`.

- Return the density at point ``x``, that is, the Gaussian distribution

- density multiplied by :obj:`abs`.

-There is a convenience function for computing empirical class distributions from

-.. function:: getClassDistribution(data[, weightID=0])

- Return a class distribution for the given data.

- :param data: A set of instances.

- :type data: Orange.data.Table

- :param weightID: An id for meta attribute with weights of instances

- :rtype: :obj:`Discrete` or :obj:`Continuous`, depending on the class type

-Distributions of all variables

-==============================

-Distributions of all variables can be computed and stored in

-:obj:`Domain`. The list-like object can be indexed by variable

-indices in the domain, as well as by variables and their names.

- .. method:: __init__(data[, weightID=0])

- Construct an instance with distributions of all discrete and continuous

- variables from the given data.

- :param data: A set of instances.

- :type data: Orange.data.Table

- :param weightID: An id for meta attribute with weights of instances

-The script below computes distributions for all attributes in the data and

-prints out distributions for discrete and averages for continuous attributes. ::

- dist = Orange.statistics.distribution.Domain(data)

- if d.variable.var_type == Orange.feature.Type.Discrete:

- print "%30s: %s" % (d.variable.name, d)

- print "%30s: avg. %5.3f" % (d.variable.name, d.average())

-The distribution for, say, attribute `age` can be obtained by its index and also

from Orange.core import Distribution

from Orange.core import DiscDistribution as Discrete

from Orange.core import ContDistribution as Continuous