Source

orange / docs / reference / rst / Orange.classification.logreg.rst

Full commit

Logistic regression (logreg)

Logistic regression is a statistical classification method that fits data to a logistic function. Orange provides various enhancement of the method, such as stepwise selection of variables and handling of constant variables and singularities.

A logistic regression classification model. Stores estimated values of regression coefficients and their significances, and uses them to predict classes and class probabilities.

:obj:`LogRegFitter` is the abstract base class for logistic fitters. Fitters can be called with a data table and return a vector of coefficients and the corresponding statistics, or a status signifying an error. The possible statuses are

The sole fitter available at the moment. This is a C++ translation of Alan Miller's logistic regression code that uses Newton-Raphson algorithm to iteratively minimize least squares error computed from training data.

Examples

The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`).

Result:

Classification accuracy: 0.778282598819

class attribute = survived
class values = <no, yes>

    Attribute       beta  st. error     wald Z          P OR=exp(beta)

    Intercept      -1.23       0.08     -15.15      -0.00
 status=first       0.86       0.16       5.39       0.00       2.36
status=second      -0.16       0.18      -0.91       0.36       0.85
 status=third      -0.92       0.15      -6.12       0.00       0.40
    age=child       1.06       0.25       4.30       0.00       2.89
   sex=female       2.42       0.14      17.04       0.00      11.25

The next examples shows how to handle singularities in data sets (:download:`logreg-singularities.py <code/logreg-singularities.py>`).

The first few lines of the output of this script are:

<=50K <=50K
<=50K <=50K
<=50K <=50K
>50K >50K
<=50K >50K

class attribute = y
class values = <>50K, <=50K>

                           Attribute       beta  st. error     wald Z          P OR=exp(beta)

                           Intercept       6.62      -0.00       -inf       0.00
                                 age      -0.04       0.00       -inf       0.00       0.96
                              fnlwgt      -0.00       0.00       -inf       0.00       1.00
                       education-num      -0.28       0.00       -inf       0.00       0.76
             marital-status=Divorced       4.29       0.00        inf       0.00      72.62
        marital-status=Never-married       3.79       0.00        inf       0.00      44.45
            marital-status=Separated       3.46       0.00        inf       0.00      31.95
              marital-status=Widowed       3.85       0.00        inf       0.00      46.96
marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63
    marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19
             occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72

If :obj:`remove_singular` is set to 0, inducing a logistic regression classifier returns an error:

Traceback (most recent call last):
  File "logreg-singularities.py", line 4, in <module>
    lr = classification.logreg.LogRegLearner(table, removeSingular=0)
  File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner
    return lr(examples, weightID)
  File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__
    lr = learner(examples, weight)
orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked

The attribute variable which causes the singularity is workclass.

The example below shows how the use of stepwise logistic regression can help to gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`):

The output of this script is:

Learner      CA
logistic     0.841
filtered     0.846

Number of times attributes were used in cross-validation:
 1 x a21
10 x a22
 8 x a23
 7 x a24
 1 x a25
10 x a26
10 x a27
 3 x a28
 7 x a29
 9 x a31
 2 x a16
 7 x a12
 1 x a32
 8 x a15
10 x a14
 4 x a17
 7 x a30
10 x a11
 1 x a10
 1 x a13
10 x a34
 2 x a19
 1 x a18
10 x a3
10 x a5
 4 x a4
 4 x a7
 8 x a6
10 x a9
10 x a8