Logistic regression (logreg)

Logistic regression is a statistical classification method that fits data to a logistic function. Orange provides various enhancement of the method, such as stepwise selection of variables and handling of constant variables and singularities.

A logistic regression classification model. Stores estimated values of regression coefficients and their significances, and uses them to predict classes and class probabilities.

:obj:LogRegFitter is the abstract base class for logistic fitters. Fitters can be called with a data table and return a vector of coefficients and the corresponding statistics, or a status signifying an error. The possible statuses are

The sole fitter available at the moment. This is a C++ translation of Alan Miller's logistic regression code that uses Newton-Raphson algorithm to iteratively minimize least squares error computed from training data.

Examples

The first example shows a straightforward use a logistic regression (:download:logreg-run.py <code/logreg-run.py>).

Result:

Classification accuracy: 0.778282598819

class attribute = survived
class values = <no, yes>

Attribute       beta  st. error     wald Z          P OR=exp(beta)

Intercept      -1.23       0.08     -15.15      -0.00
status=first       0.86       0.16       5.39       0.00       2.36
status=second      -0.16       0.18      -0.91       0.36       0.85
status=third      -0.92       0.15      -6.12       0.00       0.40
age=child       1.06       0.25       4.30       0.00       2.89
sex=female       2.42       0.14      17.04       0.00      11.25


The next examples shows how to handle singularities in data sets (:download:logreg-singularities.py <code/logreg-singularities.py>).

The first few lines of the output of this script are:

<=50K <=50K
<=50K <=50K
<=50K <=50K
>50K >50K
<=50K >50K

class attribute = y
class values = <>50K, <=50K>

Attribute       beta  st. error     wald Z          P OR=exp(beta)

Intercept       6.62      -0.00       -inf       0.00
age      -0.04       0.00       -inf       0.00       0.96
fnlwgt      -0.00       0.00       -inf       0.00       1.00
education-num      -0.28       0.00       -inf       0.00       0.76
marital-status=Divorced       4.29       0.00        inf       0.00      72.62
marital-status=Never-married       3.79       0.00        inf       0.00      44.45
marital-status=Separated       3.46       0.00        inf       0.00      31.95
marital-status=Widowed       3.85       0.00        inf       0.00      46.96
marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63
marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19
occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72


If :obj:remove_singular is set to 0, inducing a logistic regression classifier returns an error:

Traceback (most recent call last):
File "logreg-singularities.py", line 4, in <module>
lr = classification.logreg.LogRegLearner(table, removeSingular=0)
File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner
return lr(examples, weightID)
File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__
lr = learner(examples, weight)
orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked


The attribute variable which causes the singularity is workclass.

The example below shows how the use of stepwise logistic regression can help to gain in classification performance (:download:logreg-stepwise.py <code/logreg-stepwise.py>):

The output of this script is:

Learner      CA
logistic     0.841
filtered     0.846

Number of times attributes were used in cross-validation:
1 x a21
10 x a22
8 x a23
7 x a24
1 x a25
10 x a26
10 x a27
3 x a28
7 x a29
9 x a31
2 x a16
7 x a12
1 x a32
8 x a15
10 x a14
4 x a17
7 x a30
10 x a11
1 x a10
1 x a13
10 x a34
2 x a19
1 x a18
10 x a3
10 x a5
4 x a4
4 x a7
8 x a6
10 x a9
10 x a8