Commits

Blaz Zupan committed eca373f

new tutorial (refresh after removal of old files)

  • Participants
  • Parent commits 7b3d84b

Comments (0)

Files changed (42)

File docs/tutorial/rst/classification.rst

+Classification
+==============
+
+.. index:: classification
+.. index:: 
+   single: data mining; supervised
+
+Much of Orange is devoted to machine learning methods for classification, or supervised data mining. These methods rely on
+the data with class-labeled instances, like that of senate voting. Here is a code that loads this data set, displays the first data instance and shows its predicted class (``republican``)::
+
+   >>> data = Orange.data.Table("voting")
+   >>> data[0]
+   ['n', 'y', 'n', 'y', 'y', 'y', 'n', 'n', 'n', 'y', '?', 'y', 'y', 'y', 'n', 'y', 'republican']
+   >>> data[0].get_class()
+   <orange.Value 'party'='republican'>
+
+Learners and Classifiers
+------------------------
+
+.. index::
+   single: classification; learner
+.. index::
+   single: classification; classifier
+.. index::
+   single: classification; naive Bayesian classifier
+
+Classification uses two types of objects: learners and classifiers. Learners consider class-labeled data and return a classifier. Given a data instance (a vector of feature values), classifiers return a predicted class::
+
+    >>> import Orange
+    >>> data = Orange.data.Table("voting")
+    >>> learner = Orange.classification.bayes.NaiveLearner()
+    >>> classifier = learner(data)
+    >>> classifier(data[0])
+    <orange.Value 'party'='republican'>
+
+Above, we read the data, constructed a `naive Bayesian learner <http://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_, gave it the data set to construct a classifier, and used it to predict the class of the first data item. We also use these concepts in the following code that predicts the classes of the first five instances in the data set:
+
+.. literalinclude: code/classification-classifier1.py
+   :lines: 4-
+
+The script outputs::
+
+    republican; originally republican
+    republican; originally republican
+    republican; originally democrat
+      democrat; originally democrat
+      democrat; originally democrat
+
+Naive Bayesian classifier has made a mistake in the third instance, but otherwise predicted correctly. No wonder, since this was also the data it trained from.
+
+Probabilistic Classification
+----------------------------
+
+To find out what is the probability that the classifier assigns
+to, say, democrat class, we need to call the classifier with
+additional parameter that specifies the output type. If this is ``Orange.classification.Classifier.GetProbabilities``, the classifier will output class probabilities:
+
+.. literalinclude: code/classification-classifier2.py
+   :lines: 4-
+
+The output of the script also shows how badly the naive Bayesian classifier missed the class for the thrid data item::
+
+   Probabilities for democrat:
+   0.000; originally republican
+   0.000; originally republican
+   0.005; originally democrat
+   0.998; originally democrat
+   0.957; originally democrat
+
+Cross-Validation
+----------------
+
+.. index:: cross-validation
+
+Validating the accuracy of classifiers on the training data, as we did above, serves demonstration purposes only. Any performance measure that assess accuracy should be estimated on the independent test set. Such is also a procedure called `cross-validation <http://en.wikipedia.org/wiki/Cross-validation_(statistics)>`_, which averages performance estimates across several runs, each time considering a different training and test subsets as sampled from the original data set:
+
+.. literalinclude: code/classification-cv.py
+   :lines: 3-
+
+.. index::
+   single: classification; scoring
+.. index::
+   single: classification; area under ROC
+.. index::
+   single: classification; accuracy
+
+Cross-validation is expecting a list of learners. The performance estimators also return a list of scores, one for every learner. There was just one learner in the script above, hence the list of size one was used. The script estimates classification accuracy and area under ROC curve. The later score is very high, indicating a very good performance of naive Bayesian learner on senate voting data set::
+
+   Accuracy: 0.90
+   AUC:      0.97
+
+
+Handful of Classifiers
+----------------------
+
+Orange includes wide range of classification algorithms, including:
+
+- logistic regression (``Orange.classification.logreg``)
+- k-nearest neighbors (``Orange.classification.knn``)
+- support vector machines (``Orange.classification.svm``)
+- classification trees (``Orange.classification.tree``)
+- classification rules (``Orange.classification.rules``)
+
+Some of these are included in the code that estimates the probability of a target class on a testing data. This time, training and test data sets are disjoint:
+
+.. index::
+   single: classification; logistic regression
+.. index::
+   single: classification; trees
+.. index::
+   single: classification; k-nearest neighbors
+
+.. literalinclude: code/classification-other.py
+
+For these five data items, there are no major differences between predictions of observed classification algorithms::
+
+   Probabilities for republican:
+   original class  tree      k-NN      lr       
+   republican      0.949     1.000     1.000
+   republican      0.972     1.000     1.000
+   democrat        0.011     0.078     0.000
+   democrat        0.015     0.001     0.000
+   democrat        0.015     0.032     0.000
+
+The following code cross-validates several learners. Notice the difference between this and the code above. Cross-validation requires learners, while in the script above, learners were immediately given the data and the calls returned classifiers.
+
+.. literalinclude: code/classification-cv2.py
+
+Logistic regression wins in area under ROC curve::
+
+            nbc  tree lr  
+   Accuracy 0.90 0.95 0.94
+   AUC      0.97 0.94 0.99
+
+Reporting on Classification Models
+----------------------------------
+
+Classification models are objects, exposing every component of its structure. For instance, one can traverse classification tree in code and observe the associated data instances, probabilities and conditions. It is often, however, sufficient, to provide textual output of the model. For logistic regression and trees, this is illustrated in the script below:
+
+.. literalinclude: code/classification-models.py
+
+   The logistic regression part of the output is:
+   
+   class attribute = survived
+   class values = <no, yes>
+
+         Feature       beta  st. error     wald Z          P OR=exp(beta)
+   
+       Intercept      -1.23       0.08     -15.15      -0.00
+    status=first       0.86       0.16       5.39       0.00       2.36
+   status=second      -0.16       0.18      -0.91       0.36       0.85
+    status=third      -0.92       0.15      -6.12       0.00       0.40
+       age=child       1.06       0.25       4.30       0.00       2.89
+      sex=female       2.42       0.14      17.04       0.00      11.25
+
+Trees can also be rendered in `dot <http://en.wikipedia.org/wiki/DOT_language>`_::
+
+   tree.dot(file_name="0.dot", node_shape="ellipse", leaf_shape="box")
+
+Following figure shows an example of such rendering.
+
+.. image:: files/tree.png
+   :alt: A graphical presentation of a classification tree

File docs/tutorial/rst/code/assoc1.py

+import orngAssoc
+import Orange
+
+data = Orange.data.Table("imports-85")
+data = Orange.data.Table("zoo")
+#data = Orange.data.preprocess.Discretize(data, \
+#  method=Orange.data.discretization.EqualFreq(numberOfIntervals=3))
+# data = data.select(range(10))
+
+rules = Orange.associate.AssociationRulesInducer(data, support=0.4)
+
+print "%i rules with support higher than or equal to %5.3f found.\n" % (len(rules), 0.4)
+
+orngAssoc.sort(rules, ["support", "confidence"])
+
+orngAssoc.printRules(rules[:5], ["support", "confidence"])
+print
+
+del rules[:3]
+orngAssoc.printRules(rules[:5], ["support", "confidence"])
+print

File docs/tutorial/rst/code/assoc2.py

+# Description: Association rule sorting and filtering
+# Category:    description
+# Uses:        imports-85
+# Classes:     orngAssoc.build, Preprocessor_discretize, EquiNDiscretization
+# Referenced:  assoc.htm
+
+import orngAssoc
+import Orange
+
+data = Orange.data.Table("imports-85")
+data = Orange.data.preprocess.Discretize(data, \
+  method=Orange.data.discretization.EqualFreq(numberOfIntervals=3))
+data = data.select(range(10))
+
+rules = Orange.associate.AssociationRulesInducer(data, support=0.4)
+
+n = 5
+print "%i most confident rules:" % (n)
+orngAssoc.sort(rules, ["confidence", "support"])
+orngAssoc.printRules(rules[0:n], ['confidence', 'support', 'lift'])
+
+conf = 0.8; lift = 1.1
+print "\nRules with confidence>%5.3f and lift>%5.3f" % (conf, lift)
+rulesC = rules.filter(lambda x: x.confidence > conf and x.lift > lift)
+orngAssoc.sort(rulesC, ['confidence'])
+orngAssoc.printRules(rulesC, ['confidence', 'support', 'lift'])

File docs/tutorial/rst/code/bagging.py

+# Description: An implementation of bagging (only bagging class is defined here)
+# Category:    modelling
+# Referenced:  c_bagging.htm
+
+import random
+import Orange
+
+def Learner(examples=None, **kwds):
+    learner = apply(Learner_Class, (), kwds)
+    if examples:
+        return learner(examples)
+    else:
+        return learner
+
+class Learner_Class:
+    def __init__(self, learner, t=10, name='bagged classifier'):
+        self.t = t
+        self.name = name
+        self.learner = learner
+
+    def __call__(self, examples, weight=None):
+        r = random.Random()
+        r.seed(0)
+
+        n = len(examples)
+        classifiers = []
+        for i in range(self.t):
+            selection = []
+            for j in range(n):
+                selection.append(r.randrange(n))
+            data = examples.getitems(selection)
+            classifiers.append(self.learner(data))
+            
+        return Classifier(classifiers = classifiers, name=self.name, domain=examples.domain)
+
+class Classifier:
+    def __init__(self, **kwds):
+        self.__dict__.update(kwds)
+
+    def __call__(self, example, resultType = Orange.classification.Classifier.GetValue):
+        freq = [0.] * len(self.domain.classVar.values)
+        for c in self.classifiers:
+            freq[int(c(example))] += 1
+        index = freq.index(max(freq))
+        value = Orange.data.Value(self.domain.classVar, index)
+        for i in range(len(freq)):
+            freq[i] = freq[i]/len(self.classifiers)
+        if resultType == Orange.classification.Classifier.GetValue: return value
+        elif resultType == Orange.classification.Classifier.GetProbabilities: return freq
+        else: return (value, freq)
+        

File docs/tutorial/rst/code/bagging_test.py

+# Description: Test for bagging as defined in bagging.py
+# Category:    modelling
+# Uses:        adult_sample.tab, bagging.py
+# Referenced:  c_bagging.htm
+# Classes:     orngTest.crossValidation
+
+import bagging
+import Orange
+data = Orange.data.Table("adult_sample.tab")
+
+tree = Orange.classification.tree.TreeLearner(mForPrunning=10, minExamples=30)
+tree.name = "tree"
+baggedTree = bagging.Learner(learner=tree, t=5)
+
+learners = [tree, baggedTree]
+
+results = Orange.evaluation.testing.cross_validation(learners, data, folds=5)
+for i in range(len(learners)):
+    print "%s: %5.3f" % (learners[i].name, Orange.evaluation.scoring.CA(results)[i])

File docs/tutorial/rst/code/classification-classifier1.py

+import Orange
+
+data = Orange.data.Table("voting")
+classifier = Orange.classification.bayes.NaiveLearner(data)
+for d in data[:5]:
+    c = classifier(d)
+    print "%10s; originally %s" % (classifier(d), d.getclass())

File docs/tutorial/rst/code/classification-classifier2.py

+import Orange
+
+data = Orange.data.Table("voting")
+classifier = Orange.classification.bayes.NaiveLearner(data)
+target = 1
+print "Probabilities for %s:" % data.domain.class_var.values[target]
+for d in data[:5]:
+    ps = classifier(d, Orange.classification.Classifier.GetProbabilities)
+    print "%5.3f; originally %s" % (ps[target], d.getclass())

File docs/tutorial/rst/code/classification-cv.py

+import Orange
+
+data = Orange.data.Table("voting")
+bayes = Orange.classification.bayes.NaiveLearner()
+res = Orange.evaluation.testing.cross_validation([bayes], data, folds=5)
+print "Accuracy: %.2f" % Orange.evaluation.scoring.CA(res)[0]
+print "AUC:      %.2f" % Orange.evaluation.scoring.AUC(res)[0]

File docs/tutorial/rst/code/classification-cv2.py

+import Orange
+
+data = Orange.data.Table("voting")
+
+tree = Orange.classification.tree.TreeLearner(sameMajorityPruning=1, mForPruning=2)
+tree.name = "tree"
+nbc = Orange.classification.bayes.NaiveLearner()
+nbc.name = "nbc"
+lr = Orange.classification.logreg.LogRegLearner()
+lr.name = "lr"
+
+learners = [nbc, tree, lr]
+print " "*9 + " ".join("%-4s" % learner.name for learner in learners)
+res = Orange.evaluation.testing.cross_validation(learners, data, folds=5)
+print "Accuracy %s" % " ".join("%.2f" % s for s in Orange.evaluation.scoring.CA(res))
+print "AUC      %s" % " ".join("%.2f" % s for s in Orange.evaluation.scoring.AUC(res))

File docs/tutorial/rst/code/classification-models.py

+import Orange
+
+data = Orange.data.Table("titanic")
+lr = Orange.classification.logreg.LogRegLearner(data)
+print Orange.classification.logreg.dump(lr)
+
+tree = Orange.classification.tree.TreeLearner(data)
+print tree.to_string()

File docs/tutorial/rst/code/classification-other.py

+import Orange
+import random
+
+data = Orange.data.Table("housing")
+test = Orange.data.Table(random.sample(data, 5))
+train = Orange.data.Table([d for d in data if d not in test])
+
+tree = Orange.regression.tree.TreeLearner(train, same_majority_pruning=1, m_pruning=2)
+tree.name = "tree"
+knn = Orange.classification.knn.kNNLearner(train, k=21)
+knn.name = "k-NN"
+lr = Orange.classification.logreg.LogRegLearner(train)
+lr.name = "lr"
+
+classifiers = [tree, knn, lr]
+
+target = 0
+print "Probabilities for %s:" % data.domain.class_var.values[target]
+print "original class ",
+print " ".join("%-9s" % l.name for l in classifiers)
+
+return_type = Orange.classification.Classifier.GetProbabilities
+for d in test:
+    print "%-15s" % (d.getclass()),
+    print "     ".join("%5.3f" % c(d, return_type)[target] for c in classifiers)

File docs/tutorial/rst/code/data-domain1.py

+import Orange
+
+data = Orange.data.Table("imports-85.tab")
+m = len(data.domain.features)
+m_cont = sum(1 for x in data.domain.features if x.var_type==Orange.feature.Type.Continuous)
+m_disc = sum(1 for x in data.domain.features if x.var_type==Orange.feature.Type.Discrete)
+m_disc = len(data.domain.features)
+print "%d features, %d continuous and %d discrete" % (m, m_cont, m-m_cont)
+
+print "First three features:"
+for i in range(3):
+    print "   ", data.domain.features[i].name
+
+print "First three features (again):"
+for x in data.domain.features[:3]:
+    print "   ", x.name
+
+print "Class:", data.domain.class_var.name

File docs/tutorial/rst/code/data-domain2.py

+import Orange
+
+data = Orange.data.Table("imports-85.tab")
+
+print "Name of the first feature:", data.domain[0].name
+name = 'fuel-type'
+print "Values of feature '%s'" % name,
+print data.domain[name].values

File docs/tutorial/rst/code/data-featureselection.py

+import Orange
+
+data = Orange.data.Table("iris.tab")
+new_domain = Orange.data.Domain(data.domain.features[:2] + [data.domain.class_var])
+new_data = Orange.data.Table(new_domain, data)
+
+print data[0]
+print new_data[0]

File docs/tutorial/rst/code/data-instances1.py

+import Orange
+
+data = Orange.data.Table("iris")
+print "First three data instances:"
+for d in data[:3]:
+    print d
+
+print "25-th data instance:"
+print data[26]
+
+name = "sepal width"
+print "Value of '%s' for the first instance:" % name, data[0][name]
+print "The 3rd value of the 25th data instance:", data[26][2]

File docs/tutorial/rst/code/data-instances2.py

+import Orange
+
+average = lambda xs: sum(xs)/float(len(xs))
+
+data = Orange.data.Table("iris")
+print "%-15s %s" % ("Feature", "Mean")
+for x in data.domain.features:
+    print "%-15s %.2f" % (x.name, average([d[x] for d in data]))

File docs/tutorial/rst/code/data-instances3.py

+import Orange
+
+average = lambda xs: sum(xs)/float(len(xs))
+
+data = Orange.data.Table("iris")
+targets = data.domain.class_var.values
+print "%-15s %s" % ("Feature", " ".join("%15s" % c for c in targets))
+for x in data.domain.features:
+    dist = ["%15.2f" % average([d[x] for d in data if d.get_class()==c]) for c in targets]
+    print "%-15s" % x.name, " ".join(dist)

File docs/tutorial/rst/code/data-instances4.py

+import Orange
+from collections import Counter
+
+data = Orange.data.Table("lenses")
+print Counter(str(d.get_class()) for d in data)

File docs/tutorial/rst/code/data-lenses.py

+import Orange
+data = Orange.data.Table("lenses")
+print "Attributes:", ", ".join(x.name for x in data.domain.features)
+print "Class:", data.domain.class_var.name
+print "Data instances", len(data)
+
+target = "soft"
+print "Data instances with %s prescriptions:" % target
+for d in data:
+    if d.get_class() == target:
+        print " ".join(["%-15s" % str(v) for v in d])
+

File docs/tutorial/rst/code/data-missing.py

+import Orange
+
+data = Orange.data.Table("voting.tab")
+for x in data.domain.features:
+    n_miss = sum(1 for d in data if d[x].is_special())
+    print "%4.1f%% %s" % (100.*n_miss/len(data), x.name)

File docs/tutorial/rst/code/data-save.py

+import Orange
+data = Orange.data.Table("lenses")
+print "N1=%d" % len(data)
+new_data = Orange.data.Table([d for d in data if d["prescription"]=="myope"])
+print "N2=%d" %len(new_data)
+new_data.save("lenses-subset.tab")

File docs/tutorial/rst/code/data-subsetting.py

+import Orange
+
+data = Orange.data.Table("iris.tab")
+new_data = Orange.data.Table([d for d in data if d["petal length"]>3.0])
+print "Subsetting from %d to %d instances." % (len(data), len(new_data))

File docs/tutorial/rst/code/ensemble-bagging.py

+import Orange
+
+data = Orange.data.Table("promoters")
+
+tree = Orange.classification.tree.TreeLearner(m_pruning=2, name="tree")
+boost = Orange.ensemble.boosting.BoostedLearner(tree, name="boost")
+bagg = Orange.ensemble.bagging.BaggedLearner(tree, name="bagg")
+
+learners = [tree, boost, bagg]
+results = Orange.evaluation.testing.cross_validation(learners, data, folds=10)
+for l, s in zip(learners, Orange.evaluation.scoring.AUC(results)):
+    print "%5s: %.2f" % (l.name, s)

File docs/tutorial/rst/code/ensemble-cmd.py

+import Orange
+data = Orange.data.Table("housing")
+tree = Orange.classification.tree.TreeLearner()
+# btree = Orange.ensemble.boosting.BoostedLearner(tree)
+btree = Orange.ensemble.bagging.BaggedLearner(tree)
+#btree
+#btree(data)
+model = btree(data)
+print model(data[0])

File docs/tutorial/rst/code/ensemble-forest.py

+import Orange
+
+data = Orange.data.Table("promoters")
+
+bayes = Orange.classification.bayes.NaiveLearner(name="bayes")
+knn = Orange.classification.knn.kNNLearner(name="knn")
+forest = Orange.ensemble.forest.RandomForestLearner(name="forest")
+
+learners = [forest, bayes, knn]
+res = Orange.evaluation.testing.cross_validation(learners, data, 5)
+print "\n".join(["%6s: %5.3f" % (l.name, r) for r, l in zip(Orange.evaluation.scoring.AUC(res), learners)])

File docs/tutorial/rst/code/ensemble-stacking.py

+import Orange
+
+data = Orange.data.Table("promoters")
+
+bayes = Orange.classification.bayes.NaiveLearner(name="bayes")
+tree = Orange.classification.tree.SimpleTreeLearner(name="tree")
+knn = Orange.classification.knn.kNNLearner(name="knn")
+
+base_learners = [bayes, tree, knn]
+stack = Orange.ensemble.stacking.StackedClassificationLearner(base_learners)
+
+learners = [stack, bayes, tree, knn]
+res = Orange.evaluation.testing.cross_validation(learners, data, 10)
+print "\n".join(["%8s: %5.3f" % (l.name, r) for r, l in zip(Orange.evaluation.scoring.AUC(res), learners)])

File docs/tutorial/rst/code/fss6.py

+# Author:      B Zupan
+# Version:     1.0
+# Description: Same as fss5.py but uses FilterRelieff class from orngFSS
+# Category:    preprocessing
+# Uses:        adult_saple.tab
+# Referenced:  o_fss.htm
+
+import orngFSS
+import Orange
+data = Orange.data.Table("adult_sample.tab")
+
+def report_relevance(data):
+  m = Orange.feature.scoring.score_all(data)
+  for i in m:
+    print "%5.3f %s" % (i[1], i[0])
+
+print "Before feature subset selection (%d attributes):" % len(data.domain.attributes)
+report_relevance(data)
+data = Orange.data.Table("adult_sample.tab")
+
+marg = 0.01
+filter = Orange.feature.selection.FilterRelief(margin=marg)
+ndata = filter(data)
+print "\nAfter feature subset selection with margin %5.3f (%d attributes):" % (marg, len(ndata.domain.attributes))
+report_relevance(ndata)

File docs/tutorial/rst/code/fss7.py

+# Author:      B Zupan
+# Version:     1.0
+# Description: Shows the use of feature subset selection and compares
+#              plain naive Bayes (with discretization) and the same classifier but with
+#              feature subset selection. On crx data set, both classifiers achieve similarly
+#              accuracy but naive Bayes with feature subset selection uses substantially
+#              less features. Wrappers FilteredLearner and DiscretizedLearner are used,
+#              and example illustrates how to analyze classifiers used in ten-fold cross
+#              validation (how many and which attributes were used?).
+# Category:    preprocessing
+# Uses:        crx.tab
+# Referenced:  o_fss.htm
+
+import orngFSS
+import Orange
+
+data = Orange.data.Table("crx.tab")
+
+bayes = Orange.classification.bayes.NaiveLearner()
+dBayes = Orange.feature.discretization.DiscretizedLearner(bayes, name='disc bayes')
+fss = Orange.feature.selection.FilterAboveThreshold(threshold=0.05)
+fBayes = Orange.feature.selection.FilteredLearner(dBayes, filter=fss, name='bayes & fss')
+
+learners = [dBayes, fBayes]
+results = Orange.evaluation.testing.cross_validation(learners, data, folds=10, storeClassifiers=1)
+
+# how many attributes did each classifier use?
+
+natt = [0.] * len(learners)
+for fold in range(results.numberOfIterations):
+  for lrn in range(len(learners)):
+    natt[lrn] += len(results.classifiers[fold][lrn].domain.attributes)
+for lrn in range(len(learners)):
+  natt[lrn] = natt[lrn] / 10.
+
+print "\nLearner         Accuracy  #Atts"
+for i in range(len(learners)):
+  print "%-15s %5.3f     %5.2f" % (learners[i].name, Orange.evaluation.scoring.CA(results)[i], natt[i])
+
+# which attributes were used in filtered case?
+
+print '\nAttribute usage (in how many folds attribute was used?):'
+used = {}
+for fold in range(results.numberOfIterations):
+  for att in results.classifiers[fold][1].domain.attributes:
+    a = att.name
+    if a in used.keys(): used[a] += 1
+    else: used[a] = 1
+for a in used.keys():
+  print '%2d x %s' % (used[a], a)

File docs/tutorial/rst/code/lenses.tab

+age	prescription	astigmatic	tear_rate	lenses
+discrete	discrete	discrete	discrete	discrete
+				class
+young	myope	no	reduced	none
+young	myope	no	normal	soft
+young	myope	yes	reduced	none
+young	myope	yes	normal	hard
+young	hypermetrope	no	reduced	none
+young	hypermetrope	no	normal	soft
+young	hypermetrope	yes	reduced	none
+young	hypermetrope	yes	normal	hard
+pre-presbyopic	myope	no	reduced	none
+pre-presbyopic	myope	no	normal	soft
+pre-presbyopic	myope	yes	reduced	none
+pre-presbyopic	myope	yes	normal	hard
+pre-presbyopic	hypermetrope	no	reduced	none
+pre-presbyopic	hypermetrope	no	normal	soft
+pre-presbyopic	hypermetrope	yes	reduced	none
+pre-presbyopic	hypermetrope	yes	normal	none
+presbyopic	myope	no	reduced	none
+presbyopic	myope	no	normal	none
+presbyopic	myope	yes	reduced	none
+presbyopic	myope	yes	normal	hard
+presbyopic	hypermetrope	no	reduced	none
+presbyopic	hypermetrope	no	normal	soft
+presbyopic	hypermetrope	yes	reduced	none
+presbyopic	hypermetrope	yes	normal	none

File docs/tutorial/rst/code/py-score-features.py

+import Orange
+
+data = Orange.data.Table("promoters")
+gain = Orange.feature.scoring.InfoGain()
+best = [f for _, f in sorted((gain(x, data), x) for x in data.domain.features)[-5:]]
+print "Features:", len(data.domain.features)
+print "Best ones:", ", ".join([x.name for x in best])

File docs/tutorial/rst/code/py-small.py

+import Orange
+
+class SmallLearner(Orange.classification.PyLearner):
+    def __init__(self, base_learner=Orange.classification.bayes.NaiveLearner,
+                 name='small', m=5):
+        self.name = name
+        self.m   = m
+        self.base_learner = base_learner
+
+    def __call__(self, data, weight=None):
+        gain = Orange.feature.scoring.InfoGain()
+        m = min(self.m, len(data.domain.features))
+        best = [f for _, f in sorted((gain(x, data), x) for x in data.domain.features)[-m:]]
+        domain = Orange.data.Domain(best + [data.domain.class_var])
+
+        model = self.base_learner(Orange.data.Table(domain, data), weight)
+        return Orange.classification.PyClassifier(classifier=model, name=self.name)
+
+class OptimizedSmallLearner(Orange.classification.PyLearner):
+    def __init__(self, name="opt_small", ms=range(1,30,3)):
+        self.ms = ms
+        self.name = name
+
+    def __call__(self, data, weight=None):
+        scores = []
+        for m in self.ms:
+            res = Orange.evaluation.testing.cross_validation([SmallLearner(m=m)], data, folds=5)
+            scores.append((Orange.evaluation.scoring.AUC(res)[0], m))
+        _, best_m = max(scores)
+
+        return SmallLearner(data, m=best_m)
+
+data = Orange.data.Table("promoters")
+s_learner = SmallLearner(m=3)
+classifier = s_learner(data)
+print classifier(data[20])
+print classifier(data[20], Orange.classification.Classifier.GetProbabilities)
+
+nbc = Orange.classification.bayes.NaiveLearner(name="nbc")
+s_learner = SmallLearner(m=3)
+o_learner = OptimizedSmallLearner()
+
+learners = [o_learner, s_learner, nbc]
+res = Orange.evaluation.testing.cross_validation(learners, data, folds=10)
+print ", ".join("%s: %.3f" % (l.name, s) for l, s in zip(learners, Orange.evaluation.scoring.AUC(res)))
+

File docs/tutorial/rst/code/regression-cv.py

+import Orange
+
+data = Orange.data.Table("housing.tab")
+
+lin = Orange.regression.linear.LinearRegressionLearner()
+lin.name = "lin"
+earth = Orange.regression.earth.EarthLearner()
+earth.name = "mars"
+tree = Orange.regression.tree.TreeLearner(m_pruning = 2)
+tree.name = "tree"
+
+learners = [lin, earth, tree]
+
+res = Orange.evaluation.testing.cross_validation(learners, data, folds=5)
+mse = Orange.evaluation.scoring.RMSE(res)
+
+print "Learner  RMSE"
+for i in range(len(learners)):
+  print "%-7s %5.2f" % (learners[i].name, mse[i])

File docs/tutorial/rst/code/regression-other.py

+import Orange
+import random
+
+data = Orange.data.Table("housing")
+test = Orange.data.Table(random.sample(data, 5))
+train = Orange.data.Table([d for d in data if d not in test])
+
+lin = Orange.regression.linear.LinearRegressionLearner(train)
+lin.name = "lin"
+earth = Orange.regression.earth.EarthLearner(train)
+earth.name = "mars"
+tree = Orange.regression.tree.TreeLearner(train)
+tree.name = "tree"
+
+models = [lin, earth, tree]
+
+print "y    " + " ".join("%-4s" % l.name for l in models)
+for d in test[:3]:
+    print "%.1f" % (d.get_class()),
+    print " ".join("%4.1f" % model(d) for model in models)

File docs/tutorial/rst/code/regression-tree.py

+import Orange
+
+data = Orange.data.Table("housing.tab")
+tree = Orange.regression.tree.TreeLearner(data, m_pruning=2., min_instances=20)
+print tree.to_string()

File docs/tutorial/rst/code/regression.py

+import Orange
+
+data = Orange.data.Table("housing")
+learner = Orange.regression.linear.LinearRegressionLearner()
+model = learner(data)
+
+print "pred obs"
+for d in data[:3]:
+    print "%.1f %.1f" % (model(d), d.get_class())

File docs/tutorial/rst/conf.py

+# -*- coding: utf-8 -*-
+#
+# tutorial documentation build configuration file, created by
+# sphinx-quickstart on Fri Jul 16 13:29:06 2010.
+#
+# This file is execfile()d with the current directory set to its containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import os, sys
+
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")))
+
+from conf import *
+
+TITLE = "%s v%s" % ("Orange Tutorial", VERSION)
+
+html_title = TITLE
+epub_title = TITLE
+
+latex_documents = [
+    ('index', 'reference.tex', TITLE,
+     AUTHOR, 'manual'),
+    ]
+

File docs/tutorial/rst/data.rst

+The Data
+========
+
+.. index: data
+
+This section describes how to load and save the data. We also show how to explore the data, its domain description, how to report on basic data set statistics, and how to sample the data.
+
+Data Input
+----------
+
+.. index:: 
+   single: data; input
+
+Orange can read files in native and other data formats. Native format starts with feature (attribute) names, their type (continuous, discrete, string). The third line contains meta information to identify dependent features (class), irrelevant features (ignore) or meta features (meta). Here are the first few lines from a data set :download:`lenses.tab <code/lenses.tab>` on prescription of eye
+lenses [CJ1987]::
+
+   age       prescription  astigmatic    tear_rate     lenses
+   discrete  discrete      discrete      discrete      discrete 
+                                                       class
+   young     myope         no            reduced       none
+   young     myope         no            normal        soft
+   young     myope         yes           reduced       none
+   young     myope         yes           normal        hard
+   young     hypermetrope  no            reduced       none
+
+
+Values are tab-limited. The data set has four attributes (age of the patient, spectacle prescription, notion on astigmatism, and information on tear production rate) and an associated three-valued dependent variable encoding lens prescription for the patient (hard contact lenses, soft contact lenses, no lenses). Feature descriptions could use one letter only, so the header of this data set could also read::
+
+   age       prescription  astigmatic    tear_rate     lenses
+   d         d             d             d             d 
+                                                       c
+
+The rest of the table gives the data. Note that there are 5
+instances in our table above (check the original file to see
+other). Orange is rather free in what attribute value names it
+uses, so they do not need all to start with a letter like in our
+example.
+
+You may download :download:`lenses.tab <code/lenses.tab>` to a target directory and there open a python shell. Alternatively, just execute the code below; this particular data set comes with Orange instalation, and Orange knows where to find it:
+
+    >>> import Orange
+    >>> data = Orange.data.Table("lenses")
+    >>>
+
+Note that for the file name no suffix is needed; as Orange checks if any files in the current directory are of the readable type. The call to ``Orange.data.Table`` creates an object called ``data`` that holds your data set and information about the lenses domain:
+
+>>> print data.domain.features
+<Orange.feature.Discrete 'age', Orange.feature.Discrete 'prescription', Orange.feature.Discrete 'astigmatic', Orange.feature.Discrete 'tear_rate'>
+>>> print data.domain.class_var
+Orange.feature.Discrete 'lenses'
+>>> for d in data[:3]:
+   ...:     print d
+   ...:
+['young', 'myope', 'no', 'reduced', 'none']
+['young', 'myope', 'no', 'normal', 'soft']
+['young', 'myope', 'yes', 'reduced', 'none']
+>>>
+
+The following script wraps-up everything we have done so far and lists first 5 data instances with ``soft`` perscription:
+
+.. literalinclude:: code/data-lenses.py
+
+Note that data is an object that holds both the data and information on the domain. We show above how to access attribute and class names, but there is much more information there, including that on feature type, set of values for categorical features, and other.
+
+Saving the Data
+---------------
+
+Data objects can be saved to a file:
+
+>>> data.save("new_data.tab")
+>>>
+
+This time, we have to provide the extension for Orange to know which data format to use. An extension for native Orange's data format is ".tab". The following code saves only the data items with myope perscription:
+
+.. literalinclude:: code/data-save.py
+
+Exploration of Data Domain
+--------------------------
+
+.. index::
+   single: data; features
+.. index::
+   single: data; domain
+.. index::
+   single: data; class
+
+Data table object stores information on data instances as well as on data domain. Domain holds the names of features, optional classes, their types and, if categorical, value names.
+
+.. literalinclude:: code/data-domain1.py
+
+Orange's objects often behave like Python lists and dictionaries, and can be indexed or accessed through feature names.
+
+.. literalinclude:: code/data-domain2.py
+    :lines: 5-
+
+Data Instances
+--------------
+
+.. index::
+   single: data; instances
+.. index::
+   single: data; examples
+
+Data table stores data instances (or examples). These can be index or traversed as any Python list. Data instances can be considered as vectors, accessed through element index, or through feature name.
+
+.. literalinclude:: code/data-instances1.py
+
+The script above displays the following output::
+
+   First three data instances:
+   [5.1, 3.5, 1.4, 0.2, 'Iris-setosa']
+   [4.9, 3.0, 1.4, 0.2, 'Iris-setosa']
+   [4.7, 3.2, 1.3, 0.2, 'Iris-setosa']
+   25-th data instance:
+   [5.0, 3.4, 1.6, 0.4, 'Iris-setosa']
+   Value of 'sepal width' for the first instance: 3.5
+   The 3rd value of the 25th data instance: 1.6
+
+Iris data set we have used above has four continous attributes. Here's a script that computes their mean:
+
+.. literalinclude:: code/data-instances2.py
+   :lines: 3-
+
+Above also illustrates indexing of data instances with objects that store features; in ``d[x]`` variable ``x`` is an Orange object. Here's the output::
+
+   Feature         Mean
+   sepal length    5.84
+   sepal width     3.05
+   petal length    3.76
+   petal width     1.20
+
+
+Slightly more complicated, but more interesting is a code that computes per-class averages:
+
+.. literalinclude:: code/data-instances3.py
+   :lines: 3-
+
+Of the four features, petal width and length look quite discriminative for the type of iris::
+
+   Feature             Iris-setosa Iris-versicolor  Iris-virginica
+   sepal length               5.01            5.94            6.59
+   sepal width                3.42            2.77            2.97
+   petal length               1.46            4.26            5.55
+   petal width                0.24            1.33            2.03
+
+Finally, here is a quick code that computes the class distribution for another data set:
+
+.. literalinclude:: code/data-instances4.py
+
+Missing Values
+--------------
+
+.. index::
+   single: data; missing values
+
+Consider the following exploration of senate voting data set::
+
+   >>> data = Orange.data.Table("voting.tab")
+   >>> data[2]
+   ['?', 'y', 'y', '?', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'n', 'y', 'y', 'n', 'n', 'democrat']
+   >>> data[2][0].is_special()
+   1
+   >>> data[2][1].is_special()
+   0
+
+The particular data instance included missing data (represented with '?') for first and fourth feature. We can use the method ``is_special()`` to detect parts of the data which is missing. In the original data set file, the missing values are, by default, represented with a blank space. We use the method ``is_special()`` below to examine each feature and report on proportion of instances for which this feature was undefined:
+
+.. literalinclude:: code/data-missing.py
+
+First few lines of the output of this script are::
+
+    2.8% handicapped-infants
+   11.0% water-project-cost-sharing
+    2.5% adoption-of-the-budget-resolution
+    2.5% physician-fee-freeze
+    3.4% el-salvador-aid
+
+A single-liner that reports on number of data instances with at least one missing value is::
+
+    >>> sum(any(d[x].is_special() for x in data.domain.features) for d in data)
+    203
+
+
+Data Subsetting
+---------------
+
+.. index::
+   single: data; subsetting
+
+``Orange.data.Table`` accepts a list of data items and returns a new data set. This is useful for any data subsetting:
+
+.. literalinclude:: code/data-subsetting.py
+   :lines: 3-
+
+The code outputs::
+
+   Subsetting from 150 to 99 instances.
+
+and inherits the data description (domain) from the original data set. Changing the domain requires setting up a new domain descriptor. This feature is useful for any kind of feature selection:
+
+.. literalinclude:: code/data-featureselection.py
+   :lines: 3-
+
+.. index::
+   single: feature; selection
+
+By default, ``Orange.data.Domain`` assumes that last feature in argument feature list is a class variable. This can be changed with an optional argument::
+
+   >>> nd = Orange.data.Domain(data.domain.features[:2], False)
+   >>> print nd.class_var
+   None
+   >>> nd = Orange.data.Domain(data.domain.features[:2], True)
+   >>> print nd.class_var
+   Orange.feature.Continuous 'sepal width'
+
+The first call to ``Orange.data.Domain`` constructed the classless domain, while the second used the last feature and constructed the domain with one input feature and a continous class.   
+
+**References**
+
+.. [CJ1987] Cendrowska J (1987) PRISM: An algorithm for inducing modular rules, International Journal of Man-Machine Studies, 27, 349-370.

File docs/tutorial/rst/ensembles.rst

+.. index:: ensembles
+
+Ensembles
+=========
+
+`Learning of ensembles <http://en.wikipedia.org/wiki/Ensemble_learning>`_ combines the predictions of separate models to gain in accuracy. The models may come from different training data samples, or may use different learners on the same data sets. Learners may also be diversified by changing their parameter sets.
+
+In Orange, ensembles are simply wrappers around learners. They behave just like any other learner. Given the data, they return models that can predict the outcome for any data instance::
+
+   >>> import Orange
+   >>> data = Orange.data.Table("housing")
+   >>> tree = Orange.classification.tree.TreeLearner()
+   >>> btree = Orange.ensemble.bagging.BaggedLearner(tree)
+   >>> btree
+   BaggedLearner 'Bagging'
+   >>> btree(data)
+   BaggedClassifier 'Bagging'
+   >>> btree(data)(data[0])
+   <orange.Value 'MEDV'='24.6'>
+
+The last line builds a predictor (``btree(data)``) and then uses it on a first data instance.
+
+Most ensemble methods can wrap either classification or regression learners. Exceptions are task-specialized techniques such as boosting.
+
+Bagging and Boosting
+--------------------
+
+.. index:: 
+   single: ensembles; bagging
+
+`Bootstrap aggregating <http://en.wikipedia.org/wiki/Bootstrap_aggregating>`_, or bagging, samples the training data uniformly and with replacement to train different predictors. Majority vote (classification) or mean (regression) across predictions then combines independent predictions into a single prediction. 
+
+.. index:: 
+   single: ensembles; boosting
+
+In general, boosting is a technique that combines weak learners into a single strong learner. Orange implements `AdaBoost <http://en.wikipedia.org/wiki/AdaBoost>`_, which assigns weights to data instances according to performance of the learner. AdaBoost uses these weights to iteratively sample the instances to focus on those that are harder to classify. In the aggregation AdaBoost emphases individual classifiers with better performance on their training sets.
+
+The following script wraps a classification tree in boosted and bagged learner, and tests the three learner through cross-validation:
+
+.. literalinclude:: code/ensemble-bagging.py
+
+The benefit of the two ensembling techniques, assessed in terms of area under ROC curve, is obvious::
+
+    tree: 0.83
+   boost: 0.90
+    bagg: 0.91
+
+Stacking
+--------
+
+.. index:: 
+   single: ensembles; stacking
+
+Consider we partition a training set into held-in and held-out set. Assume that our taks is prediction of y, either probability of the target class in classification or a real value in regression. We are given a set of learners. We train them on held-in set, and obtain a vector of prediction on held-out set. Each element of the vector corresponds to prediction of individual predictor. We can now learn how to combine these predictions to form a target prediction, by training a new predictor on a data set of predictions and true value of y in held-out set. The technique is called `stacked generalization <http://en.wikipedia.org/wiki/Ensemble_learning#Stacking>`_, or in short stacking. Instead of a single split to held-in and held-out data set, the vectors of predictions are obtained through cross-validation.
+
+Orange provides a wrapper for stacking that is given a set of base learners and a meta learner:
+
+.. literalinclude:: code/ensemble-stacking.py
+   :lines: 3-
+
+By default, the meta classifier is naive Bayesian classifier. Changing this to logistic regression may be a good idea as well::
+
+    stack = Orange.ensemble.stacking.StackedClassificationLearner(base_learners, \
+               meta_learner=Orange.classification.logreg.LogRegLearner)
+
+Stacking is often better than each of the base learners alone, as also demonstrated by running our script::
+
+   stacking: 0.967
+      bayes: 0.933
+       tree: 0.836
+        knn: 0.947
+
+Random Forests
+--------------
+
+.. index:: 
+   single: ensembles; random forests
+
+`Random forest <http://en.wikipedia.org/wiki/Random_forest>`_ ensembles tree predictors. The diversity of trees is achieved in randomization of feature selection for node split criteria, where instead of the best feature one is picked arbitrary from a set of best features. Another source of randomization is a bootstrap sample of data from which the threes are developed. Predictions from usually several hundred trees are aggregated by voting. Constructing so many trees may be computationally demanding. Orange uses a special tree inducer (Orange.classification.tree.SimpleTreeLearner, considered by default) optimized for speed in random forest construction: 
+
+.. literalinclude:: code/ensemble-forest.py
+   :lines: 3-
+
+Random forests are often superior when compared to other base classification or regression learners::
+
+   forest: 0.976
+    bayes: 0.935
+      knn: 0.952

File docs/tutorial/rst/files/tree.png

Added
New image

File docs/tutorial/rst/index.rst

+###############
+Orange Tutorial
+###############
+
+This is a gentle introduction on scripting in Orange. Orange is a Python `Python <http://www.python.org/>`_ library, and the tutorial is a guide through Orange scripting in this language.
+
+We here assume you have already `downloaded and installed Orange <http://orange.biolab.si/download/>`_ and have a working version of Python. Python scripts can run in a terminal window, integrated environments like `PyCharm <http://www.jetbrains.com/pycharm/>`_ and `PythonWin <http://wiki.python.org/moin/PythonWin>`_,
+or shells like `iPython <http://ipython.scipy.org/moin/>`_. Whichever environment you are using, try now to import Orange. Below, we used a Python shell::
+
+   % python
+   >>> import Orange
+   >>> Orange.version.version
+   '2.6a2.dev-a55510d'
+   >>>
+
+If this leaves no error and warning, Orange and Python are properly
+installed and you are ready to continue with this Tutorial.
+
+********
+Contents
+********
+
+.. toctree::
+   :maxdepth: 1
+
+   data.rst
+   classification.rst
+   regression.rst
+   ensembles.rst
+   python-learners.rst
+
+****************
+Index and Search
+****************
+
+* :ref:`genindex`
+* :ref:`search`

File docs/tutorial/rst/python-learners.rst

+Learners in Python
+==================
+
+.. index::
+   single: classifiers; in Python
+
+Orange comes with plenty classification and regression algorithms. But its also fun to make the new ones. You can build them anew, or wrap existing learners and add some preprocessing to construct new variants. Notice that learners in Orange have to adhere to certain rules. Let us observe them on a classification algorithm::
+
+   >>> import Orange
+   >>> data = Orange.data.Table("titanic")
+   >>> learner = Orange.classification.logreg.LogRegLearner()
+   >>> classifier = learner(data)
+   >>> classifier(data[0])
+   <orange.Value 'survived'='no'>
+
+When learner is given the data it returns a predictor. In our case, classifier. Classifiers are passed data instances and return a value of a class. They can also return probability distribution, or this together with a class value::
+
+   >>> classifier(data[0], Orange.classification.Classifier.GetProbabilities)
+   Out[26]: <0.593, 0.407>
+   >>> classifier(data[0], Orange.classification.Classifier.GetBoth)
+   Out[27]: (<orange.Value 'survived'='no'>, <0.593, 0.407>)
+
+Regression is similar, just that the regression model would return only the predicted continuous value.
+
+Notice also that the constructor for the learner can be given the data, and in that case it will construct a classifier (what else could it do?)::
+
+   >>> classifier = Orange.classification.logreg.LogRegLearner(data)
+   >>> classifier(data[42])
+   <orange.Value 'survived'='no'>
+
+Now we are ready to build our own learner. We will do this for a classification problem.
+
+Classifier with Feature Selection
+---------------------------------
+
+Consider a naive Bayesian classifiers. They do perform well, but could loose accuracy when there are many features, especially when these are correlated. Feature selection can help. We may want to wrap naive Bayesian classifier with feature subset selection, such that it would learn only from the few most informative features. We will assume the data contains only discrete features and will score them with information gain. Here is an example that sets the scorer (``gain``) and uses it to find best five features from the classification data set:
+
+.. literalinclude:: code/py-score-features.py
+   :lines: 3-
+
+We need to incorporate the feature selection within the learner, at the point where it gets the data. Learners for classification tasks inherit from ``Orange.classification.PyLearner``:
+
+.. literalinclude:: code/py-small.py
+   :lines: 3-17
+
+The initialization part of the learner (``__init__``) simply stores the based learner (in our case a naive Bayesian classifier), the name of the learner and a number of features we would like to use. Invocation of the learner (``__call__``) scores the features, stores the best one in the list (``best``), construct a data domain and then uses the one to transform the data (``Orange.data.Table(domain, data)``) by including only the set of the best features. Besides the most informative features we needed to include also the class. The learner then returns the classifier by using a generic classifier ``Orange.classification.PyClassifier``, where the actual prediction model is passed through ``classifier`` argument.
+
+Note that classifiers in Orange also use the weight vector, which records the importance of training data items. This is useful for several algorithms, like boosting.
+
+Let's check if this works::
+
+   >>> data = Orange.data.Table("promoters")
+   >>> s_learner = SmallLearner(m=3)
+   >>> classifier = s_learner(data)
+   >>> classifier(data[20])
+   <orange.Value 'y'='mm'>
+   >>> classifier(data[20], Orange.classification.Classifier.GetProbabilities)
+   <0.439, 0.561>
+
+It does! We constructed the naive Bayesian classifier with only three features. But how do we know what is the best number of features we could use? It's time to construct one more learner.
+
+Estimation of Feature Set Size
+------------------------------
+
+Given a training data, what is the best number of features we could use with a training algorithm? We can estimate that through cross-validation, by checking possible feature set sizes and estimating how well does the classifier on such reduced feature set behave. When we are done, we use the feature sets size with best performance, and build a classifier on the entire training set. This procedure is often referred to as internal cross validation. We wrap it into a new learner:
+
+.. literalinclude:: code/py-small.py
+   :lines: 19-31
+
+Again, our code stores the arguments at initialization (``__init__``). The learner invocation part selects the best value of parameter ``m``, the size of the feature set, and uses it to construct the final classifier.
+
+We can now compare the three classification algorithms. That is, the base classifier (naive Bayesian), the classifier with a fixed number of selected features, and the classifier that estimates the optimal number of features from the training set:
+
+.. literalinclude:: code/py-small.py
+   :lines: 39-45
+
+And the result? The classifier with feature set size wins (but not substantially. The results would be more pronounced if we would run this on the datasets with larger number of features)::
+
+   opt_small: 0.942, small: 0.937, nbc: 0.933
+

File docs/tutorial/rst/regression.rst

+Regression
+==========
+
+.. index:: regression
+
+From the interface point of view, regression methods in Orange are very similar to classification. Both intended for supervised data mining, they require class-labeled data. Just like in classification, regression is implemented with learners and regression models (regressors). Regression learners are objects that accept data and return regressors. Regression models are given data items to predict the value of continuous class:
+
+.. literalinclude:: code/regression.py
+
+
+Handful of Regressors
+---------------------
+
+.. index::
+   single: regression; tree
+
+Let us start with regression trees. Below is an example script that builds the tree from data on housing prices and prints out the tree in textual form:
+
+.. literalinclude:: code/regression-tree.py
+   :lines: 3-
+
+The script outputs the tree::
+   
+   RM<=6.941: 19.9
+   RM>6.941
+   |    RM<=7.437
+   |    |    CRIM>7.393: 14.4
+   |    |    CRIM<=7.393
+   |    |    |    DIS<=1.886: 45.7
+   |    |    |    DIS>1.886: 32.7
+   |    RM>7.437
+   |    |    TAX<=534.500: 45.9
+   |    |    TAX>534.500: 21.9
+
+Following is initialization of few other regressors and their prediction of the first five data instances in housing price data set:
+
+.. index::
+   single: regression; mars
+   single: regression; linear
+
+.. literalinclude:: code/regression-other.py
+   :lines: 3-
+
+Looks like the housing prices are not that hard to predict::
+
+   y    lin  mars tree
+   21.4 24.8 23.0 20.1
+   15.7 14.4 19.0 17.3
+   36.5 35.7 35.6 33.8
+
+Cross Validation
+----------------
+
+Just like for classification, the same evaluation module (``Orange.evaluation``) is available for regression. Its testing submodule includes procedures such as cross-validation, leave-one-out testing and similar, and functions in scoring submodule can assess the accuracy from the testing:
+
+.. literalinclude:: code/regression-other.py
+   :lines: 3-
+
+.. index: 
+   single: regression; root mean squared error
+
+`MARS <http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines>`_ has the lowest root mean squared error::
+
+   Learner  RMSE
+   lin      4.83
+   mars     3.84
+   tree     5.10
+