Source

ml-workshop / ml-workshop.txt

Full commit
Machine Learning Workshop
=========================
:author: Ragib Morshed <rmorshed@adconion.com>, Miki Tebeka <miki@adconion.com>
:backend: slidy
:max-width: 45em
:data-uri:
:icons:

The Required Humor
------------------
image:dilbert.gif[]

''''

What is the difference between statistics, machine learning, AI and data mining?

* If there are up to 3 variables, it is statistics.
* If the problem is NP-complete, it is machine learning.
* If the problem is PSPACE- complete, it is AI.
* If you don't know what is PSPACE-complete, it is data mining


Machine Learning Overview
-------------------------
* What is Machine Learning (ML)?
* Supervised vs Unsupervised Learning.
* Regression, Classification, and Clustering.
* Quick note on cross validation.
=====================================================================
* +Why is ML trending?+
** +Large complex systems with tons of data, can be difficult to write
everything by hand.+
** +Demand for content personalization.+
** +Very applied.+
=====================================================================

What is Machine Learning (ML)?
------------------------------
* Algorithms that improved their performance at some task with experience.
** experience === data
** task === classification, filtering, prediction, ...
* Tom Mitchell's definition:
A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves
with experience E
* Key Issues
** Training and test data.
** Performance measure.
* Machine learning in action
** Document classification
** Spam filtering


Supervised Vs Unsupervised learning
------------------------------------
* Supervised Learning
** Learning with a teacher. Labeled data.
** Examples: Classification, Regression
* Unsupervised Learning
** Learning without a teacher. No labeled data.
** Examples: Clustering, Dimensionality reduction.
* Other categories
** Semi-supervised learning.
** Active learning.
** Reinforcement Learning.

Regression
-----------
* Predict a continuous variable based on some features.
** link:http://en.wikipedia.org/wiki/Linear_regression[Linear Regression].
** link:http://en.wikipedia.org/wiki/Support_vector_machine[Support Vector Machines (SVM)].

Classification
--------------
* Predict a discrete variable based on some features.
** link:http://en.wikipedia.org/wiki/Decision_tree_learning[Decision Trees].
** link:http://en.wikipedia.org/wiki/Random_forests[Random Forests].
** link:http://en.wikipedia.org/wiki/Naive_bayes[Naive Bayes].
** link:http://en.wikipedia.org/wiki/Logistic_regression[Logistic Regression].

Clustering
----------
* Group related features/data points together based on some "measure" of similarity
or dissimilarity.
** link:http://en.wikipedia.org/wiki/K-means_clustering[K Means clustering].

image:scikit-learn-logo.png[]
-----------------------------
* Part of link:http://scipy.org/[SciPy]
* link:http://scikit-learn.org/stable/modules/classes.html[Many machine learning algorithms]
** Both supervised and unsupervised
* Provides a 
  link:http://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.html[Lego like]
  framework
* Oh, and it's fast ...
** K-means clustering on sparse 8493*1005686=8,541,291,198 matrix took 4.5min
** 1M vector multiplication done in about 2.5msec
* Comes with 
  link:http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets[several datasets]
  to play with


Workshop
--------
Open your link:http://localhost:8888[IPython notebooks!]

[source,bash,numbered]
---------------------------------------------------
ipython notebook --pylab=inline
---------------------------------------------------

* http://nbviewer.ipython.org/url/dl.dropbox.com/u/706094/ml-workshop/01-Numpy.ipynb[NumPy]
* http://nbviewer.ipython.org/url/dl.dropbox.com/u/706094/ml-workshop/02-Regression.ipynb[Regression]
* http://nbviewer.ipython.org/url/dl.dropbox.com/u/706094/ml-workshop/03-Classification.ipynb[Classification]
* http://nbviewer.ipython.org/url/dl.dropbox.com/u/706094/ml-workshop/04-RandomForest.ipynb[Random Forest]
* http://nbviewer.ipython.org/url/dl.dropbox.com/u/706094/ml-workshop/05-Clustering.ipynb[Clustering]
* http://nbviewer.ipython.org/url/dl.dropbox.com/u/706094/ml-workshop/06-Pipeline.ipynb[Pipeline]


References
----------
* link:https://www.coursera.org/course/ml[Coursera Machine Learning course]
* link:http://scikit-learn.org/stable/index.html[scikit-learn]
* link:http://astroml.github.com/sklearn_tutorial/[scikit-learn tutorial]
* link:http://numpy.scipy.org/[NumPy]
* link:http://matplotlib.sourceforge.net/[Matplotlib]
* link:http://ipython.org/[IPython]
* link:http://scipy.org/[SciPy]


'''

This presentation was made with
link:http://www.methods.co.nz/asciidoc/[asciidoc] using the
link:http://www.w3.org/Talks/Tools/Slidy2/Overview.html[slidy] backend and
link:http://pygments.org/[Pygments] syntax highlighter.

Thank You
---------
image:scikit-learn-logo.png[]

// vim: ft=asciidoc spell