ml-workshop / ml-workshop.txt

Full commit
Machine Learning Workshop
:author: Ragib Morshed <>, Miki Tebeka <>
:backend: slidy
:max-width: 45em

The Required Humor


What is the difference between statistics, machine learning, AI and data mining?

* If there are up to 3 variables, it is statistics.
* If the problem is NP-complete, it is machine learning.
* If the problem is PSPACE- complete, it is AI.
* If you don't know what is PSPACE-complete, it is data mining

Machine Learning Overview
* What is Machine Learning (ML)?
* Supervised vs Unsupervised Learning.
* Regression, Classification, and Clustering.
* Quick note on cross validation.
* +Why is ML trending?+
** +Large complex systems with tons of data, can be difficult to write
everything by hand.+
** +Demand for content personalization.+
** +Very applied.+

What is Machine Learning (ML)?
* Algorithms that improved their performance at some task with experience.
** experience === data
** task === classification, filtering, prediction, ...
* Tom Mitchell's definition:
A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves
with experience E
* Key Issues
** Training and test data.
** Performance measure.
* Machine learning in action
** Document classification
** Spam filtering

Supervised Vs Unsupervised learning
* Supervised Learning
** Learning with a teacher. Labeled data.
** Examples: Classification, Regression
* Unsupervised Learning
** Learning without a teacher. No labeled data.
** Examples: Clustering, Dimensionality reduction.
* Other categories
** Semi-supervised learning.
** Active learning.
** Reinforcement Learning.

* Predict a continuous variable based on some features.
** link:[Linear Regression].
** link:[Support Vector Machines (SVM)].

* Predict a discrete variable based on some features.
** link:[Decision Trees].
** link:[Random Forests].
** link:[Naive Bayes].
** link:[Logistic Regression].

* Group related features/data points together based on some "measure" of similarity
or dissimilarity.
** link:[K Means clustering].

* Part of link:[SciPy]
* link:[Many machine learning algorithms]
** Both supervised and unsupervised
* Provides a 
  link:[Lego like]
* Oh, and it's fast ...
** K-means clustering on sparse 8493*1005686=8,541,291,198 matrix took 4.5min
** 1M vector multiplication done in about 2.5msec
* Comes with 
  link:[several datasets]
  to play with

Open your link:http://localhost:8888[IPython notebooks!]

ipython notebook --pylab=inline

*[Random Forest]

* link:[Coursera Machine Learning course]
* link:[scikit-learn]
* link:[scikit-learn tutorial]
* link:[NumPy]
* link:[Matplotlib]
* link:[IPython]
* link:[SciPy]


This presentation was made with
link:[asciidoc] using the
link:[slidy] backend and
link:[Pygments] syntax highlighter.

Thank You

// vim: ft=asciidoc spell