# Commits

committed 756432e

Completed first draft of lecture 16

• Participants
• Parent commits 7f196c8
• Branches default

# File docs/mlclass-notes/_sources/index.txt

    You can adapt this file completely to your liking, but it should at least
    contain the root toctree directive.

-Welcome to My Machine Learning class notes's documentation!
-===========================================================
+Machine Learning class notes
+============================

 Contents:


# File docs/mlclass-notes/_sources/lecture16.txt


 Lecture 16: Anomaly Detection
 =============================
-
 Video 16-1: Problem Motivation
 ------------------------------


 Question: Is the new engine anomalous or should it receive further testing as the two possibilities in the following graph.

-.. image:: images/aircraft_engines.png
+.. image:: images/lecture16/aircraft_engines.png


 * Assumption is that the dataset provided is non-anomalous or normal

 In the following example, the closer the point is to the inner circle, the higher is the likelihood of it being non-anomalous. On the other hand in the point which is far out (eg. the *x* near the bottom of the image), the likelihood of the engine being anomalous is high.

-.. image:: images/aircraft_engines_2.png
+.. image:: images/lecture16/aircraft_engines_2.png

-Anomaly detection example
-+++++++++++++++++++++++++
+Video 16-2: Anomaly detection example
+-------------------------------------

 One of the most frequent usages of anomaly detection is that of fraud detection.


    p(x;\mu,\sigma^2) = \frac{1}{\sqrt{2 \pi}\  \sigma} exp\left(- \frac{(x - \sigma)^2}{2 \sigma^2}\right)

-.. image:: images/gaussian_probability_distribution.png
+.. image:: images/lecture16/gaussian_probability_distribution.png

 The impact of varying :math:\mu and :math:\sigma^2 on the distribution function is shown in the image below.

-.. image:: images/gaussian_distribution_example.png
+.. image:: images/lecture16/gaussian_distribution_example.png

 The equation for computing the mean is :

 Lets say we have a

 * Training set : :math:{x^{(1)}, ..., x^{(m)}}, and
-* each example is :math:\left(x \epsilon \mathbb{R}^n\right)
+* each example is :math:\left(x \epsilon \mathbb{R}^n\right) ie. has *n* features.

--- End of File -- Lecture not fully documented yet --
+Assume that each feature is distributed as per gaussian probability distribution. ie. :math:x_1 ~ :math:N(\mu_1,\sigma_1^2) and :math:x_2 ~ :math:N(\mu_2,\sigma_2^2) and so on...

+The computed probability is thus

+.. math::
+
+   p(x) = p(x_1;\mu_1,\sigma_1^2) *  p(x_2;\mu_2,\sigma_2^2) * ... *  p(x_n;\mu_n,\sigma_n^2)
+
+.. note::
+
+   Even if the above formula is that for computing probability for independent variables, in practice it works quite well even if the features are not independent.
+
+The above expression could be summarised as
+
+.. math::
+
+   p(x) = \prod_{j=1}^n p(x_j;\mu_j,\sigma_j^2)
+
+.. note::
+
+   The symbol :math:\prod_{j=1}^n is similar to :math:\sum_{j=1}^n except that it computes the product of all the values in the series rather than adding them up.
+
+This computation of the probability :math:p(x) is often referred to as *Density Estimation*.
+
+1. Choose features :math:x_i that you think might be indicative of anomalous examples. Especially choose those for whom either unusually high or unusually low values of :math:x_i might be indicative of existence of anomalies.
+
+1. Fit parameters :math:\mu_1, ..., \mu_n, \sigma_1^2, ..., \sigma_n^2 using
+
+.. math::
+
+   \mu_j = \frac{1}{m} \sum_{i=1}^m x_j^{(i)}
+
+.. math::
+
+   \sigma_j^2 = \frac{1}{m} \sum_{i=1}^m (x_j^{(i)} - \mu_j)^2
+
+The corresponding vectorised implementations is :math:\mu = \frac{1}{m} \sum_{i=1}^m x^{(i)} and :math:\sigma^2 = \frac{1}{m} \sum_{i=1}^m (x^{(i)} - \mu)^2 where :math:x E \mathbb{R}^n
+
+1. Given new example :math:x, compute :math:p(x)
+
+Anomaly detection example
++++++++++++++++++++++++++
+
+Anomaly if :math:p(x) < \epsilon
+
+.. image:: images/lecture16/anomaly_detection_example.png
+
+In the above example, :math:x_1 and :math:x_2 are two different features. The graphs on the right show their gaussian distribution curves, which are different from each other.
+
+At the top left is the plot of the known combinations of :math:x_1 and :math:x_2 (which of course was used to compute the necessary :math:\mu and :math:\sigma^2 values.
+
+The figure at the bottom left shows the effective probability of occurrence of particular combinations of :math:x_1 and :math:x_2. Thus any points in this graph where the height of the point on the surface matching the particular point values of :math:x_1 and :math:x_2  is very low, can be viewed as likely anomalous.
+
+Video 16-4: Developing and evaluating an anomaly detection system
+-----------------------------------------------------------------
+
+* One of the important aspects of being able to develop an anomaly detection system is being able to first have a way of evaluating the anomaly detection system. This can help decide later whether specific feature additions or removals are actually helping or hurting the anomaly detection system.
+
+* The starting point would be some labeled data of anomalous and non-anomalous data (labels being whether the particular case is anomalous or non-anomalous).
+
+* The training set should consist of the non-anomalous subset of the data referredt to above .. these would be :math:x^{(1)},x^{(2)}, ..., x^{(m)}, Ideally this data should *not* contain the anomalous data points. However if a few of them do seep through thats probably not such a big deal.
+
+* On the other hand both the cross validation set :math:(x_{cv}^{(1)},y_{cv}^{(1)}), ..., (x_{cv}^{(m_{cv})},y_{cv}^{(m_{cv})}) and the test set :math:(x_{test}^{(1)},y_{test}^{(1)}), ..., (x_{test}^{(m_{test})},y_{test}^{(m_{test})}) should contain some elements which are known to be anomalous.
+
+Example : Aircraft Engines
+++++++++++++++++++++++++++
+
+Let us consider a scenario where we have data for 10,000 good/normal engines and 20 flawed/anomalous engines. One may want to consider that the data should be split up as follows :
+
+* Training set: 6000 good engines (unlabeled since all are considered good)
+* Cross validation set: 2000 good engines, 10 anomalous
+* Test set : 2000 good engines, 10 anomalous
+
+Use the training set to compute :math:\mu_1, \sigma_1^2, \mu_2, \sigma_2^2,... and thus the density estimation as well, ie. fit the model :math:p(x).
+
+Now on the cross validation/test example :math:x, predict,
+
+.. math::
+
+   y = \left\{\begin{matrix}1 & if p(x) < \epsilon (anomaly) \\0 & if p(x) \geq \epsilon (normal)\end{matrix}\right.
+
+Note: :math:y above is a prediction. You can now contrast it with the actual data in your cross validation set. Note that data is extremely skewed ie. #normal points are substantially greater than #anomalous. Thus classification accuracy would not be a good evaluation metric. Instead computing the following might be useful.
+
+* % of True/False +ves/-ves, or
+* Precision / Recall
+* :math:F_1 score
+
+One could attempt to apply different values of :math:\epsilon on the cross validation set, and choose the value that maximises the :math:F_1 score. Finally apply the selected value on the test set, and recompute the metrics above.
+
+Video 16-5: Anomaly Detection vs. Supervised Learning
+-----------------------------------------------------
+
+If there is labeled data ie. labeled anomalous or normal, why don't we just use techniques for supervised learning?
+
+* Usually anomaly detection is likely to be useful in scenarios where there is a very very small number of positive (ie. anomalous or :math:y = 0) scenarios.
+* Anomaly detection might be more useful, where it is hard for an algorithm to learn from positive examples what the anomalies look like (could also cover situations where future anomalies may look nothing like the anomalies we have seen so far).
+* Candidate uses for Anomaly detection :  Fraud detection, manufacturing defects, monitoring machines in a data center.
+* Candidate uses for supervised learning : Email spam classification, Weather prediction, (sunny / rainy etc)., Cancer classification.
+
+.. note::
+
+   If you are a large online retailer and you have data about a large number of users who have been identified to commit fraud, then fraud detection in such a context might shift to a supervised learning approach rather than an anomaly detection one.
+
+Video 16-6: Choosing what features to use
+-----------------------------------------
+
+Non-gaussian features
++++++++++++++++++++++
+
+Would be useful to plot a histogram of various features to get a sense if the features are gaussian. In many situations even if the feature is not showing gaussian distribution, it still might be Ok to consider to go ahead assuming it is so. However sometimes many features show themselves to be substantially non gaussian. In such a situation it might be useful to figure out a transformation to process the feature into a gaussian feature eg. :math:log(x) instead of :math:x. Other options could be :math:log(x2 + c), :math:\sqrt{x_3}, .. etc.
+
+.. image:: images/lecture16/transformation_to_gaussian_distribution_1.png
+
+*Above: Transformation of a non-gaussian to a gaussian distribution using log(x)*
+
+.. image:: images/lecture16/transformation_to_gaussian_distribution_2.png
+
+*Above: Transformation using :math:xNew = x ^\wedge 0.05*
+
+Error Analysis
+++++++++++++++
+
+How does one come up features appropriate for anomaly detection?
+
+* Try to study the features by applying them on the cross validation set.
+* You might find situations where say :math:p(x) is high for anomalous examples as well. Lets say you find an example where :math:p(x) is high for a clearly anomalous situation. Study that particular example to identify perhaps an additional feature that would lead to this particular situation getting flagged as an anomaly.
+
+.. image:: images/lecture16/identifying_a_new_feature.png
+
+*Above: identifying a new feature by looking at anomalies with a high p(x)*
+
+* Also prefer to choose features that might take on unusually large or small values in event of an anomaly. Let us imagine memory, disk acceses, cpu load and network traffic are features being looked at for monitoring computers in a data center. Lets imagine that anomalies are more likely to occur when the computer gets stuck in a long while loop, in which case the CPU load is likely to be quite high and the network traffic quite low. This is a candidate case for identification of yet another feature which is the ratio of CPU load to network traffic. (or perhaps even square of cpu load to network traffic). This will help you spot anomalies which are based on unusual combination of features.
+
+Video 16-7: Multivariate Gaussian distribution
+----------------------------------------------
+
+Sometimes the features have some correlation with each other.
+
+.. image:: images/lecture16/gaussian_distribution_of_correlated_features.png
+
+You can see the positive seemingly linear correlation between the two features :math:x_1 and :math:x_2. Yet the algorithm above largely assumed these features to be independent. This creates a difficulty as shown in the diagram below.
+
+.. image:: images/lecture16/gaussian_distribution_of_correlated_features_2.png
+
+The contours of the probability function computed by independent gaussian variables are similar to the magenta circles drawn above. Yet a casual look can convince us that the contours need to be more along the lines of the contour drawn in the blue line. Thus if you focus on the point marked in green, it should ideally get flagged off as an anomaly, but given the seemingly circular contours, it in this case will not. For this enter - multivariate gaussian distribution.
+
+So for :math:x \epsilon \mathbb{R}^n, do not model :math:p(x_1), p(x_2),... separately assuming them to be independent variables. Model :math:p(x) in one go. The parameters to such computations here are :math:\mu \epsilon \mathbb{R}^n and :math:\Sigma \epsilon \mathbb{R}^{nxn}. Note that we have now introduced :math:\Sigma which is the covariance matrix instead of :math:\sigma^2 which was just a vector.
+
+In such a case the probability function will be computed as follows :
+
+.. math::
+
+   p(x;\mu,\Sigma) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} exp(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x - \mu))
+
+where :math:|\Sigma| is the determinant of :math:\Sigma
+
+.. image:: images/lecture16/multivariate_gaussian_distribution_1.png
+
+In the figure above it can be seen that by changing the values on the diagonal of the covariance matrix simultaneously, the contours can be made either broader or narrower.
+
+.. image:: images/lecture16/multivariate_gaussian_distribution_2.png
+
+In this figure it can be seen that by independently changing the values on the diagonal of the covariance matrix, the contour profiles can be made to be elliptical along the horizontal and vertical axes.
+
+.. image:: images/lecture16/multivariate_gaussian_distribution_3.png
+
+The above figure shows, that by changing the values of :math:\mu, the overall position of the contour profile could be moved.
+
+.. image:: images/lecture16/multivariate_gaussian_distribution_4.png
+
+Finally the image above shows that by changing the values on the covariance matrix not on the diagonal, the contour profile changes to an elliptical shape along arbitrary axes. In fact the right most profile is probably closest to the one that we started with. And setting the non-diagonal elements of a correlation matrix to be non-zero is an admission of the fact that these elements are correlated and not independently variable.
+
+So multivariate gaussian distribution should help us model the situation to better fit the actual behaviour of the two features :math:x_1 and :math:x_2 that we started out with. Thus by using the modified probability function above we can better predict anomalies when the features show some correlation within themselves.
+
+
+Video 16-8: Anomaly detection using the multivariate gaussian distribution
+--------------------------------------------------------------------------
+
+In computing the probability function using a multivariate gaussian distribution the following could be used.
+
+.. math::
+
+   p(x;\mu,\Sigma) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} exp(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x - \mu))
+
+We would need to start by first computing :math:\mu and :math:\Sigma as follows
+
+.. math::
+
+   \mu = \frac{1}{m} \sum_{i=1}^m x^{(i)}
+
+.. math::
+
+   \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)} - \mu)(x^{(i)} - \mu)^T
+
+
+Then compute :math:p(x). And flag an anomaly if :math:p(x) < \epsilon
+
+Relationship to the original model
+++++++++++++++++++++++++++++++++++
+
+It turns out that gaussian distribution is simply a special case of multivariate gaussian distribution with the constraint that all the non-diagonal elements of the covariance matrix should be set to zero.
+
+Howeever gaussian distributed still tends to be used more frequently than its multivariate cousin given that the former is computationally cheaper, and can even deal with situations where :math:m (the training set size) is small or even less than :math:n (the number of features). If :math:m \leq n, multivariate gaussian distribution cannot be used since :math:\Sigma is non-invertible. It is preferred to generally have :math:m \geq 10 n.
+
+Quite often (as in the case above of the two correlated features), it might still be helpful to model additional features by creating new features eg. :math:x_1 - x_2 or :math:\frac{x_1}{x_2} and using gaussian distribution rather then the multivariate gaussian because of the additional computation complexity or if :math:m is not substantially larger than :math:n.

# File docs/mlclass-notes/genindex.html

   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

-    <title>Index &mdash; My Machine Learning class notes v1.0.0 documentation</title>
+    <title>Index &mdash; Machine Learning class notes v1.0.0 documentation</title>
     <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
     <link rel="stylesheet" href="_static/print.css" type="text/css" />
     <script type="text/javascript" src="_static/underscore.js"></script>
     <script type="text/javascript" src="_static/doctools.js"></script>
     <script type="text/javascript" src="_static/theme_extras.js"></script>
-    <link rel="top" title="My Machine Learning class notes v1.0.0 documentation" href="index.html" />
+    <link rel="top" title="Machine Learning class notes v1.0.0 documentation" href="index.html" />
   </head>
   <body>
       <div class="header"><h1 class="heading"><a href="index.html">
-          <span>My Machine Learning class notes v1.0.0 documentation</span></a></h1>
+          <span>Machine Learning class notes v1.0.0 documentation</span></a></h1>
         <h2 class="heading"><span>Index</span></h2>
       </div>
       <div class="topnav">

# File docs/mlclass-notes/index.html

   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

-    <title>Welcome to My Machine Learning class notes’s documentation! &mdash; My Machine Learning class notes v1.0.0 documentation</title>
+    <title>Machine Learning class notes &mdash; Machine Learning class notes v1.0.0 documentation</title>
     <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
     <link rel="stylesheet" href="_static/print.css" type="text/css" />
     <script type="text/javascript" src="_static/underscore.js"></script>
     <script type="text/javascript" src="_static/doctools.js"></script>
     <script type="text/javascript" src="_static/theme_extras.js"></script>
-    <link rel="top" title="My Machine Learning class notes v1.0.0 documentation" href="#" />
+    <link rel="top" title="Machine Learning class notes v1.0.0 documentation" href="#" />
     <link rel="next" title="Lecture 16: Anomaly Detection" href="lecture16.html" />
   </head>
   <body>
       <div class="header"><h1 class="heading"><a href="#">
-          <span>My Machine Learning class notes v1.0.0 documentation</span></a></h1>
-        <h2 class="heading"><span>Welcome to My Machine Learning class notes’s documentation!</span></h2>
+          <span>Machine Learning class notes v1.0.0 documentation</span></a></h1>
+        <h2 class="heading"><span>Machine Learning class notes</span></h2>
       </div>
       <div class="topnav">

       <div class="content">


-  <div class="section" id="welcome-to-my-machine-learning-class-notes-s-documentation">
-<h1>Welcome to My Machine Learning class notes&#8217;s documentation!<a class="headerlink" href="#welcome-to-my-machine-learning-class-notes-s-documentation" title="Permalink to this headline">¶</a></h1>
+  <div class="section" id="machine-learning-class-notes">
+<h1>Machine Learning class notes<a class="headerlink" href="#machine-learning-class-notes" title="Permalink to this headline">¶</a></h1>
 <p>Contents:</p>
 <div class="toctree-wrapper compound">
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="lecture16.html">Lecture 16: Anomaly Detection</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-1-problem-motivation">Video 16-1: Problem Motivation</a></li>
+<li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-2-anomaly-detection-example">Video 16-2: Anomaly detection example</a></li>
 <li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-2-gaussian-distribution">Video 16-2: Gaussian Distribution</a></li>
 <li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-3-anomaly-detection-algorithm">Video 16-3: Anomaly detection algorithm</a></li>
+<li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-4-developing-and-evaluating-an-anomaly-detection-system">Video 16-4: Developing and evaluating an anomaly detection system</a></li>
+<li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-5-anomaly-detection-vs-supervised-learning">Video 16-5: Anomaly Detection vs. Supervised Learning</a></li>
+<li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-6-choosing-what-features-to-use">Video 16-6: Choosing what features to use</a></li>
+<li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-7-multivariate-gaussian-distribution">Video 16-7: Multivariate Gaussian distribution</a></li>
+<li class="toctree-l2"><a class="reference internal" href="lecture16.html#video-16-8-anomaly-detection-using-the-multivariate-gaussian-distribution">Video 16-8: Anomaly detection using the multivariate gaussian distribution</a></li>
 </ul>
 </li>
 </ul>

# File docs/mlclass-notes/lecture16.html

   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

-    <title>Lecture 16: Anomaly Detection &mdash; My Machine Learning class notes v1.0.0 documentation</title>
+    <title>Lecture 16: Anomaly Detection &mdash; Machine Learning class notes v1.0.0 documentation</title>
     <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
     <link rel="stylesheet" href="_static/print.css" type="text/css" />
     <script type="text/javascript" src="_static/underscore.js"></script>
     <script type="text/javascript" src="_static/doctools.js"></script>
     <script type="text/javascript" src="_static/theme_extras.js"></script>
-    <link rel="top" title="My Machine Learning class notes v1.0.0 documentation" href="index.html" />
-    <link rel="prev" title="Welcome to My Machine Learning class notes’s documentation!" href="index.html" />
+    <link rel="top" title="Machine Learning class notes v1.0.0 documentation" href="index.html" />
+    <link rel="prev" title="Machine Learning class notes" href="index.html" />
   </head>
   <body>
       <div class="header"><h1 class="heading"><a href="index.html">
-          <span>My Machine Learning class notes v1.0.0 documentation</span></a></h1>
+          <span>Machine Learning class notes v1.0.0 documentation</span></a></h1>
         <h2 class="heading"><span>Lecture 16: Anomaly Detection</span></h2>
       </div>
       <div class="topnav">

         <p>
-        «&#160;&#160;<a href="index.html">Welcome to My Machine Learning class notes&#8217;s documentation!</a>
+        «&#160;&#160;<a href="index.html">Machine Learning class notes</a>
         &#160;&#160;::&#160;&#160;
         <a class="uplink" href="index.html">Contents</a>
         </p>
 <p>In the following example, the closer the point is to the inner circle, the higher is the likelihood of it being non-anomalous. On the other hand in the point which is far out (eg. the <em>x</em> near the bottom of the image), the likelihood of the engine being anomalous is high.</p>
 <img alt="_images/aircraft_engines_2.png" src="_images/aircraft_engines_2.png" />
 </div>
-<div class="section" id="anomaly-detection-example">
-<h3>Anomaly detection example<a class="headerlink" href="#anomaly-detection-example" title="Permalink to this headline">¶</a></h3>
+</div>
+<div class="section" id="video-16-2-anomaly-detection-example">
+<h2>Video 16-2: Anomaly detection example<a class="headerlink" href="#video-16-2-anomaly-detection-example" title="Permalink to this headline">¶</a></h2>
 <p>One of the most frequent usages of anomaly detection is that of fraud detection.</p>
 <ul class="simple">
 <li><img class="math" src="_images/math/ccb122c9db21d2f7e393bf3482a8a8339e8559e4.png" alt="x^{(i)}"/> = features of user i&#8217;s activities</li>
 </ul>
 <p>Identify machines that are likely to fail and flag off for attention.</p>
 </div>
-</div>
 <div class="section" id="video-16-2-gaussian-distribution">
 <h2>Video 16-2: Gaussian Distribution<a class="headerlink" href="#video-16-2-gaussian-distribution" title="Permalink to this headline">¶</a></h2>
 <div class="admonition note">
 <p>Lets say we have a</p>
 <ul class="simple">
 <li>Training set : <img class="math" src="_images/math/292602c20f8b9532c22cbb1e307e501b71d6133f.png" alt="{x^{(1)}, ..., x^{(m)}}"/>, and</li>
-<li>each example is <img class="math" src="_images/math/bd361fdd36a8faac093f9381fe0ae6bd63018076.png" alt="\left(x \epsilon \mathbb{R}^n\right)"/></li>
+<li>each example is <img class="math" src="_images/math/bd361fdd36a8faac093f9381fe0ae6bd63018076.png" alt="\left(x \epsilon \mathbb{R}^n\right)"/> ie. has <em>n</em> features.</li>
 </ul>
-<p>&#8211; End of File &#8211; Lecture not fully documented yet &#8211;</p>
+<p>Assume that each feature is distributed as per gaussian probability distribution. ie. <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> ~ <img class="math" src="_images/math/a8979a050202c057200d7782967be6b57c3dc76a.png" alt="N(\mu_1,\sigma_1^2)"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/> ~ <img class="math" src="_images/math/cc738f96d54057ec67e21b2b016a93bafc431a9d.png" alt="N(\mu_2,\sigma_2^2)"/> and so on...</p>
+<p>The computed probability is thus</p>
+<div class="math">
+<p><img src="_images/math/d19fed368badd5b80be06781a692aa4918101b20.png" alt="p(x) = p(x_1;\mu_1,\sigma_1^2) *  p(x_2;\mu_2,\sigma_2^2) * ... *  p(x_n;\mu_n,\sigma_n^2)" /></p>
+</div><div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">Even if the above formula is that for computing probability for independent variables, in practice it works quite well even if the features are not independent.</p>
+</div>
+<p>The above expression could be summarised as</p>
+<div class="math">
+<p><img src="_images/math/1b1dc4863312e09adc0f8455d00dadf132e602c5.png" alt="p(x) = \prod_{j=1}^n p(x_j;\mu_j,\sigma_j^2)" /></p>
+</div><div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">The symbol <img class="math" src="_images/math/1ce5edce2fa716ab2273b2e1f8f3baf1bafeb2b0.png" alt="\prod_{j=1}^n"/> is similar to <img class="math" src="_images/math/24632172b6dcaaf2c10ca285a2579de3752ef20b.png" alt="\sum_{j=1}^n"/> except that it computes the product of all the values in the series rather than adding them up.</p>
+</div>
+<p>This computation of the probability <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/> is often referred to as <em>Density Estimation</em>.</p>
+<ol class="arabic simple">
+<li>Choose features <img class="math" src="_images/math/67bc6daa9d6b964201d6cef60cbeb1ac5fd26ead.png" alt="x_i"/> that you think might be indicative of anomalous examples. Especially choose those for whom either unusually high or unusually low values of <img class="math" src="_images/math/67bc6daa9d6b964201d6cef60cbeb1ac5fd26ead.png" alt="x_i"/> might be indicative of existence of anomalies.</li>
+</ol>
+<ol class="arabic simple">
+<li>Fit parameters <img class="math" src="_images/math/f1f81f5b9aaf3528a3c70e36d3a55852001aeb2f.png" alt="\mu_1, ..., \mu_n, \sigma_1^2, ..., \sigma_n^2"/> using</li>
+</ol>
+<div class="math">
+<p><img src="_images/math/f8f15bb2d539a6bea8efebad938fbf3dd596acd1.png" alt="\mu_j = \frac{1}{m} \sum_{i=1}^m x_j^{(i)}" /></p>
+</div><div class="math">
+<p><img src="_images/math/6f69e0f05cad975087f7f82cca9023ff0a9a8db2.png" alt="\sigma_j^2 = \frac{1}{m} \sum_{i=1}^m (x_j^{(i)} - \mu_j)^2" /></p>
+</div><p>The corresponding vectorised implementations is <img class="math" src="_images/math/bcc79e2cc8ef3a02b2ac47562065557be394c44f.png" alt="\mu = \frac{1}{m} \sum_{i=1}^m x^{(i)}"/> and <img class="math" src="_images/math/dd7eca5ef31f422dc196a71f1a6e92590476ea84.png" alt="\sigma^2 = \frac{1}{m} \sum_{i=1}^m (x^{(i)} - \mu)^2"/> where <img class="math" src="_images/math/cf5aee440ad95fcea57dc7ceeb987da04f303bed.png" alt="x E \mathbb{R}^n"/></p>
+<ol class="arabic simple">
+<li>Given new example <img class="math" src="_images/math/26eeb5258ca5099acf8fe96b2a1049c48c89a5e6.png" alt="x"/>, compute <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/></li>
+</ol>
+</div>
+<div class="section" id="anomaly-detection-example">
+<h3>Anomaly detection example<a class="headerlink" href="#anomaly-detection-example" title="Permalink to this headline">¶</a></h3>
+<p>Anomaly if <img class="math" src="_images/math/eee9890a2b68316c4489fd4b81b1f6b9dc430807.png" alt="p(x) &lt; \epsilon"/></p>
+<img alt="_images/anomaly_detection_example.png" src="_images/anomaly_detection_example.png" />
+<p>In the above example, <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/> are two different features. The graphs on the right show their gaussian distribution curves, which are different from each other.</p>
+<p>At the top left is the plot of the known combinations of <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/> (which of course was used to compute the necessary <img class="math" src="_images/math/2d8c833ed800824727cd7bd2fb9de1a12ad7e674.png" alt="\mu"/> and <img class="math" src="_images/math/741fb9098efcb98055f467f87630a5d0ca599b6b.png" alt="\sigma^2"/> values.</p>
+<p>The figure at the bottom left shows the effective probability of occurrence of particular combinations of <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/>. Thus any points in this graph where the height of the point on the surface matching the particular point values of <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/>  is very low, can be viewed as likely anomalous.</p>
+</div>
+</div>
+<div class="section" id="video-16-4-developing-and-evaluating-an-anomaly-detection-system">
+<h2>Video 16-4: Developing and evaluating an anomaly detection system<a class="headerlink" href="#video-16-4-developing-and-evaluating-an-anomaly-detection-system" title="Permalink to this headline">¶</a></h2>
+<ul class="simple">
+<li>One of the important aspects of being able to develop an anomaly detection system is being able to first have a way of evaluating the anomaly detection system. This can help decide later whether specific feature additions or removals are actually helping or hurting the anomaly detection system.</li>
+<li>The starting point would be some labeled data of anomalous and non-anomalous data (labels being whether the particular case is anomalous or non-anomalous).</li>
+<li>The training set should consist of the non-anomalous subset of the data referredt to above .. these would be <img class="math" src="_images/math/8e78ca9aed5b929b06ed8c2a6f8e3824299b114b.png" alt="x^{(1)},x^{(2)}, ..., x^{(m)},"/> Ideally this data should <em>not</em> contain the anomalous data points. However if a few of them do seep through thats probably not such a big deal.</li>
+<li>On the other hand both the cross validation set <img class="math" src="_images/math/4bed66985fe6ca121e26803e52f9e477c3a708c9.png" alt="(x_{cv}^{(1)},y_{cv}^{(1)}), ..., (x_{cv}^{(m_{cv})},y_{cv}^{(m_{cv})})"/> and the test set <img class="math" src="_images/math/1623c2a72712d8cabce2923cb0ee063edbb26683.png" alt="(x_{test}^{(1)},y_{test}^{(1)}), ..., (x_{test}^{(m_{test})},y_{test}^{(m_{test})})"/> should contain some elements which are known to be anomalous.</li>
+</ul>
+<div class="section" id="example-aircraft-engines">
+<h3>Example : Aircraft Engines<a class="headerlink" href="#example-aircraft-engines" title="Permalink to this headline">¶</a></h3>
+<p>Let us consider a scenario where we have data for 10,000 good/normal engines and 20 flawed/anomalous engines. One may want to consider that the data should be split up as follows :</p>
+<ul class="simple">
+<li>Training set: 6000 good engines (unlabeled since all are considered good)</li>
+<li>Cross validation set: 2000 good engines, 10 anomalous</li>
+<li>Test set : 2000 good engines, 10 anomalous</li>
+</ul>
+<p>Use the training set to compute <img class="math" src="_images/math/c5a0b04f1d5caad73b444f56f3154abe7995e87a.png" alt="\mu_1, \sigma_1^2, \mu_2, \sigma_2^2,..."/> and thus the density estimation as well, ie. fit the model <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/>.</p>
+<p>Now on the cross validation/test example <img class="math" src="_images/math/26eeb5258ca5099acf8fe96b2a1049c48c89a5e6.png" alt="x"/>, predict,</p>
+<div class="math">
+<p><img src="_images/math/e5434fe22b3c88b1254c4544ed1311e12c686dc6.png" alt="y = \left\{\begin{matrix}1 &amp; if p(x) &lt; \epsilon (anomaly) \\0 &amp; if p(x) \geq \epsilon (normal)\end{matrix}\right." /></p>
+</div><p>Note: <img class="math" src="_images/math/092e364e1d9d19ad5fffb0b46ef4cc7f2da02c1c.png" alt="y"/> above is a prediction. You can now contrast it with the actual data in your cross validation set. Note that data is extremely skewed ie. #normal points are substantially greater than #anomalous. Thus classification accuracy would not be a good evaluation metric. Instead computing the following might be useful.</p>
+<ul class="simple">
+<li>% of True/False +ves/-ves, or</li>
+<li>Precision / Recall</li>
+<li><img class="math" src="_images/math/3a2731abbb0e2524327bba81e355d6d886e5a6cb.png" alt="F_1"/> score</li>
+</ul>
+<p>One could attempt to apply different values of <img class="math" src="_images/math/eaf4418fbe935c15a606516d8f55dc380cd8e822.png" alt="\epsilon"/> on the cross validation set, and choose the value that maximises the <img class="math" src="_images/math/3a2731abbb0e2524327bba81e355d6d886e5a6cb.png" alt="F_1"/> score. Finally apply the selected value on the test set, and recompute the metrics above.</p>
+</div>
+</div>
+<div class="section" id="video-16-5-anomaly-detection-vs-supervised-learning">
+<h2>Video 16-5: Anomaly Detection vs. Supervised Learning<a class="headerlink" href="#video-16-5-anomaly-detection-vs-supervised-learning" title="Permalink to this headline">¶</a></h2>
+<p>If there is labeled data ie. labeled anomalous or normal, why don&#8217;t we just use techniques for supervised learning?</p>
+<ul class="simple">
+<li>Usually anomaly detection is likely to be useful in scenarios where there is a very very small number of positive (ie. anomalous or <img class="math" src="_images/math/092e364e1d9d19ad5fffb0b46ef4cc7f2da02c1c.png" alt="y"/> = 0) scenarios.</li>
+<li>Anomaly detection might be more useful, where it is hard for an algorithm to learn from positive examples what the anomalies look like (could also cover situations where future anomalies may look nothing like the anomalies we have seen so far).</li>
+<li>Candidate uses for Anomaly detection :  Fraud detection, manufacturing defects, monitoring machines in a data center.</li>
+<li>Candidate uses for supervised learning : Email spam classification, Weather prediction, (sunny / rainy etc)., Cancer classification.</li>
+</ul>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">If you are a large online retailer and you have data about a large number of users who have been identified to commit fraud, then fraud detection in such a context might shift to a supervised learning approach rather than an anomaly detection one.</p>
+</div>
+</div>
+<div class="section" id="video-16-6-choosing-what-features-to-use">
+<h2>Video 16-6: Choosing what features to use<a class="headerlink" href="#video-16-6-choosing-what-features-to-use" title="Permalink to this headline">¶</a></h2>
+<div class="section" id="non-gaussian-features">
+<h3>Non-gaussian features<a class="headerlink" href="#non-gaussian-features" title="Permalink to this headline">¶</a></h3>
+<p>Would be useful to plot a histogram of various features to get a sense if the features are gaussian. In many situations even if the feature is not showing gaussian distribution, it still might be Ok to consider to go ahead assuming it is so. However sometimes many features show themselves to be substantially non gaussian. In such a situation it might be useful to figure out a transformation to process the feature into a gaussian feature eg. <img class="math" src="_images/math/f7186728532ac1d4f26412326b2a417917ae0a87.png" alt="log(x)"/> instead of <img class="math" src="_images/math/26eeb5258ca5099acf8fe96b2a1049c48c89a5e6.png" alt="x"/>. Other options could be <img class="math" src="_images/math/00f1a8afbaec432d9e837124fedd8aef433101d9.png" alt="log(x2 + c)"/>, <img class="math" src="_images/math/c1f2062eb155dc8db26af11d63d78574dcf45bcd.png" alt="\sqrt{x_3}"/>, .. etc.</p>
+<img alt="_images/transformation_to_gaussian_distribution_1.png" src="_images/transformation_to_gaussian_distribution_1.png" />
+<p><em>Above: Transformation of a non-gaussian to a gaussian distribution using log(x)</em></p>
+<img alt="_images/transformation_to_gaussian_distribution_2.png" src="_images/transformation_to_gaussian_distribution_2.png" />
+<p><em>Above: Transformation using :math:xNew = x ^wedge 0.05</em></p>
+</div>
+<div class="section" id="error-analysis">
+<h3>Error Analysis<a class="headerlink" href="#error-analysis" title="Permalink to this headline">¶</a></h3>
+<p>How does one come up features appropriate for anomaly detection?</p>
+<ul class="simple">
+<li>Try to study the features by applying them on the cross validation set.</li>
+<li>You might find situations where say <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/> is high for anomalous examples as well. Lets say you find an example where <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/> is high for a clearly anomalous situation. Study that particular example to identify perhaps an additional feature that would lead to this particular situation getting flagged as an anomaly.</li>
+</ul>
+<img alt="_images/identifying_a_new_feature.png" src="_images/identifying_a_new_feature.png" />
+<p><em>Above: identifying a new feature by looking at anomalies with a high p(x)</em></p>
+<ul class="simple">
+<li>Also prefer to choose features that might take on unusually large or small values in event of an anomaly. Let us imagine memory, disk acceses, cpu load and network traffic are features being looked at for monitoring computers in a data center. Lets imagine that anomalies are more likely to occur when the computer gets stuck in a long while loop, in which case the CPU load is likely to be quite high and the network traffic quite low. This is a candidate case for identification of yet another feature which is the ratio of CPU load to network traffic. (or perhaps even square of cpu load to network traffic). This will help you spot anomalies which are based on unusual combination of features.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="video-16-7-multivariate-gaussian-distribution">
+<h2>Video 16-7: Multivariate Gaussian distribution<a class="headerlink" href="#video-16-7-multivariate-gaussian-distribution" title="Permalink to this headline">¶</a></h2>
+<p>Sometimes the features have some correlation with each other.</p>
+<img alt="_images/gaussian_distribution_of_correlated_features.png" src="_images/gaussian_distribution_of_correlated_features.png" />
+<p>You can see the positive seemingly linear correlation between the two features <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/>. Yet the algorithm above largely assumed these features to be independent. This creates a difficulty as shown in the diagram below.</p>
+<img alt="_images/gaussian_distribution_of_correlated_features_2.png" src="_images/gaussian_distribution_of_correlated_features_2.png" />
+<p>The contours of the probability function computed by independent gaussian variables are similar to the magenta circles drawn above. Yet a casual look can convince us that the contours need to be more along the lines of the contour drawn in the blue line. Thus if you focus on the point marked in green, it should ideally get flagged off as an anomaly, but given the seemingly circular contours, it in this case will not. For this enter - multivariate gaussian distribution.</p>
+<p>So for <img class="math" src="_images/math/44b124c13c28c6db6c44874940682f0d4f43dfa4.png" alt="x \epsilon \mathbb{R}^n"/>, do not model <img class="math" src="_images/math/0ae85bd46c9ac5337c1e179a5c3e47437e921e01.png" alt="p(x_1), p(x_2),..."/> separately assuming them to be independent variables. Model <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/> in one go. The parameters to such computations here are <img class="math" src="_images/math/77726b61e69ce6e6e5c5766183ba454e894bd784.png" alt="\mu \epsilon \mathbb{R}^n"/> and <img class="math" src="_images/math/5220bb9025475f3f04418a25a04f40828994a31d.png" alt="\Sigma \epsilon \mathbb{R}^{nxn}"/>. Note that we have now introduced <img class="math" src="_images/math/c8f77e3035db5fe9a4975967750ac1a6454bda8c.png" alt="\Sigma"/> which is the covariance matrix instead of <img class="math" src="_images/math/741fb9098efcb98055f467f87630a5d0ca599b6b.png" alt="\sigma^2"/> which was just a vector.</p>
+<p>In such a case the probability function will be computed as follows :</p>
+<div class="math">
+<p><img src="_images/math/cc40bc4c210629de628deeff2088b5a266b25bdb.png" alt="p(x;\mu,\Sigma) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} exp(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x - \mu))" /></p>
+</div><p>where <img class="math" src="_images/math/1df78b36915fffcd7ac04943cec0534e67a5c49d.png" alt="|\Sigma|"/> is the determinant of <img class="math" src="_images/math/c8f77e3035db5fe9a4975967750ac1a6454bda8c.png" alt="\Sigma"/></p>
+<img alt="_images/multivariate_gaussian_distribution_1.png" src="_images/multivariate_gaussian_distribution_1.png" />
+<p>In the figure above it can be seen that by changing the values on the diagonal of the covariance matrix simultaneously, the contours can be made either broader or narrower.</p>
+<img alt="_images/multivariate_gaussian_distribution_2.png" src="_images/multivariate_gaussian_distribution_2.png" />
+<p>In this figure it can be seen that by independently changing the values on the diagonal of the covariance matrix, the contour profiles can be made to be elliptical along the horizontal and vertical axes.</p>
+<img alt="_images/multivariate_gaussian_distribution_3.png" src="_images/multivariate_gaussian_distribution_3.png" />
+<p>The above figure shows, that by changing the values of <img class="math" src="_images/math/2d8c833ed800824727cd7bd2fb9de1a12ad7e674.png" alt="\mu"/>, the overall position of the contour profile could be moved.</p>
+<img alt="_images/multivariate_gaussian_distribution_4.png" src="_images/multivariate_gaussian_distribution_4.png" />
+<p>Finally the image above shows that by changing the values on the covariance matrix not on the diagonal, the contour profile changes to an elliptical shape along arbitrary axes. In fact the right most profile is probably closest to the one that we started with. And setting the non-diagonal elements of a correlation matrix to be non-zero is an admission of the fact that these elements are correlated and not independently variable.</p>
+<p>So multivariate gaussian distribution should help us model the situation to better fit the actual behaviour of the two features <img class="math" src="_images/math/ccada11db7b2b90693e2fac4f887a57fce6f96bf.png" alt="x_1"/> and <img class="math" src="_images/math/6a7d010bbff66a0c41e43310a51efbaa6bf63396.png" alt="x_2"/> that we started out with. Thus by using the modified probability function above we can better predict anomalies when the features show some correlation within themselves.</p>
+</div>
+<div class="section" id="video-16-8-anomaly-detection-using-the-multivariate-gaussian-distribution">
+<h2>Video 16-8: Anomaly detection using the multivariate gaussian distribution<a class="headerlink" href="#video-16-8-anomaly-detection-using-the-multivariate-gaussian-distribution" title="Permalink to this headline">¶</a></h2>
+<p>In computing the probability function using a multivariate gaussian distribution the following could be used.</p>
+<div class="math">
+<p><img src="_images/math/cc40bc4c210629de628deeff2088b5a266b25bdb.png" alt="p(x;\mu,\Sigma) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} exp(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x - \mu))" /></p>
+</div><p>We would need to start by first computing <img class="math" src="_images/math/2d8c833ed800824727cd7bd2fb9de1a12ad7e674.png" alt="\mu"/> and <img class="math" src="_images/math/c8f77e3035db5fe9a4975967750ac1a6454bda8c.png" alt="\Sigma"/> as follows</p>
+<div class="math">
+<p><img src="_images/math/d4db8293349ad6251d1d917791b594318775e285.png" alt="\mu = \frac{1}{m} \sum_{i=1}^m x^{(i)}" /></p>
+</div><div class="math">
+<p><img src="_images/math/8ae32f355fc83427443c8ae66b78d7f38f9b410f.png" alt="\Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)} - \mu)(x^{(i)} - \mu)^T" /></p>
+</div><p>Then compute <img class="math" src="_images/math/2751d79d3bbfb34440d68c685fe6ba7414951749.png" alt="p(x)"/>. And flag an anomaly if <img class="math" src="_images/math/eee9890a2b68316c4489fd4b81b1f6b9dc430807.png" alt="p(x) &lt; \epsilon"/></p>
+<div class="section" id="relationship-to-the-original-model">
+<h3>Relationship to the original model<a class="headerlink" href="#relationship-to-the-original-model" title="Permalink to this headline">¶</a></h3>
+<p>It turns out that gaussian distribution is simply a special case of multivariate gaussian distribution with the constraint that all the non-diagonal elements of the covariance matrix should be set to zero.</p>
+<p>Howeever gaussian distributed still tends to be used more frequently than its multivariate cousin given that the former is computationally cheaper, and can even deal with situations where <img class="math" src="_images/math/f5047d1e0cbb50ec208923a22cd517c55100fa7b.png" alt="m"/> (the training set size) is small or even less than <img class="math" src="_images/math/174fadd07fd54c9afe288e96558c92e0c1da733a.png" alt="n"/> (the number of features). If <img class="math" src="_images/math/2c3c3b257a6ed04ad40a382504e92231a48bc45c.png" alt="m \leq n"/>, multivariate gaussian distribution cannot be used since <img class="math" src="_images/math/c8f77e3035db5fe9a4975967750ac1a6454bda8c.png" alt="\Sigma"/> is non-invertible. It is preferred to generally have <img class="math" src="_images/math/558c61176dc4032b7c99938d2dc50e0e0cac019f.png" alt="m \geq 10 n"/>.</p>
+<p>Quite often (as in the case above of the two correlated features), it might still be helpful to model additional features by creating new features eg. <img class="math" src="_images/math/ec34b0e29c6530b107781a3858fdac694683cd33.png" alt="x_1 - x_2"/> or <img class="math" src="_images/math/05bf96b416ca512229c57ae7da5f4839aa1888ad.png" alt="\frac{x_1}{x_2}"/> and using gaussian distribution rather then the multivariate gaussian because of the additional computation complexity or if <img class="math" src="_images/math/f5047d1e0cbb50ec208923a22cd517c55100fa7b.png" alt="m"/> is not substantially larger than <img class="math" src="_images/math/174fadd07fd54c9afe288e96558c92e0c1da733a.png" alt="n"/>.</p>
 </div>
 </div>
 </div>
       <div class="bottomnav">

         <p>
-        «&#160;&#160;<a href="index.html">Welcome to My Machine Learning class notes&#8217;s documentation!</a>
+        «&#160;&#160;<a href="index.html">Machine Learning class notes</a>
         &#160;&#160;::&#160;&#160;
         <a class="uplink" href="index.html">Contents</a>
         </p>

# File docs/mlclass-notes/objects.inv

Binary file modified.

# File docs/mlclass-notes/search.html

   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

-    <title>Search &mdash; My Machine Learning class notes v1.0.0 documentation</title>
+    <title>Search &mdash; Machine Learning class notes v1.0.0 documentation</title>
     <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
     <link rel="stylesheet" href="_static/print.css" type="text/css" />
     <script type="text/javascript" src="_static/doctools.js"></script>
     <script type="text/javascript" src="_static/searchtools.js"></script>
     <script type="text/javascript" src="_static/theme_extras.js"></script>
-    <link rel="top" title="My Machine Learning class notes v1.0.0 documentation" href="index.html" />
+    <link rel="top" title="Machine Learning class notes v1.0.0 documentation" href="index.html" />
   <script type="text/javascript">
     jQuery(function() { Search.loadIndex("searchindex.js"); });
   </script>
   </head>
   <body>
       <div class="header"><h1 class="heading"><a href="index.html">
-          <span>My Machine Learning class notes v1.0.0 documentation</span></a></h1>
+          <span>Machine Learning class notes v1.0.0 documentation</span></a></h1>
         <h2 class="heading"><span>Search</span></h2>
       </div>
       <div class="topnav">

# File docs/mlclass-notes/searchindex.js

-Search.setIndex({objects:{},terms:{identifi:1,focus:1,shape:1,aspect:1,follow:1,disk:1,anamol:[],yet:1,impact:1,access:1,monitor:1,graph:1,point:1,activ:1,should:1,manufactur:1,deviat:1,densiti:1,non:1,realin:[],far:1,express:1,traffic:1,closer:1,discuss:1,like:1,each:1,fulli:1,page:0,mean:1,set:1,some:1,assumpt:1,anomal:1,sec:1,fail:1,further:1,out:1,anamoli:[],index:0,shown:1,detect:[0,1],sub:[],bottom:1,vari:1,asid:1,per:1,content:0,estim:1,sup:[],"new":1,machin:[0,1],vibrat:1,gener:1,usag:1,standard:1,leq:[],aircraft:1,let:1,likelihood:1,miscellan:[],search:0,xv1:[],could:1,xv2:[],etc:1,equat:1,frequent:1,load:1,foral:[],emphasi:[],bell:1,question:1,modul:0,number:1,dataset:1,"03b5":[],from:1,area:[],anomali:[0,1],two:1,width:1,inner:1,circl:1,rxm:[],more:1,"function":1,epsilon:[],imag:1,gaussian:[0,1],heat:1,train:1,quad:[],line:1,than:1,"case":1,fraud:1,possibl:1,provid:1,xr1:[],below:1,unusu:1,can:1,learn:[0,1],video:[0,1],problem:[0,1],similar:1,unsupervis:1,curv:1,normal:1,sai:1,featur:1,comput:1,have:1,predict:1,centr:1,indic:[0,1],lectur:[0,1],high:1,xtest:[],motiv:[0,1],file:1,tabl:0,need:1,exist:[],check:1,probabl:1,assembl:1,engin:1,welcom:0,receiv:1,titl:[],note:[0,1],also:1,other:1,build:1,which:1,test:1,document:[0,1],roll:1,higher:1,xrm:[],distribut:[0,1],though:1,earlier:1,hand:1,most:1,intens:1,user:1,attent:1,mai:1,end:1,varianc:1,data:1,"class":0,network:1,memori:1,appropri:1,off:1,center:1,algorithm:[0,1],supervis:1,flag:1,rather:1,exampl:1,thi:1,anoth:1,model:1,sigma:[],cpu:1,xr2:[]},objtypes:{},titles:["Welcome to My Machine Learning class notes&#8217;s documentation!","Lecture 16: Anomaly Detection"],objnames:{},filenames:["index","lecture16"]})
+Search.setIndex({objects:{},terms:{all:1,focus:1,correl:1,ellipt:1,computation:1,follow:1,disk:1,content:0,decid:1,graph:1,matrix:1,deviat:1,sens:1,introduc:1,far:1,candid:1,veri:1,seemingli:1,maximis:1,"try":1,vector:1,small:1,broader:1,zero:1,video:[0,1],further:1,casual:1,blue:1,supervis:[0,1],what:[0,1],asid:1,abl:1,access:1,"new":1,unlabel:1,vibrat:1,themselv:1,gener:1,here:1,behaviour:1,let:1,xnew:1,along:1,vertic:1,modifi:1,sinc:1,valu:1,search:0,ahead:1,shift:1,unsupervis:1,larger:1,precis:1,extrem:1,studi:1,narrow:1,appli:1,modul:0,prefer:1,select:1,plot:1,from:1,would:1,memori:1,two:1,few:1,more:1,flaw:1,hurt:1,flag:1,train:1,particular:1,known:1,sometim:1,work:1,focu:1,can:1,learn:[0,1],predict:1,indic:[0,1],high:1,want:1,onlin:1,magenta:1,occur:1,surfac:1,cours:1,rather:1,anoth:1,how:1,instead:1,circular:1,product:1,earlier:1,spot:1,diagram:1,attent:1,mai:1,varianc:1,data:1,attempt:1,practic:1,ani:1,correspond:1,element:1,green:1,enter:1,multivari:[0,1],help:1,move:1,becaus:1,through:1,still:1,vari:1,paramet:1,monitor:1,fit:1,better:1,might:1,non:1,good:1,recal:1,greater:1,overal:1,now:1,discuss:1,separ:1,each:1,mean:1,subset:1,hard:1,gaussian:[0,1],event:1,special:1,out:1,variabl:1,shown:1,admiss:1,profil:1,formula:1,occurr:1,math:1,difficulti:1,linear:1,situat:1,given:1,standard:1,base:1,likelihood:1,could:1,turn:1,perhap:1,retail:1,think:1,frequent:1,first:1,origin:1,independ:1,number:1,wedg:1,summaris:1,size:1,differ:1,width:1,top:1,system:[0,1],circl:1,"final":1,inner:1,option:1,relationship:1,especi:1,referredt:1,than:1,provid:1,remov:1,horizont:1,posit:1,seri:1,"function":1,sai:1,comput:1,seep:1,clearli:1,have:1,tabl:0,need:1,seen:1,imagin:1,engin:1,techniqu:1,accuraci:1,note:[0,1],also:1,ideal:1,build:1,which:1,combin:1,noth:1,even:1,distribut:[0,1],normal:1,who:1,most:1,why:1,don:1,later:1,cover:1,doe:1,determin:1,left:1,fact:1,show:1,vectoris:1,label:1,find:1,impact:1,ratio:1,activ:1,should:1,manufactur:1,variou:1,get:1,express:1,cheaper:1,acces:1,cannot:1,"import":1,sunni:1,contrast:1,contain:1,where:1,view:1,set:1,see:1,sec:1,fail:1,contour:1,closer:1,detect:[0,1],tend:1,figur:1,score:1,between:1,drawn:1,approach:1,email:1,assumpt:1,aircraft:1,cousin:1,come:1,addit:1,both:1,howev:1,etc:1,equat:1,context:1,mani:1,load:1,simpli:1,cancer:1,bell:1,stuck:1,height:1,assum:1,quit:1,evalu:[0,1],skew:1,anomali:[0,1],been:1,mark:1,whom:1,fals:1,imag:1,defect:1,former:1,those:1,"case":1,look:1,"while":1,abov:1,error:1,loop:1,howeev:1,centr:1,metric:1,them:1,motiv:[0,1],substanti:1,develop:[0,1],receiv:1,cross:1,complex:1,split:1,higher:1,closest:1,effect:1,hand:1,covari:1,user:1,implement:1,squar:1,appropri:1,off:1,center:1,scenario:1,thu:1,well:1,exampl:[0,1],thi:1,choos:[0,1],model:1,usual:1,identifi:1,just:1,less:1,shape:1,aspect:1,simultan:1,yet:1,point:1,except:1,identif:1,valid:1,densiti:1,match:1,take:1,big:1,traffic:1,like:1,specif:1,arbitrari:1,necessari:1,either:1,page:0,right:1,often:1,deal:1,some:1,anomal:1,lead:1,bottom:1,though:1,recomput:1,per:1,estim:1,larg:1,refer:1,machin:[0,1],process:1,usag:1,about:1,actual:1,constraint:1,convinc:1,commit:1,invert:1,within:1,dataset:1,diagon:1,weather:1,chang:1,fraud:1,your:1,log:1,wai:1,spam:1,question:1,transform:1,"long":1,"class":0,start:1,low:1,analysi:1,histogram:1,heat:1,line:1,"true":1,made:1,consist:1,possibl:1,whether:1,below:1,unusu:1,problem:[0,1],similar:1,curv:1,classif:1,featur:[0,1],creat:1,lectur:[0,1],exist:1,check:1,probabl:1,assembl:1,index:0,when:1,raini:1,other:1,futur:1,test:1,you:1,roll:1,symbol:1,intens:1,consid:1,network:1,algorithm:[0,1],cpu:1},objtypes:{},titles:["Machine Learning class notes","Lecture 16: Anomaly Detection"],objnames:{},filenames:["index","lecture16"]})