HTTPS SSH

Abstract


Given a destination and a date range (2001-2008), which is a better airport to fly out from - SFO or OAK?
We wanted to apply machine learning techniques to build a predictive model which can help flyer decide which airport to choose

Approach


Before applying machine learning algos, below steps were followed.

  1. Clean and Handle invalid data and apply normalization techniques.
  2. Apply Feature engineering techniques.
    • Feature scaling.
    • Visualize data to understand which features are driving the output.
    • As high dimensions require high dataset due to curse of dimensionality, it important to find co-relations between features and remove features with high co-relation. PCA also can be used to reduce high dimension into low dimension.
  3. Use k fold cross validation for training and testing
  4. Apply machine learning models with regularization.
  5. As models tries to overfit, it is better to use VC dimension to plot a graph of training error and generalization error, and choose the model which gives us the minimum difference between generalization error and training error.

Data Set


Click on dataset to visit