Given a destination and a date range (2001-2008), which is a better airport to fly out from - SFO or OAK?
We wanted to apply machine learning techniques to build a predictive model which can help flyer decide which airport to choose
Before applying machine learning algos, below steps were followed.
- Clean and Handle invalid data and apply normalization techniques.
- Apply Feature engineering techniques.
- Feature scaling.
- Visualize data to understand which features are driving the output.
- As high dimensions require high dataset due to curse of dimensionality, it important to find co-relations between features and remove features with high co-relation. PCA also can be used to reduce high dimension into low dimension.
- Use k fold cross validation for training and testing
- Apply machine learning models with regularization.
- As models tries to overfit, it is better to use VC dimension to plot a graph of training error and generalization error, and choose the model which gives us the minimum difference between generalization error and training error.
Click on dataset to visit