"**Get Data** - Our data set will consist of an Excel file containing customer counts per date. We will learn how to read in the excel file for processing. \n",
- "**Prepare Data** - The data is an irregular time series having duplicate dates. We will be challenged in compressing the data and comming up with next years forecasted customer count. \n",
- "**Analyze Data** - We use graphs to visualize trends and spot outliers. Some built in computatiopnal tools will be used to calculate next years forecasted customer count. \n",
- "**Present Data** - The results will be graphed. \n",
+ "**Prepare Data** - The data is an irregular time series having duplicate dates. We will be challenged in compressing the data and coming up with next years forecasted customer count. \n",
+ "**Analyze Data** - We use graphs to visualize trends and spot outliers. Some built in computational tools will be used to calculate next years forecasted customer count. \n",
+ "**Present Data** - The results will be plotted. \n",
- "Make sure you have looked through all previous lessons as the knowledge learned in previous lessons will be\n",
+ "Make sure you have looked through all previous lessons, as the knowledge learned in previous lessons will be\n",
"needed for this exercise.***"
- "We are now going to save this dataframe into an Excel file, to then bring it back to a dataframe. We simply do this to show you how to read and write to excel files. \n",
+ "We are now going to save this dataframe into an Excel file, to then bring it back to a dataframe. We simply do this to show you how to read and write to Excel files. \n",
- "We do not write the index values of the dataframe to the excel file since they are no meant to be part of our initial test data set."
+ "We do not write the index values of the dataframe to the Excel file, since they are not meant to be part of our initial test data set."
"# Grab Data from Excel \n",
- "We will be using the ***read_excel*** function to read in data from an excel file. The function allows you to read in specfic tabs by name or location."
+ "We will be using the ***read_excel*** function to read in data from an Excel file. The function allows you to read in specfic tabs by name or location."
- "**Note: The location on the excel file will be in the same folder as the notebook unless specified otherwise.**"
+ "**Note: The location on the Excel file will be in the same folder as the notebook, unless specified otherwise.**"
"Daily['Upper'] = StateYearMonth['CustomerCount'].transform( lambda x: x.quantile(q=.75) + (1.5*x.quantile(q=.75)-x.quantile(q=.25)) )\n",
"Daily['Outlier'] = (Daily['CustomerCount'] < Daily['Lower']) | (Daily['CustomerCount'] > Daily['Upper']) \n",
"Daily = Daily[Daily['Outlier'] == False]"
- "The dataframe named ***Daily*** will hold customer counts that have been aggregated per day. The original data (df) has multiple records per day. We are left with a data set that is indexed by both the state and the StatusDate. The Oulier column should be equal to ***false*** signifying that the record is not an outlier."
+ "The dataframe named ***Daily*** will hold customer counts that have been aggregated per day. The original data (df) has multiple records per day. We are left with a data set that is indexed by both the state and the StatusDate. The Outlier column should be equal to ***False*** signifying that the record is not an outlier."
- "We create a seperate dataframe named ***ALL*** which groups the Daily dataframe by StatusDate. We are essentially getting rid of the State column. The ***Max*** column represents the maximum customer count per month. The Max column is used to smooth out the graph."
+ "We create a separate dataframe named ***ALL*** which groups the Daily dataframe by StatusDate. We are essentially getting rid of the ***State*** column. The ***Max*** column represents the maximum customer count per month. The ***Max*** column is used to smooth out the graph."
- "As you can see from the ***ALL*** dataframe above, in the month of January 2009, the maximum customer count was 901. If we used ***apply***, we would have a dataframe with (Year and Month) as the index and just the *Max* column with the value of 901. "
+ "As you can see from the ***ALL*** dataframe above, in the month of January 2009, the maximum customer count was 901. If we had used ***apply***, we would have got a dataframe with (Year and Month) as the index and just the *Max* column with the value of 901. "