Commits

Hernan Rojas committed 9a72558 Merge

Merged in aghisla/learn-pandas (pull request #3)

Fix: s/True/False/ in Lesson 2

Comments (0)

Files changed (3)

lessons/02 - Lesson.ipynb

      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "The only parameters we will use is ***index*** and ***header***. Setting these parameters to True will prevent the index and header names from being exported. Change the values of these parameters to get a better understanding of their use."
+      "The only parameters we will use is ***index*** and ***header***. Setting these parameters to False will prevent the index and header names from being exported. Change the values of these parameters to get a better understanding of their use."
      ]
     },
     {

lessons/03 - Lesson.ipynb

      "source": [
       "# Lesson 3  \n",
       "**Get Data** - Our data set will consist of an Excel file containing customer counts per date. We will learn how to read in the excel file for processing.  \n",
-      "**Prepare Data** - The data is an irregular time series having duplicate dates. We will be challenged in compressing the data and comming up with next years forecasted customer count.  \n",
-      "**Analyze Data** - We use graphs to visualize trends and spot outliers. Some built in computatiopnal tools will be used to calculate next years forecasted customer count.  \n",
-      "**Present Data** - The results will be graphed.  \n",
+      "**Prepare Data** - The data is an irregular time series having duplicate dates. We will be challenged in compressing the data and coming up with next years forecasted customer count.  \n",
+      "**Analyze Data** - We use graphs to visualize trends and spot outliers. Some built in computational tools will be used to calculate next years forecasted customer count.  \n",
+      "**Present Data** - The results will be plotted.  \n",
       "\n",
       "***NOTE:\n",
-      "Make sure you have looked through all previous lessons as the knowledge learned in previous lessons will be\n",
+      "Make sure you have looked through all previous lessons, as the knowledge learned in previous lessons will be\n",
       "needed for this exercise.***"
      ]
     },
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "We are now going to save this dataframe into an Excel file, to then bring it back to a dataframe. We simply do this to show you how to read and write to excel files.  \n",
+      "We are now going to save this dataframe into an Excel file, to then bring it back to a dataframe. We simply do this to show you how to read and write to Excel files.  \n",
       "\n",
-      "We do not write the index values of the dataframe to the excel file since they are no meant to be part of our initial test data set."
+      "We do not write the index values of the dataframe to the Excel file, since they are not meant to be part of our initial test data set."
      ]
     },
     {
      "source": [
       "# Grab Data from Excel  \n",
       "\n",
-      "We will be using the ***read_excel*** function to read in data from an excel file. The function allows you to read in specfic tabs by name or location."
+      "We will be using the ***read_excel*** function to read in data from an Excel file. The function allows you to read in specfic tabs by name or location."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "**Note: The location on the excel file will be in the same folder as the notebook unless specified otherwise.**"
+      "**Note: The location on the Excel file will be in the same folder as the notebook, unless specified otherwise.**"
      ]
     },
     {
       "Daily['Upper'] = StateYearMonth['CustomerCount'].transform( lambda x: x.quantile(q=.75) + (1.5*x.quantile(q=.75)-x.quantile(q=.25)) )\n",
       "Daily['Outlier'] = (Daily['CustomerCount'] < Daily['Lower']) | (Daily['CustomerCount'] > Daily['Upper']) \n",
       "\n",
-      "# Remove Ouliers\n",
+      "# Remove Outliers\n",
       "Daily = Daily[Daily['Outlier'] == False]"
      ],
      "language": "python",
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "The dataframe named ***Daily*** will hold customer counts that have been aggregated per day. The original data (df) has multiple records per day.  We are left with a data set that is indexed by both the state and the StatusDate. The Oulier column should be equal to ***false*** signifying that the record is not an outlier."
+      "The dataframe named ***Daily*** will hold customer counts that have been aggregated per day. The original data (df) has multiple records per day.  We are left with a data set that is indexed by both the state and the StatusDate. The Outlier column should be equal to ***False*** signifying that the record is not an outlier."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "We create a seperate dataframe named ***ALL*** which groups the Daily dataframe by StatusDate. We are essentially getting rid of the State column. The ***Max*** column represents the maximum customer count per month. The Max column is used to smooth out the graph."
+      "We create a separate dataframe named ***ALL*** which groups the Daily dataframe by StatusDate. We are essentially getting rid of the ***State*** column. The ***Max*** column represents the maximum customer count per month. The ***Max*** column is used to smooth out the graph."
      ]
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "As you can see from the ***ALL*** dataframe above, in the month of January 2009, the maximum customer count was 901. If we used ***apply***, we would have a dataframe with (Year and Month) as the index and just the *Max* column with the value of 901. "
+      "As you can see from the ***ALL*** dataframe above, in the month of January 2009, the maximum customer count was 901. If we had used ***apply***, we would have got a dataframe with (Year and Month) as the index and just the *Max* column with the value of 901. "
      ]
     },
     {

lessons/04 - Lesson.ipynb

      "source": [
       "We can now start to select pieces of the dataframe using ***loc***.  \n",
       "\n",
-      "note: ***loc*** is strictly label based"
+      "note: ***loc*** is strictly label based. It is available from [version 0.11.0] (http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-11-0-april-22-2013)"
      ]
     },
     {
      "collapsed": false,
      "input": [
       "# df.iloc[inclusive:exclusive]\n",
-      "# note: .iloc is strictly integer position based \n",
+      "# Note: .iloc is strictly integer position based. It is available from [version 0.11.0] (http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-11-0-april-22-2013) \n",
       "df.iloc[0:3]"
      ],
      "language": "python",