Source

Learn Pandas / lessons / 08 - Lesson.ipynb

{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Lesson 8  \n",
      "\n",
      "How to pull data from a microsoft sql database"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Import libraries\n",
      "from pandas import DataFrame\n",
      "import pandas as pd\n",
      "from sqlalchemy import create_engine, MetaData, Table, select"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print 'Pandas version: ' + pd.__version__"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Pandas version: 0.11.0\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Grab Data from SQL  \n",
      "\n",
      "In this section we use the ***sqlalchemy*** library to grab data from a sql database. Make sure to use your own ***ServerName***, ***Database***, ***TableName***."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Parameters\n",
      "ServerName = \"RepSer2\"\n",
      "Database = \"BizIntel\"\n",
      "TableName = \"DimDate\"\n",
      "\n",
      "# Create the connection\n",
      "engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database)\n",
      "conn = engine.connect()\n",
      "\n",
      "# Required for querying tables\n",
      "metadata = MetaData(conn)\n",
      "\n",
      "# Table to query\n",
      "tbl = Table(TableName, metadata, autoload=True, schema=\"dbo\")\n",
      "#tbl.create(checkfirst=True)\n",
      "\n",
      "# Select all\n",
      "sql = tbl.select()\n",
      "\n",
      "# run sql code\n",
      "result = conn.execute(sql)\n",
      "\n",
      "# Insert to a dataframe\n",
      "df = DataFrame(data=list(result), columns=result.keys())\n",
      "\n",
      "# Close connection\n",
      "conn.close()\n",
      "\n",
      "print 'Done'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Done\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Select the contents in the dataframe."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<pre>\n",
        "&ltclass 'pandas.core.frame.DataFrame'&gt\n",
        "Int64Index: 5479 entries, 0 to 5478\n",
        "Data columns (total 16 columns):\n",
        "DateSK          5479  non-null values\n",
        "Date            5479  non-null values\n",
        "Day             5479  non-null values\n",
        "DaySuffix       5479  non-null values\n",
        "DayOfWeek       5479  non-null values\n",
        "DOWInMonth      5479  non-null values\n",
        "DayOfYear       5479  non-null values\n",
        "WeekOfYear      5479  non-null values\n",
        "WeekOfMonth     5479  non-null values\n",
        "Month           5479  non-null values\n",
        "MonthName       5479  non-null values\n",
        "Quarter         5479  non-null values\n",
        "QuarterName     5479  non-null values\n",
        "Year            5479  non-null values\n",
        "StandardDate    5479  non-null values\n",
        "HolidayText     210  non-null values\n",
        "dtypes: object(16)\n",
        "</pre>"
       ],
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "<class 'pandas.core.frame.DataFrame'>\n",
        "Int64Index: 5479 entries, 0 to 5478\n",
        "Data columns (total 16 columns):\n",
        "DateSK          5479  non-null values\n",
        "Date            5479  non-null values\n",
        "Day             5479  non-null values\n",
        "DaySuffix       5479  non-null values\n",
        "DayOfWeek       5479  non-null values\n",
        "DOWInMonth      5479  non-null values\n",
        "DayOfYear       5479  non-null values\n",
        "WeekOfYear      5479  non-null values\n",
        "WeekOfMonth     5479  non-null values\n",
        "Month           5479  non-null values\n",
        "MonthName       5479  non-null values\n",
        "Quarter         5479  non-null values\n",
        "QuarterName     5479  non-null values\n",
        "Year            5479  non-null values\n",
        "StandardDate    5479  non-null values\n",
        "HolidayText     210  non-null values\n",
        "dtypes: object(16)"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<pre>\n",
        "&ltclass 'pandas.core.frame.DataFrame'&gt\n",
        "Int64Index: 5 entries, 0 to 4\n",
        "Data columns (total 16 columns):\n",
        "DateSK          5  non-null values\n",
        "Date            5  non-null values\n",
        "Day             5  non-null values\n",
        "DaySuffix       5  non-null values\n",
        "DayOfWeek       5  non-null values\n",
        "DOWInMonth      5  non-null values\n",
        "DayOfYear       5  non-null values\n",
        "WeekOfYear      5  non-null values\n",
        "WeekOfMonth     5  non-null values\n",
        "Month           5  non-null values\n",
        "MonthName       5  non-null values\n",
        "Quarter         5  non-null values\n",
        "QuarterName     5  non-null values\n",
        "Year            5  non-null values\n",
        "StandardDate    5  non-null values\n",
        "HolidayText     1  non-null values\n",
        "dtypes: object(16)\n",
        "</pre>"
       ],
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "     DateSK                 Date Day DaySuffix  DayOfWeek DOWInMonth  \\\n",
        "0  20000101  2000-01-01 00:00:00   1       1st   Saturday          1   \n",
        "1  20000102  2000-01-02 00:00:00   2       2nd     Sunday          1   \n",
        "2  20000103  2000-01-03 00:00:00   3       3rd     Monday          1   \n",
        "3  20000104  2000-01-04 00:00:00   4       4th    Tuesday          1   \n",
        "4  20000105  2000-01-05 00:00:00   5       5th  Wednesday          1   \n",
        "\n",
        "  DayOfYear WeekOfYear WeekOfMonth Month MonthName Quarter QuarterName  Year  \\\n",
        "0         1          1           1     1   January       1       First  2000   \n",
        "1         2          2           2     1   January       1       First  2000   \n",
        "2         3          2           2     1   January       1       First  2000   \n",
        "3         4          2           2     1   January       1       First  2000   \n",
        "4         5          2           2     1   January       1       First  2000   \n",
        "\n",
        "  StandardDate     HolidayText  \n",
        "0   01/01/2000  New Year's Day  \n",
        "1   01/02/2000            None  \n",
        "2   01/03/2000            None  \n",
        "3   01/04/2000            None  \n",
        "4   01/05/2000            None  "
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Convert to specific data types. The code below will have to be modified to match your table."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Convert data types \n",
      "df.Date = df.Date.astype('datetime64')\n",
      "df.StandardDate = df.StandardDate.astype('datetime64')\n",
      "df.DateSK = df.DateSK.astype('int')\n",
      "df.Day = df.Day.astype('int')\n",
      "df.DOWInMonth = df.DOWInMonth.astype('int')\n",
      "df.DayOfYear = df.DayOfYear.astype('int')\n",
      "df.WeekOfYear = df.WeekOfYear.astype('int')\n",
      "df.WeekOfMonth = df.WeekOfMonth.astype('int')\n",
      "df.Month = df.Month.astype('int')\n",
      "df.Quarter = df.Quarter.astype('int')\n",
      "df.Year = df.Year.astype('int')\n",
      "\n",
      "print 'Data Types'\n",
      "print df.dtypes"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Data Types\n",
        "DateSK                   int32\n",
        "Date            datetime64[ns]\n",
        "Day                      int32\n",
        "DaySuffix               object\n",
        "DayOfWeek               object\n",
        "DOWInMonth               int32\n",
        "DayOfYear                int32\n",
        "WeekOfYear               int32\n",
        "WeekOfMonth              int32\n",
        "Month                    int32\n",
        "MonthName               object\n",
        "Quarter                  int32\n",
        "QuarterName             object\n",
        "Year                     int32\n",
        "StandardDate    datetime64[ns]\n",
        "HolidayText             object\n",
        "dtype: object\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<pre>\n",
        "&ltclass 'pandas.core.frame.DataFrame'&gt\n",
        "Int64Index: 5 entries, 0 to 4\n",
        "Data columns (total 16 columns):\n",
        "DateSK          5  non-null values\n",
        "Date            5  non-null values\n",
        "Day             5  non-null values\n",
        "DaySuffix       5  non-null values\n",
        "DayOfWeek       5  non-null values\n",
        "DOWInMonth      5  non-null values\n",
        "DayOfYear       5  non-null values\n",
        "WeekOfYear      5  non-null values\n",
        "WeekOfMonth     5  non-null values\n",
        "Month           5  non-null values\n",
        "MonthName       5  non-null values\n",
        "Quarter         5  non-null values\n",
        "QuarterName     5  non-null values\n",
        "Year            5  non-null values\n",
        "StandardDate    5  non-null values\n",
        "HolidayText     1  non-null values\n",
        "dtypes: datetime64[ns](2), int32(9), object(5)\n",
        "</pre>"
       ],
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "     DateSK                Date  Day DaySuffix  DayOfWeek  DOWInMonth  \\\n",
        "0  20000101 2000-01-01 00:00:00    1       1st   Saturday           1   \n",
        "1  20000102 2000-01-02 00:00:00    2       2nd     Sunday           1   \n",
        "2  20000103 2000-01-03 00:00:00    3       3rd     Monday           1   \n",
        "3  20000104 2000-01-04 00:00:00    4       4th    Tuesday           1   \n",
        "4  20000105 2000-01-05 00:00:00    5       5th  Wednesday           1   \n",
        "\n",
        "   DayOfYear  WeekOfYear  WeekOfMonth  Month MonthName  Quarter QuarterName  \\\n",
        "0          1           1            1      1   January        1       First   \n",
        "1          2           2            2      1   January        1       First   \n",
        "2          3           2            2      1   January        1       First   \n",
        "3          4           2            2      1   January        1       First   \n",
        "4          5           2            2      1   January        1       First   \n",
        "\n",
        "   Year        StandardDate     HolidayText  \n",
        "0  2000 2000-01-01 00:00:00  New Year's Day  \n",
        "1  2000 2000-01-02 00:00:00            None  \n",
        "2  2000 2000-01-03 00:00:00            None  \n",
        "3  2000 2000-01-04 00:00:00            None  \n",
        "4  2000 2000-01-05 00:00:00            None  "
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Author:** [David Rojas LLC](http://hdrojas.pythonanywhere.com/)  "
     ]
    }
   ],
   "metadata": {}
  }
 ]
}
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.