Commits

Anonymous committed 3144d50

homework 5 added

Comments (0)

Files changed (10)

codes/homework5/Makefile

+
+OBJECTS = functions.o quadrature.o test.o
+FFLAGS = -fopenmp
+
+.PHONY: test clean 
+
+test: test.exe
+	./test.exe
+
+test.exe: $(OBJECTS)
+	gfortran $(FFLAGS) $(OBJECTS) -o test.exe
+
+%.o : %.f90
+	gfortran $(FFLAGS) -c  $< 
+
+clean:
+	rm -f *.o *.exe *.mod
+

codes/homework5/README.txt

+
+Sample code to start with for Homework 5.
+
+For convenience a Makefile is provided:
+   $ make test
+
+to compile and run.
+
+A notebook describing Simpson's rule is in the notebook subdirectory, in
+various formats.
+

codes/homework5/functions.f90

+
+module functions
+
+    use omp_lib
+    implicit none
+    integer :: fevals(0:7)
+    real(kind=8) :: k
+    save
+
+contains
+
+    real(kind=8) function f(x)
+        implicit none
+        real(kind=8), intent(in) :: x 
+        integer thread_num
+
+        ! keep track of number of function evaluations by
+        ! each thread:
+        thread_num = 0   ! serial mode
+        !$ thread_num = omp_get_thread_num()
+        fevals(thread_num) = fevals(thread_num) + 1
+        
+        f = 1.d0 + x**3 + sin(k*x)
+        
+    end function f
+
+end module functions

codes/homework5/notebook/quadrature2.ipynb

+{
+ "metadata": {
+  "name": "quadrature2"
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Numerical Quadrature"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Numerical quadrature refers to approximating a definite integral numerically, \n",
+      "$$~~ \\int_a^b f(x) dx.$$\n",
+      "Many numerical analysis textbooks describe a variety of quadrature methods or \"rules\".  "
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "First define a simple function for which we know the exact answer:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def f1(x):\n",
+      "    return 1. + x**3\n",
+      "\n",
+      "a1 = 0.\n",
+      "b1 = 2.\n",
+      "int_true1 = (b1-a1) + (b1**4 -a1**4) / 4.\n",
+      "print \"true integral: %22.14e\" % int_true1"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "heading",
+     "level": 2,
+     "metadata": {},
+     "source": [
+      "The Trapezoid Rule"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We will first look at the Trapezoid method.  This method is implemented by evaluating the function at $n$ points and then computing the areas of the trapezoids defined by a piecewise linear approximation to the original function defined by these points.  In the figure below, we are approximating the integral of the blue curve by the sum of the areas of the red trapezoids."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def plot_trap(f,a,b,n):\n",
+      "    x = linspace(a-0.2, b+0.2, 10000) # points for smooth plot\n",
+      "    plot(x,f(x),'b-')\n",
+      "    xj = linspace(a,b,n)\n",
+      "    plot(xj,f(xj),'ro-')\n",
+      "    for xi in xj:\n",
+      "        plot([xi,xi], [0,f(xi)], 'r')\n",
+      "    plot([a,b], [0,0], 'r') # along x-axis\n",
+      "\n",
+      "plot_trap(f1,a1,b1,5)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "heading",
+     "level": 3,
+     "metadata": {},
+     "source": [
+      "The Trapezoid rule formula"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The area of a single trapezoid is the width of the base times the average height, so between points $x_j$ and $x_{j+1}$ this gives:\n",
+      "$$ \\frac{h}{2} (f(x_j) + f(x_{j+1}). $$\n",
+      "\n",
+      "Summing this up over all the trapezoids gives:\n",
+      "$$ h\\left(\\frac 1 2 f(x_0) + f(x_1) + f(x_2) + \\cdots + f(x_{n-2}) + \\frac 1 2 f(x_{n-1})\\right) = h\\sum_{j=0}^{n-1} f(x_j) - \\frac h 2 \\left(f(x_0) + f(x_{n-1})\\right) =  h\\sum_{j=0}^{n-1} f(x_j) - \\frac h 2 \\left(f(a) + f(b))\\right). $$\n",
+      "\n",
+      "This can be implemented as follows (note that in Python fj[-1] refers to the last element of fj, and similarly fj[-2] would be the next to last element)."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def trapezoid(f,a,b,n):\n",
+      "    h = (b-a)/(n-1)\n",
+      "    xj = linspace(a,b,n)\n",
+      "    fj = f(xj)\n",
+      "    int_trapezoid = h*sum(fj) - 0.5*h*(fj[0] + fj[-1])\n",
+      "    return int_trapezoid\n"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We can test it out for the points used in the figure above:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "n = 5\n",
+      "int_trap = trapezoid(f1,a1,b1,n)\n",
+      "error = abs(int_trap - int_true1)\n",
+      "print \"trapezoid rule approximation: %22.14e,  error: %10.3e\" % (int_trap, error)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Using more points will give a better approximation, try changing it in the cell above."
+     ]
+    },
+    {
+     "cell_type": "heading",
+     "level": 3,
+     "metadata": {},
+     "source": [
+      "Convergence tests"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "If we increase n, the number of points used, and hence decrease h, the spacing between points, we expect the error to converge to zero for reasonable functions $f(x)$.\n",
+      "\n",
+      "The trapezoid rule is \"second order accurate\", meaning that the error goes to zero like $O(h^2)$ for a function that is sufficiently smooth (for example if its second derivative is continuous).  For small $h$, the error is expected to be behave like $Ch^2 + O(h^3)~$ as $h$ goes to zero, where $C$ is some constant that depends on how smooth $h$ is.  \n",
+      "\n",
+      "If we double n (and halve h) then we expect the error to go down by a factor of 4 roughly (from $Ch^2$ to $C(h/2)^2~$).\n",
+      "\n",
+      "We can check this by trying several values of n and making a table of the errors and the ratio from one n to the next:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def error_table(f,a,b,nvals,int_true,method=trapezoid):\n",
+      "    \"\"\"\n",
+      "    An improved version that takes the function defining the method as an\n",
+      "    input argument.\n",
+      "    \"\"\"\n",
+      "    print \"      n         approximation        error       ratio\"\n",
+      "    last_error = 0.  # need something for first ratio\n",
+      "    for n in nvals:\n",
+      "        int_approx = method(f,a,b,n)\n",
+      "        error = abs(int_approx - int_true)\n",
+      "        ratio = last_error / error\n",
+      "        last_error = error # for next n\n",
+      "        print \"%8i  %22.14e  %10.3e  %10.3e\" % (n,int_approx, error, ratio)\n",
+      "    \n",
+      "nvals = array([5, 10, 20, 40, 80, 160, 320])\n",
+      "error_table(f1,a1,b1,nvals,int_true1,trapezoid)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "(Note that the first ratio reported is meaningless.)\n",
+      "\n",
+      "Convergence might be easier to see in a plot.  If a method is p'th order accurate then we expect the error to behave like $E\\approx Ch^p$ for some constant $C$, for small $h$.  This is hard to visualize.  It is much easier to see what order accuracy we are achieving if we produce a log-log plot instead, since $E = Ch^p~$ means that $\\log E = \\log C + p\\log h$ \n",
+      "\n",
+      "In other words $\\log E~$ is a linear function of $\\log h~$."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def error_plot(f,a,b,nvals,int_true,method=trapezoid):\n",
+      "    errvals = zeros(nvals.shape)  # initialize to right shape\n",
+      "    for i in range(len(nvals)):\n",
+      "        n = nvals[i]\n",
+      "        int_approx = method(f,a,b,n)\n",
+      "        error = abs(int_approx - int_true)\n",
+      "        errvals[i] = error\n",
+      "    hvals = (b - a) / (nvals - 1)  # vector of h values for each n\n",
+      "    loglog(hvals,errvals, 'o-')\n",
+      "    xlabel('spacing h')\n",
+      "    ylabel('error')\n",
+      "    \n",
+      "error_plot(f1,a1,b1,nvals,int_true1,trapezoid)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "heading",
+     "level": 3,
+     "metadata": {},
+     "source": [
+      "An oscillatory function"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "If the function $f(x)$ is not as smooth (has larger second derivative at various places) then the accuracy with a small number of points will not be nearly as good.  For example, consider the function $f_2(x) = 1 + x^3 + \\sin(kx)~~~$ where $k$ is a parameter.  For large $k$ this function is very oscillatory.  In order to experiment with different values of $k$, we can define a \"function factory\" that creates this function for any given $k$, and also returns the true integral over a given interval:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def f2_factory(k, a, b):\n",
+      "    def f2(x):\n",
+      "        return 1 + x**3 + sin(k*x)\n",
+      "    int_true = (b-a) + (b**4 - a**4) / 4. - (1./k) * (cos(k*b) - cos(k*a))\n",
+      "    return f2, int_true\n",
+      "    "
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "First create a version of $f_2$ with $k=50$:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "k = 50.\n",
+      "a2 = 0.\n",
+      "b2 = 2.\n",
+      "f2, int_true2 = f2_factory(k, a2, b2)\n",
+      "print \"true integral: %22.14e\" % int_true2"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "For this function with k=50, using n=10 points is not going to give a very good approximation:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "plot_trap(f2,a2,b2,10)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "This doesn't look very good, but for larger values of $n$ we still see the expected convergence rate:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "error_plot(f2,a2,b2,nvals,int_true2)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Now make the function much more oscillatory with a larger value of $k$..."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "k = 1000.\n",
+      "f2, int_true2 = f2_factory(k,a2,b2)\n",
+      "print \"true integral: %22.14e\" % int_true2"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "For the previous choice of nvals the method does not seem to be doing well:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "nvals = array([5, 10, 20, 40, 80, 160, 320])\n",
+      "print \"nvals = \",nvals\n",
+      "error_plot(f2,a2,b2,nvals,int_true2, trapezoid)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "In this case the $O(h^2)~$ behavior does not become apparent unless we use much smaller $h$ values so that we are resolving the oscillations:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "nvals = array([5 * 2**i for i in range(12)])\n",
+      "print \"nvals = \",nvals\n",
+      "error_plot(f2,a2,b2,nvals,int_true2,trapezoid)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Eventually we see second order convergence and ratios that approach 4:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "error_table(f2,a2,b2,nvals,int_true2,trapezoid)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "heading",
+     "level": 2,
+     "metadata": {},
+     "source": [
+      "Simpson's Rule"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "There are much better methods than the Trapezoidal rule that are not much harder to implement but get much smaller errors with the same number of function evaluations. One such method is Simpson\u2019s rule, which approximates the integral over a single interval from $x_i$ to $x_{i+1}$ by\n",
+      "$$\\int_{x_i}^{x_{i+1}} f(x)\\, dx \\approx \\frac h 6 (f(x_i) + 4f(x_{i+1/2}) + f(x_{i+1})),$$\n",
+      "where $x_{i+1/2} = \\frac 1 2 (x_i + x_{i+1}) = x_i + h/2.$\n",
+      "\n",
+      "Derivation: The trapezoid method is derived by approximating the function on each interval by a linear function interpolating at the two endpoints of each interval and then integrating this linear function.  Simpson's method is derived by approximating the function by a quadratic function interpolating at the endpoints and the center of the interval and integrating this quadratic function."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Adding this up over $n-1$  intervals gives the approximation\n",
+      "$$\\frac{h}{6}[f(x_0) + 4f(x_{1/2}) + 2f(x_1) + 4f(x_{3/2}) + 2f(x_2) + \\cdots + 2f(x_{n-2}) + 4f(x_{n-3/2}) + f(x_{n-1})].$$\n",
+      "In Python this can be implemented by the following code:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def simpson(f,a,b,n):\n",
+      "    h = (b-a)/(n-1)\n",
+      "    xj = linspace(a,b,n)\n",
+      "    fj = f(xj)\n",
+      "    xc = linspace(a+h/2,b-h/2,n-1)  # midpoints of cells\n",
+      "    fc = f(xc)\n",
+      "    int_simpson = (h/6.) * (2.*sum(fj) - (fj[0] + fj[-1]) + 4.*sum(fc))\n",
+      "    return int_simpson"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "This method is 4th order accurate, which means that on fine enough grids the error is proportional to \\Delta x^4. Hence increasing n by a factor of 2 should decrease the error by a factor of 2^4 = 16.  Let's try it on the last function we were experimenting with:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "k = 1000.\n",
+      "f2, int_true2 = f2_factory(k,a2,b2)\n",
+      "print \"true integral: %22.14e\" % int_true2\n",
+      "\n",
+      "error_table(f2,a2,b2,nvals,int_true2,simpson)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Note that the errors get smaller much faster and the ratio approaches 16.  The improvement over the trapezoid method is seen more clearly if we plot the errors together:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "error_plot(f2,a2,b2,nvals,int_true2,trapezoid)\n",
+      "error_plot(f2,a2,b2,nvals,int_true2,simpson)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "You might want to experiment with changing $k$ in the two cells above."
+     ]
+    },
+    {
+     "cell_type": "heading",
+     "level": 4,
+     "metadata": {},
+     "source": [
+      "Simpson's method integrates cubic functions exactly"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Even though Simpson'e method is derived by integrating a quadratic approximation of the function, rather than linear as with the Trapezoid Rule, in fact it also integrates a cubic exactly, as seen if we try it out with the function f1 defined at the top of this notebook.  (This is because the error between the cubic and the quadratic approximation on each interval is not zero but does have integral equal to zero since it turns out to be an odd function about the midpoint.)  For this reason Simpson's Rule is fourth order accurate in general rather than only third order, as one might expect when going from a linear to quadratic approximation.\n",
+      "\n",
+      "Note the error ratios are whacky as a result."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "error_table(f1,a1,b1,nvals,int_true1,simpson)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}

codes/homework5/notebook/quadrature2.pdf

Binary file added.

codes/homework5/notebook/quadrature2.py

+
+# Saved from notebook and then edited a bit by hand, e.g. added
+# from pylab import *
+
+# As it, this will do all the plot on top of one another, so you might want
+# to add
+# figure()  commands before each plot to create a new figure, and
+# savefig(filename) afterwards
+# if you wanted to save them.
+
+#------------------------------------
+
+
+# -*- coding: utf-8 -*-
+# <nbformat>3.0</nbformat>
+
+# <headingcell level=1>
+
+# Numerical Quadrature
+
+# <markdowncell>
+
+# Numerical quadrature refers to approximating a definite integral numerically, 
+# $$~~ \int_a^b f(x) dx.$$
+# Many numerical analysis textbooks describe a variety of quadrature methods or "rules".  
+
+# <markdowncell>
+
+# First define a simple function for which we know the exact answer:
+
+# <codecell>
+
+from pylab import *  # added by hand
+
+
+def f1(x):
+    return 1. + x**3
+
+a1 = 0.
+b1 = 2.
+int_true1 = (b1-a1) + (b1**4 -a1**4) / 4.
+print "true integral: %22.14e" % int_true1
+
+# <headingcell level=2>
+
+# The Trapezoid Rule
+
+# <markdowncell>
+
+# We will first look at the Trapezoid method.  This method is implemented by evaluating the function at $n$ points and then computing the areas of the trapezoids defined by a piecewise linear approximation to the original function defined by these points.  In the figure below, we are approximating the integral of the blue curve by the sum of the areas of the red trapezoids.
+
+# <codecell>
+
+def plot_trap(f,a,b,n):
+    x = linspace(a-0.2, b+0.2, 10000) # points for smooth plot
+    plot(x,f(x),'b-')
+    xj = linspace(a,b,n)
+    plot(xj,f(xj),'ro-')
+    for xi in xj:
+        plot([xi,xi], [0,f(xi)], 'r')
+    plot([a,b], [0,0], 'r') # along x-axis
+
+plot_trap(f1,a1,b1,5)
+
+# <headingcell level=3>
+
+# The Trapezoid rule formula
+
+# <markdowncell>
+
+# The area of a single trapezoid is the width of the base times the average height, so between points $x_j$ and $x_{j+1}$ this gives:
+# $$ \frac{h}{2} (f(x_j) + f(x_{j+1}). $$
+# 
+# Summing this up over all the trapezoids gives:
+# $$ h\left(\frac 1 2 f(x_0) + f(x_1) + f(x_2) + \cdots + f(x_{n-2}) + \frac 1 2 f(x_{n-1})\right) = h\sum_{j=0}^{n-1} f(x_j) - \frac h 2 \left(f(x_0) + f(x_{n-1})\right) =  h\sum_{j=0}^{n-1} f(x_j) - \frac h 2 \left(f(a) + f(b))\right). $$
+# 
+# This can be implemented as follows (note that in Python fj[-1] refers to the last element of fj, and similarly fj[-2] would be the next to last element).
+
+# <codecell>
+
+def trapezoid(f,a,b,n):
+    h = (b-a)/(n-1)
+    xj = linspace(a,b,n)
+    fj = f(xj)
+    int_trapezoid = h*sum(fj) - 0.5*h*(fj[0] + fj[-1])
+    return int_trapezoid
+
+# <markdowncell>
+
+# We can test it out for the points used in the figure above:
+
+# <codecell>
+
+n = 5
+int_trap = trapezoid(f1,a1,b1,n)
+error = abs(int_trap - int_true1)
+print "trapezoid rule approximation: %22.14e,  error: %10.3e" % (int_trap, error)
+
+# <markdowncell>
+
+# Using more points will give a better approximation, try changing it in the cell above.
+
+# <headingcell level=3>
+
+# Convergence tests
+
+# <markdowncell>
+
+# If we increase n, the number of points used, and hence decrease h, the spacing between points, we expect the error to converge to zero for reasonable functions $f(x)$.
+# 
+# The trapezoid rule is "second order accurate", meaning that the error goes to zero like $O(h^2)$ for a function that is sufficiently smooth (for example if its second derivative is continuous).  For small $h$, the error is expected to be behave like $Ch^2 + O(h^3)~$ as $h$ goes to zero, where $C$ is some constant that depends on how smooth $h$ is.  
+# 
+# If we double n (and halve h) then we expect the error to go down by a factor of 4 roughly (from $Ch^2$ to $C(h/2)^2~$).
+# 
+# We can check this by trying several values of n and making a table of the errors and the ratio from one n to the next:
+
+# <codecell>
+
+def error_table(f,a,b,nvals,int_true,method=trapezoid):
+    """
+    An improved version that takes the function defining the method as an
+    input argument.
+    """
+    print "      n         approximation        error       ratio"
+    last_error = 0.  # need something for first ratio
+    for n in nvals:
+        int_approx = method(f,a,b,n)
+        error = abs(int_approx - int_true)
+        ratio = last_error / error
+        last_error = error # for next n
+        print "%8i  %22.14e  %10.3e  %10.3e" % (n,int_approx, error, ratio)
+    
+nvals = array([5, 10, 20, 40, 80, 160, 320])
+error_table(f1,a1,b1,nvals,int_true1,trapezoid)
+
+# <markdowncell>
+
+# (Note that the first ratio reported is meaningless.)
+# 
+# Convergence might be easier to see in a plot.  If a method is p'th order accurate then we expect the error to behave like $E\approx Ch^p$ for some constant $C$, for small $h$.  This is hard to visualize.  It is much easier to see what order accuracy we are achieving if we produce a log-log plot instead, since $E = Ch^p~$ means that $\log E = \log C + p\log h$ 
+# 
+# In other words $\log E~$ is a linear function of $\log h~$.
+
+# <codecell>
+
+def error_plot(f,a,b,nvals,int_true,method=trapezoid):
+    errvals = zeros(nvals.shape)  # initialize to right shape
+    for i in range(len(nvals)):
+        n = nvals[i]
+        int_approx = method(f,a,b,n)
+        error = abs(int_approx - int_true)
+        errvals[i] = error
+    hvals = (b - a) / (nvals - 1)  # vector of h values for each n
+    loglog(hvals,errvals, 'o-')
+    xlabel('spacing h')
+    ylabel('error')
+    
+error_plot(f1,a1,b1,nvals,int_true1,trapezoid)
+
+# <headingcell level=3>
+
+# An oscillatory function
+
+# <markdowncell>
+
+# If the function $f(x)$ is not as smooth (has larger second derivative at various places) then the accuracy with a small number of points will not be nearly as good.  For example, consider the function $f_2(x) = 1 + x^3 + \sin(kx)~~~$ where $k$ is a parameter.  For large $k$ this function is very oscillatory.  In order to experiment with different values of $k$, we can define a "function factory" that creates this function for any given $k$, and also returns the true integral over a given interval:
+
+# <codecell>
+
+def f2_factory(k, a, b):
+    def f2(x):
+        return 1 + x**3 + sin(k*x)
+    int_true = (b-a) + (b**4 - a**4) / 4. - (1./k) * (cos(k*b) - cos(k*a))
+    return f2, int_true
+    
+
+# <markdowncell>
+
+# First create a version of $f_2$ with $k=50$:
+
+# <codecell>
+
+k = 50.
+a2 = 0.
+b2 = 2.
+f2, int_true2 = f2_factory(k, a2, b2)
+print "true integral: %22.14e" % int_true2
+
+# <markdowncell>
+
+# For this function with k=50, using n=10 points is not going to give a very good approximation:
+
+# <codecell>
+
+plot_trap(f2,a2,b2,10)
+
+# <markdowncell>
+
+# This doesn't look very good, but for larger values of $n$ we still see the expected convergence rate:
+
+# <codecell>
+
+error_plot(f2,a2,b2,nvals,int_true2)
+
+# <markdowncell>
+
+# Now make the function much more oscillatory with a larger value of $k$...
+
+# <codecell>
+
+k = 1000.
+f2, int_true2 = f2_factory(k,a2,b2)
+print "true integral: %22.14e" % int_true2
+
+# <markdowncell>
+
+# For the previous choice of nvals the method does not seem to be doing well:
+
+# <codecell>
+
+nvals = array([5, 10, 20, 40, 80, 160, 320])
+print "nvals = ",nvals
+error_plot(f2,a2,b2,nvals,int_true2, trapezoid)
+
+# <markdowncell>
+
+# In this case the $O(h^2)~$ behavior does not become apparent unless we use much smaller $h$ values so that we are resolving the oscillations:
+
+# <codecell>
+
+nvals = array([5 * 2**i for i in range(12)])
+print "nvals = ",nvals
+error_plot(f2,a2,b2,nvals,int_true2,trapezoid)
+
+# <markdowncell>
+
+# Eventually we see second order convergence and ratios that approach 4:
+
+# <codecell>
+
+error_table(f2,a2,b2,nvals,int_true2,trapezoid)
+
+# <headingcell level=2>
+
+# Simpson's Rule
+
+# <markdowncell>
+
+# There are much better methods than the Trapezoidal rule that are not much harder to implement but get much smaller errors with the same number of function evaluations. One such method is Simpson’s rule, which approximates the integral over a single interval from $x_i$ to $x_{i+1}$ by
+# $$\int_{x_i}^{x_{i+1}} f(x)\, dx \approx \frac h 6 (f(x_i) + 4f(x_{i+1/2}) + f(x_{i+1})),$$
+# where $x_{i+1/2} = \frac 1 2 (x_i + x_{i+1}) = x_i + h/2.$
+# 
+# Derivation: The trapezoid method is derived by approximating the function on each interval by a linear function interpolating at the two endpoints of each interval and then integrating this linear function.  Simpson's method is derived by approximating the function by a quadratic function interpolating at the endpoints and the center of the interval and integrating this quadratic function.
+
+# <markdowncell>
+
+# Adding this up over $n-1$  intervals gives the approximation
+# $$\frac{h}{6}[f(x_0) + 4f(x_{1/2}) + 2f(x_1) + 4f(x_{3/2}) + 2f(x_2) + \cdots + 2f(x_{n-2}) + 4f(x_{n-3/2}) + f(x_{n-1})].$$
+# In Python this can be implemented by the following code:
+
+# <codecell>
+
+def simpson(f,a,b,n):
+    h = (b-a)/(n-1)
+    xj = linspace(a,b,n)
+    fj = f(xj)
+    xc = linspace(a+h/2,b-h/2,n-1)  # midpoints of cells
+    fc = f(xc)
+    int_simpson = (h/6.) * (2.*sum(fj) - (fj[0] + fj[-1]) + 4.*sum(fc))
+    return int_simpson
+
+# <markdowncell>
+
+# This method is 4th order accurate, which means that on fine enough grids the error is proportional to \Delta x^4. Hence increasing n by a factor of 2 should decrease the error by a factor of 2^4 = 16.  Let's try it on the last function we were experimenting with:
+
+# <codecell>
+
+k = 1000.
+f2, int_true2 = f2_factory(k,a2,b2)
+print "true integral: %22.14e" % int_true2
+
+error_table(f2,a2,b2,nvals,int_true2,simpson)
+
+# <markdowncell>
+
+# Note that the errors get smaller much faster and the ratio approaches 16.  The improvement over the trapezoid method is seen more clearly if we plot the errors together:
+
+# <codecell>
+
+error_plot(f2,a2,b2,nvals,int_true2,trapezoid)
+error_plot(f2,a2,b2,nvals,int_true2,simpson)
+
+# <markdowncell>
+
+# You might want to experiment with changing $k$ in the two cells above.
+
+# <headingcell level=4>
+
+# Simpson's method integrates cubic functions exactly
+
+# <markdowncell>
+
+# Even though Simpson'e method is derived by integrating a quadratic approximation of the function, rather than linear as with the Trapezoid Rule, in fact it also integrates a cubic exactly, as seen if we try it out with the function f1 defined at the top of this notebook.  (This is because the error between the cubic and the quadratic approximation on each interval is not zero but does have integral equal to zero since it turns out to be an odd function about the midpoint.)  For this reason Simpson's Rule is fourth order accurate in general rather than only third order, as one might expect when going from a linear to quadratic approximation.
+# 
+# Note the error ratios are whacky as a result.
+
+# <codecell>
+
+error_table(f1,a1,b1,nvals,int_true1,simpson)
+
+# <codecell>
+
+

codes/homework5/quadrature.f90

+
+module quadrature
+
+    use omp_lib
+
+contains
+
+real(kind=8) function trapezoid(f, a, b, n)
+
+    ! Estimate the integral of f(x) from a to b using the
+    ! Trapezoid Rule with n points.
+
+    ! Input:
+    !   f:  the function to integrate
+    !   a:  left endpoint
+    !   b:  right endpoint
+    !   n:  number of points to use
+    ! Returns:
+    !   the estimate of the integral
+     
+    implicit none
+    real(kind=8), intent(in) :: a,b
+    real(kind=8), external :: f
+    integer, intent(in) :: n
+
+    ! Local variables:
+    integer :: j
+    real(kind=8) :: h, trap_sum, xj
+
+    h = (b-a)/(n-1)
+    trap_sum = 0.5d0*(f(a) + f(b))  ! endpoint contributions
+    
+    !$omp parallel do private(xj) reduction(+ : trap_sum) 
+    do j=2,n-1
+        xj = a + (j-1)*h
+        trap_sum = trap_sum + f(xj)
+        enddo
+
+    trapezoid = h * trap_sum
+
+end function trapezoid
+
+
+subroutine error_table(f,a,b,nvals,int_true,method)
+
+    ! Compute and print out a table of errors when the quadrature
+    ! rule specified by the input function method is applied for
+    ! each value of n in the array nvals.
+
+    implicit none
+    real(kind=8), intent(in) :: a,b, int_true
+    real(kind=8), external :: f, method
+    integer, dimension(:), intent(in) :: nvals
+
+    ! Local variables:
+    integer :: j, n
+    real(kind=8) :: ratio, last_error, error, int_approx
+
+    print *, "      n         approximation        error       ratio"
+    last_error = 0.d0   
+    do j=1,size(nvals)
+        n = nvals(j)
+        int_approx = method(f,a,b,n)
+        error = abs(int_approx - int_true)
+        ratio = last_error / error
+        last_error = error  ! for next n
+
+        print 11, n, int_approx, error, ratio
+ 11     format(i8, es22.14, es13.3, es13.3)
+        enddo
+
+end subroutine error_table
+
+
+end module quadrature
+

codes/homework5/test.f90

+
+program test
+
+    use omp_lib
+
+    use quadrature, only: trapezoid, error_table
+    use functions, only: f, fevals, k
+
+    implicit none
+    real(kind=8) :: a,b,int_true
+    integer :: nvals(12)
+    integer :: i, nthreads
+
+    real(kind=8) :: t1, t2, elapsed_time
+    integer(kind=8) :: tclock1, tclock2, clock_rate
+
+    nthreads = 1      ! for serial mode
+    !$ nthreads = 4   ! for openmp
+    !$ call omp_set_num_threads(nthreads)
+    print 100, nthreads
+100 format("Using ",i2," threads")
+
+    fevals = 0
+
+    k = 1.d3   ! functions module variable for function f2
+    a = 0.d0
+    b = 2.d0
+    int_true = (b-a) + (b**4 - a**4) / 4.d0 - (1.d0/k) * (cos(k*b) - cos(k*a))
+
+    print 10, int_true
+ 10 format("true integral: ", es22.14)
+    print *, " "  ! blank line
+
+    ! values of n to test:   (larger values than before)
+    do i=1,12
+        nvals(i) = 50 * 2**(i-1)
+        enddo
+
+    ! time the call to error_table:
+    call system_clock(tclock1)  
+    call cpu_time(t1)
+    call error_table(f, a, b, nvals, int_true, trapezoid)
+    call cpu_time(t2)   
+    call system_clock(tclock2, clock_rate)
+
+    elapsed_time = float(tclock2 - tclock1) / float(clock_rate)
+    print *, " "
+    print 11, elapsed_time
+ 11 format("Elapsed time = ",f12.8, " seconds")
+
+    print 12, t2-t1
+ 12 format("CPU time = ",f12.8, " seconds")
+
+    
+    ! print the number of function evaluations by each thread:
+    do i=0,nthreads-1
+        print 101,  i, fevals(i)
+101     format("fevals by thread ",i2,": ",i13)
+        enddo
+
+    print 102, sum(fevals)
+102 format("Total number of fevals: ",i10)
+
+end program test

notes/homework5.rst

+
+.. _homework5:
+
+==========================================
+Homework 5 
+==========================================
+
+
+Due Wednesday, May 22, 2013, by 11:00pm PDT.
+
+The goals of this homework are to:
+
+* Get more experience with Fortran and OpenMP.
+* Experiment with coarse grain parallelism.
+* Learn a bit more about quadrature.
+
+#. I wrote an article recently urging applied mathematicians to share 
+   research code, which appeared in SIAM News.
+   Some colleagues at other universities have told me it's required
+   reading for their students, so of course I have to assign it too!  
+   It's pretty light reading.   
+   A short quiz will be available on the Canvas page.
+
+     * `<http://www.siam.org/news/news.php?id=2064>`_
+     * `some related links  <http://faculty.washington.edu/rjl/pubs/topten/index.html>`_
+
+
+#. The IPython notebook `$UWHPSC/codes/homework5/notebook/quadrature2.ipynb`
+   is an improved version of the notebook from the last homework.  Some of
+   the functions have been made more general and a discussion has
+   been added of Simpson's Rule, a more accurate formula than Trapezoid.
+
+   This notebook is best viewed live so that you can experiment with
+   changing things in order to explore these examples.  If you have a
+   sufficiently recent version of the notebok installed (see
+   :ref:`ipython_notebook`) then you should be able to do::
+
+        $ cd $UWHPSC/codes/homework5/notebook
+        $ ipython notebook --pylab inline 
+
+   and then click on the `quadrature2` notebook.
+
+   I will also try to post it on wakari, but that seems to be down
+   currently.
+
+   .. comment ::
+       You can also view it at `wakari <?>`_ 
+       and if you have created a Wakari account, download it to your account to
+       try it out.
+
+   You can also view a static version of the notebook, which is in 
+   `$UWHPSC/codes/homework5/notebook/quadrature2.pdf`.
+
+   There is also a Python script version of the code in the notebook at
+   `$UWHPSC/codes/homework5/notebook/quadrature.py` if you 
+   find that easier to experiment with. 
+   (But see the comments at the top of that file.)
+
+   Experiment with the notebook or the module to make sure you understand
+   the material presented.  
+
+   Study this code to make sure you understand it.
+   
+#. The directory `$UWHPSC/codes/homework5/` also contains Fortran code
+   that implements the last part of homework 4, with some added
+   enhancements.  In particular:
+
+   * timing has been added
+   * a counter has been added to the f2 function to count how many times it
+     is called, and this is printed at the end.
+   * fevals is a module variable (shared between threads) to store the
+     number of function calls by each thread when OpenMP is used.
+   * the `error_table` subroutine has a new input parameter `method` to
+     pass in the function that approximates the integral.  When this is
+     called from the main program `test.f90` the function `trapezoid` is
+     passed in.  In this homework you will also test Simpson's rule.
+   * The function `f2` has been moved to a module and the parameter `k` 
+     is a module variable.
+     
+
+   Study this code and experiment with it.
+
+   With 4 threads it might produce something like the following (timings
+   will depend on how many cores you have)::
+        
+        Using  4 threads
+        true integral:   6.00136745954910E+00
+          
+               n         approximation        error       ratio
+              50  6.00200615142458E+00    6.387E-04    0.000E+00
+             100  6.01762134207395E+00    1.625E-02    3.929E-02
+             200  5.99787907396672E+00    3.488E-03    4.659E+00
+             400  5.99537682567465E+00    5.991E-03    5.823E-01
+             800  6.00057196798962E+00    7.955E-04    7.531E+00
+            1600  6.00118591794817E+00    1.815E-04    4.382E+00
+            3200  6.00132301603504E+00    4.444E-05    4.085E+00
+            6400  6.00135640717690E+00    1.105E-05    4.021E+00
+           12800  6.00136470029559E+00    2.759E-06    4.006E+00
+           25600  6.00136677000209E+00    6.895E-07    4.002E+00
+           51200  6.00136728718235E+00    1.724E-07    4.000E+00
+          102400  6.00136741645906E+00    4.309E-08    4.000E+00
+          
+        Elapsed time =   0.00554200 seconds
+        CPU time =   0.01890300 seconds
+        fevals by thread  0:         51211
+        fevals by thread  1:         51187
+        fevals by thread  2:         51187
+        fevals by thread  3:         51165
+        Total number of fevals:     204750
+        
+
+   You do not need to submit anything for this part.
+
+#. Create a new subdirectory `$MYHPSC/homework5` for the code you write
+   below.  You can use the code provided as a starting point.
+
+   Create a new module `quadrature2.f90` by starting with `quadrature.f90`
+   and adding a new function `simpson` that
+   implements Simpson's rule.  It should have the same input arguments as
+   `trapezoid`.  
+
+   Write a new main program `test2.f90` to test this.
+   Check that it is 4th order accurate on the function `f2`
+   provided with various values of `k`.  Check it also with some other
+   functions if you want, since we will test it with something other than
+   the provided function `f2`.
+
+   **Note:** this module should also be called `quadrature2`, not just the
+   file name, i.e. it should start with the line::
+
+        module quadrature2
+
+   and `test2.f90` should::
+
+        use quadrature2, only: ...
+   
+   This is important for grading purposes since we might have a different
+   main program that will `use` your module!
+
+#. Your `simpson` routine should include an `omp parallel do` loop similar
+   to `trapezoid`.  Make sure it gives the same results in the error table
+   for both with and without the `-fopenmp` during compilation, and for
+   different choices of the number of threads.
+
+   Remember that you can run with more threads than your computer has cores
+   and it should still work, but will probably make it run slower rather
+   than faster.  We will not be checking timings although you might want to
+   pay attention to this to see if your computer behaves as expected.
+
+#. Create a new version of the `quadrature` module named `quadrature3` that
+   has no parallel loops in `trapezoid` and instead has a parallel do loop 
+   in the `error_table` routine when it loops over the different values of
+   `n` to test from the `nvals` array.
+
+   In this loop make `last_error` a *firstprivate* variable and think about
+   what other variables need to be *private*.  More about this below.
+
+   Test this version with a new test program `test3.f90` that calls
+   `error_table` with `method = trapezoid`. 
+
+   Note the following:
+
+   * If you run this with more than one thread, the different lines of the
+     error table probably will not print out in the same order as on a
+     single thread.
+   * The values of `ratio` in the table will be wrong relative to the single 
+     thread code for various `n`.  Make sure you understand why.
+     (The values of the `error` should still agree with the single-thread
+     code, however.)
+   * This is not a very good way to try to parallelize this code because
+     it does not have good *load balancing*.  If you run with 2 threads, for
+     example, one of them will do many more function evaluations than the
+     other thread, if you allow OpenMP to split up the values of `n` between
+     threads in the default manner.  Think about why this is so and make
+     sure you understand what's going on.  
+
+#.  Because of the load-balancing issue just mentioned, it is useful to
+    include another clause in the `omp parallel do` loop directive in error
+    table::
+
+        !$omp parallel do ...  &   ! whatever you needed before
+        !$omp          schedule(dynamic)
+        do j=1,size(nvals)
+
+    This instructs the compiler to split up the values of `j` from 1 to
+    `size(nvals)` dynamically rather than deciding in advance that the first
+    half of the values will go to Thread 0 and the second half to Thread 1,
+    for example.  Instead the two threads would start working on `j=1` and
+    `j=2` and whichever finishes first would start on `j=3`.  This should
+    give a somewhat better balance between threads.
+
+    Note that it can't do a perfect job for this example since computing the
+    error for the last value of `j` (the largest value of `n`)
+    takes  more function evaluations that all the others put together!
+
+#.   In order to improve load balancing, reorder the parallel loop so that
+     `n` is decreasing rather than increasing via::
+
+            do j=size(nvals),1,-1
+
+    Put this in a new version of the `quadrature2` module named `quadrature3`.
+    and provide a main program `test3.f90` to test it.
+    (The same as `test2.f90` but using the new module.)
+    Think about why this is better.
+
+    In this case you might get results like this::
+        
+        Using  4 threads
+        true integral:   6.00136745954910E+00
+          
+               n         approximation        error       ratio
+           12800  6.00136470029559E+00    2.759E-06    0.000E+00
+            6400  6.00135640717688E+00    1.105E-05    2.497E-01
+           25600  6.00136677000212E+00    6.895E-07    0.000E+00
+            1600  6.00118591794817E+00    1.815E-04    3.798E-03
+            3200  6.00132301603504E+00    4.444E-05    2.487E-01
+             800  6.00057196798962E+00    7.955E-04    2.282E-01
+             400  5.99537682567465E+00    5.991E-03    7.419E-03
+             200  5.99787907396672E+00    3.488E-03    2.280E-01
+             100  6.01762134207395E+00    1.625E-02    3.686E-01
+              50  6.00200615142457E+00    6.387E-04    5.462E+00
+           51200  6.00136728718236E+00    1.724E-07    0.000E+00
+          102400  6.00136741645906E+00    4.309E-08    0.000E+00
+          
+        Elapsed time =   0.00621600 seconds
+        CPU time =   0.01550900 seconds
+        fevals by thread  0:         51200
+        fevals by thread  1:        102400
+        fevals by thread  2:         22600
+        fevals by thread  3:         28550
+        Total number of fevals:     204750
+
+    (Can you guess from this which thread got which values of `n`?)
+    Notice that the table is very much out of order in this case, since lines
+    were printed as threads finished their work.
+
+    One could clean up the table by keeping the approximation and error
+    values for each n in a short array and then printing at the end in 
+    the proper order, along with the correct ratios.  But you don't need
+    to do this for the assignment.
+
+.. warning :: An additional problem for 583 students is still to appear.
+
+To submit
+---------
+
+Your homework5 directory should contain:
+
+* functions.f90   (unchanged)
+* quadrature2.f90
+* test2.f90
+* quadrature3.f90
+* test3.f90
+* Makefile (optional if you find it useful to enhance what's provided)
+
+* Some files for 583...
+
+As usual, commit your results, push to bitbucket, and see the Canvas
+course page for the link to submit the SHA-1 hash code.  These should be 
+submitted by the due date/time to receive full credit.
+
+    

notes/homeworks.rst

  * :ref:`homework2`: Wednesday of Week 3, April 17
  * :ref:`homework3`: Wednesday of Week 5, May 1 
  * :ref:`homework4`: Wednesday of Week 6, May 8
- * Homework 5: Wednesday of Week 8, May 22
+ * :ref:`homework5`: Wednesday of Week 8, May 22
  * Homework 6: Wednesday of Week 9, May 29
 
 There will be a "final project" tentatively due on Wednesday, June 12.