Wiki

Clone wiki

SICP-solution / ch01-overview

Chapter Overview

According to the instructor's manual, the main purpose of this chapter is to intruduce procedures as a way of describing computation processes.

Section 1.1

The elements of Programming contains:

  • primitive expression : represent the simplest entities of the language is concerned with
  • means of combination : compound elements are built from simpler ones
  • means of abstraction : compound elements can be named and manipulated as unit

1.1.5 Substitution Model

Applicative order

#!scheme
(sum-of-squares (+ 5 1) (* 5 2))  ;; to evaluated the expression, follow the step below
(+ (square 6) (square 10))
(+ (* 6 6) (* 10 10))
(+ 36 100)
136

Normal order

#!scheme
(sum-of-squares (+ 5 1) (* 5 2))
(+ (square (+ 5 1)) (square (* 5 2)))
(+ (* (+ 5 1) (+ 5 1)) (* (* 5 2) (* 5 2)))
(+ (* 6 6) (* 10 10)
(+ 36 100)
136

Normal order can be viewed as "fully expand and then reduce", compare applicative order as "evaluate the argument and then apply". The interpreter use applicative order evaluation.

Exercise 1.5

Exercise 1.5 act as a good example to show the difference of applicative-order and normal-order.

#!scheme
(define (p) (p))
(define (test x y)
    (if (= x 0) 0 y))
then evaluate (test 0 (p)) can show which order the interpreter actually use. If the interpreter use applicative-order, it will fall into a infinite loop. If the interpreter use an normal-order, the interpreter will return 0.

Explanation as follow, If the interpreter use Applicative-order, then the interpreter's evaluation is as follow:

#!scheme
(test 0 (p))  ; since p evaluate to it self
(test 0 (p))  ; continue
(test 0 (p))
....
(test 0 (p))
If the interpreter work as normal-order, then the evaluation step is like this:
#!scheme
(test 0 (p))  ; expand the expression
(if (= 0 0) 0 (p))
(if #t 0 (p))
0
Notice that if in scheme is actually a special form, a special form is an expression that follows special evaluation rules. For more information, see mit-scheme-special-form for detail.

The above explanation refered Instructor's Manual of SICP and this solution

1.1.7 Square Roots by Newton's Method

The textbook use Newton's method to calculate sqrt of a number. The code was as below:

#!scheme
(define (sqrt-iter guess x)
  (if (good-enough? guess x)
      guess
      (sqrt-iter (improve guess x) x)))

(define (improve guess x)
  (average guess (/ x guess)))

(define (average x y)
  (/ (+ x y) 2))

(define (good-enough? guess x)
  (< (abs (- (square guess) x)) 0.001))

(define (sqrt x)
  (sqrt-iter 1.0 x))

(sqrt 9)
; => 3.00009155413138

Exercise 1.6

Why the described new-if can not replace the if in scheme? Use the new-if described in exercise 1.6 to re-implement sqrt, the result as follow:

#!scheme
(define (new-if predicate then-clause else-clause)
  (cond (predicate then-clause)
    (else else-clause)))

(define (sqrt-iter-bad-if guess x)
  (new-if (good-enough? guess x)
      guess
      (sqrt-iter-bad-if (improve guess x) x)))

(define (sqrt-bad-if x)
  (sqrt-iter-bad-if 1.0 x))

(sqrt-bad-if 9.0)
; => Aborting! maximum recursion depth exceeded

When evaluate (sqrt-bad-if 9.0), we get the result maximum recursion depth exceeded. In fact, this is because, the new-if is a function, and will first evaluate predicate, then-clause, else-clause before apply to the body of the function. So it will not terminated when the guess is good enough, and still evaluate then-clause and else-clause. But if in scheme is a special form, it does not follow the evaluated rule of common function.

According to this page, if form like this:

if predicate consequent [alternative]

predicate, consequent, alternative are expressions. If expression evaluated as follows : first, predicate is evaluated, if it yields a true value, then consequent is evaluated and its value is returned. Otherwise alternative is evaluated and its value is returned.

Notice that not all expression in if form are evaluated, as common scheme function does.

1.1.8 Procedure as Black-Box Abstraction

This section is actually told us the basic of lambda calculus and scope of the scheme programming language. Lambda calculus is actually the mathematical basic of a lot of functional programming.

Lambda Calculus Definition, from wikipedia

  • variables v1, v2, ..., vn, ...
  • the abstraction symbols λ and .
  • parentheses ( )

Lambda expression which is a set, can be defined inductively

  • If x is a variable, then x ∈ Λ
  • If x is a variable and M ∈ Λ, then (λx.M) ∈ Λ
  • If M, N ∈ Λ, then (M N) ∈ Λ

λ abstraction λx.t is anaymous function that is capabile of taking a single input x and substituting it into expression t

Free and Bound variables

The set of free variables of a lambda expression, M, is expressed as FV(m) and is defined by recursion on the structure of the terms, as follows:

  • FV(x) = {x}, where x is a variable
  • FV(λx.M) = FV(M) \ {x}
  • FV(M N) = FV(M) ∪ FV(N)

In the above expression, \ means set minus.

Use a textbook code example to illustrate. The following code:

#!scheme
(define (good-enough? guess x)
  (< (abs (- (square guess) x)) 0.001))

In the definition good-enough? above, guess and x are bound variables, <, -, abs and square are free. It is just like the definition of lambda calculus

Another important point of this section is the illustration of scope.

In my opinion, this page illustrate the idea of lexical scope and dynamatic scope most clearly.

Scoping is how you search for a variable with a given name.

In short, lexical scope search variable by function definition, dynamatic scope search variable by function call relation. So the lexical scope can be done while compile the source code, while dynamtic scope can only be done at runtime. This is reason why lexical scope sometimes was called static scope, because the scoping can be done by the source code.

Some people scope dynamatic scope a lot, and said that dynamatic scope make more bug and is more hard to debug, because the variable search is done by function call relation, while lexical scope will only done by the definition of the function. Here is the section of Richard Stallman, argued that dynamatic scope should not to be the only scope, but just useful for it to be avaiable.

Here I give an concrete example of the difference of lexical scope and dynamatic scope.

#!pascal
program A;
    var K: char;
    K = 'a'
    procedure B;
        print K;
    end

    procedure C;
        var K : char
        K = 'c'
        call(B)
    end
end

If the language is lexical scope, then the call of procedure B will find variable K in the environment procedure B was defined, so it will print out a, if the language is dynamatic scope, the search of variable of K will be at runtime, and will get the binding of procedure C, which call B, and the output will be c.

Lexical Scope was invented by programming language ALGOL, C programming language inherit from Algo, so most programming language nowerdays use lexical scope, Lisp was once use dynamatic scope, another dialect scheme use lexical scope exclusively. Modern Common Lisp both provide lexical scope and dynamatic scope. Emacs Lisp was once only use dynamatic scope, but since version 24.x, Emacs Lisp supported lexical scope.

Section 1.2

The ability to visualize the consequences of the actions under consideration is crucial to becoming an expert programmer, just as it is in any synthetic, creative activity.

This subsection mainly teach serveral programming pattern one will normally use when programming, including linear recursive and iterative, tree recursive, fast exponentiation, fast fibonacci method, great common divisor and primality test.

Fast Exponentiation method

In fact, fast fibonacci method is the fast exponentiation method used in matrix. The procedure as calculate fibonacci number can be show as follows:

equation

If we let

equation

then since the matrix multiplication has associate property, so we can get

equation

This type of speed the computation can be widely use in a lot of problem to speed up an O(n) algorithm to O(lg(n)).

Exercise 1.9

According the books method, it could be calculate that :

equation

then we could get equation. So we could easily implementation the code

#!scheme
(define (fast-fib n)
  (fast-fib-iter 1 0 0 1 n))

(define (fast-fib-iter a b p q count)
  (cond ((= count 0) b)
    ((even? count)
     (fast-fib-iter a
            b
            (+ (square p) (square q))
            (+ (* 2 p q) (square q))
            (/ count 2)))
    (else (fast-fib-iter (+ (* b q) (* a q) (* a p))
                 (+ (* b p) (* a q))
                 p
                 q
                 (- count 1)))))

(fast-fib 0)
;Value: 0
(fast-fib 1)
;Value: 1
(fast-fib 2)
;Value: 1
(fast-fib 3)
;Value: 2
(fast-fib 4)
;Value: 3
(fast-fib 5)
;Value: 5

Greatest Common Divisor

In fact, mit-scheme has built-in gcd function. One thing which is interesting of the greatest common divisor algorithm is that it can be easily to proof that if it requires k step to compute the common divisor, then the smaller number is large than the kth fibonacci number.

It can be easily explained by the following procedure.

equation

Example : Testing for Primality

Proof of Fermet's Little Theorem

Here give a proof of the Fermet's Little Theorem.

equation

The code to implement Fermat prime test is like following :

#!scheme
(define (expmod base exp m)
  (cond ((= exp 0) 1)
    ((even? exp) (remainder (square (expmod base (/ exp 2) m)) m))
    (else
     (remainder (* base (expmod base (- exp 1) m)) m))))

(expmod 3 1000 21)
;Value : 18

(define (fermat-test n)
  (define (try-it a)
    (= (expmod a n n) a))
  (try-it (+ 1 (random (- n 1)))))

(define (fast-prime? n times)
  (cond ((= times 0) true)
    ((fermat-test n) (fast-prime? n (- times 1)))
    (else false)))

(fast-prime? 21 4)

Exercise 1.28

The key idea of Miller-Rabin test to determin in the expmod procedure whether it is a nontrivial square root of n. A nontrivial square root of n is that itself is not equal to 1 or n - 1, but the sqaure of the number is congruent to 1 modulo n.

The following content came from wikiepdia:

When the number n to be tested is small, trying all a < 2(ln n)2 is not necessary, as much smaller sets of potential witnesses are known to suffice. For example, Pomerance, Selfridge and Wagstaff[6] and Jaeschke[7] have verified that

  • if n < 1,373,653, it is enough to test a = 2 and 3;
  • if n < 9,080,191, it is enough to test a = 31 and 73;
  • if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
  • if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
  • if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
  • if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.

Here we give the example code of the Miller-Rabin test

#!scheme
(define (expmod base exp m)
  (define (nontrivial? num n)
    (and (not (= num 1)) (not (= num (- n 1))) (= (remainder (square num) n) 1)))
  (cond ((= exp 0) 1)
    ((even? exp) (if (nontrivial? (expmod base (/ exp 2) m) m)
             (remainder (square (expmod base (/ exp 2) m)) m)
             0))
    (else
     (remainder (* base (expmod base (- exp 1) m)) m))))

(define (miller-rabin-test n)
  (define (try-it a)
    (= (expmod a n n) a))
  (try-it (+ 1 (random (- n 1)))))

(define (fast-prime? n times)
  (cond ((= times 0) true)
    ((miller-rabin-test n) (miller-rabin-test? n (- times 1)))
    (else false)))

(fast-prime? 561 10)
;Value: #f

(fast-prime? 2821 20)
;Value: #f

Section 1.3

1.3.1 Procedure as Arguments

The subject matter of section 1.3 is the expressive power we get from higher-order procedures. In the first part of the textbook, the authors use the example of the three sum example to show that there is a common pattern of the procedure.

#!scheme
(define (cube x) (* x x x))

(define (sum-integers a b)
  (if (> a b)
      0
      (+ a (sum-integers (+ a 1) b))))

(sum-integers 1 100)
;Value: 5050

(define (sum-cubes a b)
  (if (> a b)
      0
      (+ (cube a) (sum-cubes (+ a 1) b))))

(sum-cubes 1 100)
;Value: 25502500

(define (pi-sum a b)
  (if (> a b)
      0
      (+ (/ 1.0 (* a (+ a 2)))
     (pi-sum (+ a 4) b))))

(pi-sum 1 41)
;Value: .38702019080795513

In fact, we could see that all sum-integers, sum-cubes and pi-sum share a common pattern of the following procedure :

#!scheme
(define (sum-common term a next b)
  "term is a procedure calculate at integer a, and next is another procedure
to return the next integer of a"
  (if (> a b)
      0
      (+ (term a)
     (sum-common term (next a) next b))))

Then we can define the procedure sum-integers, sum-cubes and pi-sum in the following ways:

#!scheme
(define (inc n) (+ n 1))
(define (identity x) x)
(define (sum-integers2 a b)
  (sum-common identity a inc b))

(sum-integers2 1 101)
;Value: 5151

(define (sum-cubes2 a b)
  (sum-common cube a inc b))
(sum-cubes2 1 100)
;Value: 25502500

(define (pi-sum2 a b)
  (define (pi-term x)
    (/ 1.0 (* x (+ x 2))))
  (define (pi-next x)
    (+ x 4))
  (sum-common pi-term a pi-next b))

(pi-sum2 1 41)
;Value: .38702019080795513

We can see after we asbstract the common pattern of the prcedure, it is more easier for us to build procedure in an more abstract and clear way. In my view, higher-order make functional programming more easy to reuse code than Object-Oriented Programming. In the article Object Oriented Programming is inherently harmful, there are some criticism of the OOP.

Exercise 1.29

It is difficult to sum over y in the interval from a to b, instead we sum over the integer of k. So we could get the following solution:

#!scheme
(define (integral-simpson f a b n)
  "n must be an even number"
  (define y0 (f a))
  (define yn (f b))
  (define h (/ (* 1.0 (- b a)) n))
  (define (term-simpson ind)
    (* (cond ((or (= ind 0) (= ind n)) 1)
         ((= (remainder ind 2) 1) 4)
         (else 2))
       (f (+ a (* ind h)))))
  (/ (* h (sum-common term-simpson 0 inc n)) 3))

(integral-simpson cube 0 1 100)
;Value: .24999999999999992

(integral cube 0 1 0.01)
;Value: .24998750000000042

Exercise 1.30

Exercise 1.30 illustrate an important concept of Functional Programming -- tail recursive call. Once a time, functional programming language such as Lisp is famous for extremely unefficient.

LISP programmers know the value of everything and the cost of nothing. --Alan Perlis

But in fact, nowerdays Common Lisp compiler can produce almost as efficient code as a C compiler, and use tail recursive techinque, we can get code almost as efficient as iteration.

Tail recursive call is that every recursive call appear are the end of the function, and the function will return immediately after the termination case reached. In practice, a good compiler will generate machine code of tail-recursive call in an iterative manner, and will reuse the call stack every time there is a recursive call while an non tail-recursive call may have a deep stack to record the calling/callee relation of function.

#!scheme
(define (factorial-recursive n)
  (if (= n 0)
      1
      (* n (factorial-recursive (- n 1)))))

(factorial-recursive 10)
;Value: 3628800

(define (factorial-iterative n)
  (define (iter n ans)
    (if (= n 0)
    ans
    (iter (- n 1) (* ans n))))
  (iter n 1))

(factorial-iterative 10)
;Value: 3628800

In the above example, function factorial-recursive is a normal recursive call while factorial-iterative is a tail-recursive call. when we call (factorial-recursive 3), the call is like the following :

#!scheme
(factorial-recursive 3)
(* 3 (factorial-recursive 2))
(* 3 (* 2 (factorial-recursive 1)))
(* 3 (* 2 (* 1 (factorial-recursive 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6

while use the tail-recursive call code factorial-iterative, the interpretor/compiler will actually produce the following efficient code :

call (factorial-iterative 3)
    (call (iter 3 1))
        replace argument with (2 3) jump to "iter"
        replace argument with (1 6) jump to "iter"
        replace argument with (0 6) jump to "iter"
    iter return 6
factorial-iterative return 6

So we can see that the tail-recursive call form actually reuse the top stack by using "jump", so it will produce more efficient code than recursive form, and is equivalent to iterative/loop.

And we could easily implement tail-recursive call just like showed in the example, use an parameter to store the current result to be abled to return immediately when the terminate situation is reached.

Another important topic of this section is to teach us of the important idea of abstraction. The more abstract we write our code, the more bug we will avoid. So I think this is another reason that a lot of good programmer benefit from SICP. Like the following two example:

#!scheme
(define (accumulate combiner null-value term a next b)
  (if (> a b)
      null-value
      (combiner (term a) (accumulate combiner null-value term (next a) next b))))

We can abstract from the procedure sum and product a more general procedure pattern like the above code. Then we can use this more general code to define the procedure sum and product like following:

#!scheme
(define (sum-acc term a next b)
  (accumulate + 0 term a next b))

(define (product-acc term a next b)
  (accumulate * 1 term a next b))

(sum-acc cube 1 inc 10)
;Value: 3025

(sum-acc identity 1 inc 100)
;Value: 5050

(define (fact-acc n)
  (product-acc identity 1 inc n))

(fact-acc 5)
;Value: 120

1.3.2 Constructing Procedure Using Lambda

In general, lambda is used to create procedure in the same way as define, except that no name is specified for the procedure.

From the footnote, we know the term lambda came from the λ-calculus, which provide a rigorous foundation for studying the notions of function and function application, and become a basic tool for mathematical investigations of the semantics of programming language.

1.3.3 Procedures as General Methods

This section and the next together are one of the most fascinating part of the book, including the amazing abstraction example of fixed-point program. Fixed-point is one of the most important idea of λ-calculus. The most famous fixed-point maybe the Y-combinator. The famous Lisper Paul Graham whose company name is called Y-combinator.

Fixed-point

About the definition, one could see the definition fixed-point on Wikipedia. Here I will give some explanation of the textbook, why some fixed-point function converge while some other did not.

Here is the proof:

equation

So according the proof above, we could easily know why the two fixed-point functions in the textbook behave differently. For function f(y) = x / y, we could easily get f'(y) = -x / y^2, and since r the fixed piont of the function, for x = 4, r = 2, and f'(r) = f'(2) = 1. So the function does not converge, for function f(y) = 0.5 * (y + x / y) f'(y) = 0.5(1 - x / y^2), and f'(2) = 0, so the function converge.

Section 1.3.4 Procedures as Returned Values

Procedures as returned values make scheme and all functional-programming language distinguish with procedure programming language like C or some OOP language like java. In this language, sometimes, we can also pass a function to another function such as function pointer in C, but in C you can not return a procedure. The abilitity to return a procedure is the key to make functions the first class citizen, which is the key idea in lambda-calculus.

Experienced programmers know how to choose procedural formulations that are particularly perspicuous, and where useful elements of the process are exposed as separate entities that can be reused in other applications.

We could see from the example of newton-method how powerful this abstraction give us.

#!scheme
(define (deriv g)
  (lambda (x) (/ (- (g (+ x dx)) (g x)) dx)))

(define dx 0.00001)

(define (newton-transform g)
  (lambda (x) (- x (/ (g x) ((deriv g) x)))))

(define (fixed-point-of-transform g transform guess)
  (fixed-point (transform g) guess))

(define (sqrt-2 x)
  (fixed-point-of-transform
   (lambda (y) (/ x y)) average-damp 1.0))

(sqrt-2 5)
;Value: 2.236067977499978

(define (sqrt-3 x)
  (fixed-point-of-transform
   (lambda (y) (- (square y) x)) newton-transform 1.0))

(sqrt-3 5)
;Value: 2.2360679775020436

We can see from the above example, how clear and powerful this type of abstraction give us when we write our code.

Rights and privileges of first-class elements are:

  • They may be named by variables.
  • They may be passed as arguments to procedure.
  • They may be returned as result of procedure.
  • They may be included in data structures.

Updated