Commits

certainty  committed af14266

more work on testing article

  • Participants
  • Parent commits dbcf614

Comments (0)

Files changed (1)

File _posts/2014-04-05-tesing_your_chicken_code.md

      caption: test
    - url: http://wiki.call-cc.org/eggref/4/test-generative
      caption: test-generative
-   - url: https://bitbucket.org/certainty/lisp-unleashed-examples/src/tip/code-breaker/
-     caption: code-breaker-example
 tags: [scheme,test,chicken,testing]
 ---
 
 Hello everybody and welcome back. In this article I attempt to introduce you to the very excellent [test egg](http://wiki.call-cc.org/eggref/4/test), which is a great way to do your unit-testing in CHICKEN.
 
-It will start with a gentle introduction to unit testing in general before we dive into the **test egg** itself. I'll show you a few **best practices** that will help you to benifit most from your test code. After I've outlined the bells and whistles that come with **test**, I'll introduce you to **random testing** on top of **test**. Finally I'll give you some hints on how a useful Emacs setup to do CHICKEN unit tests may look like.
+It will start with a gentle introduction to unit testing in general, before we dive into the **test egg** itself. I'll show you a few **best practices** that will help you to benifit most from your test code. After I've outlined the bells and whistles that come with **test**, I'll introduce you to **random testing** on top of **test**. Finally I'll give you some hints on how a useful Emacs setup to do CHICKEN unit tests may look like.
 
 You can either read this article as a whole, which is what I recommend, or cherry-pick the parts that you're interested in. Both will hopefully work out.
-It will be a very practical article with few theory.
 
 For those of you who are fluent with **test**, there is probably not much new here. Still I'd love you to be my guest and read on. Don't hesitate to come back to me and tell me about things I've missed or things that you do differently that work great. I'm eager to learn about your setup.
 
 
 ### A gentle introduction to testing
 
-You probably heard that testing your code is good practice and that every *serious* software engineer must do it. I do agree with this in general and most software engineers out there do as well. There are different schools though. Some do the tests after their code, some do them before and yet others do them while they flesh out their functions in the REPL. I don't want to tell you that there is only one true way to do it, but there are a few arguments that I'd like to make, that suggest that a particulare school may have advantages.
+You probably heard that testing your code is good practice and that every *serious* software engineer must do it. I do agree with this in general and most software engineers out there do as well. There are different schools though. Some do the tests after their code, some do them before and yet others do them while they flesh out their functions in the REPL. I don't want to tell you that there is only one true way to do it, but there are a few arguments that I'd like to make, that suggest that a particular school may have advantages. At first though, I want to give you a very brief overview
+of what testing gains you.
 
-**What does testing give you?**
+#### What does testing give you?
 
 If it is done right, it gives you a reasonable amount of confidence that the code you're testing works correctly. Your tests act like a specification for the code under test.
-Secondly they give you a safety net that enables you to change your code and still make sure that the code works as expected. This means that you're free to improve your code without having to worry, that you broke some, probably distant, part of the code, while doing so. Closely related to this are regression tests, that are used to detect bugs that have been fixed but which now pop up again, after you have changed some portion of your code. Regression tests are an important part of a test suite.
-Once you discover a bug you generally write a test that reproduces it. This test will be naturally failing as the system/code under test doesn't behave as expected. The next step is to fix this
-bug and make the test pass. This way you have made sure that this particular bug has been fixed.
+Secondly they give you a safety net that enables you to change your code and still make sure that the code works as expected. This means that you're free to refactor your code without having to worry, that you broke some, probably distant, part of the code, while doing so. That also means that someone else, who wants to contribute to your code, which is a fact that must not be underestimated.
+
+Closely related to this are regression tests, that are used to detect bugs that have been fixed but which now pop up again, after you have changed some portion of your code. Regression tests are an important part of a test suite. Once you discover a bug you generally write a test that reproduces it. This test will be naturally failing as the system/code under test doesn't behave as expected. The next step is to fix this bug and make the test pass. This way you have made sure that this particular bug has been fixed.
 This must be contrasted with the sort of tests you use to test your features. While those are
-estimates, testing for bugs and fixing them can act as a proof. Of course there is no rule without an exception. [Bugs tend to come in clusters](http://testingreflections.com/node/7584) and can be grouped into categories or families.
-This means in practice that you may have fixed this particular bug but you're advised to look
+estimates, testing for bugs and fixing them can act as a proof. Of course there is no rule without an exception. [Bugs tend to come in clusters](http://testingreflections.com/node/7584) and can be grouped into categories or families. This means in practice that you may have fixed this particular bug but you're advised to look
 for a generalisation of that bug that might accure elsewhere. Also you likely want to check
 the code that surrounds the part that caused the bug. It has been shown empirically that it is
 likely to contain bugs as well. For a critical view on this theory you might want to have
 a look at [this](http://www.developsense.com/blog/2009/01/ideas-around-bug-clusters).
-Also tests often are a form of documentation. They describe the expected bevaviour of your code and thus give strong hints about how it shall be used.
+
+Also tests often are a form of documentation. They describe the expected bevaviour of your code and thus give strong hints about how it shall be used. Often you find that the documentation
+of a project isn't very well. If it at least has a thorough and good test-suite you can quickly learn the most important aspects of the library.
 
 There are many more testing categories that all have their particular value. The literature is
 full of those and I very much recommend reading some of it. The [reference section](#references) has a few links
 that you may find useful.
 
-**What does testing not give you?**
+#### What does testing not give you?
 
 Tests for features can never prove that your code doesn't contain bugs and is correct. It is only an estimation. You write as much tests as needed to reach a level of confidence that you need.
-This level may be either perceived intuitively or measured. A common way to measure it is a so called, code coverage analysis. This just means that an analyzer runs your tests and checks which code paths they exercise. The result may be used to derive a metric for the developer on when he/she has enough or better good enough tests. But all this is not nearly a proof for the absence of bugs or misbehaviour.
-This is what tests normally can't give you, possibly with the exception of regression tests.
-So having great many tests that verify features of your application is comforting and all, but be assured that there will be a time
-when a bug pops up in your application. This means that all your tests didn't do anything to prevent this bug. You're on your own now.
+This level may be either perceived intuitively or measured. A common way to measure it is a so called, code coverage analysis. This just means that an analyzer runs your tests and checks which code paths they exercise. The result may be used to derive a metric for the developer on when he/she has good enough tests. This approach has some flaws though and a 100% coverage says nothing about
+the quality of your tests. You can easily see that you can have tests that execute all of your code paths but simply do not verify their outputs. In this case you have 100% coverage, but actually
+0 confidence that the code is correct.
+
+While code coverage gives you a qualitive measure of your test code there is also a quantitive measure. That is the code to test ratio. It's a simple as it can be, it just tells you the proportion of your code and tests. Most people tend to agree that a ratio of 1:2 is about good. That means you have twice as much tests as you've got actual code. In my oppinion that very much depends on the kind of project. If you happen to have many internal helper procedures and very few procedures that belong to the public API, then you most likely won't reach that ratio. If your code is mostly public API though that it may be actually close to the truth. Each procedure is likely to have at least two tests. Again my advice is not to use that as an absolute measure but only as a guideline on to verify that you're on the right track.
+
+Let's resume after that little detour.
+
+Another aspect that must be emphasized is that tests can never prove the absence of bugs, possibly with the exception of regression tests. If tests have been written **after** a certain bug accured you have a high probability that this specific bug has been fixed. Apart from these though, there is by no means a proof that tests can give you of the correctness of your code.
+
+Tests are not a silver bullet and are not a replacement for good design and solid software engineering skills. Having great many tests that verify features of your application is comforting and all, but be assured that there will be a time when a bug pops up in your application. This means that all your tests didn't do anything to prevent this bug. You're on your own now.
 Now you actually have to understand your system, reason about it and figure out what went wrong. This is another crucial part of developing
-an application. You must make sure that you have a system that you can actually understand and reason about. Tests can help to develop such a system, as it has been shown that software that is easy to test is often also [simpler](http://www.infoq.com/presentations/Simple-Made-Easy) more focused and easier to comprehend.
+an application. You must make sure that you have a system that you can actually understand and reason about. Tests can help to develop such a system, as it has been shown that software that is easy to test is often also [simpler](http://www.infoq.com/presentations/Simple-Made-Easy), more focused and easier to comprehend.
 
+#### If testing is that great, why do some people still don't do it?
 
+I can't give you a universal answer to this, as there is probably a great variety of reasons, which might or might not be sensible. I've heard some reasons repeatedly though.
 
-**Ok, I want to test. How do I do it?**
+* **It is more work then just writing your application code**
 
-While there is value in doing manual testing in the REPL, while you develop a certain function, you really also want a suite of **automated tests**. Automated means in practice that you have written
+  This one is true. Writing tests is an investment. It does cost more time, more money, more ene  rgy etc. But as with all good investments, they better pay off in the end. It turns out that
+  most of the time this is indeed the case. The longer a project exists the more often you or someone else comes back to your code and changes it. This involves fixing bugs, adding new features, improving performance, you name it. For all those cases, you will spend significantly less time
+  if you have a test-suite that helps you to ensure that all those changes didn't break anything.
+
+* **It's hard to break the thing that you just carefully built**
+
+  It's just not fun to try to destroy what you just build. Suppose you build a procedure that has
+  been really hard to accomplish. Now you're supposed to find a possible invokation in which it
+  misbehaves. If you succeed you will have to get back at it and fix it. Which will again be very hard eventually. There is an inner barrier, that subconciously holds you back. I think we all   agree that having found this misbehavior is better than keeping it burried, but the back of your
+  mind, might see this slightly different, especially when it's friday afternoon at 6pm.
+
+* **It's not fun**
+
+  I don't agree with that one, but I have heard that many times. I think that is possibly the
+  consequence of the points above. If you create a mindset where tests are actually part of your
+  code and are first class citizens of your project, then I think tests are at least as fun as the
+  application code.
+
+Of course there may be many more reasons. Just take these as an excerpt.
+
+
+#### Ok, I want to test. How do I do it?
+
+While there is value in doing manual testing in the REPL or by executing your application by hand, you really also want a suite of **automated tests**. Automated means in practice that you have written
 code that tests your application. You can run these tests and the result will tell you if and which tests have failed or passed. This makes your tests reproducable with minimum effort. You want to develop this test suite as you develop your application. If you test before your actual code or after is really up to you. There is one thing though that I want to point out. There is a general problem with tests, well a few of those but one is particulary important now: How do you make sure that your test code is correct? It doesn't make much sense to put trust in your code because
 of your shiny test-suite when the tests in there are incorrect. This means they pass but shouldn't or they don't pass but really should. While you could write tests for your tests, you may immediatly see that this is a recursive problem and might lead to endless tests testing tests testing tests ....
 
 
 **«Always think about the value of the tests»**
 
-Don't write tests just because someone said you must. Don't write tests that don't improve the trust in your system. This can be a difficult decision. Test code is code just like other your application code. It has to be written and maintained.
+Don't write tests just because someone said you must. Don't write tests that don't improve the trust in your system. This can be a difficult decision. Test code is code just like your application code. It has to be written and maintained.
 
 **«Think about interfaces not implementation»**
 
-This means that your tests should not need to know about the internals of your function. You should just run them against your interface
+This means that your tests should not need to know about the internals of the procedures. You should just run them against your interface
 boundaries. Doing so enables you to change your implementation and yet have your test-suite telling the truth about your system.
 
 **«Keep your tests focused»**
 bad thing as it makes your tests unpredictable. One way to automatically detect these kinds of dependencies is to randomize the
 order in which tests are executed. This is useful as sometimes you're simply not aware of one test depending on another.
 
+**«Keep your tests simple»**
+Naturally tests are a critical part of your system. They are the safety net. You don't want them to contain bugs. Keeping them simple also means that it is easier to make them correct. Secondly they are easier to comprehend. Testcode should state as clear as possible what it is supposed to do.
+
+
+**«Keep your tests fast»**
+This turns out to be a crucial feature of your test suite as well. If your tests are slow they will disrupt your workflow. Ideally testing and writing code is smothely intertwined. You test a little, then you code a little, then you repeat. If you have to wait for a long time for your tests to finish there will be some point where you don't run them regulary anymore. Of course you can trim down your test-suite to just the tests that are currently important, but after you've finished the implementation of a particular procedure you will likely want to run the entire suite.
 
 These are all just general guidelines that apply to unit-tests in general. There are specific Do and Don'ts that apply to other kinds
 of tests that I don't want to cover here. I hope this little introduction gave you enough information to go on with the rest of the article and you now have a firm grip of what I'm going to be talking about.
 
-
 ### Putting the test egg to work
 
-Ok good, you're still here and not bored away by the little introduction. This is finally where the fun starts and we will be seeing
-actual code. I'll show you how to obtain and use the very excellent testing facility that is the current defacto standard test library
-for CHICKEN. We will be using a very tiny project that implements the score-breaker game. I've borrowed the idea from the excellent book: [Programming Clojure](http://pragprog.com/book/shcloj/programming-clojure), so thanks to the authors.
+You're still here and not bored away by the little introduction. Very good since this is finally where the fun starts and we will be seeing actual code. CHICKEN is actually a good environment to do testing. Almost every egg is covered by unit tests and within the community there seems to be a general agreement that tests are useful. Additionally tests for CHICKEN extensions are encouraged particulary. We have a great continous integration (CI) setup, that will automatically run the unit tests of your eggs, even on different platforms and CHICKENS. You can find more information on [tests.call-cc.org](http://tests.call-cc.org/). I'll tell you a little more about this later. For now just be assured that you're in good company.
 
-You can find the entire code for this project at [score-breaker-example](https://bitbucket.org/certainty/lisp-unleashed-examples/src/tip/score-breaker).
+Let's continue our little journey now. We'll be implementing the well known [stack](https://en.wikipedia.org/wiki/Stack_(abstract_data_type) and build a suite of unit tests for it. This is a fairly simple task and allows us to concentrate on the tests. You can find all the code that is sused here at:
 
-Let's start with the first thing you need. You obviously need [CHICKEN](https://http://code.call-cc.org/) and you need the test egg.
-You can obtain it with the lovely **chicken-install**. I assume you're familiar with it but I'll give you
-the command line anyway.
+#### Prerequisites
+
+You obviously need [CHICKEN](https://code.call-cc.org/) and you need the [test egg](https://wiki.call-cc.or/eggref/4/tess).
+You can obtain it with the lovely **chicken-install**. I assume you're familiar with it but I'll give you the command line anyway.
 
 ~~~ bash
  $ chicken-install test
 ~~~
 
-
 #### The project layout
 
 Once you've installed the test egg you can have a closer look at the directory layout of our example project.
-There is the top-level directory **code-breaker** which holds one scheme source file named **code-breaker.scm**. This is where your application code goes. Furthermore you'll notice that there is a directory called **tests** which holds a single file named **run.scm**. The entire layout looks like this:
+There is the top-level directory **stack** which holds one scheme source file named **stack.scm**. This is where your application code goes. Furthermore you'll notice that there is a directory called **tests** which holds a single file named **run.scm**. The entire layout looks like this:
 
 <pre>
- code-breaker
- | - code-breaker.scm
+ stack
+ | - stack.scm
  | - tests
      | - run.scm
 </pre>
 and structure their files differently but the majority of projects look like this, so it is a good practice
 to follow it. You may noticed that this is also the standard layout of CHICKEN eggs. They contain egg specific files like *.release-info, *.meta and *.setup but appart from that, they look very much like this. Another reason to arrange your tests the way I showed you is that CHICKEN's CI at [salmonella](https://tests.call-cc.org) expects this layout. You can benefit from this service once you follow this convention. It's time to give **mario** and **peter** a big thank you, as they made it possible.
 
-**The game**
+#### Basic layout of the test file
 
-Back to our project you might want to have a look at the source code to understand what it actually does. I will take the opportunity to give you a little explanation here though. As I already said it implements the code-breaker game.
-The game constists of the code-maker (the program) that generates a random code and a code-breaker (the player) that tries to
-guess this code. The code is a tuple of colored pegs. Once the code-breaker submitted a guess the program scores the guess
-like so:
-
-* One black peg for each peg of the right color in the right position
-* One white peg for each peg ot the right color but not in the right position
-
-The game ends when either the code was guessed correctly or a predetermined amount of tries has been exceeded.
-
-**The code**
-Here we present the code that is needed to implement the code breaker. It is almost literally translated from clojure
-to scheme. Please bear with me as I introduced some inaccuracies. Clojure maps don't map to scheme's alists but I still chose
-this representation. This is all not important though as we want to concentrate on the tests. The following code snippets
-outlines the code we want to concentrate on. It uses some helpers that you can look up in the repository, which are not
-important right now.
-
-~~~ clojure
-;; how many pegs in code are also in guess taking into account the position
-(define (exact-matches code guess)
-  (length (remove (cut not <>) (list-diff code guess))))
-
-;; how many pegs of code are in guess not taking into account the position
-(define (unordered-matches code guess)
-  (let ((f1 (select-keys (frequencies code) guess))
-        (f2 (select-keys (frequencies guess) code)))
-    (merge-with min f1 f2)))
-
-;; finally we can welcome our score function
-(define (score code guess)
-  (let ((exact (exact-matches code guess))
-        (unordered (apply + (map cdr (unordered-matches code guess)))))
-    (cons exact (- unordered exact))))
-~~~
-
-The main **interface** and thus the subject to our tests will be the **score-procedure**. It takes a list containing the code
-and a list containing the guess as arguments and produces a pair with the amount of exact matches in the car and the amount
-of unordered matches in the cdr.
-
-**The tests**
-
-Now let's finally write some tests for the given piece of code. We start with the template for every test file.
+Let's dive in now and start writing our first tests. For this purpose we're going to add a little skeleton to tests/run.scm, so that it looks like this.
 
 ~~~ clojure
  (use test)
 
- (test-begin "code-breaker")
- (test-end "code-breaker")
+ (test-begin "stack")
+ (test-end "stack")
 
  (test-exit)
 ~~~
 with a status code that indicates the status of your tests. If there have been any tests failing it will return with a non-zero
 status code, which can be passed as an argument to the procedure and defaults to one. Zero will be the status code if all tests have passed.
 
-Now let's add some smoke tests that are there to verify that our score procedure behaves as expected.
+
+
+
+We'll start by adding the procedure that we obviously need at the beginning. We want a way to create an empty stack. I'll start with a test for it.
 
 ~~~ clojure
- (use test)
- (load "../code-breaker.scm")
+(use test)
+(load "../stack.scm")
 
- (test-begin "code-breaker")
+(test-begin "stack")
+(test "make-stack creates an empty stack"
+   #t
+   (stack-empty? (make-stack)))
+(test-end "stack")
 
-  (test-group "score"
-    (test "empty guess"
-      '(0 . 0)
-       (score '(r g g r) '()))
-
-    (test "empty code"
-      '(0 . 0)
-       (score '() '(r g g r)))
-
-    (test "exact matches only"
-      '(2 . 2)
-       (score '(r g g b) '(r g b g)))
-
-    (test "exact matches don't show up in unordered matches"
-      '(2 . 2)
-       (score '(r g g b) '(r b g y)))
-
-    (test "unordered matches only"
-      '(0 . 2)
-       (score '(r g g b) '(b r y y))))
-
-    (test "ordered and unordered matches"
-      '(1 . 3)
-       (score '(r g g b) '(r b b g)))
-
- (test-end "code-breaker")
-
- (test-exit)
+(test-exit)
 ~~~
 
-Ignore the test-group form for now and have a close look at the various invocations of the **(test)** form.
-As you may have noticed it takes multiple arguments. The first argument is a description string that gives a hint about
-what this particalar test attempts to verify. The next argument is the **expected value**. It can be any scheme value.
-The last argument is the scheme expression that shall be tested. It will be evaluated and compared with the **expected value**.
-This is actually the long form. You can get by with the shorter form that omits the description string.
+Let's have a closer look at this now. You see a new form there:
+
+ **(test description expected expression)**
+
+It takes multiple arguments. The first argument is a description string that gives a hint about
+what this particalar test attempts to verify. The next argument is the **expected value**. It can be any scheme value. The last argument is the scheme expression that shall be tested. It will be evaluated and compared with the **expected value**.
+This is actually the long form. You can get by with the shorter form that omits the description string like so:
 
 ~~~ clojure
  (test 3 (+ 1 2))
 You just have to create a form that reads nicely. Let me clarify this:
 
 ~~~ clojure
- (test-assert (member 3 (list 1 2 3)))
+ (test #t (member 3 (list 1 2 3)))
 ~~~
 
 This will generate a description like this when you run the tests.
 
 <pre>
-(member 3 (list 2 3 4)) .............................................. [ PASS]
+(member 3 (list 2 3 4)) .............................................. [<span style="color:green"> PASS</span>]
 </pre>
 
-This is pretty expressive, even though we didn't supply a proper description. The **test-assert** form can be used
-whenever you'd otherwise write tests like the following
+
+Ok, going back to the example above. I've added a little test that attempts to verify
+that a stack that has been created with make-stack is initially empty.
+Let's run the tests now. You can do this by changing into the tests directory and running
+the file with csi.
+
+<pre>
+  cd tests
+  csi -s run.scm
+</pre>
+
+The output looks like this:
+
+<pre>
+make-stack creates an empty stack .................................... [<span style="color:red">ERROR</span>]
+
+Error: unbound variable: stack-empty?
+    (stack-empty? (make-stack))
+1 test completed in 0.001 seconds.
+<span style="color:red">1 error (100%).</span>
+0 out of 1 (0%) tests passed.
+-- done testing stack --------------------------------------------------------
+</pre>
+
+As you can see the test output indicates that something went wrong. There red ERROR clearly indicates this. The text following it shows the details that make things clearer. It tells us that
+we attempted to use procedure that doesn't actually exist. This makes perfect sense since we didn't write any code yet. That's easy enough to mitigate.
 
 ~~~ clojure
- (test #t ....)
+(define-record-type stack (create-stack elements) stack? (elements stack-elements stack-elements-set!))
+(define (stack-empty? stack) #t)
+(define (make-stack . elements) (create-stack (reverse elements))
 ~~~
 
-Ok, going back to the example above. First I've written two tests that handle common edge cases, namely when one of the input is empty. The next test cases simply test a very specific aspect of the score function. I want to make sure that it recognizes exact matches. Also I want to make sure that the exact matches are not counted twice in the unordered matches. This is pretty good so far as we have a couple of cases covered. These tests should all pass now as the following output shows.
+I've added the minimal amount of procedures that are needed to remove the error eventually.
+Please note that I've chosen to represent the stack as a list internally.
 
 <pre>
--- testing code-breaker ------------------------------------------------------
-
-    -- testing score ---------------------------------------------------------
-    empty guess ...................................................... [ PASS]
-    empty code ....................................................... [ PASS]
-    exact matches only ............................................... [ PASS]
-    exact matches don't show up in unordered matches ................. [ PASS]
-    unordered matches only ........................................... [ PASS]
-    ordered and unordered matches .................................... [ PASS]
-    6 tests completed in 0.002 seconds.
-    6 out of 6 (100%) tests passed.
-    -- done testing score ----------------------------------------------------
-
-1 subgroup completed in 0.007 seconds.
-1 out of 1 (100%) subgroup passed.
--- done testing code-breaker -------------------------------------------------
+-- testing stack -------------------------------------------------------------
+make-stack creates an empty stack .................................... [ <span style="color:green">PASS</span>]
+1 test completed in 0.0 seconds.
+<span style="color:green">1 out of 1 (100%) test passed.</span>
+-- done testing stack --------------------------------------------------------
 </pre>
 
-Before we go any further we want to make sure that our score function only accepts arguments of a distinct type. If one
-of the arguments is not a list it shall signal an error. Let's express this in our tests.
+This looks better. You can see that all tests we've written are now passing, as indicated by the green PASS on the right side. We've written anough code to make the tests pass, but it's easy to
+see that these tests are lieing. stack-empty? always returns #t regardless of the content of a given stack. So le's add a test that verifies that a non-empty stack is indeed not empty. Our make-stack procedure allows us to specify initial elements of the stack so we have all we need to create our tests.
 
 ~~~ clojure
-(test-begin "code-breaker")
-  (test-group "score"
-    ;; .... tests as above
-    (test-error "guess must be a list" (score '() #f))
-	(test-error "code must be a list" (score #f '())))
-(test-end "code-breaker")
+(use test)
+(load "../stack.scm")
+
+(test-begin "stack")
+(test "make-stack creates an empty stack"
+   #t
+   (stack-empty? (make-stack)))
+
+(test "make-stack with arguments creates a non-empty stack"
+   #f
+   (stack-empty? (make-stack 'one 'two)))
+
+(test-end "stack")
 
 (test-exit)
 ~~~
 
+Running these tests reveals the following:
+
 <pre>
--- testing code-breaker ------------------------------------------------------
-
-    -- testing score ---------------------------------------------------------
-    empty guess ...................................................... [ PASS]
-    empty code ....................................................... [ PASS]
-    exact matches only ............................................... [ PASS]
-    exact matches don't show up in unordered matches ................. [ PASS]
-    unordered matches only ........................................... [ PASS]
-    ordered and unordered matches .................................... [ PASS]
-    guess must be a list ............................................. [ PASS]
-    code must be a list .............................................. [ PASS]
-    8 tests completed in 0.003 seconds.
-    8 out of 8 (100%) tests passed.
-    -- done testing score ----------------------------------------------------
-
-1 subgroup completed in 0.009 seconds.
-1 out of 1 (100%) subgroup passed.
--- done testing code-breaker -------------------------------------------------
+-- testing stack -------------------------------------------------------------
+make-stack creates an empty stack .................................... [ <span style="color:green">PASS</span>]
+make-stack with arguments creates a non-empty stack .................. [ <span style="color:red">FAIL</span>]
+    expected #f but got #t
+    (stack-empty? (make-stack 'one 'two))
+2 tests completed in 0.002 seconds.
+<span style="color:red">1 failure (50.0%).</span>
+1 out of 2 (50.0%) test passed.
+-- done testing stack --------------------------------------------------------
 </pre>
 
-**Please pay close attention to the output.** They **succeed**, all of them!
-How can that be? We didn't even implement the logic yet. This is a good example why it is good to write your tests first. If we had written the code before we would've never noticed that the tests succeed even without the proper implementation, which pretty much renders these particular tests useless for this case. They do more harm than good because they lie to you and you most likely
-believe them. Obviously just checking that an error accured is not enough. We should verify that a particular error has accured.
-The test library doesn't provide a procdure or macro that does this so we have to come up with our own. We need a way to tell
-if and which condition has been signaled in a given expression. For this purpose I'll add a little helper to the very top
+The output tells us that one of our tests has passed and one has failed. The red FAIL indicates that an assertion didn't hold. I this case stack-empty? returned #t for the non-empty stack. This is expected as stack-empty? doesn't do anything useful yet. This is the last possible result-type of a test. Contrast a FAIL with ERROR please. ERROR indicates that a condition has been signaled wheras FAIL indicates that an assertion did not hold.
+Let's quickly fix this and make all tests pass. stack.scm now looks like this:
+
+~~~clojure
+(define-record-type stack (create-stack elements) stack? (elements stack-elements stack-elements-set!))
+(define (stack-empty? stack) (null? (stack-elements stack)))
+(define (make-stack . elements) (create-stack (reverse elements)))
+~~~
+
+Running the tests for these definitions results in the following output:
+
+<pre>
+-- testing stack -------------------------------------------------------------
+make-stack creates an empty stack .................................... [ <span style="color:green">PASS</span>]
+make-stack with arguments creates a non-empty stack .................. [ <span style="color:green">PASS</span>]
+2 tests completed in 0.002 seconds.
+<span style="color:green">2 out of 2 (100%) tests passed.</span>
+-- done testing stack --------------------------------------------------------
+</pre>
+
+Very good all tests are passing. We're in the green. Let's take the opportunity and refactor the test-code a bit. The first test asserts that the outcome of the procedure invocation is the boolean #t. Whenever you find yourself writing tests that look like **(test description #t code)**, then you might want to take the shorter **(test-assert)** form. Let's quickly do this in the test file.
+
+~~~clojure
+(use test)
+(load "../stack.scm")
+
+(test-begin "stack")
+
+(test-assert "make-stack creates an empty stack"
+   (stack-empty? (make-stack)))
+
+(test "make-stack with arguments creates a non-empty stack"
+   #f
+   (stack-empty? (make-stack 'one 'two)))
+
+(test-end "stack")
+
+(test-exit)
+~~~
+
+That reads a bit nicer. As every good refactoring, this one didn't change the semantic of our tests and consequently it didn't change the output that is generated, so I leave it out right now.
+
+There are some more procedures that are needed to make the stack actually useful. Let's continue by implementing **push** which will allow us to add a single value to the stack.
+
+~~~clojure
+(use test)
+(load "../stack.scm")
+
+(test-begin "stack")
+
+(test-assert "make-stack creates an empty stack"
+   (stack-empty? (make-stack)))
+
+(test "make-stack with arguments creates a non-empty stack"
+   #f
+   (stack-empty? (make-stack '(one two))))
+
+(test-group "stack-push!"
+  (test #f (stack-empty? (stack-push! (make-stack) 'item))))
+
+(test-end "stack")
+
+(test-exit)
+~~~
+
+You'll notice that I not only added a new test for **stack-push** but also introduced the form (test-group) which you don't know yet. This form allows you to group related tests into a named context. Every group runs the tests it contains and finishes them with a status report that tells you how many of the tests have failed and haw many have passed etc. I've added the group "stack-push" that will hold all tests that are needed to cover the stack-push procedure. While we're at it let's also create a group for make-stack. The test file now looks like this:
+
+~~~clojure
+(use test)
+(load "../stack.scm")
+
+(test-begin "stack")
+
+(test-group "make-stack"
+  (test-assert "without arguments creates an empty stack"
+     (stack-empty? (make-stack)))
+
+  (test "with arguments creates a non-empty stack"
+     #f
+     (stack-empty? (make-stack '(one two)))))
+
+(test-group "stack-push!"
+  (test #f (stack-empty? (stack-push! (make-stack) 'item))))
+
+(test-end "stack")
+
+(test-exit)
+~~~
+
+The output that is generated reads like this:
+
+<pre>
+-- testing stack -------------------------------------------------------------
+
+    -- testing make-stack ----------------------------------------------------
+    without arguments creates an empty stack ......................... [ <span style="color:green">PASS</span>]
+    with arguments creates a non-empty stack ......................... [ <span style="color:green">PASS</span>]
+    2 tests completed in 0.0 seconds.
+    <span style="color:green">2 out of 2 (100%) tests passed.</span>
+    -- done testing make-stack -----------------------------------------------
+
+
+    -- testing stack-push! ----------------------------------------------------------
+    (stack-empty? (stack-push! (make-stack) 'item)) ................... [<span style="color:red">ERROR</span>]
+
+Error: unbound variable: stack-push!
+    1 test completed in 0.0 seconds.
+    <span style="color:red">1 error (100%).</span>
+    0 out of 1 (0%) tests passed.
+    -- done testing stack-push! -----------------------------------------------------
+
+2 subgroups completed in 0.007 seconds.
+1 out of 2 (50.0%) subgroup passed.
+-- done testing stack --------------------------------------------------------
+</pre>
+
+Look how groups are nicely formatted and seperate your test output into focused chunks that
+deal with one aspect of your API. Of course we see an ERROR indicating a condition as we didn't
+yet implement the **stack-push!** procedure. Let's fix this now.
+
+~~~clojure
+ (define-record-type stack (create-stack elements) stack? (elements stack-elements stack-elements-set!))
+ (define (stack-empty? stack) (null? (stack-elements stack)))
+ (define (make-stack . elements) (create-stack (reverse elements)))
+
+ (define (stack-push! stack item)
+   (stack-elements-set! stack (cons item (stack-elements stack)))
+   stack)
+~~~
+
+With these definitions all of our tests pass and we're back in the green. I'll fast forward now and show you the code and the tests that cover a little bit more of the API.
+The test file now looks like this:
+
+~~~clojure
+ (use test)
+ (load "../stack.scm")
+
+ (test-begin "stack")
+
+ (test-group "make-stack"
+   (test-assert "without arguments creates an empty stack"
+      (stack-empty? (make-stack)))
+
+   (test "with arguments creates a non-empty stack"
+      #f
+      (stack-empty? (make-stack '(one two)))))
+
+ (test-group "stack-push!"
+   (test #f (stack-empty? (stack-push! (make-stack) 'item)))
+   (test "pushing an item makes it the new top item"
+       'two
+        (let ((stack (make-stack 'one)))
+          (stack-top (stack-push! stack 'two)))))
+
+ (test-group "stack-top"
+   (test "returns the only element for a stack with one element"
+      'one
+      (let ((stack (make-stack 'one)))
+        (stack-top stack)))
+   (test "returns thet top-most element"
+      'two
+      (let ((stack (make-stack 'one 'two)))
+        (stack-top stack))))
+
+ (test-end "stack")
+
+ (test-exit)
+~~~
+
+The code look like this now:
+
+~~~clojure
+(define-record-type stack (create-stack elements) stack? (elements stack-elements stack-elements-set!))
+(define (stack-empty? stack) (null? (stack-elements stack)))
+(define (make-stack . elements) (create-stack (reverse elements)))
+
+(define (stack-push! stack item)
+ (stack-elements-set! stack (cons item (stack-elements stack)))
+ stack)
+
+(define (stack-top stack)
+  (car (stack-elements stack)))
+~~~
+
+These tests all pass sofar and we've added a few more tests for the stack-top API.
+Let's take a closer look at that procedure. It behaves well when the stack is non-empty, but what should happen if the stack is empty? Let's just signal a condition that indicates that taking
+the top item of an empty stack is an error. The test egg gives us another form that allows
+us to assert that a condition has been signaled. Let's see what this looks like.
+
+~~~clojure
+(use test)
+(load "../stack.scm")
+
+(test-begin "stack")
+
+(test-group "make-stack"
+  (test-assert "without arguments creates an empty stack"
+     (stack-empty? (make-stack)))
+
+  (test "with arguments creates a non-empty stack"
+     #f
+     (stack-empty? (make-stack '(one two)))))
+
+(test-group "stack-push!"
+  (test #f (stack-empty? (stack-push! (make-stack) 'item)))
+  (test "pushing an item makes it the new top item"
+      'two
+       (let ((stack (make-stack 'one)))
+         (stack-top (stack-push! stack 'two)))))
+
+(test-group "stack-top"
+  (test "returns the only element for a stack with one element"
+     'one
+     (let ((stack (make-stack 'one)))
+       (stack-top stack)))
+  (test "returns thet top-most element"
+     'two
+     (let ((stack (make-stack 'one 'two)))
+       (stack-top stack)))
+  (test-error "taking the top item from an empty stack is an error"
+     (stack-top (make-stack))))
+
+(test-end "stack")
+
+(test-exit)
+~~~
+
+The last test in the test-group "stack-pop" attempts codify our assertion. Let's see
+what the output of this looks like. Instead of just invoking the tests normally like we did before I want to show you another feature of **test** that comes in handy. As we're currently
+working on the implementation of **stack-pop** we're currently not interested in the result of the other tests and would like to omit them. We can do so by applying a test filter. Take a look:
+
+<pre>
+TEST_FILTER="empty stack is an error" csi -s run.scm
+</pre>
+
+This will only run the tests which include the given text their description. This can actually be a regular expression, so it is much more versatile than it appears now. There is also the variable TEST_GROUP_FILTER which allows you to run only test-groups that match the filter. Howver in the current implementation of tests it seems not to be possible to filter groups within other groups. So setting TEST_GROUP_FILTER="stack-top" doesn't currently work. It will not run any tests since the filter doesn't match the surrounding group "stack". It would be a nice addition though.
+
+The output with the filter expression looks like this:
+
+<pre>
+-- testing stack -------------------------------------------------------------
+    -- done testing make-stack -----------------------------------------------
+
+    -- done testing stack-push! ----------------------------------------------
+
+
+    -- testing stack-top -----------------------------------------------------
+    taking the top item from an empty stack is an error .............. [ <span style="color:green">PASS</span>]
+    1 test completed in 0.0 seconds (2 tests skipped).
+    1 out of 1 (100%) test passed.
+    -- done testing stack-top ------------------------------------------------
+
+3 subgroups completed in 0.007 seconds.
+<span style="color:green">3 out of 3 (100%) subgroups passed.</span>
+-- done testing stack --------------------------------------------------------
+</pre>
+
+**Please pay close attention to the output.** The test passes!
+How can that be? We didn't even implement the part of the code yet that signals an error in the case of an empty stack. This is a good example why it is good to write your tests first. If we had written the code after we would've never noticed that the tests succeed even without the proper implementation, which pretty much renders this particular test useless. It does more harm than good because it lies to you. This test passes because it is an error to take the **car** of the empty list. Obviously just checking that an error accured is not enough. We should verify that a particular error has accured. The test library doesn't provide a procdure or macro that does this so we have to come up with our own. We need a way to tell if and which condition has been signaled in a given expression. For this purpose I'll add a little helper to the very top
 of the test file and update the tests to use that little helper.
 
 ~~~ clojure
+ (use test)
+(load "../stack.scm")
 
 (define-syntax condition-of
   (syntax-rules ()
        (or (handle-exceptions exn (map car (condition->list exn)) code #f)
            '())))))
 
- (test "it expects guess to be a list"
-   '(exn code-breaker invalid-argument)
-    (condition-of (score '() #f)))
+(test-begin "stack")
 
- (test "it expects code to be a list"
-   '(exn code-breaker invalid-argument)
-    (condition-of (score #f '())))
+; ... other tests
+
+(test-group "stack-top"
+  (test "returns the only element for a stack with one element"
+     'one
+     (let ((stack (make-stack 'one)))
+       (stack-top stack)))
+  (test "returns thet top-most element"
+     'two
+     (let ((stack (make-stack 'one 'two)))
+       (stack-top stack)))
+  (test "taking the top item from an empty stack is an error"
+     '(exn stack empty)
+      (condition-of (stack-top (make-stack)))))
+
+(test-end "stack")
+
+(test-exit)
 ~~~
 
 With these definitions, let's see now if our tests fail. Running the tests reveals:
 
 <pre>
--- testing code-breaker ------------------------------------------------------
+-- testing stack -------------------------------------------------------------
+    -- done testing make-stack -----------------------------------------------
 
-    -- testing score ---------------------------------------------------------
-    empty guess ...................................................... [ PASS]
-    empty code ....................................................... [ PASS]
-    exact matches only ............................................... [ PASS]
-    exact matches don't show up in unordered matches ................. [ PASS]
-    unordered matches only ........................................... [ PASS]
-    ordered and unordered matches .................................... [ PASS]
-    it expects guess to be a list .................................... [ FAIL]
-        expected (exn code-breaker invalid-argument) but got (exn type)
-    (condition-of (score '() #f))
-    it expects code to be a list ..................................... [ FAIL]
-        expected (exn code-breaker invalid-argument) but got (exn type)
-    (condition-of (score #f '()))
-    8 tests completed in 0.003 seconds.
-    2 failures (25.0%).
-    6 out of 8 (75.0%) tests passed.
-    -- done testing score ----------------------------------------------------
+    -- done testing stack-push! ----------------------------------------------
 
-1 subgroup completed in 0.01 seconds.
-0 out of 1 (0%) subgroups passed.
--- done testing code-breaker -------------------------------------------------
+
+    -- testing stack-top -----------------------------------------------------
+    taking the top item from an empty stack is an error .............. [ <span style="color:red">FAIL</span>]
+        expected (exn stack empty) but got (exn type)
+    (condition-of (stack-top (make-stack)))
+    1 test completed in 0.0 seconds (2 tests skipped).
+    <span style="color:red">1 failure (100%).</span>
+    0 out of 1 (0%) tests passed.
+    -- done testing stack-top ------------------------------------------------
+
+3 subgroups completed in 0.007 seconds.
+2 out of 3 (66.7%) subgroups passed.
+-- done testing stack --------------------------------------------------------
 </pre>
 
 Aha! We have two failing tests saying that we were expecting a condition of type **(exn code-breaker invalid-argument)**
 
 This looks very good. We have added tests for this case and while doing so we introduced a nice little helper to handle
 specific kinds of conditions. This is the usual way to do it. The test egg provides us with all the primitives that are needed
-to build on. It does not attempt to solve every possible problem for us. This is very much in alignment with the [prime clingerism](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.5197) (Hi Peter!).
+to build on. It does not attempt to solve every possible problem for us. This is very much in the spirit of scheme and the [prime clingerism](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.5197) (Greetings to Peter)
 I want to stop for a moment with these tests and tell you something about the configuration options you have when you use the
 test egg. We will come back to our little toy project later, when I introduce you to random testing using a little library along
 with the test egg. Stay tuned ....
 
+
+- Tell something about the output
+  Pass vs. Error vs. Fail
+
 ### Bells and whistles of the test egg
 
 The test egg is very configurable. It gives you a knob for almost every aspect of it. I often found myself wanting features from