Commits

Anonymous committed dbcf614

half way through the test article

Comments (0)

Files changed (1)

_posts/2014-04-05-tesing_your_chicken_code.md

 code:
    - url: http://bitbucket.org/certainty/chicken-test-mode
      caption: chicken-test-mode
-   - url: http://wiki.call-cc.or/eggref/test
+   - url: http://wiki.call-cc.org/eggref/4/test
      caption: test
+   - url: http://wiki.call-cc.org/eggref/4/test-generative
+     caption: test-generative
+   - url: https://bitbucket.org/certainty/lisp-unleashed-examples/src/tip/code-breaker/
+     caption: code-breaker-example
 tags: [scheme,test,chicken,testing]
 ---
+
+Hello everybody and welcome back. In this article I attempt to introduce you to the very excellent [test egg](http://wiki.call-cc.org/eggref/4/test), which is a great way to do your unit-testing in CHICKEN.
+
+It will start with a gentle introduction to unit testing in general before we dive into the **test egg** itself. I'll show you a few **best practices** that will help you to benifit most from your test code. After I've outlined the bells and whistles that come with **test**, I'll introduce you to **random testing** on top of **test**. Finally I'll give you some hints on how a useful Emacs setup to do CHICKEN unit tests may look like.
+
+You can either read this article as a whole, which is what I recommend, or cherry-pick the parts that you're interested in. Both will hopefully work out.
+It will be a very practical article with few theory.
+
+For those of you who are fluent with **test**, there is probably not much new here. Still I'd love you to be my guest and read on. Don't hesitate to come back to me and tell me about things I've missed or things that you do differently that work great. I'm eager to learn about your setup.
+
+But now without further ado, let's go down this rabbit hole.
+
+### A gentle introduction to testing
+
+You probably heard that testing your code is good practice and that every *serious* software engineer must do it. I do agree with this in general and most software engineers out there do as well. There are different schools though. Some do the tests after their code, some do them before and yet others do them while they flesh out their functions in the REPL. I don't want to tell you that there is only one true way to do it, but there are a few arguments that I'd like to make, that suggest that a particulare school may have advantages.
+
+**What does testing give you?**
+
+If it is done right, it gives you a reasonable amount of confidence that the code you're testing works correctly. Your tests act like a specification for the code under test.
+Secondly they give you a safety net that enables you to change your code and still make sure that the code works as expected. This means that you're free to improve your code without having to worry, that you broke some, probably distant, part of the code, while doing so. Closely related to this are regression tests, that are used to detect bugs that have been fixed but which now pop up again, after you have changed some portion of your code. Regression tests are an important part of a test suite.
+Once you discover a bug you generally write a test that reproduces it. This test will be naturally failing as the system/code under test doesn't behave as expected. The next step is to fix this
+bug and make the test pass. This way you have made sure that this particular bug has been fixed.
+This must be contrasted with the sort of tests you use to test your features. While those are
+estimates, testing for bugs and fixing them can act as a proof. Of course there is no rule without an exception. [Bugs tend to come in clusters](http://testingreflections.com/node/7584) and can be grouped into categories or families.
+This means in practice that you may have fixed this particular bug but you're advised to look
+for a generalisation of that bug that might accure elsewhere. Also you likely want to check
+the code that surrounds the part that caused the bug. It has been shown empirically that it is
+likely to contain bugs as well. For a critical view on this theory you might want to have
+a look at [this](http://www.developsense.com/blog/2009/01/ideas-around-bug-clusters).
+Also tests often are a form of documentation. They describe the expected bevaviour of your code and thus give strong hints about how it shall be used.
+
+There are many more testing categories that all have their particular value. The literature is
+full of those and I very much recommend reading some of it. The [reference section](#references) has a few links
+that you may find useful.
+
+**What does testing not give you?**
+
+Tests for features can never prove that your code doesn't contain bugs and is correct. It is only an estimation. You write as much tests as needed to reach a level of confidence that you need.
+This level may be either perceived intuitively or measured. A common way to measure it is a so called, code coverage analysis. This just means that an analyzer runs your tests and checks which code paths they exercise. The result may be used to derive a metric for the developer on when he/she has enough or better good enough tests. But all this is not nearly a proof for the absence of bugs or misbehaviour.
+This is what tests normally can't give you, possibly with the exception of regression tests.
+So having great many tests that verify features of your application is comforting and all, but be assured that there will be a time
+when a bug pops up in your application. This means that all your tests didn't do anything to prevent this bug. You're on your own now.
+Now you actually have to understand your system, reason about it and figure out what went wrong. This is another crucial part of developing
+an application. You must make sure that you have a system that you can actually understand and reason about. Tests can help to develop such a system, as it has been shown that software that is easy to test is often also [simpler](http://www.infoq.com/presentations/Simple-Made-Easy) more focused and easier to comprehend.
+
+
+
+**Ok, I want to test. How do I do it?**
+
+While there is value in doing manual testing in the REPL, while you develop a certain function, you really also want a suite of **automated tests**. Automated means in practice that you have written
+code that tests your application. You can run these tests and the result will tell you if and which tests have failed or passed. This makes your tests reproducable with minimum effort. You want to develop this test suite as you develop your application. If you test before your actual code or after is really up to you. There is one thing though that I want to point out. There is a general problem with tests, well a few of those but one is particulary important now: How do you make sure that your test code is correct? It doesn't make much sense to put trust in your code because
+of your shiny test-suite when the tests in there are incorrect. This means they pass but shouldn't or they don't pass but really should. While you could write tests for your tests, you may immediatly see that this is a recursive problem and might lead to endless tests testing tests testing tests ....
+
+This is one reason why doing tests before code might be helpful. This discipline is called [TDD](https://en.wikipedia.org/wiki/Test-driven_development). It suggests a workflow that we refer to as **"Red-Green-Refactor"**. **Red** means that we start with a failing test. **Green** means that we implement as much of the application code, that is needed to make this test pass. **Refactor** is changing details of your code without effecting the overall functionality. I don't want to go into details but there is one aspect that is particulary useful. When we start with a **red test**, we at least
+have some good evidence that our tests test portions of our code that don't yet work as expected.
+We have some confidence that we're testing the right thing before we make the test pass.
+Contrast this with tests that you do after your code. You don't ever know if the tests would
+be failing in case the code didn't work correctly. You could update parts of your application code to emulate this, but that's often more work. This is what the TDD-folks consider good enough
+to make sure that the tests work correctly, so that they don't need a test-suite for a test-suite for a test-suite ....
+There are other aspects of TDD that I don't cover here, like responding to difficult tests by changing your application code instead of
+the tests. There is many more and I invite you to have a look at this methodology even if you don't apply it.
+Personally I do test before and I do test after and also while I develop application code. I try though to test first, if it's feasable.
+
+There are many best practices when it comes to testing. I can not name and explain all of them here. One reason is that I certainly don't know them all and the other is that there are to many that are very well explained elsewhere.
+A few of them are very essential though and I have often seen people violating them which made their tests brittle.
+
+**«Always think about the value of the tests»**
+
+Don't write tests just because someone said you must. Don't write tests that don't improve the trust in your system. This can be a difficult decision. Test code is code just like other your application code. It has to be written and maintained.
+
+**«Think about interfaces not implementation»**
+
+This means that your tests should not need to know about the internals of your function. You should just run them against your interface
+boundaries. Doing so enables you to change your implementation and yet have your test-suite telling the truth about your system.
+
+**«Keep your tests focused»**
+
+Write enough test code to *"verify"* one aspect of your function but not more. For example if
+you have three invariants that you can test for a given function, then you likely want three tests for them. The reason may not be
+obvious but it should become clear in a moment. There should be only one reason for your tests to fail. The next step after you noticed a failing test is to find out
+what went wrong. If there are multiple possibilities why the test have failed because you verified three invariants in one test, you
+have to investigate all three paths. Having one test for each of the invariants makes this task trivial, you immediatly see what the
+culprit is. The other aspect is that it tends to keep your test code small, which means that you have fewer code to maintain and
+fewer places you can be wrong in one test. The attentive reade might have noticed that a consequence from this guideline is, that you
+have more tests. This is totally true so you want to make sure that they execute fast. A typical test suite of unit-tests often contains
+a rather large amount of small tests.
+
+**«Keep your tests independent»**
+
+This just means that tests should be implemented in such a way that only the code inside the test you're looking at can make
+the test fail or pass. It must not depend on other tests. This is likely to accure when your code involves mutation of shared state.
+Suddenly you may find that your test only passes if you run the entire suite but fails if you run it in isolation. This is obviously a
+bad thing as it makes your tests unpredictable. One way to automatically detect these kinds of dependencies is to randomize the
+order in which tests are executed. This is useful as sometimes you're simply not aware of one test depending on another.
+
+
+These are all just general guidelines that apply to unit-tests in general. There are specific Do and Don'ts that apply to other kinds
+of tests that I don't want to cover here. I hope this little introduction gave you enough information to go on with the rest of the article and you now have a firm grip of what I'm going to be talking about.
+
+
+### Putting the test egg to work
+
+Ok good, you're still here and not bored away by the little introduction. This is finally where the fun starts and we will be seeing
+actual code. I'll show you how to obtain and use the very excellent testing facility that is the current defacto standard test library
+for CHICKEN. We will be using a very tiny project that implements the score-breaker game. I've borrowed the idea from the excellent book: [Programming Clojure](http://pragprog.com/book/shcloj/programming-clojure), so thanks to the authors.
+
+You can find the entire code for this project at [score-breaker-example](https://bitbucket.org/certainty/lisp-unleashed-examples/src/tip/score-breaker).
+
+Let's start with the first thing you need. You obviously need [CHICKEN](https://http://code.call-cc.org/) and you need the test egg.
+You can obtain it with the lovely **chicken-install**. I assume you're familiar with it but I'll give you
+the command line anyway.
+
+~~~ bash
+ $ chicken-install test
+~~~
+
+
+#### The project layout
+
+Once you've installed the test egg you can have a closer look at the directory layout of our example project.
+There is the top-level directory **code-breaker** which holds one scheme source file named **code-breaker.scm**. This is where your application code goes. Furthermore you'll notice that there is a directory called **tests** which holds a single file named **run.scm**. The entire layout looks like this:
+
+<pre>
+ code-breaker
+ | - code-breaker.scm
+ | - tests
+     | - run.scm
+</pre>
+
+This is the standard layout of a scheme project for CHICKEN. There are projects that have additional folders
+and structure their files differently but the majority of projects look like this, so it is a good practice
+to follow it. You may noticed that this is also the standard layout of CHICKEN eggs. They contain egg specific files like *.release-info, *.meta and *.setup but appart from that, they look very much like this. Another reason to arrange your tests the way I showed you is that CHICKEN's CI at [salmonella](https://tests.call-cc.org) expects this layout. You can benefit from this service once you follow this convention. It's time to give **mario** and **peter** a big thank you, as they made it possible.
+
+**The game**
+
+Back to our project you might want to have a look at the source code to understand what it actually does. I will take the opportunity to give you a little explanation here though. As I already said it implements the code-breaker game.
+The game constists of the code-maker (the program) that generates a random code and a code-breaker (the player) that tries to
+guess this code. The code is a tuple of colored pegs. Once the code-breaker submitted a guess the program scores the guess
+like so:
+
+* One black peg for each peg of the right color in the right position
+* One white peg for each peg ot the right color but not in the right position
+
+The game ends when either the code was guessed correctly or a predetermined amount of tries has been exceeded.
+
+**The code**
+Here we present the code that is needed to implement the code breaker. It is almost literally translated from clojure
+to scheme. Please bear with me as I introduced some inaccuracies. Clojure maps don't map to scheme's alists but I still chose
+this representation. This is all not important though as we want to concentrate on the tests. The following code snippets
+outlines the code we want to concentrate on. It uses some helpers that you can look up in the repository, which are not
+important right now.
+
+~~~ clojure
+;; how many pegs in code are also in guess taking into account the position
+(define (exact-matches code guess)
+  (length (remove (cut not <>) (list-diff code guess))))
+
+;; how many pegs of code are in guess not taking into account the position
+(define (unordered-matches code guess)
+  (let ((f1 (select-keys (frequencies code) guess))
+        (f2 (select-keys (frequencies guess) code)))
+    (merge-with min f1 f2)))
+
+;; finally we can welcome our score function
+(define (score code guess)
+  (let ((exact (exact-matches code guess))
+        (unordered (apply + (map cdr (unordered-matches code guess)))))
+    (cons exact (- unordered exact))))
+~~~
+
+The main **interface** and thus the subject to our tests will be the **score-procedure**. It takes a list containing the code
+and a list containing the guess as arguments and produces a pair with the amount of exact matches in the car and the amount
+of unordered matches in the cdr.
+
+**The tests**
+
+Now let's finally write some tests for the given piece of code. We start with the template for every test file.
+
+~~~ clojure
+ (use test)
+
+ (test-begin "code-breaker")
+ (test-end "code-breaker")
+
+ (test-exit)
+~~~
+
+This little snippet is a useful template for the tests you write. It loads and imports the test egg. It encloses your
+tests with a **(test-begin)** and **(test-end)** form. You will want to do this as **test** will print a summary for every
+test within these boundaries. This means that you get a summary of how many tests have passed and how many tests have failed
+at the end of test's output. This means that you can't miss a failing test that has flitted across the screen. I've been bit by
+that many times. Finally the last line in your test file should be **(test-exit)**. This will make your test process exit
+with a status code that indicates the status of your tests. If there have been any tests failing it will return with a non-zero
+status code, which can be passed as an argument to the procedure and defaults to one. Zero will be the status code if all tests have passed.
+
+Now let's add some smoke tests that are there to verify that our score procedure behaves as expected.
+
+~~~ clojure
+ (use test)
+ (load "../code-breaker.scm")
+
+ (test-begin "code-breaker")
+
+  (test-group "score"
+    (test "empty guess"
+      '(0 . 0)
+       (score '(r g g r) '()))
+
+    (test "empty code"
+      '(0 . 0)
+       (score '() '(r g g r)))
+
+    (test "exact matches only"
+      '(2 . 2)
+       (score '(r g g b) '(r g b g)))
+
+    (test "exact matches don't show up in unordered matches"
+      '(2 . 2)
+       (score '(r g g b) '(r b g y)))
+
+    (test "unordered matches only"
+      '(0 . 2)
+       (score '(r g g b) '(b r y y))))
+
+    (test "ordered and unordered matches"
+      '(1 . 3)
+       (score '(r g g b) '(r b b g)))
+
+ (test-end "code-breaker")
+
+ (test-exit)
+~~~
+
+Ignore the test-group form for now and have a close look at the various invocations of the **(test)** form.
+As you may have noticed it takes multiple arguments. The first argument is a description string that gives a hint about
+what this particalar test attempts to verify. The next argument is the **expected value**. It can be any scheme value.
+The last argument is the scheme expression that shall be tested. It will be evaluated and compared with the **expected value**.
+This is actually the long form. You can get by with the shorter form that omits the description string.
+
+~~~ clojure
+ (test 3 (+ 1 2))
+~~~
+
+This is very handy for multiple reasons. The most obvious reason is that you don't have to think of a fitting description.
+The test egg is smart enough to use a pretty-printed form of the expression, which is (+ 1 2) in our little example, as the
+description. Secondly you can use this feature to generate descriptions out of your expression that are still meaningful.
+You just have to create a form that reads nicely. Let me clarify this:
+
+~~~ clojure
+ (test-assert (member 3 (list 1 2 3)))
+~~~
+
+This will generate a description like this when you run the tests.
+
+<pre>
+(member 3 (list 2 3 4)) .............................................. [ PASS]
+</pre>
+
+This is pretty expressive, even though we didn't supply a proper description. The **test-assert** form can be used
+whenever you'd otherwise write tests like the following
+
+~~~ clojure
+ (test #t ....)
+~~~
+
+Ok, going back to the example above. First I've written two tests that handle common edge cases, namely when one of the input is empty. The next test cases simply test a very specific aspect of the score function. I want to make sure that it recognizes exact matches. Also I want to make sure that the exact matches are not counted twice in the unordered matches. This is pretty good so far as we have a couple of cases covered. These tests should all pass now as the following output shows.
+
+<pre>
+-- testing code-breaker ------------------------------------------------------
+
+    -- testing score ---------------------------------------------------------
+    empty guess ...................................................... [ PASS]
+    empty code ....................................................... [ PASS]
+    exact matches only ............................................... [ PASS]
+    exact matches don't show up in unordered matches ................. [ PASS]
+    unordered matches only ........................................... [ PASS]
+    ordered and unordered matches .................................... [ PASS]
+    6 tests completed in 0.002 seconds.
+    6 out of 6 (100%) tests passed.
+    -- done testing score ----------------------------------------------------
+
+1 subgroup completed in 0.007 seconds.
+1 out of 1 (100%) subgroup passed.
+-- done testing code-breaker -------------------------------------------------
+</pre>
+
+Before we go any further we want to make sure that our score function only accepts arguments of a distinct type. If one
+of the arguments is not a list it shall signal an error. Let's express this in our tests.
+
+~~~ clojure
+(test-begin "code-breaker")
+  (test-group "score"
+    ;; .... tests as above
+    (test-error "guess must be a list" (score '() #f))
+	(test-error "code must be a list" (score #f '())))
+(test-end "code-breaker")
+
+(test-exit)
+~~~
+
+<pre>
+-- testing code-breaker ------------------------------------------------------
+
+    -- testing score ---------------------------------------------------------
+    empty guess ...................................................... [ PASS]
+    empty code ....................................................... [ PASS]
+    exact matches only ............................................... [ PASS]
+    exact matches don't show up in unordered matches ................. [ PASS]
+    unordered matches only ........................................... [ PASS]
+    ordered and unordered matches .................................... [ PASS]
+    guess must be a list ............................................. [ PASS]
+    code must be a list .............................................. [ PASS]
+    8 tests completed in 0.003 seconds.
+    8 out of 8 (100%) tests passed.
+    -- done testing score ----------------------------------------------------
+
+1 subgroup completed in 0.009 seconds.
+1 out of 1 (100%) subgroup passed.
+-- done testing code-breaker -------------------------------------------------
+</pre>
+
+**Please pay close attention to the output.** They **succeed**, all of them!
+How can that be? We didn't even implement the logic yet. This is a good example why it is good to write your tests first. If we had written the code before we would've never noticed that the tests succeed even without the proper implementation, which pretty much renders these particular tests useless for this case. They do more harm than good because they lie to you and you most likely
+believe them. Obviously just checking that an error accured is not enough. We should verify that a particular error has accured.
+The test library doesn't provide a procdure or macro that does this so we have to come up with our own. We need a way to tell
+if and which condition has been signaled in a given expression. For this purpose I'll add a little helper to the very top
+of the test file and update the tests to use that little helper.
+
+~~~ clojure
+
+(define-syntax condition-of
+  (syntax-rules ()
+    ((_ code)
+     (begin
+       (or (handle-exceptions exn (map car (condition->list exn)) code #f)
+           '())))))
+
+ (test "it expects guess to be a list"
+   '(exn code-breaker invalid-argument)
+    (condition-of (score '() #f)))
+
+ (test "it expects code to be a list"
+   '(exn code-breaker invalid-argument)
+    (condition-of (score #f '())))
+~~~
+
+With these definitions, let's see now if our tests fail. Running the tests reveals:
+
+<pre>
+-- testing code-breaker ------------------------------------------------------
+
+    -- testing score ---------------------------------------------------------
+    empty guess ...................................................... [ PASS]
+    empty code ....................................................... [ PASS]
+    exact matches only ............................................... [ PASS]
+    exact matches don't show up in unordered matches ................. [ PASS]
+    unordered matches only ........................................... [ PASS]
+    ordered and unordered matches .................................... [ PASS]
+    it expects guess to be a list .................................... [ FAIL]
+        expected (exn code-breaker invalid-argument) but got (exn type)
+    (condition-of (score '() #f))
+    it expects code to be a list ..................................... [ FAIL]
+        expected (exn code-breaker invalid-argument) but got (exn type)
+    (condition-of (score #f '()))
+    8 tests completed in 0.003 seconds.
+    2 failures (25.0%).
+    6 out of 8 (75.0%) tests passed.
+    -- done testing score ----------------------------------------------------
+
+1 subgroup completed in 0.01 seconds.
+0 out of 1 (0%) subgroups passed.
+-- done testing code-breaker -------------------------------------------------
+</pre>
+
+Aha! We have two failing tests saying that we were expecting a condition of type **(exn code-breaker invalid-argument)**
+but we actually got a condition of type **(exn type)**. This is why the tests succeeded. Some code within score did
+validate that he input are lists but it was certainly not our score procedure. Now we can go on and add the code that
+verifies that the arguments are of the expected type and signals an error if they're not. That's easy enough to add.
+
+~~~ clojure
+ (define (assert-list ls message)
+   (unless (list? ls)
+     (signal
+      (make-composite-condition
+       (make-property-condition
+        'exn
+        'message message
+        'arguments (list ls))
+       (make-property-condition 'code-breaker)
+       (make-property-condition 'invalid-argument)))))
+
+ (define (score code guess)
+   (assert-list code "code must be a list")
+   (assert-list guess "guess must be a list")
+
+   (let ((exact (exact-matches code guess))
+         (unordered (apply + (map cdr (unordered-matches code guess)))))
+     (cons exact (- unordered exact))))
+~~~
+
+This simply adds a little helper that checks if a give argument is a list and signals a condition of the desired kinds
+if they're not. Let's see if we did a good job and made the tests pass.
+
+<pre>
+-- testing code-breaker ------------------------------------------------------
+
+    -- testing score ---------------------------------------------------------
+    empty guess ...................................................... [ PASS]
+    empty code ....................................................... [ PASS]
+    exact matches only ............................................... [ PASS]
+    exact matches don't show up in unordered matches ................. [ PASS]
+    unordered matches only ........................................... [ PASS]
+    ordered and unordered matches .................................... [ PASS]
+    it expects guess to be a list .................................... [ PASS]
+    it expects code to be a list ..................................... [ PASS]
+    8 tests completed in 0.002 seconds.
+    8 out of 8 (100%) tests passed.
+    -- done testing score ----------------------------------------------------
+
+1 subgroup completed in 0.01 seconds.
+1 out of 1 (100%) subgroup passed.
+-- done testing code-breaker -------------------------------------------------
+</pre>
+
+This looks very good. We have added tests for this case and while doing so we introduced a nice little helper to handle
+specific kinds of conditions. This is the usual way to do it. The test egg provides us with all the primitives that are needed
+to build on. It does not attempt to solve every possible problem for us. This is very much in alignment with the [prime clingerism](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.5197) (Hi Peter!).
+I want to stop for a moment with these tests and tell you something about the configuration options you have when you use the
+test egg. We will come back to our little toy project later, when I introduce you to random testing using a little library along
+with the test egg. Stay tuned ....
+
+### Bells and whistles of the test egg
+
+The test egg is very configurable. It gives you a knob for almost every aspect of it. I often found myself wanting features from
+test when I had to realize that it is already there. Test's author **Alex Shinn** did a very good job.
+
+- test-equals
+- test-epsilon
+- test-evaluater
+- test-handler
+- test-skipper
+- environment
+
+### Best practices when using the test egg
+
+- where to put the tests
+- general layout of the test-file
+- test-begin and test-end
+- test-exit
+
+### Random testing with test-generative
+
+- introduction
+- model based testing
+
+### Integrating tests into your Emacs workflow
+
+- chicken-test-mode
+- what does it give you
+- alternatives
+- run command
+
+### Wrap up
+
+
+# References
+
+* [xunit test patterns](http://xunitpatterns.com/)
+* [regression testing](https://en.wikipedia.org/wiki/Regression_testing)
+* [black swans](http://testingreflections.com/node/7584)
+* [ideas around bug clusters](http://www.developsense.com/blog/2009/01/ideas-around-bug-clusters)