>From version 0.0 to 0.2
Solve-State(state, prev_states, ret)
if (state == empty_state)
for each move possible on state
new_state <- state
if (new_state in prev_states)
; Do nothing
add new_state to prev_states
if (Solve-State(new_state, prev_states, ret) == SOLVED)
if (Solve-State(init_state, Null, ret) == SOLVED)
print "State was solved"
while (state <- Pop(ret))
print "I could not solve this board";
Basically a DFS scan.
* Initial programming in perl:
** State was represnted as a perl5 data-structure.
** State was duplicated by serializing it and unserializing it.
** @prev_states contained the states unsorted.
** Only a small amount of the tests were implmented before the conversion to
** Running the program from the command-line gave it a very slow output.
** Conclusion: perl is slow, if you want your application to be as fast
as possible, code it in C from the first stage.
* Converting to C:
** Wrote the functions card_user2perl card_perl2user etc. in C (show the card.c module)
** For representing a board used the following scheme
- include some code of state.h
- Note that non-vacant cards in stacks have to be set to the
empty card so the states can be memcmp'ed.
** I decided to use memcmp for comparing states. It's faster than a stack by
stack, card by card comparison of states.
** For getting the state information I wrote some macros:
stack_card_card_num, stack_card_deck() etc.
I figured out that they will be used a lot in solve_for_state
so writing them as functions will consume a lot of CPU.
Thus, they were written as macros.
** In order to avoid a case where two states which differ only with their
order of the stacks and/or freecells were generated, the states were
- The stacks were qsorted according to their bottommost card.
- The freecells were qsorted according to their card.
That way, the possibility is avoided.
- Downside: the order of the stacks and freecells is not preserved.
Show the code of canonize_state, card_compare, stack_compare
** The prev_states were stored in a sorted array, with an unsorted
- bsearch() was used to search for the current state inside
the sorted array.
- lfind() was used to search for it inside the sort margin.
- A state which had not been previously encountered was placed at
the end of the sort margin.
- Once the sort margin grew beyond PREV_STATES_SORT_MARGIN the
array was qsorted(), and the length of the sort margin was
set to 0.
- When the array exceeds its allocated size it is
- PREV_STATES_GROW_BY was defined as 256 while
PREV_STATES_SORT_MARGIN was defined as 16. Both could be
defined to a different value by the user.
** The various tests conducted:
(Enumerate the tests)
** Two tests are special: moving a freecell or stack card to the foundations.
Moving a card to the founds where both of the founds of the other colour
are higher than its card number plus 2. In that case if the child-state is
not solveable neither is the original state.
** Note that the entire checking and adding code for new states was
implemented inside the tests code. (not in a separate function).
- Show the code of freecell.c
* Testing the final code:
** I generated 1000 test boards. (unfortunately not in a reproducible manner)
and ran the function of them.
** Most of the boards ran fine. Some took too long to solve (several dozen of
** Out of which, some were terminated in the middle.
** Some boards were reported by the program as unsolveable. (as I discovered
later, they were all solveable).
** I placed Freecell Solver version 0.2 on a web-site, and posted an
announcement for it on Freshmeat.
>From version 0.2 to 0.4
* Changes I had in mind:
** The states are quite large in memory size. Reduce their size by using bytes
for their representation.
** Put the "check and add state" code in a separate function, so duplicate code
will be avoided.
* Implementing COMPACT_STATES:
- card is a char - low half is card num upper half is deck
- deck is a char
- stack counter is a char
** Because of the macros I used to access and modify the state's data, I was
able to rewrite the macros as COMPACT_STATES ones and have most of the work
done for me.
** I decided to include a compile-time directive to choose between the
COMPACT_STATES and the FAST_STATES.
** state_as_string required a thorough modification because it did not use the
macros in the first place.
** I noticed that COMPACT_STATES were much faster than the so called
FAST_STATES. As some of you remember I posted a message to linux-il
The answers I received were:
1. Less memory means less cache misses, thus faster running.
2. Operations on chars don't take more time than operations on 16-bit
or 32-bit integers.
3. The Pentium cache is 8-bits wide.
To me, #1 seems the most likely.
** Thus, I decided to rename FAST_STATES as DEBUG_STATES because
they are easier to debug using gdb and friends.
** Conclusion: use as little bit-space as you need, and use a lot of
typedefs and macros to manage them.
* Implementing check_and_add_state as a separate function:
** I took a look at the code and realized that the two distinct tests are
going to cause me problems.
** I implemented the function check_and_add_state(), which:
1. Accepts a pointer to the new state.
2. Checks if it is already present in prev_states.
3. If so, returns STATE_ALREADY_EXISTS
4. If not, calls solve_for_state on it.
5. If the inner solve_for_state succeeeds returns STATE_WAS_SOLVED
6. If it didn't return STATE_IS_NOT_SOLVEABLE to solve_for_state.
- Show the code of check_and_add_state.
** I then coded four macros:
All the tests excepts the two NSFOS (not solveable for original state)
used sfs_check_state_begin and sfs_check_state_end.
The two NSFOS states used just sfs_check_state_init and
- Show the code of the macros.
** At the end, the routines at the end of the tests, only had the code that
requires to perform the moves on the new state. All the other code was
written inside the macros and the function check_and_add_state.
Avoid duplicate code, use functions and macros to encapsulate it.
* Switching from qsort to merge:
** In my "Introduction to Data Structures and Algorithms" class I learned
that Quick Sort has a minimal complexity of n*log(n) and a maximal
complexity of n^2. Plus, when the array is sorted maximal complexity
In FCS, the array of prev_states is mostly sorted, except for the sort
margin at the end. Thus, it will rougly take Omega(n^2) to sort it.
** Since the lion's share of the array is already sorted, we can use a merge
function (which I also studied in class) to sort it. Merge works on two
sorted arrays, so we need to sort the sort margin before that, or keep it
Merge takes O(n) to perform.
** I decided to keep the sort margin sorted, and place it separately from the
main prev_states array. That way, I can allocate an extra place at the
end of the main array for the merging.
** By performing the merge from the end I was able to merge without allocating
a new array and then copying it to the original one.
** I figured that since the main array will usually be considerably larger
than the sort margin, then an item-by-item check will be wasteful. Thus, I
decided to base it on binary searches.
** The bsearch() function was inadequate because it doesn't the place
in which an unfound item should be inserted. Thus, I had to write one of
- show the code of SFO_bsearch
** Based on SFO_bsearch I wrote SFO_merge_large_and_small_sorted_arrays.
- Show the code.
** SFO_bsearch and SFO_merge_large_and_small_sorted_arrays were originally
written in perl, and then translated to C. This time, writing them in perl
first was a good idea, because it reduced the development time. Remember
that the running time for the testing was not that critical.
** SFO_bsearch also allowed me to add a new state at the appropriate place in
the sort margin.
** I rewrote check_and_add_state to use the merge.
- Show the code in freecell.c
For 50 test boards:
- FCS 0.2 takes 1 minute and 11 seconds to solve them.
- FCS 0.4.1 takes 14 seconds
A factor of 5 improvement.
* Implementing INDIRECT_STATE_STORAGE
** Motivation: I noticed that when I ran FCS on Windows NT 4.0 Workstation,
with DEBUG_STATES, I ran out of stack space. The reason: solve_for_state
accepts the entire state data structure as a function argument, and the
DFS depth of that board was considerable.
Therefore, I decided to implement a scheme in which only pointers to the
states will be passed to solve_for_state.
** I decided to replace prev_states with indirect_prev_states which would
be an array of pointers to states.
** Because of the algorithm, only the last allocated state needs to be
** At first, I tried to implement the actual states, to which the
pointers will point as a stack, that will be realloced. Then I realized,
that because of the realloc() call, the physical location of the states
** In Linux, at least on the kernel level, the memory is allocated in
powers of 2. Thus, it would be a waste if every state were allocated
separately. I decided to allocate the states in chunks that fit a 64KB
boundary. I called these chunks packs.
** I wrote the state_ia (state indirect allocation) routines.
- show the fcs_isa.c file.
** Note that there is a compile-time option to compile with either the
DIRECT_STATES_STORAGE mechanism or the INDIRECT_STATE_STORAGE.
** check_and_add_state and the sfs_check_state_ macros were adapted to
allocate the states indirectly.
** Some macros make sure that "state", "ptr_state", "new_state" and
"ptr_new_state" are all available in the code scope.
For 100 test boards on a Pentium 16 with 48 MB of RAM:
Debug Direct: 44 minutes and 37 seconds
Debug Indirect: 8 minutes and 10 seconds
Compact Direct: 7 minutes and 7 seconds
Compact Indirect: 1 minutes and 54 seconds
As you can see, the indirect state management improves the running time
by a big factor in both cases.
- When passing structs as arguments to functions always do that by a
- Never sort, search in, etc with a direct vector of structs. Instead
use an array of pointers.
* Usability features:
** modified initial_user_state_to_c so it will also accept states in the
middle of play:
1. With stacks of variable length.
2. With filled freecells.
3. With non-initial decks.
- Possibly show the source, although it's just an ugly C parsing
** Wrote a function check_state_validity that checks that all cards are
present in an initial board once and only once.
- When I used the program I noticed that I had many mistakes when
trying to manually input boards.
* I placed Freecell Solver 0.4 on its web-site and posted an announcement
for it on Freshmeat.
>From version 0.4 to 0.6
* I randomized a 1000 board and noticed that FCS could not solve all of them.
I decided to investigate and after patching GNOME Freecell so it will
input a board from a file, I realized that I forgot to add a test that
will move a card to a parent card on the same stack.
** I added the extra move and afterwards it managed to solve all of them.
* Moving the global variables to an instance strucutre.
** FCS made extensive use of global variables. But I wanted it also to be
available as a library sooner or later. Now, if only one instance of the
library can be run we are set. But what about multi-threaded applications?
** I decided to create an instance structure that will be passed as the first
parameters to solve_for_state() and check_and_add_state(). That way, more
than one simultaenous instances can be run at any given time.
** I moved prev_states, indirect_prev_states, num_prev_states, etc. to the
instance variable. Gradually, I eventually moved all the variables their.
** I created functions to initialize and terminate an instance. (show the code
** Conclusion: Global variables are wrong because they do not go well with
multi-threading. In C++, use objects. In ANSI C, create an instance struct
that will hold all the "global" variables of the function, and create
functions to instantize it.
* Remembering the stack and freecell order.
** Because Freecell Solver "canonizes" the order of the stacks and freecells
they are not preserved throught the solution that is displayed to the
user. (You can give an example).
** This is a big usability flaw. I wanted to correct it.
** The reason for the canonization is that I wanted that states with a
different ordering of the stacks or freecells will not be generated.
** If I place an ordering integer inside each stack and freecell, I won't be
able to memcmp() two states.
** But what if I put them outside the main state sturcture?
** Solution: I created a structure called state_with_locations_t that:
1. Contain the state as its first element. (So its pointers can still
2. Contain 8 integers that specify the order of the stacks.
3. Contain 4 integers that specify the order of the freecells.
** I had to modify canonize_state() that it will swap the order specifiers
as well as the stacks or freecells themselves. I could not do that
with C's qsort() so I wrote my own insertion-sort routine. (Show the
code of canonize_state() in state.h)
** I also had to modify the code itself to access the s element of the
state_with_locations stucture. I did some of it using macros.
** I also had to modify the state_ia routines to manage state_with_locations
instead of regular states.
** state_as_string was modified to make use of the new locations.
** Eventually, the order of the states was preserved.
** Conculsion: The user is always right. If you make some software design
considerations with possible implications for the user, always make
sure the user does not notice.
* Adding the fcs_ prefix.
** Despite moving the global variables to the instance struct, FCS still
have a lot of functions. But names like state_as_string may might
as well be names of user's routines. Thus, FCS may interfere with the
main program of the user.
** Solution: Add a "fcs_" or "freecell_solver" prefixes to the names of
the global functions. That way, they won't collide with names of
other functions, in case Freecell Solver is compiled as a library.
** Conclusion: Avoid namespace collision. In C++ use the namespaces facility,
In C append prefixes to the functions you create.
* Each test in a separate function.
** To enumerate the different moves possible on each state, solve_for_state
performs several tests. Those tests include:
1. Moving a top card to the foundations.
2. Moving a card to a parent card.
3. Moving a non-top card to the foundations.
4. Moving a sequence of cards to a different parent.
5. Emptying a stack into the freecells.
** Those tests can be done in any order whatsoever. It's hard to know in
advance what will be the order which will solve a state more quickly.
** Solution: Let the user specify a test order from the command line.
** Implementation: The code of solve_for_state was split into 10 test
functions and one main function. Each test function implemented one of
the tests and called check_and_add_state().
** The solve_for_state() function holds ten pointers to test functions, and
executes them one by one according to a test order which is specified in
the instance struct.
** The instance can also specify the number of tests to conduct, thus enabling
the user to omit some of the tests.
- For every board there's usually a test order that can solve it very
fast. For other orders it usually get stuck and doesn't finish
- For example, running FCS 0.6.2 with a limit of 150000 boards with the
following test orders:
Duration: 19:25 minutes
Number of unsolved boards: 19
Duration: 14:27 minutes
Number of unsolved boards: 12
Duration: 2:50 minutes
Number of unsolved boards: 5 (some of them after a complete scan)
** Conclusion: If several things can be done with any given order, make sure
they are implemented as separate routines. You may wish to play with the
order and see which one gives the optimal results.
* Interface separation and code cleaning.
** Like I said, I wanted FCS to be also available as a library.
** I placed the initialization, solution and de-allocation routines in their
own functions, instead of inside main.
** The following functions were created: