Source

hgbook / en / undo.tex

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
\chapter{Finding and fixing your mistakes}
\label{chap:undo}

To err might be human, but to really handle the consequences well
takes a top-notch revision control system.  In this chapter, we'll
discuss some of the techniques you can use when you find that a
problem has crept into your project.  Mercurial has some highly
capable features that will help you to isolate the sources of
problems, and to handle them appropriately.

\section{Erasing local history}

\subsection{The accidental commit}

I have the occasional but persistent problem of typing rather more
quickly than I can think, which sometimes results in me committing a
changeset that is either incomplete or plain wrong.  In my case, the
usual kind of incomplete changeset is one in which I've created a new
source file, but forgotten to \hgcmd{add} it.  A ``plain wrong''
changeset is not as common, but no less annoying.

\subsection{Rolling back a transaction}
\label{sec:undo:rollback}

In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats
each modification of a repository as a \emph{transaction}.  Every time
you commit a changeset or pull changes from another repository,
Mercurial remembers what you did.  You can undo, or \emph{roll back},
exactly one of these actions using the \hgcmd{rollback} command.  (See
section~\ref{sec:undo:rollback-after-push} for an important caveat
about the use of this command.)

Here's a mistake that I often find myself making: committing a change
in which I've created a new file, but forgotten to \hgcmd{add} it.
\interaction{rollback.commit}
Looking at the output of \hgcmd{status} after the commit immediately
confirms the error.
\interaction{rollback.status}
The commit captured the changes to the file \filename{a}, but not the
new file \filename{b}.  If I were to push this changeset to a
repository that I shared with a colleague, the chances are high that
something in \filename{a} would refer to \filename{b}, which would not
be present in their repository when they pulled my changes.  I would
thus become the object of some indignation.

However, luck is with me---I've caught my error before I pushed the
changeset.  I use the \hgcmd{rollback} command, and Mercurial makes
that last changeset vanish.
\interaction{rollback.rollback}
Notice that the changeset is no longer present in the repository's
history, and the working directory once again thinks that the file
\filename{a} is modified.  The commit and rollback have left the
working directory exactly as it was prior to the commit; the changeset
has been completely erased.  I can now safely \hgcmd{add} the file
\filename{b}, and rerun my commit.
\interaction{rollback.add}

\subsection{The erroneous pull}

It's common practice with Mercurial to maintain separate development
branches of a project in different repositories.  Your development
team might have one shared repository for your project's ``0.9''
release, and another, containing different changes, for the ``1.0''
release.

Given this, you can imagine that the consequences could be messy if
you had a local ``0.9'' repository, and accidentally pulled changes
from the shared ``1.0'' repository into it.  At worst, you could be
paying insufficient attention, and push those changes into the shared
``0.9'' tree, confusing your entire team (but don't worry, we'll
return to this horror scenario later).  However, it's more likely that
you'll notice immediately, because Mercurial will display the URL it's
pulling from, or you will see it pull a suspiciously large number of
changes into the repository.

The \hgcmd{rollback} command will work nicely to expunge all of the
changesets that you just pulled.  Mercurial groups all changes from
one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is
all you need to undo this mistake.

\subsection{Rolling back is useless once you've pushed}
\label{sec:undo:rollback-after-push}

The value of the \hgcmd{rollback} command drops to zero once you've
pushed your changes to another repository.  Rolling back a change
makes it disappear entirely, but \emph{only} in the repository in
which you perform the \hgcmd{rollback}.  Because a rollback eliminates
history, there's no way for the disappearance of a change to propagate
between repositories.

If you've pushed a change to another repository---particularly if it's
a shared repository---it has essentially ``escaped into the wild,''
and you'll have to recover from your mistake in a different way.  What
will happen if you push a changeset somewhere, then roll it back, then
pull from the repository you pushed to, is that the changeset will
reappear in your repository.

(If you absolutely know for sure that the change you want to roll back
is the most recent change in the repository that you pushed to,
\emph{and} you know that nobody else could have pulled it from that
repository, you can roll back the changeset there, too, but you really
should really not rely on this working reliably.  If you do this,
sooner or later a change really will make it into a repository that
you don't directly control (or have forgotten about), and come back to
bite you.)

\subsection{You can only roll back once}

Mercurial stores exactly one transaction in its transaction log; that
transaction is the most recent one that occurred in the repository.
This means that you can only roll back one transaction.  If you expect
to be able to roll back one transaction, then its predecessor, this is
not the behaviour you will get.
\interaction{rollback.twice}
Once you've rolled back one transaction in a repository, you can't
roll back again in that repository until you perform another commit or
pull.

\section{Reverting the mistaken change}

If you make a modification to a file, and decide that you really
didn't want to change the file at all, and you haven't yet committed
your changes, the \hgcmd{revert} command is the one you'll need.  It
looks at the changeset that's the parent of the working directory, and
restores the contents of the file to their state as of that changeset.
(That's a long-winded way of saying that, in the normal case, it
undoes your modifications.)

Let's illustrate how the \hgcmd{revert} command works with yet another
small example.  We'll begin by modifying a file that Mercurial is
already tracking.
\interaction{daily.revert.modify}
If we don't want that change, we can simply \hgcmd{revert} the file.
\interaction{daily.revert.unmodify}
The \hgcmd{revert} command provides us with an extra degree of safety
by saving our modified file with a \filename{.orig} extension.
\interaction{daily.revert.status}

Here is a summary of the cases that the \hgcmd{revert} command can
deal with.  We will describe each of these in more detail in the
section that follows.
\begin{itemize}
\item If you modify a file, it will restore the file to its unmodified
  state.
\item If you \hgcmd{add} a file, it will undo the ``added'' state of
  the file, but leave the file itself untouched.
\item If you delete a file without telling Mercurial, it will restore
  the file to its unmodified contents.
\item If you use the \hgcmd{remove} command to remove a file, it will
  undo the ``removed'' state of the file, and restore the file to its
  unmodified contents.
\end{itemize}

\subsection{File management errors}
\label{sec:undo:mgmt}

The \hgcmd{revert} command is useful for more than just modified
files.  It lets you reverse the results of all of Mercurial's file
management commands---\hgcmd{add}, \hgcmd{remove}, and so on.

If you \hgcmd{add} a file, then decide that in fact you don't want
Mercurial to track it, use \hgcmd{revert} to undo the add.  Don't
worry; Mercurial will not modify the file in any way.  It will just
``unmark'' the file.
\interaction{daily.revert.add}

Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use
\hgcmd{revert} to restore it to the contents it had as of the parent
of the working directory.
\interaction{daily.revert.remove}
This works just as well for a file that you deleted by hand, without
telling Mercurial (recall that in Mercurial terminology, this kind of
file is called ``missing'').
\interaction{daily.revert.missing}

If you revert a \hgcmd{copy}, the copied-to file remains in your
working directory afterwards, untracked.  Since a copy doesn't affect
the copied-from file in any way, Mercurial doesn't do anything with
the copied-from file.
\interaction{daily.revert.copy}

\subsubsection{A slightly special case: reverting a rename}

If you \hgcmd{rename} a file, there is one small detail that
you should remember.  When you \hgcmd{revert} a rename, it's not
enough to provide the name of the renamed-to file, as you can see
here.
\interaction{daily.revert.rename}
As you can see from the output of \hgcmd{status}, the renamed-to file
is no longer identified as added, but the renamed-\emph{from} file is
still removed!  This is counter-intuitive (at least to me), but at
least it's easy to deal with.
\interaction{daily.revert.rename-orig}
So remember, to revert a \hgcmd{rename}, you must provide \emph{both}
the source and destination names.  

% TODO: the output doesn't look like it will be removed!

(By the way, if you rename a file, then modify the renamed-to file,
then revert both components of the rename, when Mercurial restores the
file that was removed as part of the rename, it will be unmodified.
If you need the modifications in the renamed-to file to show up in the
renamed-from file, don't forget to copy them over.)

These fiddly aspects of reverting a rename arguably constitute a small
bug in Mercurial.

\section{Dealing with committed changes}

Consider a case where you have committed a change $a$, and another
change $b$ on top of it; you then realise that change $a$ was
incorrect.  Mercurial lets you ``back out'' an entire changeset
automatically, and building blocks that let you reverse part of a
changeset by hand.

Before you read this section, here's something to keep in mind: the
\hgcmd{backout} command undoes changes by \emph{adding} history, not
by modifying or erasing it.  It's the right tool to use if you're
fixing bugs, but not if you're trying to undo some change that has
catastrophic consequences.  To deal with those, see
section~\ref{sec:undo:aaaiiieee}.

\subsection{Backing out a changeset}

The \hgcmd{backout} command lets you ``undo'' the effects of an entire
changeset in an automated fashion.  Because Mercurial's history is
immutable, this command \emph{does not} get rid of the changeset you
want to undo.  Instead, it creates a new changeset that
\emph{reverses} the effect of the to-be-undone changeset.

The operation of the \hgcmd{backout} command is a little intricate, so
let's illustrate it with some examples.  First, we'll create a
repository with some simple changes.
\interaction{backout.init}

The \hgcmd{backout} command takes a single changeset ID as its
argument; this is the changeset to back out.  Normally,
\hgcmd{backout} will drop you into a text editor to write a commit
message, so you can record why you're backing the change out.  In this
example, we provide a commit message on the command line using the
\hgopt{backout}{-m} option.

\subsection{Backing out the tip changeset}

We're going to start by backing out the last changeset we committed.
\interaction{backout.simple}
You can see that the second line from \filename{myfile} is no longer
present.  Taking a look at the output of \hgcmd{log} gives us an idea
of what the \hgcmd{backout} command has done.
\interaction{backout.simple.log}
Notice that the new changeset that \hgcmd{backout} has created is a
child of the changeset we backed out.  It's easier to see this in
figure~\ref{fig:undo:backout}, which presents a graphical view of the
change history.  As you can see, the history is nice and linear.

\begin{figure}[htb]
  \centering
  \grafix{undo-simple}
  \caption{Backing out a change using the \hgcmd{backout} command}
  \label{fig:undo:backout}
\end{figure}

\subsection{Backing out a non-tip change}

If you want to back out a change other than the last one you
committed, pass the \hgopt{backout}{--merge} option to the
\hgcmd{backout} command.
\interaction{backout.non-tip.clone}
This makes backing out any changeset a ``one-shot'' operation that's
usually simple and fast.
\interaction{backout.non-tip.backout}

If you take a look at the contents of \filename{myfile} after the
backout finishes, you'll see that the first and third changes are
present, but not the second.
\interaction{backout.non-tip.cat}

As the graphical history in figure~\ref{fig:undo:backout-non-tip}
illustrates, Mercurial actually commits \emph{two} changes in this
kind of situation (the box-shaped nodes are the ones that Mercurial
commits automatically).  Before Mercurial begins the backout process,
it first remembers what the current parent of the working directory
is.  It then backs out the target changeset, and commits that as a
changeset.  Finally, it merges back to the previous parent of the
working directory, and commits the result of the merge.

% TODO: to me it looks like mercurial doesn't commit the second merge automatically!

\begin{figure}[htb]
  \centering
  \grafix{undo-non-tip}
  \caption{Automated backout of a non-tip change using the \hgcmd{backout} command}
  \label{fig:undo:backout-non-tip}
\end{figure}

The result is that you end up ``back where you were'', only with some
extra history that undoes the effect of the changeset you wanted to
back out.

\subsubsection{Always use the \hgopt{backout}{--merge} option}

In fact, since the \hgopt{backout}{--merge} option will do the ``right
thing'' whether or not the changeset you're backing out is the tip
(i.e.~it won't try to merge if it's backing out the tip, since there's
no need), you should \emph{always} use this option when you run the
\hgcmd{backout} command.

\subsection{Gaining more control of the backout process}

While I've recommended that you always use the
\hgopt{backout}{--merge} option when backing out a change, the
\hgcmd{backout} command lets you decide how to merge a backout
changeset.  Taking control of the backout process by hand is something
you will rarely need to do, but it can be useful to understand what
the \hgcmd{backout} command is doing for you automatically.  To
illustrate this, let's clone our first repository, but omit the
backout change that it contains.

\interaction{backout.manual.clone}
As with our earlier example, We'll commit a third changeset, then back
out its parent, and see what happens.
\interaction{backout.manual.backout} 
Our new changeset is again a descendant of the changeset we backout
out; it's thus a new head, \emph{not} a descendant of the changeset
that was the tip.  The \hgcmd{backout} command was quite explicit in
telling us this.
\interaction{backout.manual.log}

Again, it's easier to see what has happened by looking at a graph of
the revision history, in figure~\ref{fig:undo:backout-manual}.  This
makes it clear that when we use \hgcmd{backout} to back out a change
other than the tip, Mercurial adds a new head to the repository (the
change it committed is box-shaped).

\begin{figure}[htb]
  \centering
  \grafix{undo-manual}
  \caption{Backing out a change using the \hgcmd{backout} command}
  \label{fig:undo:backout-manual}
\end{figure}

After the \hgcmd{backout} command has completed, it leaves the new
``backout'' changeset as the parent of the working directory.
\interaction{backout.manual.parents}
Now we have two isolated sets of changes.
\interaction{backout.manual.heads}

Let's think about what we expect to see as the contents of
\filename{myfile} now.  The first change should be present, because
we've never backed it out.  The second change should be missing, as
that's the change we backed out.  Since the history graph shows the
third change as a separate head, we \emph{don't} expect to see the
third change present in \filename{myfile}.
\interaction{backout.manual.cat}
To get the third change back into the file, we just do a normal merge
of our two heads.
\interaction{backout.manual.merge}
Afterwards, the graphical history of our repository looks like
figure~\ref{fig:undo:backout-manual-merge}.

\begin{figure}[htb]
  \centering
  \grafix{undo-manual-merge}
  \caption{Manually merging a backout change}
  \label{fig:undo:backout-manual-merge}
\end{figure}

\subsection{Why \hgcmd{backout} works as it does}

Here's a brief description of how the \hgcmd{backout} command works.
\begin{enumerate}
\item It ensures that the working directory is ``clean'', i.e.~that
  the output of \hgcmd{status} would be empty.
\item It remembers the current parent of the working directory.  Let's
  call this changeset \texttt{orig}
\item It does the equivalent of a \hgcmd{update} to sync the working
  directory to the changeset you want to back out.  Let's call this
  changeset \texttt{backout}
\item It finds the parent of that changeset.  Let's call that
  changeset \texttt{parent}.
\item For each file that the \texttt{backout} changeset affected, it
  does the equivalent of a \hgcmdargs{revert}{-r parent} on that file,
  to restore it to the contents it had before that changeset was
  committed.
\item It commits the result as a new changeset.  This changeset has
  \texttt{backout} as its parent.
\item If you specify \hgopt{backout}{--merge} on the command line, it
  merges with \texttt{orig}, and commits the result of the merge.
\end{enumerate}

An alternative way to implement the \hgcmd{backout} command would be
to \hgcmd{export} the to-be-backed-out changeset as a diff, then use
the \cmdopt{patch}{--reverse} option to the \command{patch} command to
reverse the effect of the change without fiddling with the working
directory.  This sounds much simpler, but it would not work nearly as
well.

The reason that \hgcmd{backout} does an update, a commit, a merge, and
another commit is to give the merge machinery the best chance to do a
good job when dealing with all the changes \emph{between} the change
you're backing out and the current tip.  

If you're backing out a changeset that's~100 revisions back in your
project's history, the chances that the \command{patch} command will
be able to apply a reverse diff cleanly are not good, because
intervening changes are likely to have ``broken the context'' that
\command{patch} uses to determine whether it can apply a patch (if
this sounds like gibberish, see \ref{sec:mq:patch} for a
discussion of the \command{patch} command).  Also, Mercurial's merge
machinery will handle files and directories being renamed, permission
changes, and modifications to binary files, none of which
\command{patch} can deal with.

\section{Changes that should never have been}
\label{sec:undo:aaaiiieee}

Most of the time, the \hgcmd{backout} command is exactly what you need
if you want to undo the effects of a change.  It leaves a permanent
record of exactly what you did, both when committing the original
changeset and when you cleaned up after it.

On rare occasions, though, you may find that you've committed a change
that really should not be present in the repository at all.  For
example, it would be very unusual, and usually considered a mistake,
to commit a software project's object files as well as its source
files.  Object files have almost no intrinsic value, and they're
\emph{big}, so they increase the size of the repository and the amount
of time it takes to clone or pull changes.

Before I discuss the options that you have if you commit a ``brown
paper bag'' change (the kind that's so bad that you want to pull a
brown paper bag over your head), let me first discuss some approaches
that probably won't work.

Since Mercurial treats history as accumulative---every change builds
on top of all changes that preceded it---you generally can't just make
disastrous changes disappear.  The one exception is when you've just
committed a change, and it hasn't been pushed or pulled into another
repository.  That's when you can safely use the \hgcmd{rollback}
command, as I detailed in section~\ref{sec:undo:rollback}.

After you've pushed a bad change to another repository, you
\emph{could} still use \hgcmd{rollback} to make your local copy of the
change disappear, but it won't have the consequences you want.  The
change will still be present in the remote repository, so it will
reappear in your local repository the next time you pull.

If a situation like this arises, and you know which repositories your
bad change has propagated into, you can \emph{try} to get rid of the
changeefrom \emph{every} one of those repositories.  This is, of
course, not a satisfactory solution: if you miss even a single
repository while you're expunging, the change is still ``in the
wild'', and could propagate further.

If you've committed one or more changes \emph{after} the change that
you'd like to see disappear, your options are further reduced.
Mercurial doesn't provide a way to ``punch a hole'' in history,
leaving changesets intact.

XXX This needs filling out.  The \texttt{hg-replay} script in the
\texttt{examples} directory works, but doesn't handle merge
changesets.  Kind of an important omission.

\subsection{Protect yourself from ``escaped'' changes}

If you've committed some changes to your local repository and they've
been pushed or pulled somewhere else, this isn't necessarily a
disaster.  You can protect yourself ahead of time against some classes
of bad changeset.  This is particularly easy if your team usually
pulls changes from a central repository.

By configuring some hooks on that repository to validate incoming
changesets (see chapter~\ref{chap:hook}), you can automatically
prevent some kinds of bad changeset from being pushed to the central
repository at all.  With such a configuration in place, some kinds of
bad changeset will naturally tend to ``die out'' because they can't
propagate into the central repository.  Better yet, this happens
without any need for explicit intervention.

For instance, an incoming change hook that verifies that a changeset
will actually compile can prevent people from inadvertantly ``breaking
the build''.

\section{Finding the source of a bug}
\label{sec:undo:bisect}

While it's all very well to be able to back out a changeset that
introduced a bug, this requires that you know which changeset to back
out.  Mercurial provides an invaluable command, called
\hgcmd{bisect}, that helps you to automate this process and accomplish
it very efficiently.

The idea behind the \hgcmd{bisect} command is that a changeset has
introduced some change of behaviour that you can identify with a
simple binary test.  You don't know which piece of code introduced the
change, but you know how to test for the presence of the bug.  The
\hgcmd{bisect} command uses your test to direct its search for the
changeset that introduced the code that caused the bug.

Here are a few scenarios to help you understand how you might apply
this command.
\begin{itemize}
\item The most recent version of your software has a bug that you
  remember wasn't present a few weeks ago, but you don't know when it
  was introduced.  Here, your binary test checks for the presence of
  that bug.
\item You fixed a bug in a rush, and now it's time to close the entry
  in your team's bug database.  The bug database requires a changeset
  ID when you close an entry, but you don't remember which changeset
  you fixed the bug in.  Once again, your binary test checks for the
  presence of the bug.
\item Your software works correctly, but runs~15\% slower than the
  last time you measured it.  You want to know which changeset
  introduced the performance regression.  In this case, your binary
  test measures the performance of your software, to see whether it's
  ``fast'' or ``slow''.
\item The sizes of the components of your project that you ship
  exploded recently, and you suspect that something changed in the way
  you build your project.
\end{itemize}

From these examples, it should be clear that the \hgcmd{bisect}
command is not useful only for finding the sources of bugs.  You can
use it to find any ``emergent property'' of a repository (anything
that you can't find from a simple text search of the files in the
tree) for which you can write a binary test.

We'll introduce a little bit of terminology here, just to make it
clear which parts of the search process are your responsibility, and
which are Mercurial's.  A \emph{test} is something that \emph{you} run
when \hgcmd{bisect} chooses a changeset.  A \emph{probe} is what
\hgcmd{bisect} runs to tell whether a revision is good.  Finally,
we'll use the word ``bisect'', as both a noun and a verb, to stand in
for the phrase ``search using the \hgcmd{bisect} command.

One simple way to automate the searching process would be simply to
probe every changeset.  However, this scales poorly.  If it took ten
minutes to test a single changeset, and you had 10,000 changesets in
your repository, the exhaustive approach would take on average~35
\emph{days} to find the changeset that introduced a bug.  Even if you
knew that the bug was introduced by one of the last 500 changesets,
and limited your search to those, you'd still be looking at over 40
hours to find the changeset that introduced your bug.

What the \hgcmd{bisect} command does is use its knowledge of the
``shape'' of your project's revision history to perform a search in
time proportional to the \emph{logarithm} of the number of changesets
to check (the kind of search it performs is called a dichotomic
search).  With this approach, searching through 10,000 changesets will
take less than three hours, even at ten minutes per test (the search
will require about 14 tests).  Limit your search to the last hundred
changesets, and it will take only about an hour (roughly seven tests).

The \hgcmd{bisect} command is aware of the ``branchy'' nature of a
Mercurial project's revision history, so it has no problems dealing
with branches, merges, or multiple heads in a repoository.  It can
prune entire branches of history with a single probe, which is how it
operates so efficiently.

\subsection{Using the \hgcmd{bisect} command}

Here's an example of \hgcmd{bisect} in action.

\begin{note}
  In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a
  core command: it was distributed with Mercurial as an extension.
  This section describes the built-in command, not the old extension.
\end{note}

Now let's create a repository, so that we can try out the
\hgcmd{bisect} command in isolation.
\interaction{bisect.init}
We'll simulate a project that has a bug in it in a simple-minded way:
create trivial changes in a loop, and nominate one specific change
that will have the ``bug''.  This loop creates 35 changesets, each
adding a single file to the repository.  We'll represent our ``bug''
with a file that contains the text ``i have a gub''.
\interaction{bisect.commits}

The next thing that we'd like to do is figure out how to use the
\hgcmd{bisect} command.  We can use Mercurial's normal built-in help
mechanism for this.
\interaction{bisect.help}

The \hgcmd{bisect} command works in steps.  Each step proceeds as follows.
\begin{enumerate}
\item You run your binary test.
  \begin{itemize}
  \item If the test succeeded, you tell \hgcmd{bisect} by running the
    \hgcmdargs{bisect}{good} command.
  \item If it failed, run the \hgcmdargs{bisect}{--bad} command.
  \end{itemize}
\item The command uses your information to decide which changeset to
  test next.
\item It updates the working directory to that changeset, and the
  process begins again.
\end{enumerate}
The process ends when \hgcmd{bisect} identifies a unique changeset
that marks the point where your test transitioned from ``succeeding''
to ``failing''.

To start the search, we must run the \hgcmdargs{bisect}{--reset} command.
\interaction{bisect.search.init}

In our case, the binary test we use is simple: we check to see if any
file in the repository contains the string ``i have a gub''.  If it
does, this changeset contains the change that ``caused the bug''.  By
convention, a changeset that has the property we're searching for is
``bad'', while one that doesn't is ``good''.

Most of the time, the revision to which the working directory is
synced (usually the tip) already exhibits the problem introduced by
the buggy change, so we'll mark it as ``bad''.
\interaction{bisect.search.bad-init}

Our next task is to nominate a changeset that we know \emph{doesn't}
have the bug; the \hgcmd{bisect} command will ``bracket'' its search
between the first pair of good and bad changesets.  In our case, we
know that revision~10 didn't have the bug.  (I'll have more words
about choosing the first ``good'' changeset later.)
\interaction{bisect.search.good-init}

Notice that this command printed some output.
\begin{itemize}
\item It told us how many changesets it must consider before it can
  identify the one that introduced the bug, and how many tests that
  will require.
\item It updated the working directory to the next changeset to test,
  and told us which changeset it's testing.
\end{itemize}

We now run our test in the working directory.  We use the
\command{grep} command to see if our ``bad'' file is present in the
working directory.  If it is, this revision is bad; if not, this
revision is good.
\interaction{bisect.search.step1}

This test looks like a perfect candidate for automation, so let's turn
it into a shell function.
\interaction{bisect.search.mytest}
We can now run an entire test step with a single command,
\texttt{mytest}.
\interaction{bisect.search.step2}
A few more invocations of our canned test step command, and we're
done.
\interaction{bisect.search.rest}

Even though we had~40 changesets to search through, the \hgcmd{bisect}
command let us find the changeset that introduced our ``bug'' with
only five tests.  Because the number of tests that the \hgcmd{bisect}
command performs grows logarithmically with the number of changesets to
search, the advantage that it has over the ``brute force'' search
approach increases with every changeset you add.

\subsection{Cleaning up after your search}

When you're finished using the \hgcmd{bisect} command in a
repository, you can use the \hgcmdargs{bisect}{reset} command to drop
the information it was using to drive your search.  The command
doesn't use much space, so it doesn't matter if you forget to run this
command.  However, \hgcmd{bisect} won't let you start a new search in
that repository until you do a \hgcmdargs{bisect}{reset}.
\interaction{bisect.search.reset}

\section{Tips for finding bugs effectively}

\subsection{Give consistent input}

The \hgcmd{bisect} command requires that you correctly report the
result of every test you perform.  If you tell it that a test failed
when it really succeeded, it \emph{might} be able to detect the
inconsistency.  If it can identify an inconsistency in your reports,
it will tell you that a particular changeset is both good and bad.
However, it can't do this perfectly; it's about as likely to report
the wrong changeset as the source of the bug.

\subsection{Automate as much as possible}

When I started using the \hgcmd{bisect} command, I tried a few times
to run my tests by hand, on the command line.  This is an approach
that I, at least, am not suited to.  After a few tries, I found that I
was making enough mistakes that I was having to restart my searches
several times before finally getting correct results.

My initial problems with driving the \hgcmd{bisect} command by hand
occurred even with simple searches on small repositories; if the
problem you're looking for is more subtle, or the number of tests that
\hgcmd{bisect} must perform increases, the likelihood of operator
error ruining the search is much higher.  Once I started automating my
tests, I had much better results.

The key to automated testing is twofold:
\begin{itemize}
\item always test for the same symptom, and
\item always feed consistent input to the \hgcmd{bisect} command.
\end{itemize}
In my tutorial example above, the \command{grep} command tests for the
symptom, and the \texttt{if} statement takes the result of this check
and ensures that we always feed the same input to the \hgcmd{bisect}
command.  The \texttt{mytest} function marries these together in a
reproducible way, so that every test is uniform and consistent.

\subsection{Check your results}

Because the output of a \hgcmd{bisect} search is only as good as the
input you give it, don't take the changeset it reports as the
absolute truth.  A simple way to cross-check its report is to manually
run your test at each of the following changesets:
\begin{itemize}
\item The changeset that it reports as the first bad revision.  Your
  test should still report this as bad.
\item The parent of that changeset (either parent, if it's a merge).
  Your test should report this changeset as good.
\item A child of that changeset.  Your test should report this
  changeset as bad.
\end{itemize}

\subsection{Beware interference between bugs}

It's possible that your search for one bug could be disrupted by the
presence of another.  For example, let's say your software crashes at
revision 100, and worked correctly at revision 50.  Unknown to you,
someone else introduced a different crashing bug at revision 60, and
fixed it at revision 80.  This could distort your results in one of
several ways.

It is possible that this other bug completely ``masks'' yours, which
is to say that it occurs before your bug has a chance to manifest
itself.  If you can't avoid that other bug (for example, it prevents
your project from building), and so can't tell whether your bug is
present in a particular changeset, the \hgcmd{bisect} command cannot
help you directly.  Instead, you can mark a changeset as untested by
running \hgcmdargs{bisect}{--skip}.

A different problem could arise if your test for a bug's presence is
not specific enough.  If you check for ``my program crashes'', then
both your crashing bug and an unrelated crashing bug that masks it
will look like the same thing, and mislead \hgcmd{bisect}.

Another useful situation in which to use \hgcmdargs{bisect}{--skip} is
if you can't test a revision because your project was in a broken and
hence untestable state at that revision, perhaps because someone
checked in a change that prevented the project from building.

\subsection{Bracket your search lazily}

Choosing the first ``good'' and ``bad'' changesets that will mark the
end points of your search is often easy, but it bears a little
discussion nevertheless.  From the perspective of \hgcmd{bisect}, the
``newest'' changeset is conventionally ``bad'', and the older
changeset is ``good''.

If you're having trouble remembering when a suitable ``good'' change
was, so that you can tell \hgcmd{bisect}, you could do worse than
testing changesets at random.  Just remember to eliminate contenders
that can't possibly exhibit the bug (perhaps because the feature with
the bug isn't present yet) and those where another problem masks the
bug (as I discussed above).

Even if you end up ``early'' by thousands of changesets or months of
history, you will only add a handful of tests to the total number that
\hgcmd{bisect} must perform, thanks to its logarithmic behaviour.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "00book"
%%% End: