Source

hgbook / en / mq.tex

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
\chapter{Managing change with Mercurial Queues}
\label{chap:mq}

\section{The patch management problem}
\label{sec:mq:patch-mgmt}

Here is a common scenario: you need to install a software package from
source, but you find a bug that you must fix in the source before you
can start using the package.  You make your changes, forget about the
package for a while, and a few months later you need to upgrade to a
newer version of the package.  If the newer version of the package
still has the bug, you must extract your fix from the older source
tree and apply it against the newer version.  This is a tedious task,
and it's easy to make mistakes.

This is a simple case of the ``patch management'' problem.  You have
an ``upstream'' source tree that you can't change; you need to make
some local changes on top of the upstream tree; and you'd like to be
able to keep those changes separate, so that you can apply them to
newer versions of the upstream source.

The patch management problem arises in many situations.  Probably the
most visible is that a user of an open source software project will
contribute a bug fix or new feature to the project's maintainers in the
form of a patch.

Distributors of operating systems that include open source software
often need to make changes to the packages they distribute so that
they will build properly in their environments.

When you have few changes to maintain, it is easy to manage a single
patch using the standard \texttt{diff} and \texttt{patch} programs
(see section~\ref{sec:mq:patch} for a discussion of these tools).
Once the number of changes grows, it starts to makes sense to maintain
patches as discrete ``chunks of work,'' so that for example a single
patch will contain only one bug fix (the patch might modify several
files, but it's doing ``only one thing''), and you may have a number
of such patches for different bugs you need fixed and local changes
you require.  In this situation, if you submit a bug fix patch to the
upstream maintainers of a package and they include your fix in a
subsequent release, you can simply drop that single patch when you're
updating to the newer release.

Maintaining a single patch against an upstream tree is a little
tedious and error-prone, but not difficult.  However, the complexity
of the problem grows rapidly as the number of patches you have to
maintain increases.  With more than a tiny number of patches in hand,
understanding which ones you have applied and maintaining them moves
from messy to overwhelming.

Fortunately, Mercurial includes a powerful extension, Mercurial Queues
(or simply ``MQ''), that massively simplifies the patch management
problem.

\section{The prehistory of Mercurial Queues}
\label{sec:mq:history}

During the late 1990s, several Linux kernel developers started to
maintain ``patch series'' that modified the behaviour of the Linux
kernel.  Some of these series were focused on stability, some on
feature coverage, and others were more speculative.

The sizes of these patch series grew rapidly.  In 2002, Andrew Morton
published some shell scripts he had been using to automate the task of
managing his patch queues.  Andrew was successfully using these
scripts to manage hundreds (sometimes thousands) of patches on top of
the Linux kernel.

\subsection{A patchwork quilt}
\label{sec:mq:quilt}


In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the
approach of Andrew's scripts and published a tool called ``patchwork
quilt''~\cite{web:quilt}, or simply ``quilt''
(see~\cite{gruenbacher:2005} for a paper describing it).  Because
quilt substantially automated patch management, it rapidly gained a
large following among open source software developers.

Quilt manages a \emph{stack of patches} on top of a directory tree.
To begin, you tell quilt to manage a directory tree; it stores away
the names and contents of all files in the tree.  To fix a bug, you
create a new patch (using a single command), edit the files you need
to fix, then ``refresh'' the patch.  

The refresh step causes quilt to scan the directory tree; it updates
the patch with all of the changes you have made.  You can create
another patch on top of the first, which will track the changes
required to modify the tree from ``tree with one patch applied'' to
``tree with two patches applied''.

You can \emph{change} which patches are applied to the tree.  If you
``pop'' a patch, the changes made by that patch will vanish from the
directory tree.  Quilt remembers which patches you have popped,
though, so you can ``push'' a popped patch again, and the directory
tree will be restored to contain the modifications in the patch.  Most
importantly, you can run the ``refresh'' command at any time, and the
topmost applied patch will be updated.  This means that you can, at
any time, change both which patches are applied and what
modifications those patches make.

Quilt knows nothing about revision control tools, so it works equally
well on top of an unpacked tarball or a Subversion repository.

\subsection{From patchwork quilt to Mercurial Queues}
\label{sec:mq:quilt-mq}

In mid-2005, Chris Mason took the features of quilt and wrote an
extension that he called Mercurial Queues, which added quilt-like
behaviour to Mercurial.

The key difference between quilt and MQ is that quilt knows nothing
about revision control systems, while MQ is \emph{integrated} into
Mercurial.  Each patch that you push is represented as a Mercurial
changeset.  Pop a patch, and the changeset goes away.

This integration makes understanding patches and debugging their
effects \emph{enormously} easier.  Since every applied patch has an
associated changeset, you can use \hgcmdargs{log}{\emph{filename}} to
see which changesets and patches affected a file.  You can use the
\hgext{bisect} extension to binary-search through all changesets and
applied patches to see where a bug got introduced or fixed.  You can
use the \hgcmd{annotate} command to see which changeset or patch
modified a particular line of a source file.  And so on.

Because quilt does not care about revision control tools, it is still
a tremendously useful piece of software to know about for situations
where you cannot use Mercurial and MQ.
\section{Getting started with Mercurial Queues}
\label{sec:mq:start}

Because MQ is implemented as an extension, you must explicitly enable
before you can use it.  (You don't need to download anything; MQ ships
with the standard Mercurial distribution.)  To enable MQ, edit your
\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}.

\begin{figure}[ht]
  \begin{codesample4}
    [extensions]
    hgext.mq =
  \end{codesample4}
  \label{ex:mq:config}
  \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension}
\end{figure}

Once the extension is enabled, it will make a number of new commands
available.  To verify that the extension is working, you can use
\hgcmd{help} to see if the \hgcmd{qinit} command is now available; see
the example in figure~\ref{ex:mq:enabled}.

\begin{figure}[ht]
  \interaction{mq.qinit-help.help}
  \caption{How to verify that MQ is enabled}
  \label{ex:mq:enabled}
\end{figure}

You can use MQ with \emph{any} Mercurial repository, and its commands
only operate within that repository.  To get started, simply prepare
the repository using the \hgcmd{qinit} command (see
figure~\ref{ex:mq:qinit}).  This command creates an empty directory
called \sdirname{.hg/patches}, where MQ will keep its metadata.  As
with many Mercurial commands, the \hgcmd{qinit} command prints nothing
if it succeeds.

\begin{figure}[ht]
  \interaction{mq.tutorial.qinit}
  \caption{Preparing a repository for use with MQ}
  \label{ex:mq:qinit}
\end{figure}

\begin{figure}[ht]
  \interaction{mq.tutorial.qnew}
  \caption{Creating a new patch}
  \label{ex:mq:qnew}
\end{figure}

\subsection{Creating a new patch}

To begin work on a new patch, use the \hgcmd{qnew} command.  This
command takes one argument, the name of the patch to create.  MQ will
use this as the name of an actual file in the \sdirname{.hg/patches}
directory, as you can see in figure~\ref{ex:mq:qnew}.

Also newly present in the \sdirname{.hg/patches} directory are two
other files, \sfilename{series} and \sfilename{status}.  The
\sfilename{series} file lists all of the patches that MQ knows about
for this repository, with one patch per line.  Mercurial uses the
\sfilename{status} file for internal book-keeping; it tracks all of the
patches that MQ has \emph{applied} in this repository.

\begin{note}
  You may sometimes want to edit the \sfilename{series} file by hand;
  for example, to change the sequence in which some patches are
  applied.  However, manually editing the \sfilename{status} file is
  almost always a bad idea, as it's easy to corrupt MQ's idea of what
  is happening.
\end{note}

Once you have created your new patch, you can edit files in the
working directory as you usually would.  All of the normal Mercurial
commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as
they did before.
\subsection{Refreshing a patch}

When you reach a point where you want to save your work, use the
\hgcmd{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch
you are working on.  This command folds the changes you have made in
the working directory into your patch, and updates its corresponding
changeset to contain those changes.

\begin{figure}[ht]
  \interaction{mq.tutorial.qrefresh}
  \caption{Refreshing a patch}
  \label{ex:mq:qrefresh}
\end{figure}

You can run \hgcmd{qrefresh} as often as you like, so it's a good way
to ``checkpoint'' your work.  Refresh your patch at an opportune
time; try an experiment; and if the experiment doesn't work out,
\hgcmd{revert} your modifications back to the last time you refreshed.

\begin{figure}[ht]
  \interaction{mq.tutorial.qrefresh2}
  \caption{Refresh a patch many times to accumulate changes}
  \label{ex:mq:qrefresh2}
\end{figure}

\subsection{Stacking and tracking patches}

Once you have finished working on a patch, or need to work on another,
you can use the \hgcmd{qnew} command again to create a new patch.
Mercurial will apply this patch on top of your existing patch.  See
figure~\ref{ex:mq:qnew2} for an example.  Notice that the patch
contains the changes in our prior patch as part of its context (you
can see this more clearly in the output of \hgcmd{annotate}).

\begin{figure}[ht]
  \interaction{mq.tutorial.qnew2}
  \caption{Stacking a second patch on top of the first}
  \label{ex:mq:qnew2}
\end{figure}

So far, with the exception of \hgcmd{qnew} and \hgcmd{qrefresh}, we've
been careful to only use regular Mercurial commands.  However, there
are more ``natural'' commands you can use when thinking about patches
with MQ, as illustrated in figure~\ref{ex:mq:qseries}:

\begin{itemize}
\item The \hgcmd{qseries} command lists every patch that MQ knows
  about in this repository, from oldest to newest (most recently
  \emph{created}).
\item The \hgcmd{qapplied} command lists every patch that MQ has
  \emph{applied} in this repository, again from oldest to newest (most
  recently applied).
\end{itemize}

\begin{figure}[ht]
  \interaction{mq.tutorial.qseries}
  \caption{Understanding the patch stack with \hgcmd{qseries} and
    \hgcmd{qapplied}}
  \label{ex:mq:qseries}
\end{figure}

\subsection{Manipulating the patch stack}

The previous discussion implied that there must be a difference
between ``known'' and ``applied'' patches, and there is.  MQ can
manage a patch without it being applied in the repository.

An \emph{applied} patch has a corresponding changeset in the
repository, and the effects of the patch and changeset are visible in
the working directory.  You can undo the application of a patch using
the \hgcmd{qpop} command.  MQ still \emph{knows about}, or manages, a
popped patch, but the patch no longer has a corresponding changeset in
the repository, and the working directory does not contain the changes
made by the patch.  Figure~\ref{fig:mq:stack} illustrates the
difference between applied and tracked patches.

\begin{figure}[ht]
  \centering
  \grafix{mq-stack}
  \caption{Applied and unapplied patches in the MQ patch stack}
  \label{fig:mq:stack}
\end{figure}

You can reapply an unapplied, or popped, patch using the \hgcmd{qpush}
command.  This creates a new changeset to correspond to the patch, and
the patch's changes once again become present in the working
directory.  See figure~\ref{ex:mq:qpop} for examples of \hgcmd{qpop}
and \hgcmd{qpush} in action.  Notice that once we have popped a patch
or two patches, the output of \hgcmd{qseries} remains the same, while
that of \hgcmd{qapplied} has changed.

\begin{figure}[ht]
  \interaction{mq.tutorial.qpop}
  \caption{Modifying the stack of applied patches}
  \label{ex:mq:qpop}
\end{figure}

MQ does not limit you to pushing or popping one patch.  You can have
no patches, all of them, or any number in between applied at some
point in time.

\subsection{Working on several patches at once}

The \hgcmd{qrefresh} command always refreshes the \emph{topmost}
applied patch.  This means that you can suspend work on one patch (by
refreshing it), pop or push to make a different patch the top, and
work on \emph{that} patch for a while.

Here's an example that illustrates how you can use this ability.
Let's say you're developing a new feature as two patches.  The first
is a change to the core of your software, and the second--layered on
top of the first--changes the user interface to use the code you just
added to the core.  If you notice a bug in the core while you're
working on the UI patch, it's easy to fix the core.  Simply
\hgcmd{qrefresh} the UI patch to save your in-progress changes, and
\hgcmd{qpop} down to the core patch.  Fix the core bug,
\hgcmd{qrefresh} the core patch, and \hgcmd{qpush} back to the UI
patch to continue where you left off.

\section{Mercurial Queues and GNU patch}
\label{sec:mq:patch}

MQ uses the GNU \command{patch} command to apply patches.  Because MQ
doesn't hide its patch-oriented nature, it is helpful to understand
the data that MQ and \command{patch} work with, and a few aspects of
how \command{patch} operates.

The \command{diff} command generates a list of modifications by
comparing two files.  The \command{patch} command applies a list of
modifications to a file.  The kinds of files that \command{diff} and
\command{patch} work with are referred to as both ``diffs'' and
``patches;'' there is no difference between a diff and a patch.

A patch file can start with arbitrary text; MQ uses this text as the
commit message when creating changesets.  It treats the first line
that starts with the string ``\texttt{diff~-}'' as the separator
between header and content.

MQ works with \emph{unified} diffs (\command{patch} can accept several
other diff formats, but MQ doesn't).  A unified diff contains two
kinds of header.  The \emph{file header} describes the file being
modified; it contains the name of the file to modify.  When
\command{patch} sees a new file header, it looks for a file with that
name to start modifying.

After the file header comes a series of \emph{hunks}.  Each hunk
starts with a header; this identifies the range of line numbers within
the file that the hunk should modify.  Following the header, a hunk
starts and ends with a few (usually three) lines of text from the
unmodified file; these are called the \emph{context} for the hunk.
Each unmodified line begins with a space characters.  Within the hunk,
a line that begins with ``\texttt{-}'' means ``remove this line,''
while a line that begins with ``\texttt{+}'' means ``insert this
line.''  For example, a line that is modified is represented by one
deletion and one insertion.

The \command{diff} command runs hunks together when there's not enough
context between modifications to justify

When \command{patch} applies a hunk, it tries a handful of
successively less accurate strategies to try to make the hunk apply.
This falling-back technique often makes it possible to take a patch
that was generated against an old version of a file, and apply it
against a newer version of that file.

First, \command{patch} tries an exact match, where the line numbers,
the context, and the text to be modified must apply exactly.  If it
cannot make an exact match, it tries to find an exact match for the
context, without honouring the line numbering information.  If this
succeeds, it prints a line of output saying that the hunk was applied,
but at some \emph{offset} from the original line number.

If a context-only match fails, \command{patch} removes the first and
last lines of the context, and tries a \emph{reduced} context-only
match.  If the hunk with reduced context succeeds, it prints a message
saying that it applied the hunk with a \emph{fuzz factor} (the number
after the fuzz factor indicates how many lines of context
\command{patch} had to trim before the patch applied).

When neither of these techniques works, \command{patch} prints a
message saying that the hunk in question was rejected.  It saves
rejected hunks to a file with the same name, and an added
\sfilename{.rej} extension.  It also saves an unmodified copy of the
file with a \sfilename{.orig} extension; the copy of the file without
any extensions will contain any changes made by hunks that \emph{did}
apply cleanly.  If you have a patch that modifies \filename{foo} with
six hunks, and one of them fails to apply, you will have: an
unmodified \filename{foo.orig}, a \filename{foo.rej} containing one
hunk, and \filename{foo}, containing the changes made by the five
successful five hunks.

\subsection{Beware the fuzz}

While applying a hunk at an offset, or with a fuzz factor, will often
be completely successful, these inexact techniques naturally leave
open the possibility of corrupting the patched file.  The most common
cases typically involve applying a patch twice, or at an incorrect
location in the file.  If \command{patch} or \hgcmd{qpush} ever
mentions an offset or fuzz factor, you should make sure that the
modified files are correct afterwards.  

It's often a good idea to refresh a patch that has applied with an
offset or fuzz factor; refreshing the patch generates new context
information that will make it apply cleanly.  I say ``often,'' not
``always,'' because sometimes refreshing a patch will make it fail to
apply against a different revision of the underlying files.  In some
cases, such as when you're maintaining a patch that must sit on top of
multiple versions of a source tree, it's acceptable to have a patch
apply with some fuzz, provided you've verified the results of the
patching process in such cases.

\subsection{Handling rejection}

If \hgcmd{qpush} fails to apply a patch, it will print an error
message and exit.  If it has left \sfilename{.rej} files behind, it is
usually best to fix up the rejected hunks before you push more patches
or do any further work.

If your patch \emph{used to} apply cleanly, and no longer does because
you've changed the underlying code that your patches are based on,
Mercurial Queues can help; see section~\ref{seq:mq:merge} for details.

Unfortunately, there aren't any great techniques for dealing with
rejected hunks.  Most often, you'll need to view the \sfilename{.rej}
file and edit the target file, applying the rejected hunks by hand.

If you're feeling adventurous, Neil Brown, a Linux kernel hacker,
wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more
vigorous than \command{patch} in its attempts to make a patch apply.

Another Linux kernel hacker, Chris Mason (the author of Mercurial
Queues), wrote a similar tool called \command{rej}~\cite{web:rej},
which takes a simple approach to automating the application of hunks
rejected by \command{patch}.  \command{rej} can help with four common
reasons that a hunk may be rejected:

\begin{itemize}
\item The context in the middle of a hunk has changed.
\item A hunk is missing some context at the beginning or end.
\item A large hunk might apply better--either entirely or in part--if
  it was broken up into smaller hunks.
\item A hunk removes lines with slightly different content than those
  currently present in the file.
\end{itemize}

If you use \command{wiggle} or \command{rej}, you should be doubly
careful to check your results when you're done.

\section{Updating your patches when the underlying code changes}
\label{sec:mq:merge}

XXX.

\section{Managing patches in a repository}

Because MQ's \sdirname{.hg/patches} directory resides outside a
Mercurial repository's working directory, the ``underlying'' Mercurial
repository knows nothing about the management or presence of patches.

This presents the interesting possibility of managing the contents of
the patch directory as a Mercurial repository in its own right.  This
can be a useful way to work.  For example, you can work on a patch for
a while, \hgcmd{qrefresh} it, then \hgcmd{commit} the current state of
the patch.  This lets you ``roll back'' to that version of the patch
later on.

In addition, you can then share different versions of the same patch
stack among multiple underlying repositories.  I use this when I am
developing a Linux kernel feature.  I have a pristine copy of my
kernel sources for each of several CPU architectures, and a cloned
repository under each that contains the patches I am working on.  When
I want to test a change on a different architecture, I push my current
patches to the patch repository associated with that kernel tree, pop
and push all of my patches, and build and test that kernel.

Managing patches in a repository makes it possible for multiple
developers to work on the same patch series without colliding with
each other, all on top of an underlying source base that they may or
may not control.

\subsection{MQ support for managing a patch repository}

MQ helps you to work with the \sdirname{.hg/patches} directory as a
repository; when you prepare a repository for working with patches
using \hgcmdargs{qinit}, you can pass the \hgopt{qinit}{-c} option to
create the \sdirname{.hg/patches} directory as a Mercurial repository.

\begin{note}
  If you forget to use the \hgopt{qinit}{-c} option, you can simply go
  into the \sdirname{.hg/patches} directory at any time and run
  \hgcmd{init}.  Don't forget to add an entry for the
  \filename{status} file to the \filename{.hgignore} file, though
  (\hgopt{qinit}{-c} does this for you automatically); you
  \emph{really} don't want to manage the \filename{status} file.
\end{note}

As a convenience, if MQ notices that the \dirname{.hg/patches}
directory is a repository, it will automatically \hgcmd{add} every
patch that you create and import.

Finally, MQ provides a shortcut command, \hgcmd{qcommit}, that runs
\hgcmd{commit} in the \sdirname{.hg/patches} directory.  This saves
some cumbersome typing.

\subsection{A few things to watch out for}

MQ's support for working with a repository full of patches is limited
in a few small respects.

MQ cannot automatically detect changes that you make to the patch
directory.  If you \hgcmd{pull}, manually edit, or \hgcmd{update}
changes to patches or the \sfilename{series} file, you will have to
\hgcmdargs{qpop}{-a} and then \hgcmdargs{qpush}{-a} in the underlying
repository to see those changes show up there.  If you forget to do
this, you can confuse MQ's idea of which patches are applied.

\section{Commands for working with patches}

Once you've been working with patches for a while, you'll find
yourself hungry for tools that will help you to understand and
manipulate the patches you're dealing with.

The \command{diffstat} command~\cite{web:diffstat} generates a
histogram of the modifications made to each file in a patch.  It
provides a good way to ``get a sense of'' a patch--which files it
affects, and how much change it introduces to each file and as a
whole.  (I find that it's a good idea to use \command{diffstat}'s
\texttt{-p} option as a matter of course, as otherwise it will try to
do clever things with prefixes of file names that inevitably confuse
at least me.)

The \package{patchutils} package~\cite{web:patchutils} is invaluable.
It provides a set of small utilities that follow the ``Unix
philosophy;'' each does one useful thing with a patch.  The
\package{patchutils} command I use most is \command{filterdiff}, which
extracts subsets from a patch file.  For example, given a patch that
modifies hundreds of files across dozens of directories, a single
invocation of \command{filterdiff} can generate a smaller patch that
only touches files whose names match a particular glob pattern.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "00book"
%%% End: