hgbook / en / ch01-intro.xml

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->

<chapter id="chap:intro">
  <?dbhtml filename="how-did-we-get-here.html"?>
  <title>How did we get here?</title>

  <sect1>
    <title>Why revision control? Why Mercurial?</title>

    <para id="x_6d">Revision control is the process of managing multiple
      versions of a piece of information.  In its simplest form, this
      is something that many people do by hand: every time you modify
      a file, save it under a new name that contains a number, each
      one higher than the number of the preceding version.</para>

    <para id="x_6e">Manually managing multiple versions of even a single file is
      an error-prone task, though, so software tools to help automate
      this process have long been available.  The earliest automated
      revision control tools were intended to help a single user to
      manage revisions of a single file.  Over the past few decades,
      the scope of revision control tools has expanded greatly; they
      now manage multiple files, and help multiple people to work
      together.  The best modern revision control tools have no
      problem coping with thousands of people working together on
      projects that consist of hundreds of thousands of files.</para>

    <para id="x_6f">The arrival of distributed revision control is relatively
      recent, and so far this new field has grown due to people's
      willingness to explore ill-charted territory.</para>

    <para id="x_70">I am writing a book about distributed revision control
      because I believe that it is an important subject that deserves
      a field guide. I chose to write about Mercurial because it is
      the easiest tool to learn the terrain with, and yet it scales to
      the demands of real, challenging environments where many other
      revision control tools buckle.</para>

    <sect2>
      <title>Why use revision control?</title>

      <para id="x_71">There are a number of reasons why you or your team might
	want to use an automated revision control tool for a
	project.</para>

      <itemizedlist>
	<listitem><para id="x_72">It will track the history and evolution of
	    your project, so you don't have to.  For every change,
	    you'll have a log of <emphasis>who</emphasis> made it;
	    <emphasis>why</emphasis> they made it;
	    <emphasis>when</emphasis> they made it; and
	    <emphasis>what</emphasis> the change
	    was.</para></listitem>
	<listitem><para id="x_73">When you're working with other people,
	    revision control software makes it easier for you to
	    collaborate.  For example, when people more or less
	    simultaneously make potentially incompatible changes, the
	    software will help you to identify and resolve those
	    conflicts.</para></listitem>
	<listitem><para id="x_74">It can help you to recover from mistakes.  If
	    you make a change that later turns out to be in error, you
	    can revert to an earlier version of one or more files.  In
	    fact, a <emphasis>really</emphasis> good revision control
	    tool will even help you to efficiently figure out exactly
	    when a problem was introduced (see <xref
	      linkend="sec:undo:bisect"/> for details).</para></listitem>
	<listitem><para id="x_75">It will help you to work simultaneously on,
	    and manage the drift between, multiple versions of your
	    project.</para></listitem>
      </itemizedlist>

      <para id="x_76">Most of these reasons are equally
	valid&emdash;at least in theory&emdash;whether you're working
	on a project by yourself, or with a hundred other
	people.</para>

      <para id="x_77">A key question about the practicality of revision control
	at these two different scales (<quote>lone hacker</quote> and
	<quote>huge team</quote>) is how its
	<emphasis>benefits</emphasis> compare to its
	<emphasis>costs</emphasis>.  A revision control tool that's
	difficult to understand or use is going to impose a high
	cost.</para>

      <para id="x_78">A five-hundred-person project is likely to collapse under
	its own weight almost immediately without a revision control
	tool and process. In this case, the cost of using revision
	control might hardly seem worth considering, since
	<emphasis>without</emphasis> it, failure is almost
	guaranteed.</para>

      <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>
	might seem like a poor place to use a revision control tool,
	because surely the cost of using one must be close to the
	overall cost of the project.  Right?</para>

      <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
	these scales of development.  You can learn the basics in just
	a few minutes, and due to its low overhead, you can apply
	revision control to the smallest of projects with ease.  Its
	simplicity means you won't have a lot of abstruse concepts or
	command sequences competing for mental space with whatever
	you're <emphasis>really</emphasis> trying to do.  At the same
	time, Mercurial's high performance and peer-to-peer nature let
	you scale painlessly to handle large projects.</para>

      <para id="x_7b">No revision control tool can rescue a poorly run project,
	but a good choice of tools can make a huge difference to the
	fluidity with which you can work on a project.</para>

    </sect2>

    <sect2>
      <title>The many names of revision control</title>

      <para id="x_7c">Revision control is a diverse field, so much so that it is
	referred to by many names and acronyms.  Here are a few of the
	more common variations you'll encounter:</para>
      <itemizedlist>
	<listitem><para id="x_7d">Revision control (RCS)</para></listitem>
	<listitem><para id="x_7e">Software configuration management (SCM), or
	    configuration management</para></listitem>
	<listitem><para id="x_7f">Source code management</para></listitem>
	<listitem><para id="x_80">Source code control, or source
	    control</para></listitem>
	<listitem><para id="x_81">Version control
	    (VCS)</para></listitem></itemizedlist>
      <para id="x_82">Some people claim that these terms actually have different
	meanings, but in practice they overlap so much that there's no
	agreed or even useful way to tease them apart.</para>

    </sect2>
  </sect1>

  <sect1>
    <title>About the examples in this book</title>

    <para id="x_84">This book takes an unusual approach to code samples.  Every
      example is <quote>live</quote>&emdash;each one is actually the result
      of a shell script that executes the Mercurial commands you see.
      Every time an image of the book is built from its sources, all
      the example scripts are automatically run, and their current
      results compared against their expected results.</para>

    <para id="x_85">The advantage of this approach is that the examples are
      always accurate; they describe <emphasis>exactly</emphasis> the
      behavior of the version of Mercurial that's mentioned at the
      front of the book.  If I update the version of Mercurial that
      I'm documenting, and the output of some command changes, the
      build fails.</para>

    <para id="x_86">There is a small disadvantage to this approach, which is
      that the dates and times you'll see in examples tend to be
      <quote>squashed</quote> together in a way that they wouldn't be
      if the same commands were being typed by a human.  Where a human
      can issue no more than one command every few seconds, with any
      resulting timestamps correspondingly spread out, my automated
      example scripts run many commands in one second.</para>

    <para id="x_87">As an instance of this, several consecutive commits in an
      example can show up as having occurred during the same second.
      You can see this occur in the <literal
	role="hg-ext">bisect</literal> example in <xref
	linkend="sec:undo:bisect"/>, for instance.</para>

    <para id="x_88">So when you're reading examples, don't place too much weight
      on the dates or times you see in the output of commands.  But
      <emphasis>do</emphasis> be confident that the behavior you're
      seeing is consistent and reproducible.</para>

  </sect1>

  <sect1>
    <title>Trends in the field</title>

    <para id="x_89">There has been an unmistakable trend in the development and
      use of revision control tools over the past four decades, as
      people have become familiar with the capabilities of their tools
      and constrained by their limitations.</para>

    <para id="x_8a">The first generation began by managing single files on
      individual computers.  Although these tools represented a huge
      advance over ad-hoc manual revision control, their locking model
      and reliance on a single computer limited them to small,
      tightly-knit teams.</para>

    <para id="x_8b">The second generation loosened these constraints by moving
      to network-centered architectures, and managing entire projects
      at a time.  As projects grew larger, they ran into new problems.
      With clients needing to talk to servers very frequently, server
      scaling became an issue for large projects.  An unreliable
      network connection could prevent remote users from being able to
      talk to the server at all.  As open source projects started
      making read-only access available anonymously to anyone, people
      without commit privileges found that they could not use the
      tools to interact with a project in a natural way, as they could
      not record their changes.</para>

    <para id="x_8c">The current generation of revision control tools is
      peer-to-peer in nature.  All of these systems have dropped the
      dependency on a single central server, and allow people to
      distribute their revision control data to where it's actually
      needed.  Collaboration over the Internet has moved from
      constrained by technology to a matter of choice and consensus.
      Modern tools can operate offline indefinitely and autonomously,
      with a network connection only needed when syncing changes with
      another repository.</para>

  </sect1>
  <sect1>
    <title>A few of the advantages of distributed revision
      control</title>

    <para id="x_8d">Even though distributed revision control tools have for
      several years been as robust and usable as their
      previous-generation counterparts, people using older tools have
      not yet necessarily woken up to their advantages.  There are a
      number of ways in which distributed tools shine relative to
      centralised ones.</para>

    <para id="x_8e">For an individual developer, distributed tools are almost
      always much faster than centralised tools.  This is for a simple
      reason: a centralised tool needs to talk over the network for
      many common operations, because most metadata is stored in a
      single copy on the central server.  A distributed tool stores
      all of its metadata locally.  All else being equal, talking over
      the network adds overhead to a centralised tool.  Don't
      underestimate the value of a snappy, responsive tool: you're
      going to spend a lot of time interacting with your revision
      control software.</para>

    <para id="x_8f">Distributed tools are indifferent to the vagaries of your
      server infrastructure, again because they replicate metadata to
      so many locations.  If you use a centralised system and your
      server catches fire, you'd better hope that your backup media
      are reliable, and that your last backup was recent and actually
      worked.  With a distributed tool, you have many backups
      available on every contributor's computer.</para>

    <para id="x_90">The reliability of your network will affect distributed
      tools far less than it will centralised tools.  You can't even
      use a centralised tool without a network connection, except for
      a few highly constrained commands.  With a distributed tool, if
      your network connection goes down while you're working, you may
      not even notice.  The only thing you won't be able to do is talk
      to repositories on other computers, something that is relatively
      rare compared with local operations.  If you have a far-flung
      team of collaborators, this may be significant.</para>

    <sect2>
      <title>Advantages for open source projects</title>

      <para id="x_91">If you take a shine to an open source project and decide
	that you would like to start hacking on it, and that project
	uses a distributed revision control tool, you are at once a
	peer with the people who consider themselves the
	<quote>core</quote> of that project.  If they publish their
	repositories, you can immediately copy their project history,
	start making changes, and record your work, using the same
	tools in the same ways as insiders.  By contrast, with a
	centralised tool, you must use the software in a <quote>read
	  only</quote> mode unless someone grants you permission to
	commit changes to their central server.  Until then, you won't
	be able to record changes, and your local modifications will
	be at risk of corruption any time you try to update your
	client's view of the repository.</para>

      <sect3>
	<title>The forking non-problem</title>

	<para id="x_92">It has been suggested that distributed revision control
	  tools pose some sort of risk to open source projects because
	  they make it easy to <quote>fork</quote> the development of
	  a project.  A fork happens when there are differences in
	  opinion or attitude between groups of developers that cause
	  them to decide that they can't work together any longer.
	  Each side takes a more or less complete copy of the
	  project's source code, and goes off in its own
	  direction.</para>

	<para id="x_93">Sometimes the camps in a fork decide to reconcile their
	  differences. With a centralised revision control system, the
	  <emphasis>technical</emphasis> process of reconciliation is
	  painful, and has to be performed largely by hand.  You have
	  to decide whose revision history is going to
	  <quote>win</quote>, and graft the other team's changes into
	  the tree somehow. This usually loses some or all of one
	  side's revision history.</para>

	<para id="x_94">What distributed tools do with respect to forking is
	  they make forking the <emphasis>only</emphasis> way to
	  develop a project.  Every single change that you make is
	  potentially a fork point.  The great strength of this
	  approach is that a distributed revision control tool has to
	  be really good at <emphasis>merging</emphasis> forks,
	  because forks are absolutely fundamental: they happen all
	  the time.</para>

	<para id="x_95">If every piece of work that everybody does, all the
	  time, is framed in terms of forking and merging, then what
	  the open source world refers to as a <quote>fork</quote>
	  becomes <emphasis>purely</emphasis> a social issue.  If
	  anything, distributed tools <emphasis>lower</emphasis> the
	  likelihood of a fork:</para>
	<itemizedlist>
	  <listitem><para id="x_96">They eliminate the social distinction that
	      centralised tools impose: that between insiders (people
	      with commit access) and outsiders (people
	      without).</para></listitem>
	  <listitem><para id="x_97">They make it easier to reconcile after a
	      social fork, because all that's involved from the
	      perspective of the revision control software is just
	      another merge.</para></listitem></itemizedlist>

	<para id="x_98">Some people resist distributed tools because they want
	  to retain tight control over their projects, and they
	  believe that centralised tools give them this control.
	  However, if you're of this belief, and you publish your CVS
	  or Subversion repositories publicly, there are plenty of
	  tools available that can pull out your entire project's
	  history (albeit slowly) and recreate it somewhere that you
	  don't control.  So while your control in this case is
	  illusory, you are forgoing the ability to fluidly
	  collaborate with whatever people feel compelled to mirror
	  and fork your history.</para>

      </sect3>
    </sect2>
    <sect2>
      <title>Advantages for commercial projects</title>

      <para id="x_99">Many commercial projects are undertaken by teams that are
	scattered across the globe.  Contributors who are far from a
	central server will see slower command execution and perhaps
	less reliability.  Commercial revision control systems attempt
	to ameliorate these problems with remote-site replication
	add-ons that are typically expensive to buy and cantankerous
	to administer.  A distributed system doesn't suffer from these
	problems in the first place.  Better yet, you can easily set
	up multiple authoritative servers, say one per site, so that
	there's no redundant communication between repositories over
	expensive long-haul network links.</para>

      <para id="x_9a">Centralised revision control systems tend to have
	relatively low scalability.  It's not unusual for an expensive
	centralised system to fall over under the combined load of
	just a few dozen concurrent users.  Once again, the typical
	response tends to be an expensive and clunky replication
	facility.  Since the load on a central server&emdash;if you have
	one at all&emdash;is many times lower with a distributed tool
	(because all of the data is replicated everywhere), a single
	cheap server can handle the needs of a much larger team, and
	replication to balance load becomes a simple matter of
	scripting.</para>

      <para id="x_9b">If you have an employee in the field, troubleshooting a
	problem at a customer's site, they'll benefit from distributed
	revision control. The tool will let them generate custom
	builds, try different fixes in isolation from each other, and
	search efficiently through history for the sources of bugs and
	regressions in the customer's environment, all without needing
	to connect to your company's network.</para>

    </sect2>
  </sect1>
  <sect1>
    <title>Why choose Mercurial?</title>

    <para id="x_9c">Mercurial has a unique set of properties that make it a
      particularly good choice as a revision control system.</para>
    <itemizedlist>
      <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
      <listitem><para id="x_9e">It is lightweight.</para></listitem>
      <listitem><para id="x_9f">It scales excellently.</para></listitem>
      <listitem><para id="x_a0">It is easy to
	  customise.</para></listitem></itemizedlist>

    <para id="x_a1">If you are at all familiar with revision control systems,
      you should be able to get up and running with Mercurial in less
      than five minutes.  Even if not, it will take no more than a few
      minutes longer.  Mercurial's command and feature sets are
      generally uniform and consistent, so you can keep track of a few
      general rules instead of a host of exceptions.</para>

    <para id="x_a2">On a small project, you can start working with Mercurial in
      moments. Creating new changes and branches; transferring changes
      around (whether locally or over a network); and history and
      status operations are all fast.  Mercurial attempts to stay
      nimble and largely out of your way by combining low cognitive
      overhead with blazingly fast operations.</para>

    <para id="x_a3">The usefulness of Mercurial is not limited to small
      projects: it is used by projects with hundreds to thousands of
      contributors, each containing tens of thousands of files and
      hundreds of megabytes of source code.</para>

    <para id="x_a4">If the core functionality of Mercurial is not enough for
      you, it's easy to build on.  Mercurial is well suited to
      scripting tasks, and its clean internals and implementation in
      Python make it easy to add features in the form of extensions.
      There are a number of popular and useful extensions already
      available, ranging from helping to identify bugs to improving
      performance.</para>

  </sect1>
  <sect1>
    <title>Mercurial compared with other tools</title>

    <para id="x_a5">Before you read on, please understand that this section
      necessarily reflects my own experiences, interests, and (dare I
      say it) biases.  I have used every one of the revision control
      tools listed below, in most cases for several years at a
      time.</para>


    <sect2>
      <title>Subversion</title>

      <para id="x_a6">Subversion is a popular revision control tool, developed
	to replace CVS.  It has a centralised client/server
	architecture.</para>

      <para id="x_a7">Subversion and Mercurial have similarly named commands for
	performing the same operations, so if you're familiar with
	one, it is easy to learn to use the other.  Both tools are
	portable to all popular operating systems.</para>

      <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
	merges. At the time of writing, its merge tracking capability
	is new, and known to be <ulink
	  url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated 
	  and buggy</ulink>.</para>

      <para id="x_a9">Mercurial has a substantial performance advantage over
	Subversion on every revision control operation I have
	benchmarked.  I have measured its advantage as ranging from a
	factor of two to a factor of six when compared with Subversion
	1.4.3's <emphasis>ra_local</emphasis> file store, which is the
	fastest access method available.  In more realistic
	deployments involving a network-based store, Subversion will
	be at a substantially larger disadvantage.  Because many
	Subversion commands must talk to the server and Subversion
	does not have useful replication facilities, server capacity
	and network bandwidth become bottlenecks for modestly large
	projects.</para>

      <para id="x_aa">Additionally, Subversion incurs substantial storage
	overhead to avoid network transactions for a few common
	operations, such as finding modified files
	(<literal>status</literal>) and displaying modifications
	against the current revision (<literal>diff</literal>).  As a
	result, a Subversion working copy is often the same size as,
	or larger than, a Mercurial repository and working directory,
	even though the Mercurial repository contains a complete
	history of the project.</para>

      <para id="x_ab">Subversion is widely supported by third party tools.
	Mercurial currently lags considerably in this area.  This gap
	is closing, however, and indeed some of Mercurial's GUI tools
	now outshine their Subversion equivalents.  Like Mercurial,
	Subversion has an excellent user manual.</para>

      <para id="x_ac">Because Subversion doesn't store revision history on the
	client, it is well suited to managing projects that deal with
	lots of large, opaque binary files.  If you check in fifty
	revisions to an incompressible 10MB file, Subversion's
	client-side space usage stays constant The space used by any
	distributed SCM will grow rapidly in proportion to the number
	of revisions, because the differences between each revision
	are large.</para>

      <para id="x_ad">In addition, it's often difficult or, more usually,
	impossible to merge different versions of a binary file.
	Subversion's ability to let a user lock a file, so that they
	temporarily have the exclusive right to commit changes to it,
	can be a significant advantage to a project where binary files
	are widely used.</para>

      <para id="x_ae">Mercurial can import revision history from a Subversion
	repository. It can also export revision history to a
	Subversion repository.  This makes it easy to <quote>test the
	  waters</quote> and use Mercurial and Subversion in parallel
	before deciding to switch.  History conversion is incremental,
	so you can perform an initial conversion, then small
	additional conversions afterwards to bring in new
	changes.</para>


    </sect2>
    <sect2>
      <title>Git</title>

      <para id="x_af">Git is a distributed revision control tool that was
	developed for managing the Linux kernel source tree.  Like
	Mercurial, its early design was somewhat influenced by
	Monotone.</para>

      <para id="x_b0">Git has a very large command set, with version 1.5.0
	providing 139 individual commands.  It has something of a
	reputation for being difficult to learn.  Compared to Git,
	Mercurial has a strong focus on simplicity.</para>

      <para id="x_b1">In terms of performance, Git is extremely fast.  In
	several cases, it is faster than Mercurial, at least on Linux,
	while Mercurial performs better on other operations.  However,
	on Windows, the performance and general level of support that
	Git provides is, at the time of writing, far behind that of
	Mercurial.</para>

      <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
	repository requires frequent manual <quote>repacks</quote> of
	its metadata.  Without these, performance degrades, while
	space usage grows rapidly.  A server that contains many Git
	repositories that are not rigorously and frequently repacked
	will become heavily disk-bound during backups, and there have
	been instances of daily backups taking far longer than 24
	hours as a result.  A freshly packed Git repository is
	slightly smaller than a Mercurial repository, but an unpacked
	repository is several orders of magnitude larger.</para>

      <para id="x_b3">The core of Git is written in C.  Many Git commands are
	implemented as shell or Perl scripts, and the quality of these
	scripts varies widely. I have encountered several instances
	where scripts charged along blindly in the presence of errors
	that should have been fatal.</para>

      <para id="x_b4">Mercurial can import revision history from a Git
	repository.</para>


    </sect2>
    <sect2>
      <title>CVS</title>

      <para id="x_b5">CVS is probably the most widely used revision control tool
	in the world.  Due to its age and internal untidiness, it has
	been only lightly maintained for many years.</para>

      <para id="x_b6">It has a centralised client/server architecture.  It does
	not group related file changes into atomic commits, making it
	easy for people to <quote>break the build</quote>: one person
	can successfully commit part of a change and then be blocked
	by the need for a merge, causing other people to see only a
	portion of the work they intended to do.  This also affects
	how you work with project history.  If you want to see all of
	the modifications someone made as part of a task, you will
	need to manually inspect the descriptions and timestamps of
	the changes made to each file involved (if you even know what
	those files were).</para>

      <para id="x_b7">CVS has a muddled notion of tags and branches that I will
	not attempt to even describe.  It does not support renaming of
	files or directories well, making it easy to corrupt a
	repository.  It has almost no internal consistency checking
	capabilities, so it is usually not even possible to tell
	whether or how a repository is corrupt.  I would not recommend
	CVS for any project, existing or new.</para>

      <para id="x_b8">Mercurial can import CVS revision history.  However, there
	are a few caveats that apply; these are true of every other
	revision control tool's CVS importer, too.  Due to CVS's lack
	of atomic changes and unversioned filesystem hierarchy, it is
	not possible to reconstruct CVS history completely accurately;
	some guesswork is involved, and renames will usually not show
	up.  Because a lot of advanced CVS administration has to be
	done by hand and is hence error-prone, it's common for CVS
	importers to run into multiple problems with corrupted
	repositories (completely bogus revision timestamps and files
	that have remained locked for over a decade are just two of
	the less interesting problems I can recall from personal
	experience).</para>

      <para id="x_b9">Mercurial can import revision history from a CVS
	repository.</para>


    </sect2>
    <sect2>
      <title>Commercial tools</title>

      <para id="x_ba">Perforce has a centralised client/server architecture,
	with no client-side caching of any data.  Unlike modern
	revision control tools, Perforce requires that a user run a
	command to inform the server about every file they intend to
	edit.</para>

      <para id="x_bb">The performance of Perforce is quite good for small teams,
	but it falls off rapidly as the number of users grows beyond a
	few dozen. Modestly large Perforce installations require the
	deployment of proxies to cope with the load their users
	generate.</para>


    </sect2>
    <sect2>
      <title>Choosing a revision control tool</title>

      <para id="x_bc">With the exception of CVS, all of the tools listed above
	have unique strengths that suit them to particular styles of
	work.  There is no single revision control tool that is best
	in all situations.</para>

      <para id="x_bd">As an example, Subversion is a good choice for working
	with frequently edited binary files, due to its centralised
	nature and support for file locking.</para>

      <para id="x_be">I personally find Mercurial's properties of simplicity,
	performance, and good merge support to be a compelling
	combination that has served me well for several years.</para>


    </sect2>
  </sect1>
  <sect1>
    <title>Switching from another tool to Mercurial</title>

    <para id="x_bf">Mercurial is bundled with an extension named <literal
	role="hg-ext">convert</literal>, which can incrementally
      import revision history from several other revision control
      tools.  By <quote>incremental</quote>, I mean that you can
      convert all of a project's history to date in one go, then rerun
      the conversion later to obtain new changes that happened after
      the initial conversion.</para>

    <para id="x_c0">The revision control tools supported by <literal
	role="hg-ext">convert</literal> are as follows:</para>
    <itemizedlist>
      <listitem><para id="x_c1">Subversion</para></listitem>
      <listitem><para id="x_c2">CVS</para></listitem>
      <listitem><para id="x_c3">Git</para></listitem>
      <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>

    <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
      export changes from Mercurial to Subversion.  This makes it
      possible to try Subversion and Mercurial in parallel before
      committing to a switchover, without risking the loss of any
      work.</para>

    <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
      is easy to use.  Simply point it at the path or URL of the
      source repository, optionally give it the name of the
      destination repository, and it will start working.  After the
      initial conversion, just run the same command again to import
      new changes.</para>
  </sect1>

  <sect1>
    <title>A short history of revision control</title>

    <para id="x_c7">The best known of the old-time revision control tools is
      SCCS (Source Code Control System), which Marc Rochkind wrote at
      Bell Labs, in the early 1970s.  SCCS operated on individual
      files, and required every person working on a project to have
      access to a shared workspace on a single system.  Only one
      person could modify a file at any time; arbitration for access
      to files was via locks.  It was common for people to lock files,
      and later forget to unlock them, preventing anyone else from
      modifying those files without the help of an
      administrator.</para>

    <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the
      early 1980s; he called his program RCS (Revision Control System).
      Like SCCS, RCS required developers to work in a single shared
      workspace, and to lock files to prevent multiple people from
      modifying them simultaneously.</para>

    <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block
      for a set of shell scripts he initially called cmt, but then
      renamed to CVS (Concurrent Versions System).  The big innovation
      of CVS was that it let developers work simultaneously and
      somewhat independently in their own personal workspaces.  The
      personal workspaces prevented developers from stepping on each
      other's toes all the time, as was common with SCCS and RCS. Each
      developer had a copy of every project file, and could modify
      their copies independently.  They had to merge their edits prior
      to committing changes to the central repository.</para>

    <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote
      them in C, releasing in 1989 the code that has since developed
      into the modern version of CVS.  CVS subsequently acquired the
      ability to operate over a network connection, giving it a
      client/server architecture.  CVS's architecture is centralised;
      only the server has a copy of the history of the project. Client
      workspaces just contain copies of recent versions of the
      project's files, and a little metadata to tell them where the
      server is.  CVS has been enormously successful; it is probably
      the world's most widely used revision control system.</para>

    <para id="x_cb">In the early 1990s, Sun Microsystems developed an early
      distributed revision control system, called TeamWare.  A
      TeamWare workspace contains a complete copy of the project's
      history.  TeamWare has no notion of a central repository.  (CVS
      relied upon RCS for its history storage; TeamWare used
      SCCS.)</para>

    <para id="x_cc">As the 1990s progressed, awareness grew of a number of
      problems with CVS.  It records simultaneous changes to multiple
      files individually, instead of grouping them together as a
      single logically atomic operation.  It does not manage its file
      hierarchy well; it is easy to make a mess of a repository by
      renaming files and directories.  Worse, its source code is
      difficult to read and maintain, which made the <quote>pain
	level</quote> of fixing these architectural problems
      prohibitive.</para>

    <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had
      worked on CVS, started a project to replace it with a tool that
      would have a better architecture and cleaner code.  The result,
      Subversion, does not stray from CVS's centralised client/server
      model, but it adds multi-file atomic commits, better namespace
      management, and a number of other features that make it a
      generally better tool than CVS. Since its initial release, it
      has rapidly grown in popularity.</para>

    <para id="x_ce">More or less simultaneously, Graydon Hoare began working on
      an ambitious distributed revision control system that he named
      Monotone. While Monotone addresses many of CVS's design flaws
      and has a peer-to-peer architecture, it goes beyond earlier (and
      subsequent) revision control tools in a number of innovative
      ways.  It uses cryptographic hashes as identifiers, and has an
      integral notion of <quote>trust</quote> for code from different
      sources.</para>

    <para id="x_cf">Mercurial began life in 2005.  While a few aspects of its
      design are influenced by Monotone, Mercurial focuses on ease of
      use, high performance, and scalability to very large
      projects.</para>
  </sect1>
</chapter>

<!--
local variables: 
sgml-parent-document: ("00book.xml" "book" "chapter")
end:
-->
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.