Supporting a Fast-Forward-Only Pull Request Workflow with Interdiffs
- Supporting a Fast-Forward-Only Pull Request Workflow with Interdiffs
- Value of fast-forward merge workflow
- Supporting a fast-forward merge workflow
- Implementing interdiffs in pull requests
- Related work/links
Fast-forward-only merge workflows improve the relevance of code reviews, protect against unexpected merge results, and improve the readability of commit history. Incremental reviews mitigate the rebase contention that is inherent to this workflow.
A pull-request-based Git host, Stash, is contrasted with two popular code review tools (Gerrit and ReviewBoard) that are not based on pull requests. Strategies to implement incremental reviews (interdiffs) in pull requests are discussed.
Even without enforcing a fast-forward-only merge workflow, adapting Gerrit's incremental review capability to the pull request workflow can bring significant benefit.
Value of fast-forward merge workflow
A fast-forward merge results when a feature branch is merged back into its original branch without any commits having meanwhile been pushed to the original branch. If commits have since been added to the original branch from some other source, one can rebase a feature branch, transforming it to a state that can be fast-forward-merged.
There are at least two major motivations to restrict a target branch to fast-forward-only ("ff-only") merges:
- Merge transparency
- History readability
The final step of a pull request often involves clicking a "Merge" button in the Git host's web interface. In a non-ff situation, this step can suddenly result in the production of content that is different than the content under review. The Git host may alter some files, if it determines that it can perform a "conflict-free" merge.
The automated merge process typically operates on a line-by-line basis, without considering the overall structure particular to the file format or programming language at hand. A merge that succeeds without textual conflicts may not remain unscathed by semantic conflicts.
Assume that the target branch is shared by many people. Build breakage or functional breakage can bring development on the branch to a standstill. Workflows are disrupted while attention is diverted to the emergency of repairing the branch.
In a ff-only merge, only the exact commit content and history that has been vetted by reviewers ever makes it into the target branch. The review in this case is completely germane, as the code merged into the target branch represents the code under review with 100% correspondence.
I considered a number terms to describe the concept of "only merging exactly what was reviewed":
- Review correspondence
- Review fidelity
- Review pertinence
- Review relevace
I'm still not sure which of these most succinctly captures the idea.
Arguably, the highly linear history produced by enforcing ff-only merges into a target branch is easier to comprehend than a history replete with merges. This is not to say that merge commits cannot find their way into the target branch's history, however, if performed fully within the feature branch before merging the pull request.
Supporting a fast-forward merge workflow
A fundamental requirement for a ff-only merge workflow is that the rebase cycle must be shorter than the time between commits made to the target branch by the organization at large. Rebase contention results when the commit frequency to the target branch is too high or the rebase cycle is too slow. To minimize the annoyance of rebase contention, the average period between commits to the target branch should be lower than the average rebase cycle time by a considerable margin.
The rebase cycle consists of:
- performing the rebase locally and updating the pull request
- obtaining re-approval by reviewers
For the purpose of this discussion, let us consider the continuous integration (CI), or "build" server as a reviewer. The CI server automatically gives its approval to a review upon a successful build.
In this workflow, the rebase cycle is the last phase in the lifecycle of a pull request into a ff-only branch. It begins after reviewers are satisfied with the content of the developer's pull request and the developer attempts to merge into the target branch. If commits have been made to the target branch since the pull request was opened, the developer must rebase upon the new tip of the target branch.
The act of rebasing will, in general, change the content of the pull request, and the developer may seek re-approval by reviewers (especially the build server).
If the build takes too long, or if the human reviewers are unable to re-submit their approval in a timely manner, it may happen that another commit is made to the target branch, forcing yet another rebase of the feature branch. Let us describe this condition as rebase contention.
In the worst case, a pull request may never get merged into the target branch, being stymied by an endless cycle of rebases. In a productive workflow, the bulk of the pull request duration should not be spent trying to actively merge into the target branch.
Note: It is not unheard of for pull requests to remain open for a long time (e.g. while a reviewer is on vacation or the work is de-prioritized). Additionally, a developer may iterate several times on the pull request in response to reviewer feedback. This time does not contribute to what we define as the rebase cycle.
Enter: incremental reviews
The rebase cycle time may be reduced by the implementation of incremental reviews for the human reviewers. Equally important is the minimization of build time, one strategy for which is incremental builds. However, build optimization strategies are beyond the scope of this document.
Gerrit and ReviewBoard
These interdiffs facilitate an efficient iterative review process, allowing reviewers to concentrate only on the changes made in response to their feedback, instead of requiring re-examination a developer's entire pending changeset.
In contrast with a pull request workflow, ReviewBoard is "patch-oriented" rather than "branch-oriented". Because its code review lacks full integration with the authoritative Git host, it is possible for different code to be merged than was under review. This is a "review correspondence" problem even more significant than that of a pull request workflow in which non-fast-forward merges are allowed.
Gerrit is similar in its patch-oriented workflow and support for diffs-of-diffs, but it at least also serves as an authoritative Git host, closing the review correspondence gap. This video demonstrates a typical Gerrit workflow.
Code reviews in Gerrit revolve around entities called changes, which are generated by and result in a single commit to the repository. Gerrit changes may be revised over the course of several patch sets.
In Gerrit, you must review one commit at a time. Dependence across commits (which is represented in Git as a "parent" relationship) is tracked by "change IDs" in Gerrit. These change IDs must be referenced in commit messages when pushing to the Gerrit server.
Git hosts that support pull requests (e.g. GitHub, GitLab, Stash, and BitBucket) allow one not only to review the file content of a patch, but also review and preserve the exact commit messages and commit graph structure the developer has carefully crafted in their feature branch.
Implementation of interdiffs in a pull request workflow would win the biggest selling point of Gerrit without its limitations regarding the preservation of commit graph structure.
Currently, the best approximation of "incremental diffs" that Stash has to offer is by additive commits, having commit messages such as "Addressed John Q. Reviewer's feedback". Of course, you don't want a log message like that ending up in your repository's permanent history.
A pull request should give reviewers transparency to and voice in the process of crafting of a portion of commit history. In contrast with Gerrit, this means that that the unit of review must be able to span more than a single commit. But in contrast with the current Stash implementation, we must be able to see changes to the set of commits in aggregate.
Stash has done well to at least introduce the concept of the "rescope" (described below). They must take this concept further by allowing the author to annotate the rescope action with text like "Squashing commits in preparation for final merge" and to view a diff against the pre-rescope content.
General utility of interdiffs
Even without enforcing a fast-forward-only merge workflow, incremental reviews would be useful. Rebasing can be a common operation, especially to squash all of those "Addressed John Q. Reviewer's feedback" commit messages before final submission. Stash reviewers are still on the hook to re-review and approve the changeset after such a rebase.
Implementing interdiffs in pull requests
This section first defines some relevant terms, then goes into Stash implementation specifics.
Let us define a rescope as a modification of pull request content. Two cases must be considered in computing interdiffs for pull request rescopes:
- static base
- dynamic base
The base commit is the the first commit in the series spanned by the pull request, i.e. the greatest ancestor commit whose parent is not considered a member of the pull request.
Static base rescopes
The static base case may involve strictly additive commits to the existing series, or it may involve an interactive rebase contained within the series of commits spanned by the pull request. This is the simpler interdiff to compute, as it reduces to a mere diff between the former tip of the feature branch and the new tip:
$ git diff <old_tip_sha1> <new_tip_sha1>
A rescope with a static base may result when:
Fixing a bug
Reviewers suggest code changes, which may call for an extra commit or the re-write of an existing commit.
Cleaning up branch history (e.g. squashing)
Developers may remove superfluous commits at the suggestion of reviewers.
Fixing commit messages
Reviewers may request that commit messages be made more descriptive, or perhaps that they include a reference to the issue tracker.
Dynamic base rescopes
The dynamic base commit case results when the pull request is rescoped by rebasing the feature branch upon a new target branch tip. This case is integral to the rebase cycle. Computing diffs in this case is not as simple as diffing the pre- and post-rebase feature branch tips.
The pre-rebase aggregate patch set must be computed and diffed against the post-rebase aggregate patch set. Implementation in ReviewBoard and Gerrit is more direct here, since they already operate on patch sets rather than commit sequences within a review.
However, it is straightforward to obtain these two patchsets with Git and to pass them to the
interdiff program (available in the
patchutils Ubuntu package):
$ interdiff <(git diff <old_base_commit> <old_feature_branch_tip>)\ <(git diff <new_base_commit> <new_feature_branch_tip>)
TODO: Check whether the above
interdiff command generates the same output as this multistep git procedure.
Stash implementation considerations
I've moved my plugin writeup to a separate document.
- Stash feature request: Diff between original commit and amended/rebased commit
- I have not evaluated Phabricator, which is another open-source Git host along the lines of GitLab.
- Git team workflows: merge or rebase?
- The State of GitHub's Code Review
- I have not tried PullReview, but they have a sweet logo.
- You may be interested in my open source Stash plugins on the Atlassian Marketplace.
- Git host: A server that hosts a centralized, "authoritative" clone of the repository, while providing access control and other services
- feature branch: The "source" branch in a pull request
- target branch: The branch from which the "source" branch was originally derived and will be merged back into
- review correspondence: the degree to which the content that was merged matches the content that was code reviewed
- interdiff: A diff between two diffs
- rebase cycle: Iteration on a pull request in response to reviewer feedback or preparation for a merge into a fast-forward-only branch
- rebase contention: The phenomenon in a fast-forward-only workflow in which a developer is forced to rebase repeatedly when other developers "make it in first" to merge to the target branch
- pull request rescope: The action of rebasing a branch that is involved in a pull request
- static base commit: The rescope case in which the feature branch is modified only additively
- dynamic base commit: The rescope case in which the feature branch was rebased
A generalized mitigation strategy for rebase contention
Say that a large team of developers all regularly commit to the same target branch. What happens of the mitigation strategies of incremental reviews and incremental builds are still not enough to shorten the rebase cycle?
If the frequency of commits made directly to the target branch is too high, a tiered branch hierarchy may be adopted to reduce contention. Note that because shared branches should not be rebased, the "feeder branches" to the toplevel target branch must relax the ff-only constraint. The hierarchy should be exactly deep enough to reduce the merge period to within the safety factor of the re-build/re-review cycle.
More on merge conflicts
Note: Not sure where this section would fit in yet, or if it even applies to my argument
Resolving merge conflicts can be a risky and time consuming consequence of branching, especially if you are forced to merge code that you didn't write. One must sometimes seek out original developer for their domain expertise to resolve the merge.
Even the original developer may have difficulty merging their own code when time has passed. For example, a developer may change a numeric constant in one branch, then some time later, in a different branch, make a whitespace change (indentation) to the file, touching the same line. This could mask the change made to the constant in the merge conflict view. Such small resolution details can be easy to overlook when resolving a large set of conflicts.