Issue #7723 open

Improve performance of large code reviews (BB-7555)

sethmoore
created an issue

Currently, if you are viewing a diff that happens to have a bunch of autogenerated files that you don't really care about, site performance is severely impacted.

Example: A current pull request I'm reviewing has a reference.cs file with ~3000 lines of new auto-generated code and a generated wsdl of ~2000 lines of code, along with other programmer code that I actually want to review. This situation basically renders the code review tool worthless.

We've also run into this issue numerous times when we have test files that are in xml (in our case rdl's).

It would be great if the reviewer had the ability to show/hide files at will to improve the site's performance, or some other way to break the diff up into manageable pieces.

Comments (42)

  1. Samuel Haddad

    I second this issue. It would also be great to hide files of an extension type, such as don't automatically render the diff .rdl files. Rdl's while they are just xml, I want them to be treated as binary and not create huge diffs.

  2. Erik van Zijst staff

    I see your issue, but I'm unclear as to what you are suggesting in terms of improvement.

    I see the following options:

    • the ability to collapse individual file diffs (they'd still all be rendered and initially visible and be relatively slow, but then the reviewer can collapse them in javascript)
    • the ability to let the PR creator "uncheck" some files from the diff at create time and those files would be missing in the PR view
    • Refactor the diff page entirely, moving from a all-on-one-page design to something that only ever opens one file diff at a time, maybe with something like a directory tree on the left akin to Crucible or Stash

    What is your opinion on this and what should I raise an internal issue for?

  3. Erik van Zijst staff

    I'm closing this issue for lack of further information.

    We agree that performance of certain PR related operations can be improved and will continue to work on that.

    Feel free to reopen this issue and provide suggestions.

  4. Samuel Haddad

    Erik van Zijst I can follow up with sethmoore. He is a colleague of mine, I will ping him in person.

    For me I would love some way to tell the diff to ignore certain files. This may not be the end all solution, but it will help greatly with those auto generated files, or files that I want to be treated as binary.

    This is probably were most of our slow rendering comes from.

    The RDL's is a good example. While the file is actually XML and can be diffed, I don't really care about the xml that makes up the file and I want it to be treated as binary, and just tell me the file has changed.

    If I commit 2 new RDL files to my repository that could easily be 4000+ lines of XML that I don't care about at all.

    Web service proxy classes is another good example of a common auto generated class I don't care to read.

    So to me I see two obvious solutions: * Uncheck files you don't care to see diffed. - I feel like this just treats the symptom and not the problem, since there could be a case where you have a large diff and it isn't really auto generated, or you can't really exclude any file.

    *Refactor the diff page entirely and move away from an all-in-one design - This solves the problem in all cases that I can think of. You may still be able to get the all-in-one feel parts of the diff are dynamically loaded and removed.

    This also doesn't assume any importance of a file. While I may want to hide a file I believe is low importance, another code reviewer might want to review it in detail.

    This is just my thoughts, I hope it helps. Keep up the great work guys!

  5. sethmoore reporter
    • changed status to open

    Sorry, I was a little late getting back to you. I had turned off email alerts so I didn't realize Erik was waiting on my response.

  6. sethmoore reporter

    Erik van Zijst : Thanks for the follow-up. I think the 2nd option you presented (unchecking files) would solve the issue we're running into.

    Like Samuel Haddad said though, you might still run into this issue if someone has a huge diff with thousands of lines of code changes they're trying to commit (I'd hate to be that reviewer though!)

    I think for the normal case though where the actually code changes aren't in the thousands of lines of code, the ability to let the PR creator "uncheck" some files from the diff at create time would solve the problem, as long as reviewers would still see a list of those ignored files so that they could still go review them if they really want to.

    Like Samuel Haddad said, we love BitBucket so keep up the good work!

  7. Erik van Zijst staff

    Thanks for taking the time to jot down your thoughts so carefully, guys. Much appreciated. We're well aware of the performance issues related to our everything-on-one-page approach and are trying to come up with the right solution and so feedback like this is very helpful.

    I've raised an internal issue for this. I can't give an ETA, but it is very much on our minds.

  8. reuben fletcher-costin

    I hit this a similar problem recently - but it does not seem to be precisely the same performance problem - bitbucket is stating that "This merge is too large to display" in the "overview" tab of the pull request. There is also a warning indicator next to the merge button saying "cannot merge due to conflicts", but it is not possible to see what the conflicts are (if any?) using the bitbucket interface.

    more information: my scenario is a pull request of about 15 changesets. In one of the changesets I rename ~280 binary files. The other changesets are small code changes to a small number of files. For this situation, none of us want to read the changeset with the large number of file renames. The binary files in question all have predictable file name extensions / live in predictable places in the repository.

  9. Erik van Zijst staff

    We suffered severe performance issues over the weekend due to a denial of service attack and following from that we had some regressions that we didn't fully resolve until this morning. This resulted in an increased level of "merge too large" errors.

    Now you raise another issue. That of Bitbucket not tracking file renames. This in turn makes small changes balloon to large diff pages if you rename a lot of files.

    Because Bitbucket treats these as real changes, you can get into the diff-too-large territory quite easily.

    We should fix the site so that it can track renames and we already have an issue for that. You can vote for it and watch it to be notified of progress: https://bitbucket.org/site/master/issue/589/file-history-should-follow-copies-and

  10. Mårten Rånge

    I am struggling with Performance during large pull requests. Due to the poor performance experienced with lot of auto-generated files we try to find ways to split prs but that is unfortunate because then we start sub-optimizing our process because of tool performance.

    I personally would have been happy with an option on the user (ie me) that doesn't expand any source files during PR and I then expand them on demand (ignoring files I know are auto-generated)

  11. Ege AKPINAR

    I second this issue. Github does a clever thing (at least for iOS projects) by hiding such large diffs (e.g. pbxproj in an Xcode project). Would be great if bitbucket did a similar optimisation

  12. sethmoore reporter

    I'm viewing a large pull request now and every file in the diff now says "This diff was too big for us to render on page load. Click here to try to load it", so I can just choose which files I want to view.

    Is this the fix or something unrelated?

    It might also just be happening because all of my binary un-diffable files are listed first.

  13. Jon Anderson

    I know I'm late to the party, but here's my take on it:

    Coming from Crucible, I feel a 1-at-a-time workflow works very well. I'd +1 that approach.

    I'd very much support an interface that allows you to toggle individual files. You wouldn't need to load/render the whole PR if toggling a file on was done via AJAX. Then folks that need to see more than one file at once could do so, whether loooong-page scroll fashion like current PR layout (-1), or as tabs/splitpane (+1).

    As an aside, since I liked Crucible so much I applied a user-stylesheet to my BB PR pages with the Chrome Stylebot plugin with this CSS. It approximates the Crucible layout. Yay for duck punching punch

    #compare-summary > h1 {
        border-top: none;
    }
    
    #content {
        margin-right: 0;
        width: 70%;
    }
    
    #edit-pullrequest {
        border-left-width: 1px;
        border-radius: 3.01px 0 0 3.01px;
    }
    
    #fulfill-pullrequest {
        display: none;
    }
    
    #pullrequest-diff > section.main {
        background: white;
        border: solid 1px #DDD;
        border-left: none;
        bottom: 0;
        left: 0;
        overflow: scroll;
        padding: 20px;
        position: fixed;
        top: 210px;
        width: 26%;
        z-index: 99;
    }
    
  14. Jon Anderson

    exclamation oh, and if PR's ever change from being one long page, I think it would make my workflow cough up blood without a way to search within the pull request. I currently use Find-on-page frequently and would be lost w/o it.

  15. Mårten Rånge

    Personally I am at the point where I can accept any solution that allows me to control what part of the PR is expanded; repository settings, user settings, pr settings, secret query string parameters I don't care just as long as the PR to starts as unexpanded.

    If it was my choice this performance issue alone would make me look for alternatives to bitbucket. It's not nice to threaten to switch but when you can't work the way you want to because of tool quality it's frustrating.

  16. doggan

    Running into the same issue here.

    We have 20000 line long YAML files for some of our data. It is more efficient to keep these files as text (as opposed to binary) for delta compression considerations. Often when a single GUID in the application changes, multiple references in this YAML file need to be updated. Due to the large size, the commit view in Bitbucket is pretty much unusable, so we are unable to continue code reviews until this issue is resolved.

    One temporary solution that I am currently contemplating is to split these types of files into their own separate repository, and manage them through a Mercurial subrepo. In our main repo, we won't have to deal with these long files slowing down the diff view. Not a great solution, but it's the only thing I can come up with (aside from injecting null bytes into the files to trick Mercurial into thinking they are binary files... which I'd rather not do) until this issue gets resolved.

    Thanks.

  17. David Allen

    Bitbuckets code review feels just good enough to be useful, but with lots of room for improvement.

    Crucible wasn't perfect, but it was much better than this and I have never understood why since it can work with git why it isn't an OnDemand option. Can't you looks for the best parts of that UI even if the implementation isn't suitable?

    An option to be able to choose files to exclude from the review would be better than nothing, but isn't a good long-term solution.

    Some sort of interface based on a list of files to select from in a panel or instantly available popup so files can be reviewed one at a time would be better.

    However, it seems a single page with 5000 lines across 20 files and 10 commits should perform significantly better than it does.

    Adding automatic links to related JIRA issues would be helpful.

    Code review usability is my single biggest gripe with Bitbucket and improvements would be very welcome.

  18. Tim Bevan

    Suffering from the "too big" issue and the performance of the large pull requests.

    Also finding that Firefox has a tendency of locking up for considerable periods. Chrome is not so bad.

    Again it would be nice to exclude specific files from the review (or even commits), but in some cases we really do have large changes (re-factoring!).

    Thanks.

  19. Mattias Seebergs

    Also suffering from this issue. I would really appreciate an option to have them all collapsed when the page is rendered and instead fetched dynamically on demand. When I replied on a comment yesterday my Chrome tab hang every now and then because of this, very frustrating. What's the latest on this?

  20. Dylan Bevan

    We have the same problem with the .edmx files that Visual Studio automatically generates. If we have a pull request that has any database changes it won't display the actual code changes because the web page's "diff display quota" is all taken up by the generated files that we don't care about.

  21. Nick Geiger

    I'm new to bitbucket, but I've never had this problem before on github, so maybe you could look into their solution.

    In my case, I'm trying to view large diffs for pull requests and it's common for a PR to have several commits and lots of files changed.

    First of all, the anchor link from the file changed to the diff for that file never seems to work (goes nowhere).

    Second, if I scroll down, at some point the diffs are suppressed with a message "This diff was too big for us to render on page load. Click here to try to load it"

    I can usually click the "try to load it" links for a few files to load them and then I start to get: "While impressive, your diff is just too big!" trying to load the diff for any/all individual files.

    Thanks.

  22. Erik van Zijst staff

    Hey Nick Geiger I'm happy to have a look at a specific pull request to get a feel for how big these diffs really are and whether we're showing those messages incorrectly. Assuming your repo is private, can you email us the link of the pull request at support@bitbucket.org and mention this issue?

  23. Rob Widdick

    Bumping this: It would be great to list all files - without the source - and give the reviewer the option to "view changes" instead of outputting the source automatically. We've had pull requests which contained a ton of commits and ASCII file changes of which Bitbucket will then try to output the source of every file, leading to chrome crashing. Unacceptable behavior.

  24. Log in to comment