1. Bitbucket
  2. Public Issue Tracker
  3. master
  4. Issues

Issues

Issue #7723 open

Improve performance of large code reviews (BB-7555)

Seth Moore
created an issue

Currently, if you are viewing a diff that happens to have a bunch of autogenerated files that you don't really care about, site performance is severely impacted.

Example: A current pull request I'm reviewing has a reference.cs file with ~3000 lines of new auto-generated code and a generated wsdl of ~2000 lines of code, along with other programmer code that I actually want to review. This situation basically renders the code review tool worthless.

We've also run into this issue numerous times when we have test files that are in xml (in our case rdl's).

It would be great if the reviewer had the ability to show/hide files at will to improve the site's performance, or some other way to break the diff up into manageable pieces.

Comments (66)

  1. Samuel Haddad

    I second this issue. It would also be great to hide files of an extension type, such as don't automatically render the diff .rdl files. Rdl's while they are just xml, I want them to be treated as binary and not create huge diffs.

  2. Erik van Zijst staff

    I see your issue, but I'm unclear as to what you are suggesting in terms of improvement.

    I see the following options:

    • the ability to collapse individual file diffs (they'd still all be rendered and initially visible and be relatively slow, but then the reviewer can collapse them in javascript)
    • the ability to let the PR creator "uncheck" some files from the diff at create time and those files would be missing in the PR view
    • Refactor the diff page entirely, moving from a all-on-one-page design to something that only ever opens one file diff at a time, maybe with something like a directory tree on the left akin to Crucible or Stash

    What is your opinion on this and what should I raise an internal issue for?

  3. Erik van Zijst staff

    I'm closing this issue for lack of further information.

    We agree that performance of certain PR related operations can be improved and will continue to work on that.

    Feel free to reopen this issue and provide suggestions.

  4. Samuel Haddad

    Erik van Zijst I can follow up with Seth Moore. He is a colleague of mine, I will ping him in person.

    For me I would love some way to tell the diff to ignore certain files. This may not be the end all solution, but it will help greatly with those auto generated files, or files that I want to be treated as binary.

    This is probably were most of our slow rendering comes from.

    The RDL's is a good example. While the file is actually XML and can be diffed, I don't really care about the xml that makes up the file and I want it to be treated as binary, and just tell me the file has changed.

    If I commit 2 new RDL files to my repository that could easily be 4000+ lines of XML that I don't care about at all.

    Web service proxy classes is another good example of a common auto generated class I don't care to read.

    So to me I see two obvious solutions: * Uncheck files you don't care to see diffed. - I feel like this just treats the symptom and not the problem, since there could be a case where you have a large diff and it isn't really auto generated, or you can't really exclude any file.

    *Refactor the diff page entirely and move away from an all-in-one design - This solves the problem in all cases that I can think of. You may still be able to get the all-in-one feel parts of the diff are dynamically loaded and removed.

    This also doesn't assume any importance of a file. While I may want to hide a file I believe is low importance, another code reviewer might want to review it in detail.

    This is just my thoughts, I hope it helps. Keep up the great work guys!

  5. Seth Moore reporter

    Erik van Zijst : Thanks for the follow-up. I think the 2nd option you presented (unchecking files) would solve the issue we're running into.

    Like Samuel Haddad said though, you might still run into this issue if someone has a huge diff with thousands of lines of code changes they're trying to commit (I'd hate to be that reviewer though!)

    I think for the normal case though where the actually code changes aren't in the thousands of lines of code, the ability to let the PR creator "uncheck" some files from the diff at create time would solve the problem, as long as reviewers would still see a list of those ignored files so that they could still go review them if they really want to.

    Like Samuel Haddad said, we love BitBucket so keep up the good work!

  6. Erik van Zijst staff

    Thanks for taking the time to jot down your thoughts so carefully, guys. Much appreciated. We're well aware of the performance issues related to our everything-on-one-page approach and are trying to come up with the right solution and so feedback like this is very helpful.

    I've raised an internal issue for this. I can't give an ETA, but it is very much on our minds.

  7. rfc

    I hit this a similar problem recently - but it does not seem to be precisely the same performance problem - bitbucket is stating that "This merge is too large to display" in the "overview" tab of the pull request. There is also a warning indicator next to the merge button saying "cannot merge due to conflicts", but it is not possible to see what the conflicts are (if any?) using the bitbucket interface.

    more information: my scenario is a pull request of about 15 changesets. In one of the changesets I rename ~280 binary files. The other changesets are small code changes to a small number of files. For this situation, none of us want to read the changeset with the large number of file renames. The binary files in question all have predictable file name extensions / live in predictable places in the repository.

  8. Erik van Zijst staff

    We suffered severe performance issues over the weekend due to a denial of service attack and following from that we had some regressions that we didn't fully resolve until this morning. This resulted in an increased level of "merge too large" errors.

    Now you raise another issue. That of Bitbucket not tracking file renames. This in turn makes small changes balloon to large diff pages if you rename a lot of files.

    Because Bitbucket treats these as real changes, you can get into the diff-too-large territory quite easily.

    We should fix the site so that it can track renames and we already have an issue for that. You can vote for it and watch it to be notified of progress: https://bitbucket.org/site/master/issue/589/file-history-should-follow-copies-and

  9. Mårten Rånge

    I am struggling with Performance during large pull requests. Due to the poor performance experienced with lot of auto-generated files we try to find ways to split prs but that is unfortunate because then we start sub-optimizing our process because of tool performance.

    I personally would have been happy with an option on the user (ie me) that doesn't expand any source files during PR and I then expand them on demand (ignoring files I know are auto-generated)

  10. Ismail Ege Akpinar

    I second this issue. Github does a clever thing (at least for iOS projects) by hiding such large diffs (e.g. pbxproj in an Xcode project). Would be great if bitbucket did a similar optimisation

  11. Seth Moore reporter

    I'm viewing a large pull request now and every file in the diff now says "This diff was too big for us to render on page load. Click here to try to load it", so I can just choose which files I want to view.

    Is this the fix or something unrelated?

    It might also just be happening because all of my binary un-diffable files are listed first.

  12. Jon

    I know I'm late to the party, but here's my take on it:

    Coming from Crucible, I feel a 1-at-a-time workflow works very well. I'd +1 that approach.

    I'd very much support an interface that allows you to toggle individual files. You wouldn't need to load/render the whole PR if toggling a file on was done via AJAX. Then folks that need to see more than one file at once could do so, whether loooong-page scroll fashion like current PR layout (-1), or as tabs/splitpane (+1).

    As an aside, since I liked Crucible so much I applied a user-stylesheet to my BB PR pages with the Chrome Stylebot plugin with this CSS. It approximates the Crucible layout. Yay for duck punching punch

    #compare-summary > h1 {
        border-top: none;
    }
    
    #content {
        margin-right: 0;
        width: 70%;
    }
    
    #edit-pullrequest {
        border-left-width: 1px;
        border-radius: 3.01px 0 0 3.01px;
    }
    
    #fulfill-pullrequest {
        display: none;
    }
    
    #pullrequest-diff > section.main {
        background: white;
        border: solid 1px #DDD;
        border-left: none;
        bottom: 0;
        left: 0;
        overflow: scroll;
        padding: 20px;
        position: fixed;
        top: 210px;
        width: 26%;
        z-index: 99;
    }
    
  13. Jon

    exclamation oh, and if PR's ever change from being one long page, I think it would make my workflow cough up blood without a way to search within the pull request. I currently use Find-on-page frequently and would be lost w/o it.

  14. Mårten Rånge

    Personally I am at the point where I can accept any solution that allows me to control what part of the PR is expanded; repository settings, user settings, pr settings, secret query string parameters I don't care just as long as the PR to starts as unexpanded.

    If it was my choice this performance issue alone would make me look for alternatives to bitbucket. It's not nice to threaten to switch but when you can't work the way you want to because of tool quality it's frustrating.

  15. Shyam Guthikonda

    Running into the same issue here.

    We have 20000 line long YAML files for some of our data. It is more efficient to keep these files as text (as opposed to binary) for delta compression considerations. Often when a single GUID in the application changes, multiple references in this YAML file need to be updated. Due to the large size, the commit view in Bitbucket is pretty much unusable, so we are unable to continue code reviews until this issue is resolved.

    One temporary solution that I am currently contemplating is to split these types of files into their own separate repository, and manage them through a Mercurial subrepo. In our main repo, we won't have to deal with these long files slowing down the diff view. Not a great solution, but it's the only thing I can come up with (aside from injecting null bytes into the files to trick Mercurial into thinking they are binary files... which I'd rather not do) until this issue gets resolved.

    Thanks.

  16. David Allen

    Bitbuckets code review feels just good enough to be useful, but with lots of room for improvement.

    Crucible wasn't perfect, but it was much better than this and I have never understood why since it can work with git why it isn't an OnDemand option. Can't you looks for the best parts of that UI even if the implementation isn't suitable?

    An option to be able to choose files to exclude from the review would be better than nothing, but isn't a good long-term solution.

    Some sort of interface based on a list of files to select from in a panel or instantly available popup so files can be reviewed one at a time would be better.

    However, it seems a single page with 5000 lines across 20 files and 10 commits should perform significantly better than it does.

    Adding automatic links to related JIRA issues would be helpful.

    Code review usability is my single biggest gripe with Bitbucket and improvements would be very welcome.

  17. Tim Bevan

    Suffering from the "too big" issue and the performance of the large pull requests.

    Also finding that Firefox has a tendency of locking up for considerable periods. Chrome is not so bad.

    Again it would be nice to exclude specific files from the review (or even commits), but in some cases we really do have large changes (re-factoring!).

    Thanks.

  18. Mattias Seebergs

    Also suffering from this issue. I would really appreciate an option to have them all collapsed when the page is rendered and instead fetched dynamically on demand. When I replied on a comment yesterday my Chrome tab hang every now and then because of this, very frustrating. What's the latest on this?

  19. Dylan Bevan

    We have the same problem with the .edmx files that Visual Studio automatically generates. If we have a pull request that has any database changes it won't display the actual code changes because the web page's "diff display quota" is all taken up by the generated files that we don't care about.

  20. Nick Geiger

    I'm new to bitbucket, but I've never had this problem before on github, so maybe you could look into their solution.

    In my case, I'm trying to view large diffs for pull requests and it's common for a PR to have several commits and lots of files changed.

    First of all, the anchor link from the file changed to the diff for that file never seems to work (goes nowhere).

    Second, if I scroll down, at some point the diffs are suppressed with a message "This diff was too big for us to render on page load. Click here to try to load it"

    I can usually click the "try to load it" links for a few files to load them and then I start to get: "While impressive, your diff is just too big!" trying to load the diff for any/all individual files.

    Thanks.

  21. Erik van Zijst staff

    Hey Nick Geiger I'm happy to have a look at a specific pull request to get a feel for how big these diffs really are and whether we're showing those messages incorrectly. Assuming your repo is private, can you email us the link of the pull request at support@bitbucket.org and mention this issue?

  22. Rob Widdick

    Bumping this: It would be great to list all files - without the source - and give the reviewer the option to "view changes" instead of outputting the source automatically. We've had pull requests which contained a ton of commits and ASCII file changes of which Bitbucket will then try to output the source of every file, leading to chrome crashing. Unacceptable behavior.

  23. Kevin Conaway

    Erik van Zijst Atlassian support forwarded me here relating to an internal ticket I opened (BBS-21792).

    That issue contains a link to a branch where we are running in to issues showing diffs for some branches.

    I think a few things could help improve the UX:

    • Don't automatically show the file contents for deletes.
    • Better move support. As of now, Bitbucket will show a delete and a new file when a file is merely moved.

    At a minimum, there should be an error message that displays that indicates that the diff is too large. Right now, I see all of the affected files in the table of contents but clicking on some of them does nothing because the individual diff for that file isn't loaded on the page.

  24. shaunequilife

    We just started using BitBucket and we have issues with this as well. We have an app the auto-generates html files with hashes in the file names. We keep these in the repo so we can compare the output with any changes to the code. However, when looking at one of the pages that does a diff, it opens and displays not only the new generated files in their entirety, but all the code that was removed from the previous iteration. By the time it gets to do a diff on some relevant code changes it will not display.

    Can you add a setting that they are all closed on default, and then you can open them manually. Or at the very least, why display ALL the code in a diff for a file you are removing?

  25. Nick Barone

    Adding my vote/voice to this.

    What I'd like is to be able to supply a list of extensions and/or set a maximum number of lines for my user and/or project. If a file has one of those extensions, and/or is longer than the line length, it is not automatically loaded when the PR is loaded. Instead, I see "This diff was too big for us to render on page load. Click here to try to load it".

    I experience significant performance issues when reviewing large pull-requests, and I think it's all performance issues on my browser (Chrome) rather than anything to do with the server.

  26. Chad Windnagle

    Hi All

    I'm experiencing this issue and the diff / pull-request merge view is unusable (not just slow) in Chrome. We have some auto generated files from Grunt / asset build tools, and when the built files are compiled and included in a pull request, the diff takes so long to load that Chrome says the javascript has crashed and asks to kill the page.

    The end result is, I can't even merge pull requests from the UI as a result. Not sure what the best user-interface solution is, but something needs to be done because this is piling up as making Bitbucket unsable for these types of commits.

    I know on Beanstalk (beanstalkapp.com) they load large commits with a "oops this commit is to big to be previewed" message, and then require the user to expand previews for large files. That'd at least make this usable.

    Chad

  27. Dave H

    Rob Widdick Wasn't using a dependency, this happened on generated html templates, which was included in versioning from original developers, who used github, which had no problems with the pull request diffs. We had to put the folder on gitignore because of the BitBucket problem.

  28. Chris Fairclough

    This is an important issue. I would guess / have some evidence that that Bitbucket's being dropped in favour of other tools because of this problem. The integration with JIRA is neat but if every time someone has to do a code review they have to sit and wait until the site gets it's finger out people will find alternatives pretty quickly.

  29. Dave H

    Chris Fairclough I notice that your code that shows the "couldn't load" error (which we got) was for three.js, which is like 150MB, so I wouldn't expect BitBucket to show that in the browser, and clicking "give it another chance" shows "There's simply too much in this diff for us to render it all" which I'd agree with, though showing its size would be nice.

    The problem we're running into are files that are up to 300KB in size each (less than 10 of them), and I'm guessing BitBucket just has a download cap less than that, though I'm not sure where to see that.

  30. Chad Windnagle

    Rob Widdick Sorry just saw your reply. These aren't dependencies as much as they are compiled assets. I don't want to .gitignore them because I use git to actually deploy the file to our production system, so having them ignored would not be suitable.

  31. Dave H

    Chad Windnagle I think your compiled assets may be better in a git-submodule (since you're already versioning them, this would at least keep those changes separate from the top-level repo changes), then I think BitBucket may not look at them as a diff in the top-level repo.

    BitBucket does have a link to click on the "couldn't load" error to try again (like the Beanstalk expand action), but that of course has a limit as I mentioned in my last comment to Chris.

  32. Chris Fairclough

    Dave H Not sure what you mean - I didn't post any code, maybe you meant to @ someone else. I don't think it's useful defining what size files are seeing this problem as most of our pull requests see issues, and some have impossible usability issues. Github is better but still not perfect. Even large PRs should be reviewable. Crucible or other tools are a fix but if BB had robust code review capability it would be a winning differentiator.

    I would think there's plenty of avenues of investigation to improve performance on this - the site could use javascript caching with an additional server-side cache generated from new commits. Look at the way facebook loads timelines dynamically. Scrolling through a PR should be smooth with no visible loading times.

    EDIT safari perf seems to be much better. I'm using chrome on OSX.

  33. Chris Fairclough

    Hmm ok fair enough Dave H. That's not what I was referring to however, I've seen problems in most standard size pull requests, and was trying to describe the issue in general as it seems many others are seeing similar issues especially on chrome.

  34. Eyal Roth

    I'm just about to review a PR with 123 modified files, and the performance of the page is simply unbearable.

    It's probably worth mentioning that this performance issue is present only with Chrome (Incognito as well); not with Firefox, IE 11, nor Edge.

    I'm running on Windows 10.

    Update:

    After re-installing Windows 10 the problem has been solved. My previous installation was an upgrade from Windows 8.1, and that perhaps caused an unknown issue.

  35. Tony Banik

    A nice way to do this would be to add a modifier to .gitignore like @these/autogenerated/files/*

    and then those files matched would be hidden under a dropdown or something

  36. Log in to comment