Illuminated Cloud Freezes IntelliJ, Requiring Forced Quit

Issue #1243 resolved
quietopus created an issue

For the last couple of weeks, this happens roughly once or more per session while editing Apex code. Attached stack trace log from idea.log, obtained manually after forced shutdown of IDE.

Comments (18)

  1. quietopus reporter

    Same edit consistently causes freeze, even after after invalidating caches, restarting and waiting for indexing to complete.

  2. Brandt Elison

    I’m seeing the exact same problem that you describe. There’s a specific method that I cannot delete without IntelliJ locking up. It doesn’t matter if I delete the method all at once or if I comment it out or start deleting it one character at a time.

    While frozen IntelliJ doesn’t appear to be doing anything:

    Even remove all references to the method before removing the method isn’t fixing the problem.

  3. Scott Wells repo owner

    Can you verify that you're on the most recent build of the IDE? I fixed a few deadlock issues a few builds back. If you're on the latest version and you're still seeing this, please provide the files found under Help>Show Log in Explorer/Finder/Files for review, specifically the thread dumps.

  4. Brandt Elison

    I hadn’t upgraded in a while. I have upgraded now and I’ll let you know if I can reproduce the problem.

  5. Scott Wells repo owner

    Thanks, Xander. That thread dump does help. I'll take a look and see if I can resolve this issue.

  6. Scott Wells repo owner

    Xander, looking at this thread dump, it appears that the issue is that the Salesforce CLI is running interminably. Basically I see a few threads (including the event dispatch thread) waiting on a common mutex, and that mutex is owned by a thread that is executing sfdx force:org:list and waiting for it to complete. Once that CLI invocation completes, all of this will unlock properly, but it sounds like perhaps the CLI never completes...or is taking a VERY long time to complete.

    Since it sounds like you're able to reproduce this reliably, can you get it into this state and see what's going on with the CLI at that point? You'll be looking for sfdx and node processes that are either still running or suspended.

    You might also try a few other things here. The first would be to run sfdx force:org:list --clean to remove all stale org entries. Then run sfdx force:org:list again and look for any orgs with a CONNECTED STATUS that's not Connected. Remove the corresponding files for those...they're named based on the value for USERNAME...from ~/.sfdx. Users who have 3-5 minute execution times for sfdx force:org:list have seen it drop to just a few seconds after doing that type of explicit clean-up.

    If that doesn't do it, how long is it taking for sfdx force:org:list to run from the command-line? If it still takes a long time, I'd recommend a scorched earth reinstall of the Salesforce CLI. If it completes quickly but is still causing IC to hang, please enable debug logging for Salesforce DX, reproduce the issue, and provide the resulting idea.log at the point that it's hanging so that I can see what's going on.

    Thanks!

  7. Xander Victory

    Hiya Scott. Yes sfdx takes a while to start up & do its thing (this is because it does a filesystem scan for its commands and I have 11 orgs configured). Just now it’s taking ~10s to show the list.

    The time it takes however is not my point. From one developer to another, you should never be running external processes from a UI thread. It really does not matter how quickly it “normally” runs

  8. Scott Wells repo owner

    Hi, Xander. Yes, I'm definitely aware of how the EDT should and should not be used to avoid issues with repaints. In the thread dumps you've provided, the EDT isn't running the external process; it's waiting for a semaphore owned by another thread that's running the external process:

    "AWT-EventQueue-0 2019.2#IU-192.5728.98 IDEA, eap:false, os:Windows 10 10.0, java-version:JetBrains s.r.o 11.0.3+12-b304.10 2019.2#IU-192.5728.98 IDEA, eap:false, os:Windows 10 10.0, java-version:JetBrains s.r.o 11.0.3+12-b304.10" prio=0 tid=0x0 nid=0x0 blocked
         java.lang.Thread.State: BLOCKED
     on com.illuminatedcloud.intellij.settings.project.IlluminatedCloudConnection@6777e3a7 owned by "JobScheduler FJ pool 2/11" Id=50
        at com.illuminatedcloud.intellij.settings.project.IlluminatedCloudConnection.config(SourceFile:234)
    

    and:

    "JobScheduler FJ pool 2/11" prio=0 tid=0x0 nid=0x0 waiting on condition
         java.lang.Thread.State: TIMED_WAITING
        at java.base@11.0.3/java.lang.Thread.sleep(Native Method)
        at com.illuminatedcloud.util.CommandLineUtil.runCommandLine(SourceFile:130)
        at com.illuminatedcloud.intellij.sfdx.SfdxUtil.runSfdx(SourceFile:196)
    

    There are in fact quite a few threads waiting on the latter thread to complete. That semaphore is required to ensure that IC's internal connection list and details get initialized properly. I guess that technically the EDT is indirectly waiting for the external process to complete...

    Are you seeing a specific thread dump where the EDT is actually running the external process? If so, please let me know which one because that's definitely an issue. The EDT should be rendering data and making tweaks the presentation layer state, not doing anything with more weight.

  9. Xander Victory

    Thanks for the detailed response - there’s only so far I can look into the traces for someone else’s plugin 😉

    It does indeed look like your synchronized block in com.illuminatedcloud.intellij.settings.project.IlluminatedCloudConnection.config() is causing quite the deadlock. I suppose there’d be some decent refactoring needed to make that less blocking, more 'try again later'?

  10. Xander Victory

    I also apologise for coming off a bit snarky. I did not take the time to truly read (and understand) the thread dump

  11. Scott Wells repo owner

    Oh, I didn't take it as snarky at all. If anything I took it as well-informed about the idiosyncrasies of Java Swing development. And I don't ever expect others to have to understand raw thread dumps. I do, however, appreciate group brainstorming on issues, so don't hesitate to throw your own thoughts into the ring!

    Regarding the actual issue, yes, the problem here is that to initialize an SFDX-based connection with all the relevant details, IC must consult the CLI that is authoritative on those details. Because multiple threads often care about those details--the EDT for presentation of connection info to the user and one or more worker threads for actually doing work against a connection--that initialization becomes a critical section that must be properly guarded. Right now that guard is a standard Java synchronized block (technically it's using DCL against a volatile member to avoid the synchronized block whenever possible). When a worker thread invokes the CLI to fill in the connection details, other threads that need those details block until the worker completes. I'm 99% sure that's what you're seeing, and in your case that CLI invocation is long-running. At that point in time the EDT is trying to paint the connection details for the new status bar widget. One thing I'm considering is making that (and other similar presentation-oriented, read-only usages) try to retrieve the connection details in a non-blocking manner and then refresh when the connection details are available. But as you pointed out, that's going to be a non-trivial refactoring.

    I'll definitely see what I can do, even if it's broken into short-term and more long-term approaches to solving the problem. I'll keep you posted here and potentially make a test build available with any prospective fix.

  12. Xander Victory

    I dug a little deeper on this today and thought I’d post here, even if just for completeness.

    When running sfdx force:org:list, the CLI will refresh auth info (i.e. potentially calling out to login/test.salesforce.com) for EVERY org, and for Dev hubs will also list the scratch org infos to check they’re still valid.
    I imagine it does this because the JSON output includes the access token, and the way sfdx is structured, commands generate both printable output and json regardless of whether the --json flag is used.

    So it starts to make sense why the command can take quite a while for me, as I have ~14 aliases right now, and the login endpoints are over an international link (~180ms round trip). There is limited concurrency of the auth checks, but only 3 at a time (at time of writing).

    It would be useful for someone like me if (at least on startup) IC could just scan the $HOME/.sfdx folder for aliases/usernames and then rely on force:org:display for when it needed an access token, but that does run the risk of breaking if the CLI changes its storage format.

  13. Scott Wells repo owner

    Yeah, that's the root cause of this issue in many cases...a VERY long runtime for force:org:list. I've spoken with the SFDX team about that and they've hinted about having another CLI command that encapsulates exactly what you suggested...scanning the local files and not trying to verify each of the associated connections. I'm meeting with that team next Monday during Dreamforce. I'll raise this again and see if/when they're planning to deliver that before I do it myself. In general I strongly prefer not to break the CLI's state encapsulation because the representation of that state has definitely changed frequently over time.

    Note that to help with this--for some a bit and for others in a huge way--you can do the following:

    1. Run sfdx force:org:list --clean to remove many stale orgs
    2. Run sfdx force:auth:logout on any remaining orgs that show as anything but Connected in the output for sfdx force:org:list and/or that you know you're not using any longer.

    Those two steps will reduce the CLI's list of orgs to the ones that you're truly using and will make force:org:list as fast as it can be until a better/real fix is available.

  14. Scott Wells repo owner

    Issue tracker grooming. If this is still an issue, please feel free to reopen, ideally with a concrete reproduction scenario.

  15. Log in to comment