Project creation is slow with CLI connections

Issue #1699 resolved
Aidan Harding created an issue

When I create a new project (MD, not DX) with a CLI connection, the metadata seems to get fetched indexed one item at a time. With an old-style connection, everything is fetched first and indexing happens after all the local files are present.

The time difference between these seems to be 10-20 minutes, making the CLI connections unusable for short-term projects with lots of code. I’ll often want to create an IC project against a UAT sandbox just so I can poke around to quickly navigate the code. Since indexing disables functionality on all IDE windows, I can’t even leave the one project to sort itself out while I work on something else.

I it possible to get CLI connections to behave more like the old ones in this respect?

Comments (21)

  1. Scott Wells repo owner

    Aidan, I haven't seen this behavior even on larger projects. Almost all of my projects are now based on CLI connections. Could you provide a log from such a project creation--even in part--so I could see if something else is going on. I wouldn't ever expect indexing to take 10-20 minutes no matter what.

  2. Aidan Harding reporter

    Hi Scott,

    Imagine my surprise, when I try this for the purpose of collecting logs and it just works quickly!

    Do you mind if we just keep this one open for a bit? I’ll leave the log settings on and send you data if/when it happens again

  3. Scott Wells repo owner

    By the way, I had a friend who called these "Heisenbugs" because they change behavior when observed! Always thought that was quite clever...

  4. Aidan Harding reporter

    Haha… that’s a good name!

    I managed to catch the issue. It took 15 minutes to generate a new project with a fairly sizeable, but not totally huge org. One thing that was different this time was that I had many Webstorm windows open.

    The bit that seemed to take a long time was unpacking the archive of metadata into the project. This seemed to kick off file-system processing/indexing on each individual file, rather than waiting until they were all unpacked.

    Here’s my Webstorm info:

    WebStorm 2020.2.1
    Build #WS-202.6948.73, built on August 25, 2020
    Licensed to Nebula Consulting Ltd / Aidan Harding
    Subscription is active until February 5, 2021
    Runtime version: 11.0.8+10-b944.31 x86_64
    VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o.
    macOS 10.15.5
    GC: ParNew, ConcurrentMarkSweep
    Memory: 4966M
    Cores: 4
    Non-Bundled Plugins: Statistic, com.illuminatedcloud2.intellij

    And Illuminated Cloud is on 2.1.3.7

  5. Scott Wells repo owner

    Thanks, Aidan. Here's a breakdown of where the time was spent:

    Beginning of log

    2020-09-03 14:25:48,265 [536560875]   INFO - cloud.client.ForceComApiClient - Logging into Salesforce DX as servomex-org@nebulaconsulting.co.uk.fsluat. 
    

    Logging into org using CLI: 30 secs

    2020-09-03 14:25:48,276 [536560886]   INFO - cloud.client.ForceComApiClient - Retrieving org details for servomex-org@nebulaconsulting.co.uk.fsluat from Salesforce CLI. 
    2020-09-03 14:25:48,276 [536560886]  DEBUG - atedcloud.util.CommandLineUtil - Running command line: '/usr/local/bin/sfdx force:org:display -u servomex-org@nebulaconsulting.co.uk.fsluat --json' from working directory: '/usr/local/bin'. 
    ...
    2020-09-03 14:26:18,423 [536591033]  DEBUG - atedcloud.util.CommandLineUtil - Returning command-line response: CommandLineResponse{exitCode=0, output='{  "status": 0,  "result": {    "username": "servomex-org@nebulaconsulting.co.uk.fsluat",    "id": "00D3G0000008aKtUAI",    "connectedStatus": "Connected",    "accessToken": "00D3G0000008aKt!ARcAQByoHnwzIGhmXeF_5IqQ9AuArBMoUxno2OFadB1PFRkGRdl7x8rM.11FBScwHzeKEyRz6ANZ8x62pqAs.HQBmGDU5DNy",    "instanceUrl": "https://servomex--fsluat.my.salesforce.com",    "clientId": "PlatformCLI"  }}', error='', duration=30147} 
    

    Enumerating org metadata: 13 secs

    2020-09-03 14:26:18,482 [536591092]   INFO - cloud.client.ForceComApiClient - Retrieving org details from Salesforce CLI. 
    ...
    2020-09-03 14:26:31,701 [536604311]   INFO - onfigCache.FilePropertiesCache - Fixing packaged layout metadata name issue: pi__Category_Contact_Score__c-Pardot Category Score %28Contact%29 Layout => pi__Category_Contact_Score__c-pi__Pardot Category Score %28Contact%29 Layout 
    ...
    2020-09-03 14:26:37,173 [536609783]   INFO - onfigCache.FilePropertiesCache - Enumerated metadata for connection servomex-org@nebulaconsulting.co.uk.fsluat in 18 s 425 ms. 
    

    Missing 10 secs here?

    2020-09-03 14:26:47,515 [536620125]   INFO - BridgeProjectLifecycleListener - Using legacy project model to open project 
    

    Retrieving org metadata: 24 secs

    2020-09-03 14:26:52,367 [536624977]  DEBUG - lder.ForceComMetadataRetriever - Retrieving metadata for module servomex_org_nebulaconsulting_co_uk_fsluat 
    ...
    2020-09-03 14:27:00,473 [536633083]  DEBUG - lder.ForceComMetadataRetriever - Retrieve complete. 
    2020-09-03 14:27:16,365 [536648975]  DEBUG - lder.ForceComMetadataRetriever - Retrieved an archive of size 4382632 bytes. 
    

    Populating local project based on retrieved org metadata: 8 mins?!

    2020-09-03 14:27:16,663 [536649273]  DEBUG - lder.ForceComMetadataRetriever - Placing retrieved files in the diff directory. 
    ...
    2020-09-03 14:27:17,471 [536650081]  DEBUG - lder.ForceComMetadataRetriever - Refreshing the virtual file system before comparison. 
    2020-09-03 14:27:17,857 [536650467]  DEBUG - lder.ForceComMetadataRetriever - Populating files for module = servomex_org_nebulaconsulting_co_uk_fsluat, diffRoot = /private/var/folders/y8/nmc4sm411k5dqlzs76xrgrtr0000gn/T/servomex_org_nebulaconsulting_co_uk_fsluat-servomex_org_nebulaconsulting_co_uk_fsluat-retrieve/diff, postRetrieveAction = POPULATE, onFinish = set 
    ...
    2020-09-03 14:35:05,687 [537118297]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/dlrs_OpportunityLineItemTest.cls-meta.xml. 
    ...
    2020-09-03 14:35:06,414 [537119024]   INFO - lder.ForceComMetadataRetriever - Retrieved 619/619 components from servomex-org@nebulaconsulting.co.uk.fsluat in 8 m 14 s 47 ms with status SUCCEEDED. 
    

    Updating metadata subscription: 3 secs

    2020-09-03 14:35:06,914 [537119524]  DEBUG - ontentsSynchronizationListener -   Removing all files which should not be processed: 
    ...
    2020-09-03 14:35:09,147 [537121757]  DEBUG - ontentsSynchronizationListener - Updating Selected subscription. 
    

    Missing 20 secs here? Indexing?

    Generating OST: 2 mins

    2020-09-03 14:35:30,999 [537143609]   INFO - ct.OfflineSymbolTableGenerator - Starting activity Loading installed package namespaces with 1 steps. 
    ...
    2020-09-03 14:37:34,354 [537266964]   INFO - ct.OfflineSymbolTableGenerator - Offline symbol table generated for servomex-org@nebulaconsulting.co.uk.fsluat in 2 m 3 s 357 ms. 
    

    That accounts for ~12 minutes of which the most curious is the 8 minutes spent populating the local project based on the retrieved metadata. Given that only 619 files were actually retrieved, that's perplexing. Out of that, by far the most time is spent actually copying the organized retrieved files into your local project. That covers the following range of time:

    2020-09-03 14:27:17,857 [536650467]  DEBUG - lder.ForceComMetadataRetriever - Populating files for module = servomex_org_nebulaconsulting_co_uk_fsluat, diffRoot = /private/var/folders/y8/nmc4sm411k5dqlzs76xrgrtr0000gn/T/servomex_org_nebulaconsulting_co_uk_fsluat-servomex_org_nebulaconsulting_co_uk_fsluat-retrieve/diff, postRetrieveAction = POPULATE, onFinish = set 
    2020-09-03 14:27:17,858 [536650468]  DEBUG - lder.ForceComMetadataRetriever - retrievedRoot = /private/var/folders/y8/nmc4sm411k5dqlzs76xrgrtr0000gn/T/servomex_org_nebulaconsulting_co_uk_fsluat-servomex_org_nebulaconsulting_co_uk_fsluat-retrieve/diff 
    2020-09-03 14:27:17,858 [536650468]  DEBUG - lder.ForceComMetadataRetriever - localRoot = /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat 
    2020-09-03 14:27:18,246 [536650856]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/TestHttpMock.cls-meta.xml. 
    2020-09-03 14:27:18,269 [536650879]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/TaskCheckForHoliday.cls. 
    2020-09-03 14:27:18,306 [536650916]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/MetadataApexApiLoader.cls-meta.xml. 
    ...
    2020-09-03 14:35:04,234 [537116844]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/AccountHierarchyTest.cls-meta.xml. 
    2020-09-03 14:35:05,582 [537118192]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/CPAppChecklistController.cls-meta.xml. 
    2020-09-03 14:35:05,687 [537118297]   INFO - lder.ForceComMetadataRetriever - Populated /Users/aidan/WebstormProjects/servomex_org_nebulaconsulting_co_uk_fsluat/src/classes/dlrs_OpportunityLineItemTest.cls-meta.xml. 
    

    What kind of disk is this? SSD or platter-based? I could look at parallelizing that step, though if it's a platter-based disk it's likely just going to thrash your disk. That may be the best next step, though, as this is definitely where the lion's share of time is spent, specifically time that could be tuned for better performance. Things like the metadata retrieval and OST generation are likely as tuned as they're going to get, or very close to it.

    I'll go ahead and look at parallelization of the actual local project file population for the next-next build (the next one will be released today). I'll likely give you early access to that change here to see if it helps first.

  6. Aidan Harding reporter

    Thanks for the analysis Scott!

    It’s an internal SSD on a 2015 Macbook Pro.

    I’m totally fine with the retrieval and OST generation. It takes a little while, but it’s manageable.

    I’ve really never seen this problem with traditional connections - I don’t see anything indicating why CLI would be worse, but that’s been my experience of it.

    Thanks again for your help

  7. Scott Wells repo owner

    Sure, no problem. So odd that it would take that long on an SSD to copy 619 files. Let me make sure that I'm doing that with indexing temporarily disabled and it's not just sitting there starting and restarting the file-based indexer.

    As for the difference between connection types, where the time is being spent here is going to be identical between the two. 2/3 of the time is spent copying those files, and that happens either way. I'm not saying there's not some difference in terms of the invocation path, but the long pole code here is going to be the same.

    I'll keep investigating a bit. I'm sure we can get that down to where it should be.

  8. Scott Wells repo owner

    Okay, I've confirmed that local file population occurs with indexing paused, so that's not the issue. I'll tinker with parallelization of that step today and provide you a test build to try out once I feel like it's good to go.

  9. Scott Wells repo owner

    I've implemented parallel copying of files from the temporary retrieve directory into the local project, and the results aren't terribly promising. I benchmarked (briefly) with two projects, one that includes ~500 metadata files and another that includes ~4000 metadata files. I gathered times for local project population when there are no existing files as would be the case with new project creation and when it's overwriting existing files, and I did each of these with parallel processing enabled and disabled. Here are the results:

    537 metadata files

    Parallel fresh - 6s

    2020-09-03 10:55:20,577 [ 336380]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 10:55:26,518 [ 342321]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    Parallel replace - 100ms

    2020-09-03 10:57:17,059 [ 452862]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 10:57:17,153 [ 452956]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    Serial fresh - 7s

    2020-09-03 11:05:06,723 [ 922526]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 11:05:13,839 [ 929642]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    Serial replace - 250ms

    2020-09-03 11:03:59,708 [ 855511]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 11:03:59,950 [ 855753]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    4000 metadata files

    Parallel fresh - 61s

    2020-09-03 11:23:59,295 [2055098]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 11:25:00,252 [2116055]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    Parallel replace - 2s

    2020-09-03 11:35:21,467 [2737270]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 11:35:23,508 [2739311]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    Serial fresh - 40s

    2020-09-03 11:12:20,096 [1355899]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 11:13:00,139 [1395942]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    Serial replace - 6.5s

    2020-09-03 11:20:09,159 [1824962]   INFO - .ForceComSfdxMetadataRetriever - Populated <first>
    2020-09-03 11:20:15,483 [1831286]   INFO - .ForceComSfdxMetadataRetriever - Populated <last>
    

    As you can likely see, at least on my machine (Surface Laptop 3, i7, 16GB, 512GB SSD), the gains for parallel processing aren't very significant, and for the larger data set, the performance is actually worse on initial population. However, I'm also not seeing the same behavior that you're seeing where the population phase is taking an exorbitant amount of time. It also sounds like you're not seeing it consistently. Unfortunately my Mac test machine doesn't have an SSD, so any testing I might do on it isn't going to be an accurate representation either.

    I'm happy to provide this build with parallel processing enabled to you if you'd like to measure the impact on your specific machine. Without that additional information, I'm hesitant to introduce the change into the main build because I'm not seeing significant (or even consistent) gains.

    Thoughts?

  10. Aidan Harding reporter

    Looking at the status bar when it’s running, it looks to me like it is trying to redo the indexing on each individual file. It appears that the new project has indexing paused, but the other instances have indexing running.

    I captured a short video of the status bar of another instance while my new project is being created. You can see the indexing hammering away on/off

  11. Scott Wells repo owner

    Ah, I see. I never had another project open. I'm not sure if I'm going to be able to do anything about that as I'm already running the population phase telling the indexer to pause. I think I can only do that for a single project at a time, though. I'll investigate and see if there's a way I can pause it process-wide instead.

  12. Aidan Harding reporter

    That seems like a bug in IntelliJ/Webstorm then… having two projects open with disjoint file systems shouldn’t mean that they affect each other.

    I really hope this can be solved because I often have multiple instances (right now: 6). And it’s going to be common for folks working with unlocked packaging to sometimes be working in multiple environments simultaneously.

  13. Scott Wells repo owner

    The good news is that I can reproduce this easily now with another project open. I've opened a dialog with the folks at JetBrains as the other open project shouldn't care about those files being copied at all since they're not within the other project's configured content roots. Hopefully they'll produce a workaround and/or a fix.

  14. Scott Wells repo owner

    For now your workaround is going to be to close other projects while retrieving large sets of metadata, but hopefully we'll get a workaround/fix from JetBrains quickly.

  15. Scott Wells repo owner

    I'm resolving this since the core issue is in the base JetBrains IDE. I've seen status changes on the linked YouTrack issue that lead me to believe that it will be addressed in an upcoming base IDE build.

  16. Log in to comment