Issue #345 resolved

ehcache eating up all memory

jedevivie
created an issue

Hi Sebastian,

I run scm-manager on a linux box. Very quickly, the scm-server becomes unresponsiveness. I have dump server memory w/ jmap and analyse it w/ eclipse memory analyser tool (pls find here attached a snapshot of heap analyse).

The analyse shows that ehcache is filling up 98% of jvm memory and this memory is not freed by GC.

Could you help me in tracking down this issue, pls ? Thank you

BR Jérôme

Comments (45)

  1. Sebastian Sdorra repo owner
    • changed status to open

    First of all we have to find which cache takes so much memory. Could you please install the scm-script-plugin and execute the following script, when you think the memory is filled:

    def humanReadable(long bytes) {
        if (bytes < 1024) return bytes + " B";
        int exp = (int) (Math.log(bytes) / Math.log(1024));
        String pre = "KMGTPE".charAt(exp-1);
        return String.format("%.1f %sB", bytes / Math.pow(1024, exp), pre);
    }
    
    def cacheManager = injector.getInstance(sonia.scm.cache.CacheManager.class).cacheManager;
    def cacheNames = cacheManager.getCacheNames();
    for ( def cacheName : cacheNames ){
        def cache = cacheManager.getCache(cacheName);
        println cache.name + ": " + humanReadable(cache.calculateInMemorySize());
    }
    
  2. Sebastian Sdorra repo owner

    How did you installed the script-plugin? It looks like the groovy-all.jar is missing.

    Yesterday i have released a new version of the activity-plugin. This new version should reduce the size of the "sonia.cache.activity" cache massively.

    Fix for the activity-plugin: c18980a0

  3. jedevivie reporter

    Hi Sebastian,

    Thank you for this quick reply. I have install scm-script-plugin on our test environment and execute script w/o success (pls see here attached, a screenshoot).

    Then I have moved our repos on another server and continue my investigations on java heap dump file. "heap6.jpg" and "scm-activity.jpg" screenshot might help us to nail down this issue. It seems that "sonia.scm.activity.Activity" object is keeping in memory every file path of every repo. The zip of dump file is too big to be sent (750MB) but pls let me know if I can further analysis.Thank you

    BR Jérôme

  4. jedevivie reporter

    Hi Sebastian,

    I've relaunch the script w/ another browser (chrome instead of firefox) and I get an output for the script. Here is it:

    sonia.cache.repository.blame: 0 B sonia.cache.search.users: 0 B sonia.cache.search.groups: 0 B sonia.cache.plugins: 32.0 KB sonia.cache.activity: 233.4 MB sonia.cache.cmd.branches: 212.8 KB sonia.cache.repository.browser: 0 B sonia.cache.cmd.blame: 0 B sonia.cache.auth: 2.8 KB sonia.scm.hg.packages: 0 B sonia.cache.repository.changesets: 0 B sonia.cache.cmd.browse: 41.5 KB sonia.cache.authorizing: 832 B sonia.cache.cmd.tags: 3.1 MB sonia.cache.cmd.log: 828.4 MB

    BR Jérôme.

  5. jedevivie reporter

    Thank you Sebastian. I will be ooo and will not have time to test new versions before 13 march. I will keep you posted

    BR Jérôme

  6. jedevivie reporter

    Hi Sebastian,

    I have try 1.29-SNAPSHOT w/ latest plugins for scripts and activity. Here is the outcome: After startup, scm-manager consume 129 MB of memory. After 1st login, activity plugin is scanning all repos and memory consumption raise to 660MB and stops once latest activities is displayed. Now, script plugin works w/ firefox and here is output: sonia.cache.repository.blame: 0 B sonia.cache.search.users: 0 B sonia.cache.search.groups: 0 B sonia.cache.plugins: 32.0 KB sonia.cache.activity: 57.3 KB sonia.cache.cmd.branches: 209.1 KB sonia.cache.repository.browser: 0 B sonia.cache.cmd.blame: 0 B sonia.cache.auth: 2.8 KB sonia.cache.repository.changesets: 0 B sonia.cache.cmd.browse: 480 B sonia.cache.authorizing: 848 B sonia.cache.cmd.tags: 778.1 KB sonia.cache.cmd.log: 51.0 MB

    I have continue to create multiple local admins account and logged w/ each of them. The memory consumption raise by 50MB w/ each new account but stops at 1GB. (Note: the overall memory is 8GB and Java VM limit is 2GB). One file continue to grow: sonia%002ecache%002ecmd%002elog.data to reach 3GB. I stopped my analysis here. The good point is that now memory consumption seems limited. Is it possible to set a size limit for the cache file, pls ?

    I have also try the script "clear cache" from script plugin but w/o success. It dumps a stack trace on the browser. Maybe i can log a separate ticket for this issue.

    Thank you BR Jérôme

  7. Sebastian Sdorra repo owner

    The clear script clears only the cache of a single repository. You have to configure the type and the name variable:

    // identify your repository
    def type = "hg";
    def name = "scm-script-plugin";
    

    The cache files are limited, but not by size in byte, they are limited by the count of the items. Please have a look at the ehcache configuration:

    Could you please execute the following script and post the output?

    def humanReadable(def bytes) {
        if (bytes < 1024) return bytes + " B";
        int exp = (int) (Math.log(bytes) / Math.log(1024));
        String pre = "KMGTPE".charAt(exp-1);
        return String.format("%.1f %sB", bytes / Math.pow(1024, exp), pre);
    }
    
    def sizeOf(Serializable object){
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(baos);
        oos.writeObject(object);
        oos.close();
        return baos.size();
    }
    
    Runtime runtime = Runtime.getRuntime();
    def maxMemory = runtime.maxMemory();
    def allocatedMemory = runtime.totalMemory();
    def freeMemory = runtime.freeMemory();
    
    println "memory:"
    println "  - max       : " + humanReadable(maxMemory);
    println "  - allocated : " + humanReadable(allocatedMemory);
    println "  - free      : " + humanReadable(freeMemory);
    
    println "";
    println "";
    
    def cacheManager = injector.getInstance(sonia.scm.cache.CacheManager.class).cacheManager;
    
    def totalCMax = 0;
    
    def cacheNames = cacheManager.getCacheNames();
    for ( def cacheName : cacheNames ){
        def cache = cacheManager.getCache(cacheName);
        def totalMemory = cache.calculateInMemorySize();
        def average = cache.getSize() > 0 ? Math.round( totalMemory / cache.getSize() ) : 0;
        def max = cache.getCacheConfiguration().getMaxEntriesLocalHeap();
        def maxSize = average * max;
    
        totalCMax += maxSize;
    
        println cache.name + ":";
        println "  - size    : " + cache.getSize();
        println "  - maxsize : " + max;
        println "  - average : " + humanReadable( average );
        println "  - current : " + humanReadable( totalMemory );
        println "  - cmax    : " + humanReadable( maxSize );
    }
    
    println "";
    println "";
    println "Total CMax: " + humanReadable(totalCMax);
    
  8. jedevivie reporter

    Sure. Here is the output:

    memory: - max : 1.9 GB - allocated : 558.4 MB - free : 82.1 MB

    sonia.cache.repository.blame: - size : 0 - maxsize : 100 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.search.users: - size : 0 - maxsize : 10000 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.search.groups: - size : 0 - maxsize : 1000 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.plugins: - size : 1 - maxsize : 5 - average : 32.0 KB - current : 32.0 KB - cmax : 160.1 KB sonia.cache.activity: - size : 2 - maxsize : 100 - average : 57.3 KB - current : 114.5 KB - cmax : 5.6 MB sonia.cache.cmd.branches: - size : 137 - maxsize : 1000 - average : 1.5 KB - current : 209.1 KB - cmax : 1.5 MB sonia.cache.repository.browser: - size : 0 - maxsize : 200 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.cmd.blame: - size : 0 - maxsize : 1000 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.auth: - size : 2 - maxsize : 1000 - average : 2.5 KB - current : 4.9 KB - cmax : 2.4 MB sonia.cache.repository.changesets: - size : 0 - maxsize : 200 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.cmd.browse: - size : 1 - maxsize : 3000 - average : 480 B - current : 480 B - cmax : 1.4 MB sonia.cache.authorizing: - size : 2 - maxsize : 1000 - average : 832 B - current : 1.6 KB - cmax : 812.5 KB sonia.cache.cmd.tags: - size : 1 - maxsize : 500 - average : 778.1 KB - current : 778.1 KB - cmax : 379.9 MB sonia.cache.cmd.log: - size : 2386 - maxsize : 200 - average : 21.9 KB - current : 51.1 MB - cmax : 4.3 MB

    Total CMax: 396.0 MB

  9. Sebastian Sdorra repo owner

    I've done some more test. I've imported the following repositories:

    And used the following script to fill the "sonia.cache.cmd.log" cache:

    import sonia.scm.repository.*;
    import sonia.scm.repository.api.*;
    
    def repositoryManager = injector.getInstance(RepositoryManager.class);
    def repositoryServiceFactory = injector.getInstance(RepositoryServiceFactory.class);
    
    def pageSize = 20;
    
    for ( def repository : repositoryManager.getAll() ){
        def repositoryService = null;
        try {
            repositoryService = repositoryServiceFactory.create(repository);
            def log = repositoryService.getLogCommand();
            log.setPagingLimit(pageSize);
    
            def result = log.getChangesets();
    
            int total = result.getTotal();
    
            for (int i = pageSize; i < total; i = i + pageSize)
            {
                result = log.setPagingStart(i)
                    .setPagingLimit(pageSize)
                    .getChangesets();
            }
        } finally {
            repositoryService?.close();
        }
    }
    

    After the script has finished i get the following stats from ehcache for this cache:

    sonia.cache.cmd.log:
      - size    : 2027
      - maxsize : 200
      - average : 1.8 KB
      - current : 3.5 MB
      - cmax    : 358.0 KB
    

    During script run the "used heap" of the jvm claims never above 750MB (started with -Xms2G -Xmx2G and checked with jvisualvm) and the disk cache was about 6MB:

    $ ll -h sonia%002ecache%002ecmd%002elog.data
    -rw-r--r--  1 sdorra  staff   5.9M Mar 15 09:39 sonia%002ecache%002ecmd%002elog.data
    

    So i'm not sure how your disk cache grows up to 3GB. There must be something special in the case of your repositories. Perhaps we should find a way that you can modify the cache sizes by yourself. What do you think?

  10. jedevivie reporter

    Yes, i think too. I've keep continue charging my test server. The cache file for sonia.cache.cmd.log have now 28536 objects and is 6GB on disk.

    ls -l sonia%002ecache%002ecmd%002elog.data -rw-rw-r-- 1 scmmgr scmmgr 6194735889 Mar 15 08:36 sonia%002ecache%002ecmd%002elog.data

    The cache size has double during the night (w/o any action). Maybe, we should set a time to live. Other caches does not seems so hungry. Here are new cache values from scripts:

    memory: - max : 1.9 GB - allocated : 1.3 GB - free : 703.4 MB

    sonia.cache.repository.blame: - size : 0 - maxsize : 100 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.search.users: - size : 0 - maxsize : 10000 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.search.groups: - size : 0 - maxsize : 1000 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.plugins: - size : 1 - maxsize : 5 - average : 32.0 KB - current : 32.0 KB - cmax : 160.1 KB sonia.cache.activity: - size : 22 - maxsize : 100 - average : 154.4 KB - current : 3.3 MB - cmax : 15.1 MB sonia.cache.cmd.branches: - size : 137 - maxsize : 1000 - average : 1.6 KB - current : 212.5 KB - cmax : 1.5 MB sonia.cache.repository.browser: - size : 0 - maxsize : 200 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.cmd.blame: - size : 0 - maxsize : 1000 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.auth: - size : 22 - maxsize : 1000 - average : 977 B - current : 21.0 KB - cmax : 954.1 KB sonia.cache.repository.changesets: - size : 0 - maxsize : 200 - average : 0 B - current : 0 B - cmax : 0 B sonia.cache.cmd.browse: - size : 4 - maxsize : 3000 - average : 1.4 KB - current : 5.7 KB - cmax : 4.2 MB sonia.cache.cmd.tags: - size : 4 - maxsize : 500 - average : 798.2 KB - current : 3.1 MB - cmax : 389.7 MB sonia.cache.authorizing: - size : 23 - maxsize : 1000 - average : 833 B - current : 18.7 KB - cmax : 813.5 KB sonia.cache.cmd.log: - size : 28536 - maxsize : 200 - average : 2.2 KB - current : 60.6 MB - cmax : 435.0 KB

    Total CMax: 412.8 MB

  11. Sebastian Sdorra repo owner

    This is really strange, because the output show a size of 28536 for the log cache. But the maximum for this cache should be 10200 (200 in memory and 10000 on disk). I will try to reproduce this.

  12. Sebastian Sdorra repo owner

    I've created a mechanism that allows you to override cache configurations and the default configuration: 443c26ce

    To override a cache, you have to create a file called ext/cache.xml in the home directory of scm-manager. Here is a sample which overrides the configuration for the sonia.cache.cmd.log cache:

    <ehcache xmlns="http://ehcache.org/ehcache.xsd">
    
      <cache
            name="sonia.cache.cmd.log"
            maxEntriesLocalHeap="200"
            maxEntriesLocalDisk="10000"
            eternal="true"
            overflowToDisk="true"
            diskPersistent="false"
            copyOnRead="true"
            copyOnWrite="true"
      />
    
    </ehcache>
    

    The effective cache configuration is printed to the trace log after merging.

    Could you please test the following version:

  13. Sebastian Sdorra repo owner

    Ok, this log shows a problem with ehcache:

    15:56:41.439 [qtp1590441341-25] WARN  net.sf.ehcache.pool.sizeof.ObjectGraphWalker - The configured limit of 1,000 object references was reached while attempting to calculate the size of the object graph. Severe performance degradation could occur if the sizing operation continues. This can be avoided by setting the CacheManger or Cache <sizeOfPolicy> elements maxDepthExceededBehavior to "abort" or adding stop points with @IgnoreSizeOf annotations. If performance degradation is NOT an issue at the configured limit, raise the limit value using the CacheManager or Cache <sizeOfPolicy> elements maxDepth attribute. For more information, see the Ehcache configuration documentation
    

    And this seems to result in the following exception:

    16:05:02.008 [sonia%002ecache%002ecmd%002elog.data] ERROR net.sf.ehcache.store.disk.DiskStorageFactory - Disk Write of sonia.scm.repository.api.LogCommandBuilder$CacheKey@d9e602d8 failed: 
    java.lang.NullPointerException: null
        at net.sf.ehcache.store.FrontEndCacheTier.isEvictionCandidate(FrontEndCacheTier.java:713) ~[ehcache-core-2.6.3.jar:na]
        at net.sf.ehcache.store.disk.Segment.fault(Segment.java:789) ~[ehcache-core-2.6.3.jar:na]
        at net.sf.ehcache.store.disk.DiskStore.fault(DiskStore.java:823) ~[ehcache-core-2.6.3.jar:na]
        at net.sf.ehcache.store.disk.DiskStorageFactory$DiskWriteTask.call(DiskStorageFactory.java:486) ~[ehcache-core-2.6.3.jar:na]
        at net.sf.ehcache.store.disk.DiskStorageFactory$PersistentDiskWriteTask.call(DiskStorageFactory.java:1088) [ehcache-core-2.6.3.jar:na]
        at net.sf.ehcache.store.disk.DiskStorageFactory$PersistentDiskWriteTask.call(DiskStorageFactory.java:1072) [ehcache-core-2.6.3.jar:na]
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) [na:1.7.0]
        at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.7.0]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [na:1.7.0]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [na:1.7.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0]
        at java.lang.Thread.run(Unknown Source) [na:1.7.0]
    

    This looks like there is a really deep reference in some changesets. Could you create a another memory dumb and look for those deep references or send me the memory dumb?

  14. jedevivie reporter

    My dump is 2GB: too big to send...

    Here attached, i've do a snapshot of dominator tree. It appears that elements of type "sonia.scm.repository.api.LogCommandBuilder$CacheKey" are duplicated 20 times in cache (=number of users connected in my test) . I've circled two twin in red. This problem appraise many times. Could you have a look, pls ? Thank you

  15. Sebastian Sdorra repo owner

    Disable ehcache makes it nearly impossible to use features like the activity-plugin. Could you show me the references of a "sonia.scm.repository.api.LogCommandBuilder$CacheKey"?

  16. Sebastian Sdorra repo owner

    Hi, I've spend a few hours for investigate the dumb, but i found nothing special. After that i tried to reproduce the problems and spend hours of debugging ehcache without success. I could only reproduce one problem: ehcache does not honor the local disk limit and start producing npe's after the disk byte limit is reached. But i could not reproduce such a big disk store. I think we have to investigate the elements of the cache.

    Next steps for me:

    • create a small program to reproduce the ehcache problems and open a ticket at terracotta
    • script for the script-plugin to check size of the elements in the log cache
    • check other cache engines for a possible switch
  17. jedevivie reporter

    Thank you for your time. To reproduce such a big disk store, you can use the linux kernel tree. Fyi, we're using crowd plugin and my scenario was based on using multiple accounts. Could it be related to our issue?

    Pls, let me know how can i help.

    BR Jérôme

  18. Sebastian Sdorra repo owner

    I mean the cached disk store, the linux kernel is already one of my test repositories. I don't think it is multi user problem.

    I've create a second cache implementation based on google guava:

    This implementation does not have a disk store and can only be limited by elements and not by size. You can check the actual size of the caches, with the following script:

    def cacheManager = injector.getInstance( sonia.scm.cache.CacheManager.class );
    for ( def cache : cacheManager.cacheMap.values() ){
        println cache.name + ": " + cache.cache.size();
    }
    

    If you want to override cache configuration you could create a file called ext/gcache.xml in the scm home directory. For configuration examples have a look at the default implementation, the cache configuration classes and the guava cache builder:

    I've done a some test with the new implementation and this new implementation seems much more faster and does respect the limits. Could you please test the version below:

  19. Log in to comment