Linux generic cache detection is incorrect

Issue #63 resolved
Al Grant created an issue

The logic in linux-memory.c generic_get_memory_info() isn't correct.

It looks at the cpu0/cache node and iterates through the caches. The intention is to collect information about caches at each level. There may be multiple caches at a given level (typically at L1 there will be I and D). PAPI's data structure allows for this. There is a 'level count' that is incremented so that multiple caches can be collected per level.

The bug is in the lines

if (level != last_level) {
  level_count = 0;
  last_level=level;
} else {
  level_count++;
}

This assumes that for a given level, you see all the caches at that level, then you go to the next level. But in fact sysfs may return the caches in random order.

An actual example: * index2: level 2, unified cache * index0: level 1, data cache * index3: level 3, unified cache * index1: level 1, instruction cache

Because index1 is at a different level from index3, level_count will be reset to 0. So in PAPI's structures, the L1I information will overwrite the L1D information. The knowledge about L1D will be lost.

This could be fixed one of two ways: * iterate through the sysfs cache nodes in index order * in PAPI_mh_level_t, maintain a count of the caches seen at this level

Comments (4)

  1. Al Grant reporter
    diff --git a/src/linux-memory.c b/src/linux-memory.c
    index e39b5742..f540b69c 100644
    --- a/src/linux-memory.c
    +++ b/src/linux-memory.c
    @@ -907,7 +907,7 @@ generic_get_memory_info( PAPI_hw_info_t *hw_info )
        char filename[BUFSIZ],type_string[BUFSIZ];
        struct dirent *d;
        int max_level=0;
    -   int level_count=0,last_level=-1,level_index=0;
    +   int level_count,level_index;
    
        PAPI_mh_level_t *L = hw_info->mem_hierarchy.level;
    
    @@ -920,6 +920,12 @@ generic_get_memory_info( PAPI_hw_info_t *hw_info )
            goto unrecoverable_error;
        }
    
    +   for (level_index=0; level_index < PAPI_MAX_MEM_HIERARCHY_LEVELS; ++level_index) {
    +       for (level_count = 0; level_count < PAPI_MH_MAX_LEVELS; ++level_count) {
    +           L[level_index].cache[level_count].type = PAPI_MH_TYPE_EMPTY;
    +       }
    +   }
    +
        while(1) {
            d = readdir(dir);
            if (d==NULL) break;
    @@ -950,11 +956,12 @@ generic_get_memory_info( PAPI_hw_info_t *hw_info )
            /* Index arrays from 0 */
            level_index=level-1;
    
    -       if (level!=last_level) {
    -           level_count=0;
    -           last_level=level;
    -       } else {
    +       level_count = 0;
    +       while (L[level_index].cache[level_count].type != PAPI_MH_TYPE_EMPTY) {
                level_count++;
    +           if (level_count>=PAPI_MH_MAX_LEVELS) {
    +               break;
    +           }
            }
    
            if (level_count>=PAPI_MH_MAX_LEVELS) {
    
  2. Vince Weaver

    this does look reasonable, let me test the results on a few more machines and then I can get it committed.

  3. Log in to comment