Currently, Carpet's prolongation operators are only parallelized in one direction (the largest). This happens before calling the actual operator in call_operator() (CarpetLib). With increasing core(thread)-counts this is a problem. There are hardly enough points in any single direction to make this efficient.
Wouldn't it be much better (in terms of openmp efficiency) to let the operators handle openmp-parallelization themselves? I currently see quite a large overhead for real-world parfiles with 32 threads just because of this.