Profiling has identified that accessing voxels is one of the main bottlenecks of CubicSurfacExtraction. For each voxel being processed the algorithm looks at the current voxel and three of it's neighbours to determine whether to generate quads. That's four voxel reads for each iteration.
It would probably be beneficial to cache slices of voxel data in a similar way to how the MarchingCubesSurfaceExtractor does it. We should read a whole slice into an array, run the algorithm on that slice using the cached data, and then repeat for the next slice. This way we shouldn't have do do a full read on any voxel more than once.