1. Nick Coghlan
  2. walkdir
Issue #17 open

Support changing the match function for filtering

Nick Coghlan
repo owner created an issue

Current filtering forces the use of fnmatch.fnmatch for inclusion and exclusion checks.

This has the signature "fnmatch(value, pattern)".

It would be nice to be able to easily replace that in the filtering operations with re.search, which has the signature "search(pattern, value)".

Comments (3)

  1. Nick Coghlan reporter
    • changed status to open

    In addition to supporting regex matching, the glob.rglob() discussion on the Python tracker suggested it would also be good to support matching against the full path of an entry, rather than just the final path segment.

    Current plan:

    Create a "WalkFilter" ABC.

    class WalkFilter(abc.AbstractClass):
        """Handle path pattern matching for file and directory walk filtering"""
        def __init__(self, pattern):
            self.pattern = pattern
        @abstractmethod
        def match(self, dirpath, entry):
            """Does the supplied directory entry meet the filter criteria?"""
            pass
        def filter(self, dirpath, entries):
            """Filter a list of directory entries to only those that meet the filter criteria"""
            _match = self.match
            for entry in entries:
                if match(dirpath, entry):         
                    yield entry
    

    Define 4 concrete subclasses:

        import fnmatch
        import re
    
        class NameFilter(WalkFilter):
            def match(self, dirpath, entry):
                return fnmatch.fnmatch(entry, self.pattern)
            def filter(self, dirpath, entry):
                return fnmatch.filter(entries, self.pattern)
    
        class PathFilter(WalkFilter):
            def match(self, dirpath, entry):
                # crib match algorithm from glob.glob()
            def filter(self, dirpath, entry):
                # More efficient version that lifts any invariants out of the loop
    
        class NameRegex(WalkFilter):
            # Like NameFilter, but compiling the pattern as a regular expression
    
        class PathRegex(WalkFilter):
            # Like PathFilter, but compiling the pattern as a regular expression
    

    Accept an optional filter argument (defaulting to NameFilter in:

    • include_files, exclude_files, include_dirs, exclude_dirs
    • filtered_walk (this will then apply to *all* pipeline stages that accept a filter definition. To use different filters for different stages, construct the pipeline manually instead of using the convenience functions)
  2. Nick Coghlan reporter

    While the above is my current preferred approach, I also considered an alternative strategy that makes the choice between path and basename filtering a boolean flag, and uses separate filter definitions only to change the underlying match technique.

    This ended up feeling a lot clumsier, since only the path matching or the base name matching was likely to be used for any given filter instance. You pass in different kinds of patterns for filtering base names than you do for filtering paths, so it doesn't make sense to invoke both sets of methods against a common pattern definition.

  3. Log in to comment