1. Ronald Oussoren
  2. modulegraph
  3. Issues
Issue #17 resolved

How to diagnose a MissingModule

created an issue

Hello again. The current PyInstaller issues warning diagnostics for missing modules. These warnings have 3 pieces of information. I would like to generate the same kinds of warnings, when I find a MissingModule node in a ModuleGraph.

(1) Name of the missing module. This is node.identifier -- no problem.

(2) Name of what imported it. After some experiments it seems I can get this with code similar to the following -- please say if this is approved.

    ntype = type(node).__name__
    if ntype == 'MissingModule' :
        print("Missing module " + node.identifier )
        _, iter_inc = myModuleGraph.get_edges(node)
        for importer in iter_inc :
           imptype = type(importer).__name__
           impident = importer.identifier
            if imptype == 'Script' :
                impident = os.path.basename(impident)
            print( "   it was imported by {0} {1}".format(imptype, impident))

So far, so good. Now the hard part, (3) The kind of import. PyInstaller's warnings tell the user HOW the missing module is imported:

  • "top level" meaning, named in an import or from at the global level.
  • "deferred" meaning, import is inside a function definition
  • "conditional" meaning, import is inside an if statement

Are these bits of info stored in the ModuleGraph in any form? Can I get them out, given an importer-node and an imported-node?

For example, I note that I could test as follows:

if node.identifier is in importer.globalnames:

and also

if node.identifier is in importer.starimports:

Do these tests help me at all?

Thank you for your time.

Comments (15)

  1. Ronald Oussoren repo owner

    The code for getting who imports a missing module looks correct.

    Modulegraph currently does not store the type of import, but adding such functionality would be fine. The "top-level" vs. "deferred" distinction is pretty easy to make, "conditional" requires updates to the bytecode scanner.

    globalnames and starimports aren't useful for your usecase (they track global variables and 'from foo import *'). Both have very vague semantics at the moment.

  2. david_cortesi reporter

    Thanks for the info. I will assume that I cannot provide the import-type for now. Hope to see that feature in the future.

    I am not familiar with bytecodes. Just guessing, I suspect when you say top-level vs. deferred is pretty easy, you are referring to where scan_code recurses, around line 1071? Is that where it enters a def'd function? Anyway, thanks.

  3. Ronald Oussoren repo owner

    It is possible to determine if a code object is top level in scan_code. This is more or less the recursion part, but class bodies make it slightly more complicated ( iirc the class body has its own code object, but probably should be considered top level).

    Detecting if an import is in the body of an if statement is harder, but should be doable. I'll have to experiment a little to know for sure (probably with py 3.3 because it has a nicer "dis" module)

    I'm at europython next week and should be able to spent time on this there.

  4. david_cortesi reporter

    You have probably thought of this already, but I'll say it: one module could be imported different ways by different importers. So the type of import is not a property of either module, but of the link between. Can you store a type flag with a link? (edge_data in Objectgraph seems always to be a string.) Anyway, can edge_data be reached using edge iterators?

  5. Ronald Oussoren repo owner

    The edge data can be an arbitrary object. I'm currently not really using edge data and will probably change it to be a structured object in the future, for adding information such as the type of reference.

  6. david_cortesi reporter

    Any further thoughts on this? It turns out that for pyinstaller we really need the conditional-import detection for several things, not only the error message as I first described. Deferred would be nice but less essential.

  7. Ronald Oussoren repo owner

    I've just added a report for missing modules to py2app and now want to have this feature myself as well.

    There's a couple of things I want to do for this:

    1. modulegraph reports some false positives that can easily be suppressed, such as reporting "collections.defaultdict" as a MissingModule just because a script contains "from collections import defaultdict"

    2. find a way to report conditional imports for missing modules (such as by adding a MissingOptionalModule node class)

    3. add a nicer API for reporting modules imported by a node, and the nodes that imported this module.

    4. add an API for getting a user-friendly name for a node (either adding str or adding a new method to Node)

  8. Ronald Oussoren repo owner

    I'm working on 2. by adding attributes to MissingModule, starting with a 'toplevel' attribute that is True for imports at module scope and False otherwise.

    This attribute will be False for imports in a class body:

    class MyClass:
       import NoSuchModule

    The MissingModule for NoSuchModule is not toplevel. I'm not too glad about that, for reporting missing modules this should IMHO be reported as toplevel because the class body is evaluated unconditionally on import.

  9. Ronald Oussoren repo owner

    I'm now trying to replace the bytecode scanner by an AST walker for the majority of scanned code. The bytecode scanner will still be used for modules where no .py file is present (e.g. some compiled eggs).

    It should be fairly easy to extract import information from this, and the AST already contains the code structure we're interested in in a nice format. The alternative would be to reconstruct a partial AST from bytecode objects, and that would be fairly fragile code (and could easily depend on implementation details of the current compiler)

  10. Ronald Oussoren repo owner

    use_ast.txt is a very rough initial patch for using the python AST module to find imports.

    The patch does not work properly yet (although it passes more tests that I expected), but does show the direction I'm moving in: graph edges will be labelled with a dictionary with two keys: 'function' and 'conditional'. "Function" is true iff the import is done in a function and "conditional" is true iff the import is done in a conditional block (currently only inside "if" statements, but I'd like to detect try blocks that catch ImportError as well).

    The bug in this patch should be fairly shallow, I'm pretty sure it is just the calculation of the _safe_import_hook arguments that isn't done properly.

    BTW. When this works there will also be an update to altgraph to easily extract the edge data from an ObjectGraph.

    BTW2. The patch does not include tests, those will be added as well (and given another bug I fixed recently I need to add more(?) tests that validate the graph structure).

  11. Ronald Oussoren repo owner

    I've attached an updated version of the patch that uses an AST walker to find imports. This still doesn't recognise that the two imports in this block are conditional:

        import foo
    except ImportError:
         import bar

    This gives some false positives (in particular, some unconditional imports in pkg_resources that are actually optional).

    Now that I'm diving into the import system emulation code again I'm increasingly unhappy with the structure of that code, in particular with the fact that module graph.ModuleGraph.import_hook raises ImportError, which makes it unnecessarily hard to build a full graph.

    My next step will therefore be to enhance the test suite to make it validate edges as well as specific nodes (for more interesting test cases than what's used now), and then rewrite the import code.

    BTW. The edge data in this patch is now a dictionary with some keys, I'm considering changing this to a NamedTuple (or some other object with attributes) because that leads to cleaner code for users of the feature.

  12. Ronald Oussoren repo owner

    I've implemented something in the current tip of the tree: edges in the graph already had optional data associated with them. That is now used to store a DependencyInfo object that contains information about kind of import and its location ("import foo.bar" vs "from foo import bar", and flags to indicate imports in functions, if-statements and try-statements).

  13. Log in to comment