1. Ronald Oussoren
  2. modulegraph
  3. Issues
Issue #14 new

Cleaner output / only main packages?

Anonymous created an issue

Hi, I've been trying modulegraph after trying to reinvent the wheel...

However, the output is a bit too detailed to be useful---the graphs proceded are too complicated to be rendered intelligibly by "dot".

It would be helpful to have a couple of options:

1) Only output dependencies for the modules the user specifies. Right now modulegraph outputs information about the internal dependencies of numpy, scipy, etc., which are not really useful to the user.

2) Group modules by main packages. Suppose that there are modules A, A.a1, A.a2, A.a2.aa1, B, B.b1, B.b2, and suppose that B.b2 -> A.a1. Rather than have all the details of the subpackages, I would be only interested in the dependency B->A, and not about the subpackages.

I have tried looking into the code, but there isn't any obvious part where to do these modifications...

Comments (3)

  1. Ronald Oussoren repo owner

    You can do some editing of the graph through the interfaces in the altgraph package, but doing what you want will still be a lot of work: you'd have to walk the graph and add dependencies from submodules to their parent, then remove or hide the submodules.

    I'm definitely interested in making modulegraph more useful beyond what I use it for (which is primarily py2app).

    I guess an API to merge the dependencies on/to submodules in a package would be useful, something like:

    mg = modulegraph.ModuleGraph()
    # ... add modules and dependencies
    mg.merge_children('scipy')
    

    "merge_children" would then look for all submodules of scipy and copy edges that start or end in those submodules to the scipy node instead and hides the submodule nodes.

    The altgraph API should be even rich enough to make it possibly to undo the action (by adding an annotation to the new edges), although undos could be hard when multiple packages are merged and the undo is not done exactly in the reverse order.

    A higher-level API on could then be created to merge submodules in toplevel packages for interesting subsets of packages (for example, for all packages in the stdlib, or all packages in site-packages)

  2. Anonymous

    Hi Ronald,

    Re: undo, generally I think it's better to think of these operations in the functional way (get a new graph from the old graph) or using the "view" approach (get a "view" of the old graph) rather than operations that can be undo, because undo is hard to get right.

    It seems to me that there would be a simple way to "collapse" the information to only main packages. For example, suppose that modulegraph exposed a method .dependencies() that enumerates the edges:

    m.depedencies()  => [('a', 'b'), ('a.a1', 'b.b2'), ...]
    

    Then I would be able to create the reduced graph using 3 lines:

    for submodule1, submodule2 in m.dependencies():
        # get main module
        main1 = submodule1.split('.')[0] 
        main2 = submodule2.split('.')[0]
    
        new_graph.add_edge(main1, main2)
    

    More generally, I would try to decouple the dependencies collection from the presentation of the results.

    For example, it would be very useful to have modulegraph output in a standard format like GraphML.

    If the output was a standard format, people would be able to process the output with different tools and for their specific applications --- it is impossible to anticipate all possible uses and all the ways that people want the information visualized.

  3. Ronald Oussoren repo owner

    There is no method "dependencies", but the existing methods "nodes" and "get_edges" from altgraph's ObjectGraph can be used instead:

    for node in m.nodes:
        name = node.name
        for uses in node.get_edges()[0]:
            uses_name = uses.name
    
            main1 = name.split('.')[0]
            main2 = uses_name.split('.')[0]
            ...
    

    The subscript after the call of get_edges is necessary because get_edges returns two iterators: one for outgoing edges (the module's dependencies) and one for incoming edges (which modules use this module). I wouldn't have added this method when I'd design the API today, but I inherited it from an older code base.

    Added support for GraphML would be fine, and I'd welcome a patch for altgraph that adds such support (that's were the current dot and HTML output formats are also implemented).

    BTW. The reason I proposed adding a mutating method is that seems to match the current API style better, but I agree that functional version would be nicer.

  4. Log in to comment