Issue #3 open

Missing nodes in nodes_callback

Alex Plugaru
created an issue

I've got a pbf (http://download.geofabrik.de/osm/europe/moldova.osm.pbf) which has 'ways' that contain specific 'nodes' of which I'm interested. I found that the parser is missing a lot of nodes when parsing this file (maybe others too). Here is a piece of code: {{{

!python

def parse_nodes(nodes): for node in nodes: if node[0] in node_ids: #node_ids is a precomputed list of node ids #do something OSMParser(concurrency=4, nodes_callback=parse_nodes).parse(args.src) }}}

The node_ids in this example contains valid node ids. I checked this by converting the pbf file to an osm file with osmosis and the checked that a few of the node ids exist in the osm file - they do. The pbf file is valid and contains the nodes that are not parsed.

Comments (7)

  1. Alex Plugaru reporter

    I think I found the reason why this is happening. In PBFParser class it says: Nodes and relations without tags will not passed to the callback.

    Is there any particular reason for this? As far as I know nodes can have no metadata - that's normal in case of country boundaries.

  2. Oliver Tonnhofer repo owner

    That is by design, see http://imposm.org/docs/imposm.parser/latest/concepts.html#types You should use coords_callback, if you only need coordinates.

    I see that you forked imposm.parser and changed it to include empty nodes. What about making it optional? OSMParser(..., include_empty_nodes=True) (with False as the default). I would pull your changes if you also add this to the XML parser (the change should be similar) and if you add a test case for it (see imposm/parser/tests; run with nosetest (pip install Nose)).

  3. Alex Plugaru reporter

    I think that a flag making this optional would be too much. Imho it would clutter the API. I have no problem with using coords_callback. I think I'll just do that for now.

    Some thoughts on the library: Using nodes callback I would expect (without reading the documentation) that it will pass all the nodes too the callback regardless. I would expect the task of clearing what I do not need falling on me, not the library. It makes it too inflexible. A library should do simple things and have a simple API and let the user to decide what to do with it. Too much filtering can be bad and confusing - at least in my world view.

  4. Oliver Tonnhofer repo owner

    I like the callback design from imposm.parser and that it works transparently on XML and PBF files and even with multiple processes, but yes the API could even be more simplistic.

    imposm.parser was extracted from imposm and so there are parts in the API that were added to make imposm as fast as possible (like filtering and marshaling in the parser process). Maybe it's time for a new general API, only with nodes, ways and relations callbacks? There are people that are also interested in metadata (version, timestamp, user), it would be a good idea to add that too in this step.

  5. Alex Plugaru reporter

    I think the callback design is great. And yes, I'm interested in the metadata as well. Also would you happen to know if I can get an OSM writer without installing the whole imposm package. I really don't need the whole postgresql thing, but I do need to generate the OSM's.

  6. Oliver Tonnhofer repo owner

    Maybe we should create a Parser class with the three callbacks, deprecate the current OSMParser and "copy" the old API to ImposmParser?

    The writer in imposm does not write OSM XML, but it writes into PostgreSQL or other DB backends.

  7. Log in to comment