Wiki

Clone wiki

neo4j-databridge / 4.7 Importing non-tabular data

4.7 Importing non-tabular data

The default adapters that come with Databridge are designed to read tabular data - for example CSV files, or SQL query result sets. Many data structures are not tabular however, and for these, the default adapters won't work. Instead, you will need to use a custom adapter.

In this tutorial, we walk through the steps required to create and use an adapter that can read Portable Game Notation (PGN) files. We will use Java, but you can use any JVM language.


The PGN file format

PGN is a non-tabular plain text format for recording chess games (both the moves and related data). Here is a sample chess game described in PGN format:

#!bash
[Event "F/S Return Match"]
[Site "Belgrade, Serbia Yugoslavia|JUG"]
[Date "1992.11.04"]
[Round "29"]
[White "Fischer, Robert J."]
[Black "Spassky, Boris V."]
[WhiteElo "2645"]
[BlackElo "2622"]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 {This opening is called the Ruy Lopez.
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8  10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2

The PGNRecord

We will create a simple PGNRecord that can represent this data structure. Note that in order to be used in the context of a custom adapter, it must implement the marker interface AbstractRecord.

#!java
public class PGNRecord implements AbstractRecord {

    String uuid     = UUID.randomUUID().toString();
    // provide a default value for any fields that don't exist in the input file
    String white    = "Unknown";
    String black    = "Unknown";
    String whiteELO = "Unknown";
    String blackELO = "Unknown";
    String date     = "Unknown";
    String venue    = "Unknown";
    String event    = "Unknown";
    String result   = "Unknown";
    String moves    = "Unknown";

    // We must implement a method to return the data object's properties as a map.
    public Map<String, Object> properties() {

        Map<String, Object> properties = new HashMap<>();

        properties.put("uuid",      uuid);
        properties.put("white",     white);
        properties.put("black",     black);
        properties.put("whiteELO",  whiteELO);
        properties.put("blackELO",  blackELO);
        properties.put("date",      date);
        properties.put("venue",     venue);
        properties.put("event",     event);
        properties.put("result",    result);
        properties.put("moves",     moves);

        return properties;
    }
}

The PGNAdapter

Custom adapters that process file-based resources should extend AbstractTextFileAdapter. They need to define a constructor that takes a Resource argument, and implement a single callback method, parse. Here is a skeleton implementation:

#!java
public class PGNAdapter extends TextFileAdapter {

    public PGNAdapter(Resource resource) {
        super(resource);
    }

    @Override
    public AbstractRecord parse(String data) {
    }


}    

In the next sections we'll discuss the implementation of the callback.

Implementing the parse() callback

The callback supplies us with a line of data from the file and requires us to return an AbstractRecord. We want to create a new PGNRecord whenever the current line from the file starts with '[Event'.

If we get any other line than the one we're expecting as the start of a PGN record, we want to ignore it. To ignore a record return an instance of SkipRecord.

@Override
public AbstractRecord parse(String data) {
    if (data.startsWith("[Event")) {
        return parsePGNRecord(data); // create the PGN Record
    } else {
        return new SkipRecord(); // ignore anything else
    }
}

Obviously a PGNRecord contains data spanning multiple lines in the file, so we need a mechanism to request subsequent lines from the file once we have received the starting [Event line. The TextFileAdaptor class exposes a method to do this - readLine() - which, as its name suggests, returns the next unread line from the file.

The basic principle is therefore very simple: keep calling readline() until all of the data for the current PGNRecord has been consumed, then stop and return the completed record.

The example below is very simple, but it illustrates the idea.

#!java
    // create a PGNRecord object from a sequence of lines in the input file
    private PGNRecord parsePGNRecord(String line) {

        PGNRecord game = new PGNRecord();
        game.event = getFieldValue(line, "[Event"); // the first line was given to us - we need to fetch the rest.

        while ((line = readLine()) != null) {
            if (line.startsWith("[Site ")) {
                game.venue = getFieldValue(line, "[Site");
            } else if (line.startsWith("[White ")) {
                game.white = getFieldValue(line, "[White");
            } else if (line.startsWith("[Black ")) {
                game.black = getFieldValue(line, "[Black");
            } else if (line.startsWith("[Date ")) {
                game.date = getFieldValue(line, "Date");
            } else if (line.startsWith("[Result ")) {
                game.result = getFieldValue(line, "[Result");
            } else if (line.startsWith("[WhiteElo ")) {
                game.whiteELO = getFieldValue(line, "[WhiteElo");
            } else if (line.startsWith("[BlackElo")) {
                game.blackELO = getFieldValue(line, "[BlackElo");
            } else if (line.startsWith("1.")) {
                game.moves = getMovesField(game, line);
                break;
            }
        }
        return game;
    }

Installing a custom adapter

To install a custom adapter, compile the adapter classes and create a jar file from them, then copy the jar file to the lib folder of your Databridge installation.

Using a custom adapter

To use the adapter you must specify its full class name in the appropriate resource file, as in the following example:

#!json
{
  "name": "anand-resource",
  "adapter"  : "com.graphaware.neo4j.databridge.adapters.custom.pgn.PGNAdapter",
  "resource" : "demo/chess/resources/anand.pgn"
}

Sample code

The Databridge demo/chess example imports all the championship games played by Indian GM Anand Viswanathan between 1998 and 2005, from a PGN game file. The sample code for the custom PGNAdapter is here.

Updated