HTTPS SSH
Newsfeed
- And we have a winner! Congratulations to Sven Hertling, Markus Schröder, Christian Jilek and Andreas Dengel for their work on Top-k Shortest Paths in Directed, Labeled Multigraphs!
- The evaluation results of the three Challengers that determined the winner, will be published shortly
Newsfeed
- The evaluation input-data is available online at the <<Downloads>> section!
Newsfeed
- The winner of the Challenge will also win an annual subscription to the Commercial Licence of Blazegraph's graph database! (see 'Awards' section below) 13:26, Friday, 29th January, 2016
Newsfeed
- The "Top-k Shortest Paths in large typed RDF Datasets Challenge" has officially started! 13:00, Friday, 15th January, 2016

Top-k Shortest Paths in large typed RDF Datasets Challenge

Winner

Sven Hertling, Markus Schröder, Christian Jilek and Andreas Dengel "Top-k Shortest Paths in Directed, Labeled Multigraphs"

Evaluation Results


Challenger Task1, part1 (no of paths) Task1, part2 (no of paths) Task2, part1 (no of paths) Task2, part2 (no of paths) Paper review
1 377 450 370 4201 7
2 377 53008 374 52664 4
3 10 -2

Qualified Challengers to participate in the ESWC 2016 "Top-k Shortest Paths in large typed RDF Datasets Challenge"

The following teams have qualified to participate in the "Top-k Shortest Paths in large typed RDF Datasets Challenge" that will be held from May 29th, 2016 to June 2nd​, 2016 in Anissaras, Crete, Greece:

  1. Sven Hertling, Markus Schröder, Christian Jilek and Andreas Dengel Top-k Shortest Paths in Directed, Labeled Multigraphs

  2. Zohaib Hassan, Mohammad Abdul Qadir, Muhammad Arshad Islam, Umer Shahzad and Nadeem Akhter Modified MinG Algorithm to Find Top-K Shortest Paths from large RDF Graphs

  3. Laurens De Vocht, Ruben Verborgh, Erik Mannens and Rik Van de Walle Using Triple Pattern Fragments To Enable Streaming of Top-k Shortest Paths via the Web

At least one member of each team should register and attend to the ESWC 2016 Conference in order to present the qualified approach.

Overview

In the context of SPARQL, queries are patterns that are matched against an RDF graph. Until recently, it was not possible to create a pattern without specifying the exact path of the underlying graph (in terms of real data and/or variables) that would match against the graph. The advent of SPARQL 1.1 introduced property paths as a new graph matching paradigm that allows the employment of Kleene star * (and it's variant +) unary operators to build SPARQL queries that are agnostic of the underlying RDF graph structure. For example, the graph pattern: :Mike (foaf:knows)+ ?person . is a SPARQL query that asks for all the persons that know somebody that knows somebody (and so on) that :Mike knows.

Property path queries extend the traditional graph pattern matching expressiveness of SPARQL 1.0 with navigational expressiveness.

The ability to express path patterns that are agnostic of the underlying graph structure is certainly a step forward. Still though, it is impossible to retrieve the actual paths through a property path sparql query. In this challenge, we ask for a system capable of returning the actual paths between two URIs in a RDF graph.

Moreover, it is reasonable to assume that a RDF graph contains numerous paths of varying length between two arbitrary nodes. Therefore, another requirement of this challenge is to provide ranked paths according to their length.

Requested system

This challenge evolves around the development and deployment of a system that returns a specific number of ordered paths between two nodes in a given RDF graph. For each one of the Tasks of the Challenge (see the following section), the Challengers should develop a system that makes use of the following information:

  1. Start node: The first node of every path that will be returned.
  2. End node: The last node of every path that will be returned.
  3. k: The required number of paths.
  4. ppath expression: A property path expression describing the pattern of the required paths.
  5. Dataset: The RDF graph containing the start node and end node.

The system must respond with a set of paths pertaining to the above information. More specifically, each Challenger should provide k paths between the Start node and the End node of the Dataset ordered by their length. Such paths should comply to the given ppath expression.

A path is defined as a sequence of nodes and edges within the RDF graph:

(A,P1,U1,P2,U2,P3,U3,..,Un-1,Pn,B)

where A corresponds to the Start node (i.e., subject), B to the End node (i.e., object), Pi to an edge (i.e. predicate) and Ui to an intermediate node. The path length equals to the corresponding number of edges (e.g., the above path length equals to n).

After developing a system pertaining to the aforementioned requirements, each Challenger should deploy the system in the context of the Tasks below. Essentially, each Task provides a slightly altered set of input data. The next section outlines the Tasks of the Challenge.

Tasks

The Challenge consists of two tasks (see sections below). Both tasks require the implementation of a system capable of returning an ordered set of paths. Such a system should be able to take into consideration the following five input parameters:

  1. Start node: The first node of every path that will be returned.
  2. End node: The last node of every path that will be returned.
  3. k: The required number of paths.
  4. ppath expression: A property path expression describing the pattern of the required paths.
  5. Dataset: The RDF graph containing the start node and end node.

A practical example is given for each task (see sections below). Such an example serves as a visual representation of the requirements of each task.

Task1 - part1

The first part of the first task requires a certain number (i.e., k) of paths between two nodes (i.e. A and B) of the evaluation dataset (i.e. RDF graph D), ordered by their length. There is no specific pattern such paths should comply to. In SPARQL dialect, such paths should comply to the following property path expression: !:*.

Practical Example

The image below depicts the example RDF dataset D1. Each node is a subject or object while each edge is a predicate.

example_graph.png

D1 contains exactly 4 paths from A to B:

    1. (A,P,u3,p7,B)
    2. (A,p3,u6,P,B)
    3. (A,p1,u1,p2,u2,p8,B)
    4. (A,P,u3,p4,u4,p5,u5,p6,u3,p7,B)

At this point, it should be mentioned that a path is valid only if it contains unique triples. For example, the path: (A,P,u3,p4,u4,p5,u5,p6,u3,p4,u4,p5,u5,p6,u3,p7,B) is not valid, since the following triple: u3,p4,u4 exists more than once.

If task1 requires 3 paths between A and B within D1, the expected results should be the top-3 shortest paths from A to B:

1. (A,P,u3,p7,B)
2. (A,p3,u6,P,B)
3. (A,p1,u1,p2,u2,p8,B)

Note that the first two paths have the same length, which is equal to 2, since they both contain 2 edges). The third path has a length equal to 3, so it comes after the first two paths.

Task1 - part2

The second part of the first task is meant to test the systems implemented by the challengers in terms of scalability. More specifically, the second part differentiates from the first part by the k number of requested paths. This time, k is a very large number. Thus, part2 requires a certain number (i.e., k) of paths between two nodes (i.e. A and B) of the evaluation dataset (i.e. RDF graph D), ordered by their length. There is no specific pattern such paths should comply to. In SPARQL dialect, such paths should comply to the following property path expression: !:*.

Task2 - part1

The first part of the second task differentiates from the corresponding part of the first task by imposing a specific pattern to the required paths. More specifically, the second task requires a certain number (i.e., l) of paths between two nodes (i.e. A' and B') of the evaluation dataset (i.e. RDF graph D), ordered by their length. Every path should have the edge P as the outgoing edge of A', or the incomming edge of B'. In SPARQL dialect, such paths should comply to the following property path expression: (P/(!:)*)|((!:)*/P).

Practical Example The image below depicts the example RDF dataset D1. Each node is a subject or object while each edge is a predicate.

example_graph.png

If task2 requires 2 paths between A and B within D1, the expected results should be the top-2 shortest paths from A to B that have P as their first or last edge:

1. (A,P,u3,p7,B)
2. (A,p3,u6,P,B)

If task2 requires 3 paths between A and B within D1, the expected results should be the top-3 shortest paths from A to B that have P as their first or last edge:

1. (A,P,u3,p7,B)
2. (A,p3,u6,P,B)
3. (A,P,u3,p4,u4,p5,u5,p6,u3,p7,B)

Note that the path (A,p1,u1,p2,u2,p8,B) is ommitted since P is neither the first nor the last edge of the path, so it does not fit to the requirements of the path pattern.

Task2 - part2

The second part of the second task is meant to test the solutions provided by the challengers in terms of scalability. More specifically, the second part differentiates from the first part by the l number of requested paths. This time, l is a very large number. Thus, task3.2 requires a certain number (i.e., l) of paths between two nodes (i.e. A' and B') of the evaluation dataset (i.e. RDF graph D), ordered by their length. Every path should have the edge P as the outgoing edge of A', or the incomming edge of B'. In SPARQL dialect, such paths should comply to the following property path expression: (P/(!:)*)|((!:)*/P).

Instructions for Challengers

Challengers should provide both a paper describing their approach and a working system. The working system should be capable of deploying as many of the aforementioned tasks as possible. There are no special requirements for the technological environment of the system. The Challengers may use any infrastructure they see fit (laptpos, PCs, Software as-a-service, etc). More specifically:

  • Initially, Challengers have to declare their participation to the Challenge by sending an email to the Organisers (as soon as you are ready!).
  • Challengers are able to access the training dataset and the corresponding training material (see section below) so that they can test their implementations.
  • A paper must also be submitted until the Challenge papers submission deadline (see the Important Dates section: http://2016.eswc-conferences.org/call-challenges). Papers must be submitted in PDF format, following the style of the Springer's Lecture Notes in Computer Science (LNCS) series http://www.springer.com/computer/lncs/lncs+authors, and not exceeding 12 pages in length. All papers will be peer-reviewed. The paper reviewing process will result in the systems that qualify to participate to the event and compete for the highest ranking.
  • As soon as the notification to the Challengers is sent, the Organising Committee will make available online the evaluation dataset (Evaluation dataset). Challengers will have a specific time-frame to load the dataset to their system 1.
  • The actual input parameters of each task will be published at a specific date (some days after the Test data (and other participation tools) published deadline) and the Challengers will have to upload their results for each task within a specific time-frame.
  • The Challenge will have its session during the ESWC 2016 Conference. The qualified Challengers will be asked to present their work with a poster and/or laptop.
  • The winner of the Challenge as well as the runner-up(s) will be invited to submit a slightly expanded paper as part of the ESWC Satellite Events proceedings (LNCS volume).

Training material

The organising committee provides training material for the Challengers to experiment on their approaches until the official evaluation dataset is published. More specifically, the following information is available:

Task1


Input param 1 Input param 2 Input param 3 Input param 4 Input param 5 Output data
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 8 - task1_q1_8.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 344 - task1_q1_344.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 1068 - task1_q1_1068.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 20152 - task1_q1_20152.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 3 - task1_q2_3.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 4 - task1_q2_4.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 79 - task1_q2_79.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 154 - task1_q2_154.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 36 - task1_q3_36.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 336 - task1_q3_336.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 4866 - task1_q3_4866.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 2 - task1_q4_2.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 16 - task1_q4_16.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 250 - task1_q4_250.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 1906 - task1_q4_1906.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 20224 - task1_q4_20224.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 175560 - task1_q4_175560.txt

Task2


Input param 1 Input param 2 Input param 3 Input param 4 Input param 5 Output data
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 32 http://dbpedia.org/property/firstWin task2_q1_32.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 98 http://dbpedia.org/property/firstWin task2_q1_98.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 1914 http://dbpedia.org/property/firstWin task2_q1_1914.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 16632 http://dbpedia.org/property/firstWin task2_q1_16632.txt
training dataset http://dbpedia.org/resource/Felipe_Massa http://dbpedia.org/resource/Red_Bull 212988 http://dbpedia.org/property/firstWin task2_q1_212988.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 3 http://dbpedia.org/property/after task2_q2_3.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 4 http://dbpedia.org/property/after task2_q2_4.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 76 http://dbpedia.org/property/after task2_q2_76.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 151 http://dbpedia.org/property/after task2_q2_151.txt
training dataset http://dbpedia.org/resource/1952_Winter_Olympics http://dbpedia.org/resource/Elliot_Richardson 2311 http://dbpedia.org/property/after task2_q2_2311.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 12 http://dbpedia.org/property/predecessor task2_q3_12.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 76 http://dbpedia.org/property/predecessor task2_q3_76.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 1440 http://dbpedia.org/property/predecessor task2_q3_1440.txt
training dataset http://dbpedia.org/resource/Karl_W._Hofmann http://dbpedia.org/resource/Elliot_Richardson 8088 http://dbpedia.org/property/predecessor task2_q3_8088.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 1 http://dbpedia.org/ontology/president task2_q4_1.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 6 http://dbpedia.org/ontology/president task2_q4_6.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 72 http://dbpedia.org/ontology/president task2_q4_72.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 614 http://dbpedia.org/ontology/president task2_q4_614.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 5483 http://dbpedia.org/ontology/president task2_q4_5483.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 52649 http://dbpedia.org/ontology/president task2_q4_52649.txt
training dataset http://dbpedia.org/resource/James_K._Polk http://dbpedia.org/resource/Felix_Grundy 471199 http://dbpedia.org/ontology/president task2_q4_471199.txt

Notice

The training dataset (corresponding to input parameter 1) contains 9,996,907 triples.

The output data aggregates 38 files that consist of paths. Each file is named after a certain pattern: task query number of paths (i.e. k). For example, the file task2_q2_3.txt refers to a sample query of the second task (see the following sections) containing 3 paths ranked by their length.

Each line of every file is a path represented as a json-encoded array, which contains nodes and edges. For example, the following line: ["http://dbpedia.org/resource/1952_Winter_Olympics", "http://dbpedia.org/property/after", "http://dbpedia.org/resource/1956_Winter_Olympics", "http://dbpedia.org/property/after", "http://dbpedia.org/resource/1960_Winter_Olympics", "http://dbpedia.org/property/officiallyOpenedBy", "http://dbpedia.org/resource/Richard_Nixon", "http://dbpedia.org/property/defense", "http://dbpedia.org/resource/Elliot_Richardson"]

corresponds to a path, which starts with node http://dbpedia.org/resource/1952_Winter_Olympics and ends with node http://dbpedia.org/resource/Elliot_Richardson.

The start node reaches node http://dbpedia.org/resource/1956_Winter_Olympics through edge http://dbpedia.org/property/after.

Then, node http://dbpedia.org/resource/1956_Winter_Olympics reaches node http://dbpedia.org/resource/1960_Winter_Olympics through edge http://dbpedia.org/property/after.

Then, node http://dbpedia.org/resource/1960_Winter_Olympics reaches node http://dbpedia.org/resource/Richard_Nixon through edge http://dbpedia.org/property/officiallyOpenedBy.

Finally, node http://dbpedia.org/resource/Richard_Nixon reaches the final node http://dbpedia.org/resource/Elliot_Richardson through edge http://dbpedia.org/property/defense.

The length of the aforementioned path equals to 4 (the total number of edges in the path).

Assessment

Each Challenger will deploy the system he/she has created in the context of each task. Moreover, a paper must be submitted, describing the proposed approach. The Challengers will be assessed according to the following criteria:

A. Overall paper quality (40%)

The Program Committee will assess each paper based on well-defined criteria including but not limited to:

  • Presentation and organisation of the paper
  • Scientific / technical quality of the paper
  • Technical data of the paper
  • English style, writing quality
  • Figures and tables
  • Innovation of approach
  • Efficiency of approach (e.g., by measuring the complexity of an algorithm)
  • Consumption of resources
  • Scalability

B. System quality (40%)

Upon registration, Challengers will be able to work and test their implementations against the training dataset. By the time the evaluation dataset is published (see: "Test data (and other participation tools) published" deadline), the Challengers must be ready to load the evaluation dataset into their system. Then, the Organising Committee will announce the input data of each task. The Challengers must accomplish as many tasks as possible and upload the corresponding result sets of each task within a specific time frame. The submitted sets will be compared against each other in terms of the paths they contain. More specifically, the set that contains the highest number of shortest paths will be ranked first, the set that contains the second highest number of shortest paths will be ranked second, etc. In case of a tie, the tied sets will be compared against the second shortest paths. If the sets remain tied, they will be compared against the third shortest paths, and so on. All tasks have equal weight to the "System quality" criterion. Each task will be assessed separately from the others.

C. People's vote (20%)

At the end of the event, all Conference participants will have the opportunity to vote for the best approach. Their vote will decide the winners of the Challenge.

Important Notice

Participants with a qualified system must register for the conference and present their system during the Challenge session. A space for demonstration will be allocated for each participant. Participants should bring an A0 poster and/or use their own laptop. The organizers should be contacted in case of any special requirements.

Organising Committee

  • Ioannis Papadakis, Assistant Professor, Faculty of Informatics and Information Science, Ionian University, member of the organising committee of AImashup Challenge 2013, 2014 - aimashup.org. email: papadakis'at'ionio.gr
  • Michalis Stefanidakis, Assistant Professor, Faculty of Informatics and Information Science, Ionian University, member of the organising committee of AImashup Challenge 2014 - aimashup.org. email: mistral'at'ionio.gr
  • Phivos Mylonas, Assistant Professor, Faculty of Informatics and Information Science, Ionian University. email: fmylonas'at'ionio.gr
  • Brigitte Endres-Niggemeyer, Former professor from Hannover University for Applied Sciences, member of the organising committee of Aimashup Challenge 2010 - 2013. email: brigitteen'at'googlemail.com

Program Committee

  • Sherif Sakr, Senior Researcher in the Software Systems Research Group at National ICT Australia (NICTA), ATP lab, Sydney, Australia
  • Emanuele Della Valle, Assistant Professor at the Department of Electronics, Information and Bioengineering of the Politecnico di Milano
  • Heiko Paulheim, Assistant Professor at the University of Mannheim, head of Data and Web Science Group
  • Dimitrios Tsoumakos, Assistant Professor, Faculty of Informatics and Information Science, Ionian University, member of Computing Systems Laboratory, NTUA
  • Guohui Xiao, Assistant professor, Free University of Bozen-Bolzano, member of KRDB Research Centre for Knowledge and Data
  • Axel-Cyrille Ngonga Ngomo, Department of Computer Science at the University of Leipzig, member of Semantic Abstraction Group
  • Olivier Corby, Researcher at Inria & I3S
  • Prof. Dr. Jens Lehmann, Computer Science Institute, University of Bonn, Knowledge Discovery Department, Fraunhofer IAIS

Award

blazegraph.png

The winner of the Challenge will get an annual subscription to the Commercial Licence of Blazegraph's graph database with GPU acceleration.


  1. A note about the datasets: The training and the evaluation datasets correspond to transformations of the 10% dataset and the 100% dataset of the DBpedia SPARQL Benchmark respectively. More specifically, both datasets will be free of blank nodes and untyped nodes. An untyped node is a node which does not appear as subject in triples adhering to the pattern: "Node rdf:type Type". Scripts will also be available to enable the Challengers to build the datasets. The training dataset consists of 9,996,907 triples, while the evaluation dataset consists of 110,621,287 triples (about 11 times larger)