Knowledge Expansion =================== The knowledge expansion algorithm is used for the inference engine in ProbKB, a PROBabilistic Knowledge Base system. It applies first-order inference rules and infers implicit knowledge from existing knowledge bases. ProbKB models knowledge bases as database relations, and accordingly, the knowledge expansion algorithm can be expressed as a few joins among the facts and rules tables, applying the rules *in batches*. Our approach results in a speedup of 237 on the [TextRunner]( knowledge base compared to the state-of-the-art, [Tuffy]( Furthermore, ProbKB works on massive parallel processing (MPP) databases, including Pivotal Greenplum and Apache HAWQ, where the queries are executed in parallel. ProbKB uses semantic constraints to improve both quality and efficiency during the expansion task. The application of constraints allows us to improve the precision of inferred facts by 0.61. This repository provides the knowledge expansion software and datasets we use for our experiments. License ------- ProbKB is released under the [BSD license]( If you use ProbKB in your research, please cite our paper: ``` @inproceedings{chen2014knowledge, title={Knowledge expansion over probabilistic knowledge bases}, author={Chen, Yang and Wang, Daisy Zhe}, booktitle={Proceedings of the 2014 ACM SIGMOD international conference on Management of data}, pages={649--660}, year={2014}, organization={ACM} } ``` Quick Start ----------- To install the software for knowledge expansion, you need to have PostgreSQL installed. The latest version can be downloaded from <>. After the installation process, you will need to create a database, and install SQL scripts into the database you just created. Suppose the database is called `probkb`, then the following scripts will install into the database the `probkb` schema, import the data, and create and the core `probkb.ground()` and `probkb.groundFactors()` procedures that perform the grounding task. ``` $ createdb probkb $ psql probkb -f sql/create.sql # Create the probkb schema and tables. $ psql probkb -f sql/qc.sql # Create quality control procedures. $ psql probkb -f sql/load.sql # Load the files in CSV format. $ psql probkb -f sql/ground.sql # Create grounding procedures. ``` To apply the procedures, first login to the `probkb` database: ``` $ psql probkb ``` and make the procedure calls: ``` probkb=# SELECT probkb.ground(); probkb=# SELECT probkb.groundFactors(); ``` It would be useful to tune the PostgreSQL environment for better performance: ``` probkb=# SET work_mem = '8GB'; probkb=# SET enable_mergejoin = OFF; # Use hash joins. ``` The queries can be parallelized on MPP databases, e.g., Pivotal Greenplum, and achieve better performance depending on the hardware. The installation steps are the same once you have an MPP database installed. Data ---- This repository contains the following datasets for experiments: * A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011. * S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis. Learning first-order horn clauses from web text. In EMNLP, 2010. * T. Lin, O. Etzioni, et al. Identifying functional relations in web text. In EMNLP, 2010. We include the original datasets in the `data/` directory and the parsed CSV files in the `csv/` directory. Acknowledgments --------------- The [ProbKB project]( is partially supported by NSF IIS Award # 1526753, DARPA under FA8750-12-2-0348-2 (DEFT/CUBISM), and a generous gift from Google. We also thank [Dr. Milenko Petrovic]( and [Dr. Alin Dobra]( for the helpful discussions on query optimization. Contact ------- If you have any questions about Ontological Pathfinding, please visit the [project website]( or contact [Yang Chen](, [Dr. Daisy Zhe Wang](, [DSR Lab @ UF](