Clone wiki

Ontobuilder / Documentation

Downloads | Documentation | Publications | Development Roadmap | Contributors|

Documentation

The following is a comprehensive documentation of version 2.2. For additional details refer to the Development Roadmap.

Documentation of Ontobuilder v2.1 can be found here: Ontobuilder v2.1 Documentation

Table of contents:

Authors: Giovanni Modica, Haggai Roitman, Tomer Sagi, Nimrod Busany

1. Overview

The Ontobuilder project supports research in the Data Integration domain by providing tools grouped under the following 4 frameworks:

In the links above more details are provided for each framework and the related research field it participates in. The rest of this page deals with the various options for using Ontobuilder tools.

OntoBuilder started as a tool (with a user interface) developed in Java. Later, additional access options were required, leading to the current state where OntoBuilder can be used as a graphical tool, as a jar package, as a command line tool. Following are the details on how to run OntoBuilder in each method.

1.0. Downloading Ontobuilder

OntoBuilder is composed of 5 separate projects (in order to be able to control dependencies between projects and mainly independency from the GUI):

  • ontobuilder.core (Ontology, Thesaurus, Utilities - DOM, Files, Graphs, Network...)
  • ontobuilder.extraction.webform (WebForm to Ontology and HTML)
  • ontobuilder.gui
  • ontobuilder.io (Importers, Exporters, XML...)
  • ontobuilder.matching (1st Line and 2nd Line Match Algorithms, Match Matrix, Statistics...)

You can download a built version of ontobuilder in the following link: Ontobuilder With GUI

Building ontobuilder from source code

1. Download the Ontobuilder project from: Ontobuilder Source

2. Run the build.xml, located at ontobuilder.build project, using ANT, select the Build With\Without GUI option in the configuration as desired.

In order to start the build process, go to the project and right click on build.xml , then Run As > Ant Build. You must have JDK installed on your machine.

3. import the following jar to your project:

ontobuilder.core (Ontology, Thesaurus, Utilities - DOM, Files, Graphs, Network...) ontobuilder.extraction.webform (WebForm to Ontology and HTML) ontobuilder.io (Importers, Exporters, XML...) ontobuilder.matching (1st Line and 2nd Line Match Algorithms, Match Matrix, Statistics...), ontobuilder.GUI (if you choose to build with GUI)

4. Copy into your project’s directory the following three folders from ontobuilder.build/ontobuilder - ‘config’, ‘xsd’ and ‘dtds’.

o Note: There are additional jar files in the ontobuilder.build/lib folder. If you run a project from eclipse and reference the different OntoBuilder projects, you don’t need to do anything, but if you reference the generated OntoBuilder jars, you also need to reference these lib jars.

Imoportant files in the project:

ontobuilder.core. config.application.properties ontobuilder.core. config. configuration.xml ontobuilder.core. config. resources.properties ontobuilder.core.dtds.* ontobuilder.core.xsd.*

1.1. Graphical Tool

1.1.1. Installing ontobuilder Graphical Tool:

After downloading Ontobuilder from: Ontobuilder With GUI

The project can be executed:

Run the ontobuilder.bat located in the Ontobuilder Directory.

1.1.2. Browsing Features of OntoBuilder

OntoBuilder was designed to work like a web browser. Figure 1 shows the OntoBuilder browser interface. To navigate to a page simply enter the URL into the address bar (e.g. www.avis.com) and press enter or click the “Go” button. By default OntoBuilder will use the HTTP protocol when no protocol is specified, so a URL such as www.avis.com will be automatically changed to http://www.avis.com. URLs can also be entered by means of common copy/paste commands either by right clicking on the address bar or using the hot-keys shortcuts; these shortcuts are compatible with the MS Windows standards (e.g. crtl-C for copy, crtl-V for paste, etc.). Figure 1: The OntoBuilder browser interface

      Figure 1. The OntoBuilder browser interface. 

Once the “Go” button is clicked, the HTML page associated with the URL will be displayed in the “HTML View” panel. OntoBuilder maintains a history of visited URLs, which can be accessed using a combo box list in the address bar. The user can use the backwards and forwards buttons in the toolbox to navigate the history. The number of entries in the history is limited by an option in the tool options dialog as shown in figure 2. The history can be cleared (all entries in the history will be deleted) by clicking in the “Clear History” button. Figure 2: OntoBuilder_browser_options

      Figure 2. The OntoBuilder browser options. 

Other navigational aspects can also be set in the “Browser” options tab. The “Automatic META navigation” option is for pages containing redirection META tags such as the following:

<META HTTP-EQUIV=”Refresh” CONTENT="10; URL=http://www.another.com/">

By checking this option OntoBuilder will automatically load the URL specified in the URL attribute for the META tag. The connection timeout indicates the amount of time to wait before abandoning a URL connection. By specifying -1 sec., OntoBuilder will use the system default connection timeout. This option is very useful for slow connection links. OntoBuilder can also be directed to use a Proxy server for Internet connection. By specifying a proxy host and port, OntoBuilder will retrieve HTML pages through the proxy instead of a direct connection (the default). This option is very useful if running OntoBuilder behind firewalls. OntoBuilder has support for HTML cookies, however cookies do not persist outside OntoBuilder wizard sessions. This means that cookies are persistent while retrieving an ontology using the ontology creation wizard, but once the wizard finishes the ontology generation, any cookie information will be lost.

1.1.3. Generating Ontologies

Once the web page for which we want to extract the ontology from is loaded in OntoBuilder, we can launch the “Ontology Creation Wizard” by selecting the appropriate submenu command under the “Ontology” menu, or by clicking the appropriate icon in the application toolbox, or by using the hot-key crtl-W. In order to show how the wizard works we will build a multi-page (by multi-page ontology we mean an ontology that is spread across multiple pages) ontology for the Avis.com web site. The first step of the wizard is shown in figure 3. Figure 3: First steps Figure 3: Ontology creation wizard

      Figure 3. The first step of the ontology wizard.

The ontology title defaults to the title of the HTML page and the ontology name defaults to the host from where the HTML page is retrieved. By clicking in the “Next: button we open the “Form Selection” dialog as shown in figure 4. In this dialog OntoBuilder will show all the HTML forms of the HTML along with their input elements. Since only one form can be submitted at a time while browsing a web page, the user is required to select the form he/she wants to submit from the forms listed under the “<form>” node in the “HTML Elements” panel on the upper left. Notice that this panel shows a hierarchical view of all the ontological structures of the HTML page. By clicking on a node in the “HTML Elements” panel, all the attributes (default value, label, etc.) for the element represented are shown in the “Properties” panel in the lower left. Is worth noting that for HTML frame pages, the FORM elements will be located under the “<frame>” node in the “HTML Elements” panel.

Figure 4: The “Form Selection” wizard dialog

      Figure 4. The “Form Selection” wizard dialog

The “Form Preview” panel is where the user will enter the required values for form submission. In order to determine what the required fields are we suggest simulating the process on one of the Internet browsers such as MS Internet Explorer or Netscape Navigator. Figure 5 shows the minimum required values for our Avis.com

Figure 5:The reservation process in Avis.com

     Figure 5. The reservation process in Avis.com

The same process must be simulated in OntoBuilder. Figure 6 shows the equivalent reservation in OntoBuilder. The only difference is that OntoBuilder doesn’t submit the form by clicking on the form submission button, but instead by clicking the “Next” button in the wizard. Figure 6. The reservation process in OntoBuilder

     Figure 6. The reservation process in OntoBuilder

Failing to do the correct simulation in OntoBuilder will produce unexpected results (most of the times the web site will return a page indicating that some information is missing or return an error page with a brief description). Generally speaking, when using OntoBuilder to retrieve an ontology from a web application, the user must simulate the user interaction as if working in a common browser. Returning to our example, the rest of the wizard forms are the same, except they will contain new form elements to be added to the final ontology. The rest of the process is very straightforward so we will just mention how to get to the end. There are four more pages (i.e. three more wizard dialogs) to retrieve the whole ontology, and in all four pages there is no required fields, default values will be enough. All the user is required to do is to select the appropriate form on the “HTML Elements” panel and simulate the form submission by clicking on the “Continue” button in each of the next three pages. The last page will allow to actually make the car reservation in Avis, as shown in figure 7. Figure 7. Last step in the ontology creation wizard

     Figure 7. Last step in the ontology creation wizard

During the wizard operation the user can use the “Back” button to go the previously submitted form, in case a mistake was detected. Once finished, the wizard will display the generated ontology on the “Main Panel”, as depicted in figure 8. The generated ontology can be saved in different formats by the appropriate commands in the “File” menu. Figure 8. The generated ontology

     Figure 8. The generated ontology

1.1.4. Entering the Right URLs in OntoBuilder

Some times, entering the same URL using in a common browser into OntoBuilder is not the most appropriate thing to do. Due to OntoBuilder limited HTML rendering capabilities, some URLs may not be correctly displayed (and thus, difficult to navigate). As an example, consider the Alamo.com web site. By entering www.alamo.com in OntoBuilder we will see that it does a bad job in rendering the HTML page (see figure 9). No ontology will be generated from such URL. It is worth nothing that not always a bad rendering of the HTML page means that no useful ontology could be generated, some times OntoBuilder has trouble rendering the HTML page but the source code of it is retrieved correctly. Is recommended to run the ontology creation wizard even if a bad rendering occurs, in most cases the wizard will identify the form elements even if the HTML rendering didn’t work. Figure 9. An example of bad HTML rendering in OntoBuilder

     Figure 9. An example of bad HTML rendering in OntoBuilder

In these cases, it is advised to use an Internet browser to actually navigate to the page where ontological structures may be identified. In the case of Alamo.com, by clicking in the “Rates & Reservations” button in the menu the browser will display the reservation form under the URL http://res.alamo.com. Figure 10 shows how this time OntoBuilder correctly identifies the form elements in the page. Figure 10. An example of correct HTML rendering in OntoBuilderr

     Figure 10. An example of correct HTML rendering in OntoBuilder

For HTML pages containing frames, it may be useful to “break” the frames using the URL in the frameset. As an example, the http://res.alamo.com URL is a HTML page containing frames (see the empty space in the upper section of the page in figure 10) and its source is the following:

<frameset rows="100,1*" frameborder="NO" border="0" framespacing="0"> 
  <frame name="topFrame" scrolling="NO" noresize src="topnav.asp">
  <frame name="mainFrame" src="http://res8.alamo.com/res/page1.asp">
</frameset>

In this case it may be better to enter the URL for the mainFrame frame (i.e. http:res8.alamo.com/res/page1.asp) in OntoBuilder, thus “breaking” the frame. Although OntoBuilder is designed to support frames (for an example load the NationalCar.com web site to see three levels of frames correctly handled by OntoBuilder), we suggest to follow the previous points when dealing with frames. Most common Internet browsers will allow to see the source of an HTML page. By using OntoBuilder you can enable the “Source Panel” tab to see the HTML source of the loaded page. For this, check the “Source Panel” checkbox in the “View” tab of the OntoBuilder options dialog. Figure 11. View options for OntoBuilder

     Figure 11. View options for OntoBuilder

1.1.5. Troubleshooting Ontology Generation

Not all the web sites run as smoothly as the Avis.com site. Changes are you will not get a clean ontology at the first run. This is due to the complexity of most web sites designed using technologies not supported by OntoBuilder. At this time OntoBuilder doesn’t support any scripting at all. Current web sites rely on scripting for validation, automatic field filling, etc. As an example consider a page that has two fields Pickup Location and Dropoff Location, each with an assigned hidden field. By using scripting the web page automatically assigns the keyword same in the hidden field for the Dropoff Location, indicating that the dropoff location will be the same of the pickup location. All this is transparent to the user and also to OntoBuilder. If this page is loaded into OntoBuilder, the keyword won’t be assigned to the hidden field and thus the page won’t be submitted appropriately (the web site will return a missing information error message). In this section we will explore some of the advanced techniques that will allow to discover what should actually be submitted when interacting with a HTML page loaded in OntoBuilder.

1.1.5.1. Identifying Errors

The first step is to actually identify that an error occurred. An error occurs if the information returned by the ontology creation wizard (forms) is different to the information returned by simulating the process on a normal Internet browser. There two ways to see what the error was: (i) by looking at the “HTML Page” tab in the ontology creation wizard, and (ii) by looking at the lastPageSubmitted.html HTML page in the current directory. Using any of the previous two methods we can try to identify some error message returned by the web server that will hint what the error is about (such as missing required fields, for example). The difference between the two methods is that former method relies on the HTML rendering capabilities of OntoBuilder, while the latter allows using any browser to see the returned page.

1.1.5.2. Testing for Submission Parameters and Headings

Some times is not quite obvious why we received an error page. A more advanced technique can be used to see if the information submitted to a web site is the right one. This technique requires little knowledge of HTML. Appended to this document there is a file called form.jsp. This JSP (JavaServer Page) page lists all the parameters submitted along with header information when the page is called from a form action attribute (either by GET or POST). form.jsp must be installed in a web server with support for JSP applications. Tomcat is one of such servers. You can download Tomcat from http:jakarta.apache.org (it’s a free application). Instructions on how to install the Tomcat server can be found on the same web site under documentation. Once Tomcat is installed and running, we need to install our form.jsp page. The easiest way to install it is by copying the file in the HTML root directory for Tomcat (usually located in C:\Program Files\Apache Tomcat 4.0\webapps\ROOT). Advanced users may actually want to create a web context for this (see the Tomcat documentation on how to create web contexts). Tomcat is installed by default on TCP port 8080, so in order to call our JSP page we need to specify the following URL: http://localhost:8080/form.jsp. The next step is to save the HTML page that is giving problems in the local computer so we will be able to edit the source code. By using Internet Explorer we can save the page to disk using the menu “File->Save As…”, or if it is a frame we want to save, by right clicking on the frame and then select “View Source” and saving the source to a file in disk. Open the saved page using any text editor and locate the FORM tag for the form that is giving the problem. Change the form action property to the URL for the form.jsp page as shown in Figure 12. Figure 12. Action URL change for Avis.com

     Figure 12. Action URL change for Avis.com

Now load the saved page (with the action URL modification) using a browser, enter the values and submit the form. You should see a page similar to the one shown in figure 13, listing all the values submitted. Figure 13. The form.jsp page output

     Figure 13. The form.jsp page output

Now you can compare those values with the values presented in the “Last Submission” tab in the ontology creation wizard (see figure 14). Fill the values not submitted by OntoBuilder with the appropriate values as indicated by the form.jsp page and try again. Parameters ending with .x or .y are images parameters indicating the coordinate x and y where the user clicked; these parameters don’t need to have the same value but their presence is required.

Figure 14. Parameters submitted by OntoBuilder

     Figure 14. Parameters submitted by OntoBuilder
1.1.5.3. When Everything Else Fails

So, you have tried every trick outlined in this document but is still not possible to get the ontology, then you can try one last thing: load the page from disk. By using any Internet browser, save the HTML page to disk and then open it in OntoBuilder using the “Open…” submenu of the “File” menu, and then use the ontology creation wizard as explained previously. Figure 15 when all fails Using this technique is useful also for multi-page ontologies. Just use your Internet browser to interact with the web application and at each step save the HTML page to disk. Then use OntoBuilder to open each page individually and generate a partial ontology for each page. Next, save all the partial ontologies as XML files. Finally open the first XML ontology using any text editor and append all the text between the <terms></terms> tags (do not include these tags) of the other partial ontologies after the last </term> (and before the </terms> tag) of the first partial ontology. Figure 15 shows how this process works.

Figure 15. 
Multi-page ontology merging

     Figure 15. Multi-page ontology merging

1.1.6. Creating New Ontology

Creating a new ontology: 1. Choose File-> New Ontology 2. The following screen will be displayed:

new ontology

2.1 Fill the fields: Name – name of the ontology. Title – description of the ontology.

3. To add new terms to the onotology right click on terms-> Add Term…

main panel

4. In the name field write the name of the term, and in the domain choose category for the term (booleam, date, text, number, etc.)

new terms screen

5. Click ok

6. To view the new term, double click on terms

7. To view the terms feature or add new subterm double click on the term

8. To add a subterm, right click on the subterm option and choose add term

adding a subterm

8.1. Repeat 3-4

9. When finished save the ontology as a light ontology by:

File->Save Light Ontology

- Note: *.ont files that were saved in Ontobuilder versions prior to Ontobuilder 2.2 will not be supported when opened in Ontobuilder 2.2. This is due to incompatibility of objects serialized in Java 1.4 being opened in Java 1.6.

10. Done

1.1.7. Matching 2 onotologies and comparing to an exact match

Ontobuilder allows you to match between different ontologies using different ontology matchers (Similarity Flooding, Combined Algorithm, Precedence Algorithm, Term&Value Combined Algorithm, Graph Algorithm, Value Algorithm, Term Algorithm). To learn more about the matchers, you may want to read the following papers: http:bitbucket.org/tomers77/ontobuilder/downloads/dis.pdf http:ilpubs.stanford.edu:8090/497/1/2001-25.pdf http:cdn.bitbucket.org/tomers77/smb-service/downloads/SMB_Article_InformationSystems.pdf

To do this,

1. Open the 2 ontoligies you want to match. For your convenience, attached here are 2 ontologies and an exact match between them (for windows user, right click on the link and choose "save link as").

Galei Eilat Ontology, Daniel Hotel Ontology, Exact match

2. Go to: Ontology -> Ontology Merging Wizard

3. Pick the 2 ontologies you wish to match, than select Next.

Figure 16. Ontologies Selection

     Figure 16. Step 1: Ontologies Selection

4. In the following page you may add and exact match file (optional). Press Next when done. 5. In the next page you can choose the algorithm you wish to match the ontologies with and adjust it's parameters. The matching process (for the given files) may take a minute or two.

Figure 17. Set matching algorithm

     Figure 17. Set matching algorithm

Note: Make sure the parameters you put, are valid.

6. Finally a matching information page will be displayed.

Matching Information

     Figure 18. Matching Information Page

You can use the different tabs to analyize your results. Most of the options in tabs are self explanatory. Notice that in the Matching graph tab, you can navigate through the different matching results that the algorithm have produce. you may also see the results under 1-1 and 1-many constraints.

1.1.8. Using the Top K Framework Graphic Tool

You can use the Top K Framework Graphic Tool in order to view and save best mappings of selected candidate and target ontology xml files.

Please follow these steps in order to get best mappings: 1. Pick a candidate ontology XML file from your file system, or from one of the previous selected files.

2. Pick a target ontology XML file from your file system, or from one of the previous selected files. 3. Select a matching algorithm to use for match (This is an OntoBuilder Match Algorithm).

4. (optional) define the threshold for the match.

- The threshold is a real number in interval [0,1]

- Once defining a threshold , the “Match Information” table will show only those matches who whose match confidence is equal of bigger than the threshold.

5. Press (go) button , the first best mapping (1:1) will be created and shown in the table.

6. You may click with your mouse on a row of the table to see the matched terms information.

7. Press on (rightarrow) button to get next best mapping.

8. Press on (left) button to view the previous best mapping.

- Pressing (go) button will return you the first best mapping view.

9. Press on (disc) to save the current displayed best mapping into an XML.

- The best mapping xml file will be named under this format: <candidate ontology filename>||<target ontology filename>||<best map index>||.xml

10. Press on (printer picture) to print the current best mapping. (not implemented yet)

1.1.9. Schema Mappings Utilities

The Top K Framework includes utility class named “SchemaMatchingsUtilities”, which offers this facilities:

  • Differentiating two given best mappings (encapsulated within the SchemaTranslator Objects).
  • Saving the best mappings differentiation into XML file.
  • Printing the differentiation of best mapping into standard output.
  • Creation of threshold based best mapping.
  • Returning all possible translation for a given attribute under a mapping.
  • Reading XML best mapping file and recovering the SchemaTranslator Object out of it.
  • Calculating best mapping precision against the exact mapping.

1.1.10. Using the Exact Mapping Creation Tool

You can use the Exact Mapping Creation Tool in order to create exact mapping XML files that can be later used by the framework to evaluated best mappings. Currently, only 1:1 mappings are supported by the tool.

Please follow these steps in order to create the exact mapping:

1. Pick a candidate ontology XML file from your file system, or from one of the previous selected files.

2. Pick a target ontology XML file from your file system, or from one of the previous selected files.

3. Press on (wand) button, the candidate and target terms will be displayed in the table.

4. Mark in the table what are the exact pairs (third column).

5. Press on (disk) button to save the exact mapping. If the mapping is not 1:1, you will get notified to change your selections.

1.1.11. Evaluating schema matching results

1.2. Using ontobuilder as a .Jar File

1.2.1. Integrating ontobuiler to your project

1. Download the Ontobuilder project from: Ontobuilder Without GUI

2. Run the build.xml, located at ontobuilder.build project, using ANT, make sure you select the Build Without GUI option.

In order to start the build process, go to the project and right click on build.xml , then Run As > Ant Build. You must have JDK installed on your machine.

3. import the following jar to your project:

ontobuilder.core (Ontology, Thesaurus, Utilities - DOM, Files, Graphs, Network...) ontobuilder.extraction.webform (WebForm to Ontology and HTML) ontobuilder.io (Importers, Exporters, XML...) ontobuilder.matching (1st Line and 2nd Line Match Algorithms, Match Matrix, Statistics...) The

4. Copy into your project’s directory the following three folders from ontobuilder.build/ontobuilder - ‘config’, ‘xsd’ and ‘dtds’.

o Note: There are additional jar files in the ontobuilder.build/lib folder. If you run a project from eclipse and reference the different OntoBuilder projects, you don’t need to do anything, but if you reference the generated OntoBuilder jars, you also need to reference these lib jars.

Imoportant files in the project:

  • ontobuilder.core. config.application.properties
  • ontobuilder.core. config. configuration.xml
  • ontobuilder.core. config. resources.properties
  • ontobuilder.core.dtds.*
  • ontobuilder.core.xsd.*

In the following link you will find a sample project, which imports ontobuilder: OntobuilderSample.rar

This project references the

1.2.2. Performing ontology extraction from webforms

Here is an example for extracting an ontology from a URL

package Application;

import java.util.Iterator;
import java.util.Vector;
import schemamatchings.ontobuilder.OntoBuilderWrapper;
import schemamatchings.ontobuilder.OntoBuilderWrapperException;
import com.modica.ontology.Term;
import com.modica.ontology.Ontology;
import com.modica.ontology.OntologyModel;


/**
 * @author Nimrod Busany
 * @param Provide a URL for extraction (for example: "https://www.avis.com/car-rental/reservation/time-place.ac")
 * @throws OntoBuilderWrapperException 
 * This class is a example for using the Ontobuilder ontologies extraction methods
 */

public class OntologyExtractionExample {
		
	private static void printInstructions() 
	{
		System.err.println("Missing / invalid argument");
		System.out.println("Enter URL, for example: 'https://www.avis.com/car-rental/reservation/time-place.ac'");
		System.exit(1);
	}
	
	/**
	 * @param Terms vector you wish to print
	 * @param ModelId number
	 */
	
	private static void countTermChilds(Vector<Term> TA, long modelId){
		Iterator<Term> It = TA.iterator();		
		int termCounter = 0;
		while (It.hasNext()){
			Term term = It.next();
			System.out.println(term);
			//count all children
			termCounter+=term.getAllChildren().size();
			//count parent
			termCounter++;
		}
		System.out.println("\n ------------------------------------------------------- \n");
		System.out.println( "\n Terms Count: " + termCounter + "\nModel ID:" + modelId);	
	}
	
	/**
	 * @param Provide a URL for extraction (for example: "https://www.avis.com/car-rental/reservation/time-place.ac")
	 * @throws OntoBuilderWrapperException 
	 */

	public static void main(String[] args) throws OntoBuilderWrapperException
	{
		//Check input
		if (args.length<1) printInstructions();
		String URL = args[0];
		OntoBuilderWrapper obw = new OntoBuilderWrapper();
		Ontology ontology = obw.generateOntology(URL, false);
		OntologyModel model = ontology.getModel();
		//prints all terms, assumes no nodes has more than one parent
		countTermChilds(model.getTerms(),model.getId());				
	}
	
}
  • The OntoBuilder wrapper object also offers the following options:
  • Generating an Ontology Object from a given web form URL:

1.2.3. Schema Matching tools

Here is an example for performing a match. Two ontologies and thier exact match can be found bellow:

4-sportsillustrated.xml

6-people.xml

6-people.xml_4-sportsillustrated.xml_EXACT.xml

package Application;

import java.util.StringTokenizer;
import schemamatchings.ontobuilder.MatchMatrix;
import schemamatchings.ontobuilder.OntoBuilderWrapper;
import schemamatchings.ontobuilder.OntoBuilderWrapperException;
import schemamatchings.util.SchemaMatchingsUtilities;
import schemamatchings.util.SchemaTranslator;
import com.modica.ontology.Ontology;
import com.modica.ontology.match.MatchInformation;
import java.io.File;

/** 
 * <p>Title: Ontobuilder Matching Ontologies example</p>
 * <p>Description: This class is an example for using the Ontobuilder ontologies extraction methods
 * @param Provide a URL with two .xml files consisting of ontologies and an exact match in a file with .xml_EXACT.xml extension
 * (for example: 6-people.xml,4-sportsillustrated.xml_EXACT,6-people.xml_4-sportsillustrated.xml_EXACT.xml)
 * @author Nimrod Busany, Tomer Sagi
 * @version 1.
 */

public class OntologyMatchingExample {
	
	private static OntoBuilderWrapper ontoBuilderWrapper = new OntoBuilderWrapper();
	private static String URL="C:\\schema\\6-people.xml_4-sportsillustrated.xml_EXACT";
	private static int availableMatchers = 7;
	private static String Matchers[] = {"Term Match","Value Match","Term and Value Match","Graph Match","Precedence Match","Combined Match", "similarityFlooding Match"}; 
	private static Ontology  target;
	private static Ontology candidate;
	private static String available2ndLMatchers[] = MappingAlgorithms.ALL_ALGORITHM_NAMES;
	private static SchemaTranslator exactMapping;
	private static void printInstructions() 
	{
		System.err.println("Missing / invalid argument");
		System.out.println("Enter URL, for example: 'https://www.avis.com/car-rental/reservation/time-place.ac'");
		System.exit(1);
	}
	
	
	//Assumes the folder holds 3 files according to the class input parameters description 
	private static void loadXML(File subDir) {
		
	    File[] aXmlFiles = subDir.listFiles();
	    if (aXmlFiles == null) {
	      return;
	    }
	    else {
	      String sTargetOnologyName = null;
	      String sCandidateOntologyName = null;
	      String sExactMappingFileName = null;
	      String sTargetOnologyFileName = null;
	      String sCandidateOntologyFileName = null;

	      for (int i = 0; i < aXmlFiles.length; i++) {
	        File sXmlFile = aXmlFiles[i];
	        String sXmlFileName = sXmlFile.getName();
	        if (sXmlFileName.matches(".*xml_.*xml_EXACT.xml")) {
	          StringTokenizer st = new StringTokenizer(sXmlFileName, "_");
	          if (st.countTokens() != 3) {
	            return;
	          }
	          sExactMappingFileName = sXmlFile.getPath();
	          sCandidateOntologyName = st.nextToken();
	          sTargetOnologyName = st.nextToken();
	          break;
	        }
	      }
	      for (int i = 0; i < aXmlFiles.length; i++) {
	        File sXmlFile = aXmlFiles[i];
	        String sXmlFileName = sXmlFile.getName();
	        if (sXmlFileName.equals(sTargetOnologyName)) {
	          sTargetOnologyFileName = sXmlFile.getPath();
	        }
	        if (sXmlFileName.equals(sCandidateOntologyName)) {
	          sCandidateOntologyFileName = sXmlFile.getPath();
	        }
	      }
	     
	      try {target = ontoBuilderWrapper.readOntologyXMLFile(sTargetOnologyFileName,false);}
	      catch (Exception e) {
	        System.out.println("XML Load failed on:" + sTargetOnologyFileName);
	    	System.exit(0);
	            	  
	      }
	      try {candidate = ontoBuilderWrapper.readOntologyXMLFile(sCandidateOntologyFileName,false);}
	      catch (Exception e) {
		        System.out.println("XML Load failed on:" + sCandidateOntologyFileName);
		    	System.exit(0);         	  
		      }
	      
	      try {exactMapping = SchemaMatchingsUtilities.readXMLBestMatchingFile(sExactMappingFileName);}
	      catch (Exception e) {
	    	  System.out.println("XML Load failed on:" + sExactMappingFileName);
	    	  System.exit(0);      	  
	      } 
	    }
	  }
	

	public static void main(String[] args) throws OntoBuilderWrapperException
	{
		
		//Check input
		if (args.length<1) printInstructions();
		URL = args[0];
		
		loadXML(new File(URL));
		MatchInformation firstLineMI[] = new MatchInformation[availableMatchers];
		SchemaTranslator firstLineST[] = new SchemaTranslator[availableMatchers];
		
		//first line matchers		
		for (int i=0;i<availableMatchers;i++)
        {		
			long mm_gen_time = System.currentTimeMillis();
			firstLineMI[i] = ontoBuilderWrapper.matchOntologies(candidate,target,Matchers[i]);
			// boolean was set to null as default choice
			firstLineST[i] = new SchemaTranslator(firstLineMI[i]);
			firstLineST[i].importIdsFromMatchInfo(firstLineMI[i],true);
			System.out.println("Number of matches: " + firstLineST[i].getMatchedAttributesPairsCount());
			mm_gen_time = System.currentTimeMillis() - mm_gen_time;
    	    System.out.println("MatchMatrix generation Time: " + mm_gen_time);
    	    System.out.println("----------------------------------------------------------------------");
    	   
        }	
		System.out.println("True number of matches: " + exactMapping.getMatchedPairs().length);
	
	}
  • Here is a simpler example:
package ac.technion.schemamatching.test;

import java.io.File;
import ac.technion.iem.ontobuilder.core.ontology.Ontology;
import ac.technion.iem.ontobuilder.core.utils.files.XmlFileHandler;
import ac.technion.iem.ontobuilder.io.matchimport.NativeMatchImporter;
import ac.technion.iem.ontobuilder.matching.algorithms.line1.common.MatchingAlgorithmsNamesEnum;
import ac.technion.iem.ontobuilder.matching.match.MatchInformation;
import ac.technion.iem.ontobuilder.matching.wrapper.OntoBuilderWrapper;


public class TomerTest {

	
	public static void main(String[] args) throws Exception{
		OntoBuilderWrapper obw = new OntoBuilderWrapper();
		XmlFileHandler xhf = new XmlFileHandler();
		String folder = "C:\\Users\\tomer_s\\Dropbox\\workspace\\OntobuilderResearchEnvironment\\schema\\WebForm\\1-time.xml_2-surfer.xml_EXACT";
		Ontology o1 = xhf.readOntologyXMLFile(folder+"\\"+"1-time.xml",true);
		Ontology o2 = xhf.readOntologyXMLFile(folder+"\\"+"2-surfer.xml",true);
		MatchInformation mi = obw.matchOntologies(o1,o2,MatchingAlgorithmsNamesEnum.TERM.getName());
		NativeMatchImporter imp = new NativeMatchImporter();
		MatchInformation exact = imp.importMatch(mi,new File(folder+"\\"+"1-time.xml_2-surfer.xml_EXACT.xml"));
		System.out.println("Match Complete, Precision: " + mi.getPrecision(exact) + "Recall: " + mi.getRecall(exact));
	}
}
  • You may also match webforms
//match two web forms by supplying there URLs
MatchInformation match = obw. matchOntologies(new URL(candidateURL), new (URL targetURL),MatchingAlgorithms.TERM);

1.2.4. Schema meta matching

To use the second line marchers you can use the following code (uses methods from previous section):

	public static void main(String[] args) throws OntoBuilderWrapperException
	{
		
		//Check input
		//if (args.length<1) printInstructions();
		//URL = args[0];
		
		loadXML(new File(URL));
		MatchInformation firstLineMI[] = new MatchInformation[availableMatchers];
		SchemaTranslator firstLineST[] = new SchemaTranslator[availableMatchers];
		SchemaTranslator secondLineST[] = new SchemaTranslator[available2ndLMatchers.length*Matchers.length];
        MatchMatrix firstLineMM[]= new MatchMatrix[Matchers.length];
		//first line matchers		
		for (int i=0;i<availableMatchers;i++)
        {		
			
			firstLineMI[i] = ontoBuilderWrapper.matchOntologies(candidate,target,Matchers[i]);
			firstLineST[i] = new SchemaTranslator(firstLineMI[i]);
			firstLineST[i].importIdsFromMatchInfo(firstLineMI[i],true);
			firstLineMM[i] = firstLineMI[i].getMatrix();
			BestMappingsWrapper.matchMatrix = firstLineMI[i].getMatrix();
			System.out.println(Matchers[i] + ": ------------>>>");
    	    
    	    //second line matchers (combining the results of several matchers)
    	    for (int mp=0;mp<available2ndLMatchers.length;mp++) {   
    	    		SchemaTranslator ST = BestMappingsWrapper.GetBestMapping(available2ndLMatchers[mp]);
    	    		if (ST==null)
					{
						System.err.println("empty match :" +  Matchers[i] + "matcher:" + available2ndLMatchers[mp]);
						continue;
					}
    	    		ST.importIdsFromMatchInfo(firstLineMI[i],true);
    	     		System.out.println ("Finished 2nd line matching with: " + Matchers[i] + " and " + available2ndLMatchers[mp]);
    	     		System.out.println ("Number of matches: " + ST.getMatchedPairs().length);
    		}
    	    System.out.println("----------------------------------------------------------------------");
        }
    	 System.out.println("True number of matches: " + exactMapping.getMatchedPairs().length);
}

You may use the SchemaTranslator to translate attributes from candidate schemata to target schemata and vise versa. For instance suppose you want to translate candidate schemata attribute: “candAttribute1”, you should write the following code line in order to get its translation in the target schemata:

String translation  = st.translateAttribute(candAttribute1);

You may also get the translation match weight:

double translationWeight = st. getTranslationWeight(candAttribute1);

You may also get the total weight of all the matched attributes:

double totalMatchWeight = st. getTotalMatchWeight();

You may get all the matched attribute pairs:

MatchedAttributePair[] matchedPairs = getMatchedPairs() ;

You may also print the translation into standard output :

st.printTranslations(); 

1.2.5. Integrating the Top K framework with OntoBuilder

You may use the Top K framework for general 1:1 matching from your code. In order to integrate OntoBuilder with the Top K , you have to put the algorithms.xml file in the root directory of your project. You may use this file as a tuner for OntoBuilder build in algorithms.

  • The Top K framework can be easily integrated with OntoBuilder for full schema matching process through your code:
  • You will have to add these import statements to your code:
import com.modica.ontology.*;
import com.modica.ontology.match.*;

import schemamatchings.ontobuilder. *;
import schemamatchings.topk..wrapper.*;
import schemamatchings.util.*;
  • Suppose you want to match two ready Ontologies extracted from OntoBuilder:
  • Write these code lines in order to get the best matching:
OntoBuilderWrapper obw = new OntoBuilderWrapper();

try{

    Ontology candidateOntology = obw. readOntologyXMLFile(candOntology.xml);
    Ontology targetOntology  = obw. readOntologyXMLFile(targetOntology.xml);
   
    //supposing you want to use the Term Match Algorithm
    MatchInformation match = obw. matchOntologies(candidateOntology,
                                                        targetOntology, MatchingAlgorithms.TERM);

    //now create a wrapper Object for schema matching
    SchemaMatchingsWrapper smw = new SchemaMatchingsWrapper(match);

    //from now you can start using the Top K framework from the wrapper object

    SchemaTranslator st = null;

    for (int i = 1;i< max_k_parameter;i++){
        
         st = smw.getNextBestMatching();
         //suppose you want to save the match into xml file named “match<i>”
         //use this version of SchemaTranslator.saveMatchToXML
         st.saveMatchToXML(i,candSchemataName,targetSchemataName,  
                                           match+i);
    }
   catch(IOException ioe){ 
       //if any IO error occurs
   }
   catch(SchemaMatchingsException sme){
       //if any exception occuered in smw
   }
   catch(Exception e){
       //for any other exceptions
   }
  • Suppose now we want to get translations for the candidate ontology terms :
               ArrayList terms = match.getOriginalCandidateTerms();

                 Iterator termsIterator = terms.iterator();
                 
                 While (termsIterator.hasNext()){
                     
                     Term candTerm = (Term) termsIterator.next();
                     //use the SchemaTranslator to translate the candidate term
                     Term matchedTargetTerm = st. translateTerm(candTerm ,smw);
                 }
  • The SchemaMatchings Wrapper also offers the same Top K matching facilities:
//getting a K-th best matching
SchemaTranslator st = smw.getKthBestMatching(k);//k is an integer number

If you only need the first two best matching , you are advised to use these lines
which support the first two best matching calculation in O(V^3):

//getting the first best matching
SchemaTranslator st = smw.getBestMatching();// supposing you need the first best matching

//getting the second best matching
SchemaTranslator st = smw.getSecondBestMatching(true);//true – flags the runner to use a better algorithm  for the second best matching

//resetting the Top K framework for new match
//supposing you got a new MatchInformation Object
smw.reset(newMatch);

1.2.6. Evaluating schema matching results

Two important metrics for evaluating the correctness of a match are Precision and Recall.

Let us define the following terms which will help us understand the two:

1. True positive - a true match which we classified as a match.

2. True negative - a mismatch which we classified as a mismatch.

3. False positive - a mismatch which classified as a match.

4. False negative - a match which we classified as a mismatch.

Precision is the number of true positives divided by the number of true positives + false positives. Recall is the number of true positives divided by the number of true positives + false negatives.

One can look at Precision as a measure to tell how many true matches did we really get out of the matches we classified. And at Recall as:"out of all of the true matches("in the real world") how many did we find.

The two measures may contradict one another, and if we want to elevate one it may cost at the cost of the other. Cosider the two extreme scenarios:

1. classifying every possible pair as a match would yield 1 recall but very low precision.

2. Whereas, being very strict (on classifing a match) would yield very high precision but also very low recall.

For further reading you may refer to the wiki page: Percision and Recall

The SchemaMatchingsUtilities class allows you to easily calculate both of these metrics. You can use the code bellow to learn about some of the methods provided by SchemaMatchingsUtilities class.

Add the following code at the end of the matching process (see sections 1.2.4-5)

                System.out.println("--------------------------------------------");
                System.out.println("Analysing the results");
                System.out.println("--------------------------------------------");
                
                for (int i=0;i<availableMatchers;i++)
                {               
                	double precision = SchemaMatchingsUtilities.calculatePrecision(exactMapping,firstLineST[i]);
                	double recall = SchemaMatchingsUtilities.calculateRecall(exactMapping,firstLineST[i]);
                	System.out.println(Matchers[i] + " Yielded:\n Percision: " + precision + "	 Recall: " + recall);
               }

The results we got for secion 1.2.3 and the 2 given .xmls were:

Percision and Recall

1.2.7. using the various components of Ontobuilder

Attached bellow is an example of a project that uses the ontobuilder capabilities. you can learn how to import from .xsd to an xml format, or match between two ontologies (saved in .xml format).

Ontobuilder Sample project

The sample project references the following projects from the ontobuilder:

  • ontobuilder.core
  • ontobuilder.io
  • ontobuilder.matching

1.3. Command Line invocation

As a command line tool, OntoBuilder can perform two operations: ontology generation and ontology matching. From the command prompt (and assuming you are in the root directory of OntoBuilder) type:

java com.modica.ontobuilder.OntoBuilder -g|-generate -url <URL> -o|-output <file> [-n|-normalize]
java com.modica.ontobuilder.OntoBuilder -m|-match -targetURL <URL> -candidateURL <URL> -o|-output <file> [-n|-normalize]

The first command is to perform ontology generation from a URL, outputting the results to a file, and optionally normalizing the ontology generated (To normalize an ontology is to build a more consistent version of the ontology by means of hierarchical grouping and domain recognition. For details see the publications for MSU Thesis, CoopsIS 2001 and TDKE 2003).

1.3.1. Using the Top K Framework From Command Line

Use this command line syntax in order to run the Top K framework: Java –classpath OntoBuilder.jar schemamatchings.wrapper.TopK –co <Candidate Ontology XML file path> -to <Target Ontology XML file path> -out <match output filename (will be saved as XML file) –alg <OntoBuilder match algorithm index>

where:

OntoBuilder match algorithm index := 0 (Term Match) | 1 (Value Match) | 2 (Term and Value Match) | 3 (Combined Match) | 4 (Precedence Match) | 5 (Graph Macth)

- After you run this command , the first best matching will be created in file: <match output filename> 1.xml

- You will be asked to continue to the next best matching or stop the Top K process (by selecting option “n” or “N”)

- If you wish to continue (type “y” or “Y”) , the next best matching will be created in file: <match output filename> <k+1>.xml

- You can open the match output XML file using “topKmatch.dtd”

3. Configuring OntoBuilder

OntoBuilder can be configured by editing the file configuration.xml located in the OntoBuilder’s root directory. The file is an XML document where parameters are specified using a parameter tag containing the name of the parameter, the current value and the default value (in case it has one). Most of the parameters can be set using the graphical interface by using the menu Tools->Options.

4. Extension

4.1. Implementing Matching Algorithm Plug-ins in OntoBuilder

4.1.1. Implementing New Matching Algorithm

OntoBuilder let you implement and deploy new matching algorithms.

Adding a New 1st Line Algorithm

  • Add a new class to ac.technion.iem.ontobuilder.matching.line1 package o The new class should extend ac.technion.iem.ontobuilder.matching.line1.misc .AbstractAlgorithm
  • If the algorithm needs to be used via the GUI, add a new class to ac.technion.iem.ontobuilder.gui.tools.algorithms.line1 o The new class should extend ac.technion.iem.ontobuilder.gui.tools.algorithms.line1.AbstractAlgorithmGui o Add the new class to ac.technion.iem.ontobuilder.gui.tools.algorithms.line1.AlgorithmsGuiFactory
  • Add the properties of the new algorithm to ontobuilder.core. matching. algorithms.xml

Adding a New 2nd Line Algorithm

  • Add a new class to ac.technion.iem.ontobuilder.matching.line2 package o The new class should extend ac.technion.iem.ontobuilder.matching.algorithms.line2.meta. AbstractMetaAlgorithm
  • Add an enum value to ac.technion.iem.ontobuilder.matching.algorithms.line2.meta. MetaAlgorithmNamesEnum
  • Add an additional implementation to ac.technion.iem.ontobuilder.matching.algorithms.line2.meta. MetaAlgorithmsFactory

4.1.2. Setting the binaries

Our next step now is to generate a jar file for the new algorithm we implemented. Use the following command to generate the jar: javac RandomAlgorithm.java jar –cf random.jar RandomAlgorithm.class

Put the jar file into OntoBuilder\lib\ folder and edit OntoBuilder.bat file by adding random.jar into the classpath.

4.1.3. Describing the new matcher.

Open the file OntoBuilder\config\algorithms.xml with your text editor. Add the following description entry to the file:

  <algorithm name="Random Match">
  <class>RandomAlgorithm</class> 
  <parameters>
-<parameter>
        <name>seed</name> 
            <value>7</value> 
  	      <default>7</default> 
     </parameter>
 </parameters>
</algorithm>

4.1.4. Test the matcher deployment

Run OntoBuilder and test the deployment by running the Ontology Match Wizard.

            Figure 1: Matching Algorithm Selection:  

Figure 1: Matching Algorithm Selection

4.2. Implementing new importers/exporter

Adding a New Exporter/Importer

  • Add a new class to ac.technion.iem.ontobuilder.io.exports/imports
  • The new class should implement ac.technion.iem.ontobuilder.io.exports.Exporter/ ac.technion.iem.ontobuilder.io.imports.Importer
  • Add the properties of the new Exporter/Importer to ontobuilder.core. io. exporters.xml/ importers.xml

Appendix:

If you have any more questions regarding the use of ontobuilder you can send e-mail to : ontobuilder@ie.technion.ac.il

You may also find the following toturials helpful (most of the information in them can be fiound in this wiki page).

Updated