Commits

Tim Molendijk  committed 8b3ecab

Outline sort of ready for putting on hold

  • Participants
  • Parent commits 352d23c

Comments (0)

Files changed (6)

File thesis-v2.log

-This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) (format=latex 2010.5.7)  29 JUN 2010 13:59
+This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) (format=pdflatex 2010.5.7)  8 JUL 2010 12:53
 entering extended mode
  %&-line parsing enabled.
 **thesis-v2.tex
 (/etc/texmf/tex/latex/config/color.cfg
 File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive
 )
-Package color Info: Driver file: dvips.def on input line 130.
+Package color Info: Driver file: pdftex.def on input line 130.
 
-(/usr/share/texmf-texlive/tex/latex/graphics/dvips.def
-File: dvips.def 1999/02/16 v3.0i Driver-dependant file (DPC,SPQR)
-)
-(/usr/share/texmf-texlive/tex/latex/graphics/dvipsnam.def
-File: dvipsnam.def 1999/02/16 v3.0i Driver-dependant file (DPC,SPQR)
+(/usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def
+File: pdftex.def 2007/01/08 v0.04d Graphics/color for pdfTeX
+\Gread@gobject=\count89
 ))
 (/usr/share/texmf-texlive/tex/latex/base/textcomp.sty
 Package: textcomp 2005/09/27 v1.99g Standard LaTeX package
 Package: keyval 1999/03/16 v1.13 key=value parser (DPC)
 \KV@toks@=\toks19
 )
-\lst@mode=\count89
+\lst@mode=\count90
 \lst@gtempboxa=\box26
 \lst@token=\toks20
-\lst@length=\count90
+\lst@length=\count91
 \lst@currlwidth=\dimen103
-\lst@column=\count91
-\lst@pos=\count92
+\lst@column=\count92
+\lst@pos=\count93
 \lst@lostspace=\dimen104
 \lst@width=\dimen105
-\lst@newlines=\count93
-\lst@lineno=\count94
-\c@lstlisting=\count95
+\lst@newlines=\count94
+\lst@lineno=\count95
+\c@lstlisting=\count96
 \lst@maxwidth=\dimen106
 
 (/usr/share/texmf-texlive/tex/latex/listings/lstpatch.sty
 )
 (/usr/share/texmf-texlive/tex/latex/listings/lstmisc.sty
 File: lstmisc.sty 2004/09/07 1.3 (Carsten Heinz)
-\c@lstnumber=\count96
-\lst@skipnumbers=\count97
+\c@lstnumber=\count97
+\lst@skipnumbers=\count98
 \lst@framebox=\box27
 )
 (/usr/share/texmf-texlive/tex/latex/listings/listings.cfg
 
 (/usr/share/texmf-texlive/tex/latex/float/float.sty
 Package: float 2001/11/08 v1.3d Float enhancements (AL)
-\c@float@type=\count98
+\c@float@type=\count99
 \float@exts=\toks21
 \float@box=\box28
 \@float@everytoks=\toks22
 \@floatcapt=\box29
 )
 \@float@every@inset=\toks23
-\c@inset=\count99
+\c@inset=\count100
  (./thesis-v2.aux)
 \openout1 = `thesis-v2.aux'.
 
 File: t1cmtt.fd 1999/05/25 v2.5h Standard LaTeX font definitions
 ) [1
 
-] [1] (./thesis-v2.toc
+{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}] [1] (./thesis-v2.toc
 LaTeX Font Info:    External font `cmex10' loaded for size
 (Font)              <10.95> on input line 3.
-
-[1
+ [1
 
 ] [2])
 \tf@toc=\write3
 
  ) 
 Here is how much of TeX's memory you used:
- 3147 strings out of 95087
- 44964 string characters out of 1183279
- 94108 words of memory out of 1500000
- 6289 multiletter control sequences out of 10000+50000
+ 3080 strings out of 95086
+ 43504 string characters out of 1183256
+ 96437 words of memory out of 1500000
+ 6222 multiletter control sequences out of 10000+50000
  23054 words of font info for 51 fonts, out of 1200000 for 2000
  28 hyphenation exceptions out of 8191
  26i,8n,32p,337b,339s stack positions out of 5000i,500n,6000p,200000b,5000s
+ </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecbi1095.600pk> </home/tim
+/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecbi1200.600pk> </home/tim/.texmf-var/f
+onts/pk/ljfour/jknappen/ec/ecti0900.600pk> </home/tim/.texmf-var/fonts/pk/ljfou
+r/jknappen/ec/ectt0900.600pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec
+/ecrm0900.600pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecrm0600.600
+pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecrm0800.600pk> </home/ti
+m/.texmf-var/fonts/pk/ljfour/jknappen/ec/ectt1095.600pk> </home/tim/.texmf-var/
+fonts/pk/ljfour/jknappen/ec/tcrm1095.600pk> </home/tim/.texmf-var/fonts/pk/ljfo
+ur/jknappen/ec/ecbx1200.600pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/e
+c/ecbx1440.600pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecbx2074.60
+0pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecti1095.600pk> </home/t
+im/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecbx2488.600pk> </home/tim/.texmf-var
+/fonts/pk/ljfour/jknappen/ec/ecrm1095.600pk> </home/tim/.texmf-var/fonts/pk/ljf
+our/jknappen/ec/ecbx1095.600pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/
+ec/ectt1200.600pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecrm1200.6
+00pk> </home/tim/.texmf-var/fonts/pk/ljfour/jknappen/ec/ecrm1728.600pk>
+Output written on thesis-v2.pdf (76 pages, 371113 bytes).
+PDF statistics:
+ 909 PDF objects out of 1000 (max. 8388607)
+ 0 named destinations out of 1000 (max. 131072)
+ 1 words of extra memory for PDF output out of 10000 (max. 10000000)
 
-Output written on thesis-v2.dvi (76 pages, 219268 bytes).

File thesis-v3.aux

 \relax 
-\@writefile{toc}{\contentsline {chapter}{\numberline {1}Atomic findings}{3}}
+\@writefile{toc}{\contentsline {chapter}{\numberline {1}Atomic findings}{4}}
 \@writefile{lof}{\addvspace {10\p@ }}
 \@writefile{lot}{\addvspace {10\p@ }}
-\@writefile{toc}{\contentsline {section}{\numberline {1.1}Our aggregated address book is a Semantic Web\xspace  application}{3}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.2}Availability of RDF data sources is a problem}{3}}
-\newlabel{availability_of_rdf}{{1.2}{3}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.1}Our aggregated address book is a Semantic Web\xspace  application}{4}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.2}Availability of RDF data sources is a problem}{4}}
+\newlabel{availability_of_rdf}{{1.2}{4}}
 \citation{Kinsella08}
 \citation{Hogan07}
-\@writefile{toc}{\contentsline {section}{\numberline {1.3}Modeling in RDF is an expert-task}{4}}
-\newlabel{modeling_expert_task}{{1.3}{4}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.4}Data integration is a problem at three levels}{4}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.5}Structural integration}{5}}
-\newlabel{structural_integration}{{1.5}{5}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.6}Syntactical integration}{5}}
-\newlabel{syntactical_integration}{{1.6}{5}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.7}Semantical integration}{6}}
-\newlabel{semantical_integration}{{1.7}{6}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.8}Automated reasoning depends on expressivity of underlying standards}{8}}
-\newlabel{reasoning_expressivity}{{1.8}{8}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.9}Automated reasoning performance is a tricky topic}{8}}
-\newlabel{reasoning_performance}{{1.9}{8}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.10}SPARQL (potentially) requires a lot of queries to fetch a single data set}{8}}
-\newlabel{many_queries_required}{{1.10}{8}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.11}User interfaces can fuse interface elements and data on server or on client}{9}}
-\newlabel{ui_fusion}{{1.11}{9}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.12}User interfaces can be domain-independent or domain-dependent}{9}}
-\newlabel{ui_domain_dependence}{{1.12}{9}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.3}Modeling in RDF is an expert-task}{5}}
+\newlabel{modeling_expert_task}{{1.3}{5}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.4}Data integration is a problem at three levels}{5}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.5}Structural integration}{6}}
+\newlabel{structural_integration}{{1.5}{6}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.6}Syntactical integration}{6}}
+\newlabel{syntactical_integration}{{1.6}{6}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.7}Semantical integration}{7}}
+\newlabel{semantical_integration}{{1.7}{7}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.8}Automated reasoning depends on expressivity of underlying standards}{9}}
+\newlabel{reasoning_expressivity}{{1.8}{9}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.9}Automated reasoning performance is a tricky topic}{9}}
+\newlabel{reasoning_performance}{{1.9}{9}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.10}SPARQL (potentially) requires a lot of queries to fetch a single data set}{9}}
+\newlabel{many_queries_required}{{1.10}{9}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.11}User interfaces can fuse interface elements and data on server or on client}{10}}
+\newlabel{ui_fusion}{{1.11}{10}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.12}User interfaces can be domain-independent or domain-dependent}{10}}
+\newlabel{ui_domain_dependence}{{1.12}{10}}
 \citation{activerdf}
 \citation{Oren07}
 \citation{activerdf}
-\@writefile{toc}{\contentsline {section}{\numberline {1.13}Object-triple mapping to make RDF data compatible with existing web application frameworks}{10}}
-\newlabel{otm}{{1.13}{10}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.14}Data source properties}{10}}
-\newlabel{source_properties}{{1.14}{10}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.15}Publishing types and their properties}{11}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.16}Linked Data and custom API make assumptions about data model}{11}}
-\newlabel{publisher_assumptions}{{1.16}{11}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.17}Consumer types and their properties}{12}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.18}Publishing type converters}{12}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.19}Publishing for optimal reusability}{12}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.20}Publishing data via SPARQL endpoint}{13}}
-\newlabel{publishing_sparql}{{1.20}{13}}
-\@writefile{toc}{\contentsline {subsubsection}{The good}{13}}
-\@writefile{toc}{\contentsline {subsubsection}{The bad}{14}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.21}Publishing data via Linked Data}{15}}
-\@writefile{toc}{\contentsline {subsubsection}{The good}{15}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.13}Object-triple mapping to make RDF data compatible with existing web application frameworks}{11}}
+\newlabel{otm}{{1.13}{11}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.14}Data source properties}{11}}
+\newlabel{source_properties}{{1.14}{11}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.15}Publishing types and their properties}{12}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.16}Linked Data and custom API make assumptions about data model}{12}}
+\newlabel{publisher_assumptions}{{1.16}{12}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.17}Consumer types and their properties}{13}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.18}Publishing type converters}{13}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.19}Publishing for optimal reusability}{13}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.20}Publishing data via SPARQL endpoint}{14}}
+\newlabel{publishing_sparql}{{1.20}{14}}
+\@writefile{toc}{\contentsline {subsubsection}{The good}{14}}
 \@writefile{toc}{\contentsline {subsubsection}{The bad}{15}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.22}Publishing data via custom API}{16}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.21}Publishing data via Linked Data}{16}}
 \@writefile{toc}{\contentsline {subsubsection}{The good}{16}}
 \@writefile{toc}{\contentsline {subsubsection}{The bad}{16}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.23}Consuming distributed data}{17}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.24}Consuming distributed data via query federation}{17}}
-\newlabel{query_federation}{{1.24}{17}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.25}Consuming distributed data via caching locally}{17}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.22}Publishing data via custom API}{17}}
+\@writefile{toc}{\contentsline {subsubsection}{The good}{17}}
+\@writefile{toc}{\contentsline {subsubsection}{The bad}{17}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.23}Consuming distributed data}{18}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.24}Consuming distributed data via query federation}{18}}
+\newlabel{query_federation}{{1.24}{18}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.25}Consuming distributed data via caching locally}{18}}
 \citation{Heitmann09}
-\@writefile{toc}{\contentsline {section}{\numberline {1.26}Integration providers can reduce complexity}{18}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.27}Data integration can be achieved in an elegant manner}{19}}
-\@writefile{toc}{\contentsline {paragraph}{Follows from}{19}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.28}Separation between application logic and data model}{19}}
-\newlabel{separation_of_concerns}{{1.28}{19}}
-\@writefile{toc}{\contentsline {section}{\numberline {1.29}Internal models are (relatively) flexible}{20}}
-\newlabel{flexible_model}{{1.29}{20}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.26}Integration providers can reduce complexity}{19}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.27}Data integration can be achieved in an elegant manner}{20}}
+\@writefile{toc}{\contentsline {paragraph}{Follows from}{20}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.28}Separation between application logic and data model}{20}}
+\newlabel{separation_of_concerns}{{1.28}{20}}
+\@writefile{toc}{\contentsline {section}{\numberline {1.29}Internal models are (relatively) flexible}{21}}
+\newlabel{flexible_model}{{1.29}{21}}
+\@writefile{toc}{\contentsline {chapter}{\numberline {2}Case study}{22}}
+\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\addvspace {10\p@ }}
+\citation{activerdf}
+\citation{Oren07}
+\citation{activerdf}
+\@writefile{toc}{\contentsline {chapter}{\numberline {3}Architecture}{24}}
+\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\addvspace {10\p@ }}
+\@writefile{toc}{\contentsline {section}{\numberline {3.1}Persistent storage}{24}}
+\@writefile{toc}{\contentsline {section}{\numberline {3.2}Data interface}{24}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.2.1}Object-oriented programming interface}{24}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.2.2}Limitations of SPARQL}{25}}
+\@writefile{toc}{\contentsline {section}{\numberline {3.3}Integration service}{25}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.3.1}Structural integration}{25}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.3.2}Syntactical integration}{26}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.3.3}Semantical integration}{27}}
+\@writefile{toc}{\contentsline {subsubsection}{Automated inference}{28}}
+\@writefile{toc}{\contentsline {paragraph}{Complexity}{28}}
+\@writefile{toc}{\contentsline {paragraph}{Performance}{28}}
+\@writefile{toc}{\contentsline {section}{\numberline {3.4}User interface}{29}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.4.1}Server- or client-side}{29}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.4.2}Domain-independent or -dependent}{29}}
+\@writefile{toc}{\contentsline {chapter}{\numberline {4}I/O}{31}}
+\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\addvspace {10\p@ }}
+\@writefile{toc}{\contentsline {section}{\numberline {4.1}Properties}{31}}
+\@writefile{toc}{\contentsline {section}{\numberline {4.2}Input (consume)}{31}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.2.1}Distributed data}{32}}
+\@writefile{toc}{\contentsline {subsubsection}{Query federation}{32}}
+\@writefile{toc}{\contentsline {subsubsection}{Local copy}{32}}
+\@writefile{toc}{\contentsline {section}{\numberline {4.3}Output (publish)}{33}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.3.1}SPARQL endpoint}{34}}
+\@writefile{toc}{\contentsline {subsubsection}{The good}{34}}
+\@writefile{toc}{\contentsline {subsubsection}{The bad}{34}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.3.2}Linked Data}{35}}
+\@writefile{toc}{\contentsline {subsubsection}{The good}{35}}
+\@writefile{toc}{\contentsline {subsubsection}{The bad}{36}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.3.3}Custom API}{36}}
+\@writefile{toc}{\contentsline {subsubsection}{The good}{36}}
+\@writefile{toc}{\contentsline {subsubsection}{The bad}{37}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.3.4}Interface converters}{37}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.3.5}Optimal reusability}{37}}
+\@writefile{toc}{\contentsline {chapter}{\numberline {5}Separation of concerns}{39}}
+\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\addvspace {10\p@ }}
+\@writefile{toc}{\contentsline {section}{\numberline {5.1}Data integration is data modeling}{39}}
+\@writefile{toc}{\contentsline {section}{\numberline {5.2}Data model flexibility}{39}}
+\citation{Kinsella08}
+\citation{Hogan07}
+\@writefile{toc}{\contentsline {chapter}{\numberline {6}Learning curve}{40}}
+\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\addvspace {10\p@ }}
+\@writefile{toc}{\contentsline {section}{\numberline {6.1}Data modeling}{40}}
+\@writefile{toc}{\contentsline {chapter}{\numberline {7}Ecosystem}{41}}
+\@writefile{lof}{\addvspace {10\p@ }}
+\@writefile{lot}{\addvspace {10\p@ }}
+\@writefile{toc}{\contentsline {section}{\numberline {7.1}Availability of RDF data}{41}}
 \bibcite{Kinsella08}{1}
 \bibcite{Hogan07}{2}
 \bibcite{activerdf}{3}

File thesis-v3.dvi

Binary file modified.

File thesis-v3.log

-This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) (format=latex 2010.5.7)  8 JUL 2010 01:11
+This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) (format=latex 2010.5.7)  26 JUL 2010 00:43
 entering extended mode
  %&-line parsing enabled.
 **thesis-v3.tex
 
 [1
 
-])
+] [2])
 \tf@toc=\write3
 \openout3 = `thesis-v3.toc'.
 
- [2]
+ [3]
 Chapter 1.
 
 Overfull \hbox (1.36787pt too wide) in paragraph at lines 42--42
 er-nal sources,
  []
 
-[3
+[4
 
 ]
 LaTeX Font Info:    External font `cmex10' loaded for size
 (Font)              <9> on input line 99.
 LaTeX Font Info:    External font `cmex10' loaded for size
 (Font)              <5> on input line 99.
- [4]
+ [5]
 Overfull \hbox (0.29942pt too wide) in paragraph at lines 149--151
 []\T1/cmr/m/n/10.95 F.e. if the do-main of ev-ery foaf:mbox prop-
  []
 
-[5] [6] [7] [8] [9]
+[6] [7] [8] [9] [10]
 Overfull \hbox (2.78566pt too wide) in paragraph at lines 327--327
 []\T1/cmr/bx/n/14.4 Object-triple map-ping to make RDF data com-
  []
 
 
-LaTeX Warning: Reference `ecosystem' on page 10 undefined on input line 346.
+LaTeX Warning: Reference `ecosystem' on page 11 undefined on input line 346.
 
-[10] [11]
+[11] [12]
 
-LaTeX Warning: Reference `publishing_linkeddata' on page 12 undefined on input 
+LaTeX Warning: Reference `publishing_linkeddata' on page 13 undefined on input 
 line 450.
 
 
-LaTeX Warning: Reference `publishing_customapi' on page 12 undefined on input l
+LaTeX Warning: Reference `publishing_customapi' on page 13 undefined on input l
 ine 450.
 
 
-LaTeX Warning: Reference `publishing_linkeddata' on page 12 undefined on input 
+LaTeX Warning: Reference `publishing_linkeddata' on page 13 undefined on input 
 line 454.
 
 
-LaTeX Warning: Reference `publishing_customapi' on page 12 undefined on input l
+LaTeX Warning: Reference `publishing_customapi' on page 13 undefined on input l
 ine 456.
 
-[12]
+[13]
 
-LaTeX Warning: Reference `publishing_requirements' on page 13 undefined on inpu
+LaTeX Warning: Reference `publishing_requirements' on page 14 undefined on inpu
 t line 480.
 
-[13] [14] [15] [16] [17] [18] [19] [20] [21
+[14] [15] [16] [17] [18] [19] [20] [21]
+Chapter 2.
+[22
+
+] [23]
+Chapter 3.
+
+LaTeX Warning: Reference `ecosystem' on page 24 undefined on input line 788.
+
+[24
+
+] [25]
+Overfull \hbox (0.29942pt too wide) in paragraph at lines 854--856
+[]\T1/cmr/m/n/10.95 F.e. if the do-main of ev-ery foaf:mbox prop-
+ []
+
+[26] [27] [28] [29] [30]
+Chapter 4.
+[31
+
+] [32]
+
+LaTeX Warning: Reference `publishing_requirements' on page 33 undefined on inpu
+t line 1132.
+
+[33] [34] [35] [36]
+
+LaTeX Warning: Reference `publishing_linkeddata' on page 37 undefined on input 
+line 1283.
+
+
+LaTeX Warning: Reference `publishing_customapi' on page 37 undefined on input l
+ine 1283.
+
+
+LaTeX Warning: Reference `publishing_linkeddata' on page 37 undefined on input 
+line 1287.
+
+
+LaTeX Warning: Reference `publishing_customapi' on page 37 undefined on input l
+ine 1289.
+
+[37] [38]
+Chapter 5.
+[39
+
+]
+Chapter 6.
+[40
+
+]
+Chapter 7.
+[41
+
+] [42
 
 ] (./thesis-v3.aux)
 
 
  ) 
 Here is how much of TeX's memory you used:
- 3083 strings out of 95087
- 43717 string characters out of 1183279
- 92865 words of memory out of 1500000
- 6229 multiletter control sequences out of 10000+50000
- 20194 words of font info for 47 fonts, out of 1200000 for 2000
+ 3085 strings out of 95087
+ 43739 string characters out of 1183279
+ 93867 words of memory out of 1500000
+ 6230 multiletter control sequences out of 10000+50000
+ 20970 words of font info for 48 fonts, out of 1200000 for 2000
  28 hyphenation exceptions out of 8191
  27i,8n,32p,337b,383s stack positions out of 5000i,500n,6000p,200000b,5000s
 
-Output written on thesis-v3.dvi (22 pages, 55232 bytes).
+Output written on thesis-v3.dvi (43 pages, 106116 bytes).

File thesis-v3.tex

 
 \todo{Entailed by and defined in terms of section \ref{separation_of_concerns}.}
 
+
+\chapter{Case study}
+
+\begin{itemize}
+  \item We had to employ adapters to get RDF from the data sources.
+  \item No generic adapter components are available, so we had to develop them
+    ourselves.
+  \item Our adapters take care of some syntax validation (compliance to
+    specifications) and data model alignment as well.
+  \item Illustrate implemenation of adapters.
+  \item Mention some syntax validation (validation of literals) that we did not
+    implement but could (or should) be added, and illustrate what implementation
+    would look like.
+  \item Illustrate the statements that are capable of consolidating overlapping
+    contact resources.
+  \item Explain how consolidation is powered (reasoning engine: OWLIM).
+  \item Explain why this works (how our consolidation problem falls within the
+    capabilities of automated inferencing).
+  \item Elaborate on performance issues we ran into and how we solved them.
+  \item Illustrate limitations: deduplication is not possible.
+  \item Illustrate how we overcome this problem: filtering in application code.
+  \item Explain the reality of geocoding our contacts, and why it is problematic
+    (no good RDF sources, no good interfaces).
+  \item We used an object-triple mapping (SuRF).
+  \item How we setup our user interface: client-side, domain-dependent, based on
+    Exhibit.
+  \item Illustrate implementation of the parts: templating, custom API.
+\end{itemize}
+
+\chapter{Architecture}
+
+\section{Persistent storage}
+
+\ldots
+
+\section{Data interface}
+
+\subsection{Object-oriented programming interface}
+
+Providing a object-oriented API (such as ActiveRDF \cite{activerdf}) to our RDF
+data separates the application logic from the persistence layer and makes \SW
+data compatible with existing web application frameworks and allows developers
+to work with a familiar paradigm.
+
+Problems:
+
+\begin{itemize}
+  \item There is a mismatch between object-oriented and triple-based, which
+    results in some problems (see Table 2 in \cite{Oren07}). \todo{Distill
+    problems (if any?) from \cite{activerdf}.}
+  \item They abstract away interaction with SPARQL endpoint, which runs the risk
+    of tanking performance as there is no longer a direct sense of the amount of
+    queries that is required. See section \ref{many_queries_required} for an
+    explanation of why many queries is a frequent reality with SPARQL.
+  \item Current software components have their problems (see section
+    \ref{ecosystem}).
+    \begin{itemize}
+      \item They still require quite some understanding of RDF; more abstraction
+        would be desirable. \todo{Give examples.}
+      \item They lack features. \todo{Mention example of lacking support for
+        fetching resources with unique identity.}
+    \end{itemize}
+\end{itemize}
+
+\subsection{Limitations of SPARQL}
+
+SPARQL's design quickly results in many queries being required to fetch a single
+data set.
+
+\begin{itemize}
+  \item F.e. fetching all unique contact resources and their properties;
+    requires a gazillion queries.
+  \item Not a problem if constructing and dispatching queries is abstracted away
+    by an OTM (see section \ref{otm}) and a query is not too expensive.
+  \item Results in bottle neck if a query is too expensive, as is the case in
+    our aggregated address book which sends queries over HTTP to the Sesame2
+    API. \todo{Why is this too expensive??}
+\end{itemize}
+
+\section{Integration service}
+
+Structural, syntactical and semantical.
+
+\subsection{Structural integration}
+
+Structural integration of two RDF data sets comes down to constructing the union
+of the two sets.  The problem is that it assumes availability of RDF data, which
+is not an assumption we can make as we have seen in section
+\ref{availability_of_rdf}.
+
+As a result we have to compensate with adapters:
+
+\begin{itemize}
+  \item Responsible for structural integration: rewriting data as RDF.
+  \item Give example from case study: e-mail addresses.
+  \item No usable generic components available. \todo{Illustrate landscape and
+    explain why it is not usable.}
+  \item Therefore they have to be developed, which means they are on the
+    application developer's plate.
+    \begin{itemize}
+      \item It reduces separation of concerns (see section
+        \ref{separation_of_concerns}).
+      \item It complicates the ability of delegating modeling to an expert (see
+        section \ref{modeling_expert_task}).
+      \item It reduces the flexibility of the application's data model (see
+        section \ref{flexible_model}).
+    \end{itemize}
+  \item Can (and probably will) take care of some syntactical and semantical
+    integration as well. \todo{Illustrate on the basis of adapters from
+    aggregated address book.}
+  \item Also known as wrappers, although this term is used mostly for components
+    that act as an interface in front of another, rather than hardcoded
+    components that fetch and restructure some data.
+\end{itemize}
+
+\subsection{Syntactical integration}
+
+Syntactical integration means data validation. Two types of data validation can
+be distinguished:
+
+\begin{description}
+  \item[Compliance to specifications] F.e. if the domain of every foaf:mbox
+    property is indeed a foaf:Agent.
+  \item[Validation of literals] F.e. if the value (object) of a foaf:mbox
+    represents an actual e-mail address (instead of ``tim at timmolendijk dot
+    nl'' or ``none'' etc.). In this category validation can be complemented with
+    correction.
+\end{description}
+
+Compliance to specifications is a task that can be carried out by a generic
+components, but:
+
+\begin{itemize}
+  \item It assumes that we accept uncompromised and strict enforcement of the
+    specifications, which can be problematic as we have seen in section
+    \ref{modeling_expert_task}.
+  \item Currently there are no software components that can do this.
+\end{itemize}
+
+Validation of literals is a domain-specific or application-specific task and
+thus requires a custom component. Compliance to specifications requires a custom
+component because no generic components are available.
+
+\begin{itemize}
+  \item Give example from case study: e-mail addresses.
+  \item This is a development task, which means it is on the application
+    developer's plate.
+    \begin{itemize}
+      \item It reduces separation of concerns (see section
+        \ref{separation_of_concerns}).
+      \item It complicates the ability of delegating modeling to an expert (see
+        section \ref{modeling_expert_task}).
+      \item It reduces the flexibility of the application's data model (see
+        section \ref{flexible_model}).
+    \end{itemize}
+  \item Adapters, if used, will often take upon a large share (if not all) of
+    the syntactical integration tasks (of both types). \todo{Illustrate how.}
+\end{itemize}
+
+\subsection{Semantical integration}
+
+Semantical integration can be disected into:
+
+\begin{description}
+  \item[Alignment] Alignment of data that is described using different
+    ontologies.
+  \item[Consolidation] Consolidation of overlapping resources.
+\end{description}
+
+Both can be achieved using automated reasoning:
+
+\begin{itemize}
+  \item Give example from case study: multiple e-mail addresses
+  \item Integration logic can be defined declaratively, in the language of the
+    data, while the actual processing can be delegated to a generic software
+    component (a reasoning engine). \todo{Give examples of both alignment and
+    consolidation, from case study.}
+    \begin{itemize}
+      \item This allows for improved separation of concerns between data model
+        and application logic (see section \ref{separation_of_concerns}).
+      \item It allows designing the integration statements to be done by a data
+        modeling expert (see section \ref{modeling_expert_task}).
+      \item It improves the flexibility of the application's data model (see
+        section \ref{flexible_model}).
+    \end{itemize}
+    \todo{This preceding list is a positive formulation of the same list that is
+    formulated negatively in the sections on structural and syntactical
+    integration. Change this section or the others for the sake of consistency?}
+\end{itemize}
+
+But:
+
+\begin{itemize}
+  \item If an adapter is employed for every data source the alignment task is
+    usually already taken care of by the adapters, and optionally (part of) the
+    consolidation as well.
+  \item Unfortunately there are limits to what can be achieved through
+    inferences, mainly in the area of consolidation \todo{(is this true? can we
+    think of limitations in the area of ontology alignment?)}. As soon as we
+    have a use case that requires consolidation that goes beyond these limits,
+    we will have to resort to algorithms in application code to get the job
+    done. \todo{Give example about deduplication of contact resources, which is
+    simply not possible without adding some extra data (hash of foaf:name) and
+    having the application carry out part of the filtering process.}
+  \item Both factors entail that at least part of the integration definition
+    needs to be defined in applciation code, thereby reducing the benefits that
+    we have given earlier.
+\end{itemize}
+
+\subsubsection{Automated inference}
+
+\paragraph{Complexity}
+
+Answering the following questions can be very hard and requires expert
+knowledge, as the description of the semantics of
+OWLIM\footnote{\url{http://www.ontotext.com/owlim/rdfs_rules_owl.html}}
+illustrates.
+
+\begin{itemize}
+  \item Which semantics are supported (understood) by the reasoning engine?
+  \item Which semantics do you need (use) in your integration logic?
+\end{itemize}
+
+\todo{Check chapter 5 of the Working Ontologist book.}
+
+\paragraph{Performance}
+
+\begin{itemize}
+  \item In case of forward reasoning: after every data update (create, update,
+    delete) a full closure needs to be computed. Forward reasoning allows for
+    'offline' closure computation.
+  \item In case of backward reasoning: after every query a full closure needs to
+    be computed. Backward reasoning requires 'online' closure computation.
+  \item Depends heavily on the reasoning engine that is used.
+  \item Depends on the computational complexity of the inferences that are being
+    made (which is not the same as the inferences that are needed). See section
+    \ref{reasoning_expressivity}.
+\end{itemize}
+
+\todo{Check chapter 5 of the Working Ontologist book.}
+
+\todo{Investigate literature to find out to what extent optimized components can
+mitigate the performance trickness.}
+
+\section{User interface}
+
+\subsection{Server- or client-side}
+
+\begin{description}
+  \item[Server] Data does not need to be serialized in order to be transferred
+    over the network. We have direct access to the data interface, and thus its
+    object-oriented interface to our RDF data (see section \ref{otm}).
+  \item[Client] 'Ajax-compatible.' Data needs to be serialized in a format that
+    can be easily parsed by Javascript, i.e. JSON. Our application must do this
+    (diminishes flexibility of our model, see section \ref{flexible_model}), or
+    a general-purpose converter must be used; self-employed or external such as
+    morph\footnote{\url{http://convert.test.talis.com/}}.
+\end{description}
+
+\subsection{Domain-independent or -dependent}
+
+Domain-independent:
+
+\begin{itemize}
+  \item Search
+  \item Visualization
+  \item Faceted browsing
+\end{itemize}
+
+Anything with requirements beyond these three options needs a domain-dependent
+user interface:
+
+\begin{itemize}
+  \item makes assumptions about the data model
+  \item these assumptions are hard-coded on the place where interface elements
+    and data are fused (see section \ref{ui_fusion})
+  \item illustrate using example from case study: e-mail addresses
+\end{itemize}
+
+\chapter{I/O}
+
+\section{Properties}
+
+Data sources can have any number of the following features/characteristics:
+
+\begin{enumerate}
+  \item[RDF data] standard data structure
+  \item[Automatically discoverable] standard interface
+  \item[Control over data structure]: custom data structure: consumers may
+    prefer certain structures like not too nested or all records under key
+    'items' (like Exhibit does)
+  \item[Retrievable] custom interface: data must be retrievable in a form that
+    matches the needs of the consumer
+  \item[Authentication] \ldots
+  \item[Authoring] create, update, delete
+  \item[Data as JSON over HTTP] \ldots
+  \item[Cross-domain interaction] JSONP or CORS
+\end{enumerate}
+
+\section{Input (consume)}
+
+We can distinguish the following types of consumers:
+
+\begin{enumerate}
+  \item Crawler (integration provider).
+  \item Application/UI:
+    \begin{enumerate}
+      \item in-browser (Javascript);
+      \item on-server.
+    \end{enumerate}
+\end{enumerate}
+
+We can evaluate each of them by which of the properties from section
+\ref{source_properties} they require.
+
+\todo{Illustrate using examples such as geocoding scenario\ldots express
+uselessness of standard interfaces and data structures in case of application/ui
+consumer: we need to know the exact models that are in the data in order to be
+able to query meaningful chunks.}
+
+\subsection{Distributed data}
+
+\subsubsection{Query federation}
+
+Mediator component that federates a query to an arbitrary number of various
+external data sources (SPARQL endpoints).
+
+\begin{itemize}
+  \item Dependency on SPARQL endpoints, which is a problem -- see section
+    \ref{availability_of_rdf}.
+  \item Complex and no mature implementations
+  \item Limited semantical integration possibilities:
+    \begin{itemize}
+      \item the full data set is not available so we cannot calculate a full
+        closure. E.g. unification of identical resources is not possible
+        optimally because there is no guarantee that two sources that contain
+        entities with the same identity always deliver them both (or both not).
+      \item mediator does not do semantic integration, so the query that is sent
+        to it must deal with the various data models. E.g. explicitly call for
+        resources of type foaf:Agent as well as type vcard:VCard.
+    \end{itemize}
+\end{itemize}
+
+\subsubsection{Local copy}
+
+Copy all relevant data from the various sources of interest, then execute
+queries over them locally.
+
+\begin{itemize}
+  \item SPARQL is a MDBQL, but not really useful because:
+    \begin{itemize}
+      \item dependency on SPARQL endpoints;
+      \item potentially huge amounts of network transfers at the time of the
+        query (online caching): inefficient and unacceptable response times.
+    \end{itemize}
+  \item By adopting a custom cache population approach, we can solve both issues
+    from previous point:
+    \begin{itemize}
+      \item no dependency on interfaces via adapters (see section
+        \ref{structural_integration});
+      \item offline caching by calculating the closure in a separated process,
+        only when it is needed (see secion \ref{reasoning_performance}).
+    \end{itemize}
+  \item But hard limit: when data sets get too large it gets more and more
+    problematic to copy all data and do reasoning over it. Potential way around
+    this problem: leverage existing third party cache, but feasibility depends
+    on:
+    \begin{itemize}
+      \item availability of caching service
+      \item trust in quality of data.
+    \end{itemize}
+  \item Data invalidation is/can be a hard problem.
+  \item Full semantical integration possibilities, as we can apply reasoning on
+    the full data set (that is; after the copy, before the query).
+\end{itemize}
+
+\section{Output (publish)}
+
+We can distinguish the following types of consumers:
+
+\begin{enumerate}
+  \item Crawler (integration provider).
+  \item Application/UI:
+    \begin{enumerate}
+      \item in-browser (Javascript);
+      \item on-server.
+    \end{enumerate}
+\end{enumerate}
+
+We can evaluate each of them by which of the properties from section
+\ref{source_properties} they require.
+
+\todo{Illustrate using examples such as geocoding scenario\ldots express
+uselessness of standard interfaces and data structures in case of application/ui
+consumer: we need to know the exact models that are in the data in order to be
+able to query meaningful chunks.}
+
+\subsection{SPARQL endpoint}
+
+In section \ref{publishing_requirements} we have seen which factors play a role
+in publishing data. Let us discuss an approach based on a SPARQL endpoint and
+its impact on these factors.
+
+\subsubsection{The good}
+
+\begin{itemize}
+  \item It is capable of returning RDF (via CONSTRUCT). (1)
+  \item All data is discoverable as soon as the endpoint URL is specified or
+    registered with an integration provider. (2)
+  \item Capable of returning JSON in the form of:
+    \begin{itemize}
+      \item application/sparql-results+json SELECT result serialization;
+      \item RDF/JSON CONSTRUCT result serialization.
+    \end{itemize}
+    The SPARQL protocol works over HTTP. (7)
+\end{itemize}
+
+\subsubsection{The bad}
+
+\begin{itemize}
+  \item Data interface (and thus OTM, see section \ref{otm}) is skipped, which
+    means this abstraction needs to take place on the side of the data consumer.
+    This is problematic because:
+    \begin{itemize}
+      \item no in-browser OTM exists;
+      \item latency between OTM and SPARQL endpoint is no longer under our
+        control, which is not acceptable as we have seen in section
+        \ref{many_queries_required}.
+    \end{itemize}
+    (3)
+  \item The structure of the published data is restricted to the capabilities of
+    SPARQL's results model. (4)
+  \item Authentication is not supported, at least not in Sesame2. \todo{Check
+    other implementations.} (5)
+  \item Authoring data is not possible, at least not until SPARQL 1.1 is widely
+    employed. (6)
+  \item Neither JSONP or CORS are supported. Possible work-arounds:
+    \begin{itemize}
+      \item CORS support could be hacked on top of it, but that can hardly be
+        considered an acceptable solution.
+      \item Pass data through a converter that supports JSONP output (locally
+        installed or external such as Talis' morph Semantic
+        Converter\footnote{\url{http://convert.test.talis.com/}}).
+      \item Install local server-side component that pipes the JSON.
+    \end{itemize}
+    (8)
+\end{itemize}
+
+\subsection{Linked Data}
+
+\todo{Rewrite to same setup as \ref{publishing_sparql}.}
+
+\todo{Mention RDFa as a form of publishing Linked Data.}
+
+Yes: 1, 2, 4, 6 (RDF/JSON), 7 (only CORS). No: 3, 5
+
+\subsubsection{The good}
+
+\begin{itemize}
+  \item Linked Data is RDF data by definition.
+  \item Linked Data is discoverable as long as other Linked Data resources link
+    to ours, or if we specify or register one of the resources.
+  \item It would be possible to add an authentication layer on top of the Linked
+    Data dereferencable Linked Data URIs (just like you would do it with
+    traditional document resources). But it is questionable if it would make
+    much sense, as it effectively renders the incoming links useless. It seems
+    that Linked Data as a concept is fundamentally targeted at public data.
+  \item There is a RDF/JSON format that is supported by some RDF libraries,
+    although it is not a standard (yet). Note that the need for JSON means we
+    cannot use RDFa which is an (X)HTML format.  (Or actually, we can use it but
+    it would require additional converters to be set up.)
+  \item CORS can be supported, although it is not part of most Linked Data
+    publishing frameworks (such as
+    Pubby\footnote{\url{http://www4.wiwiss.fu-berlin.de/pubby/}}), so it must be
+    slapped on top ourselves.
+\end{itemize}
+
+\subsubsection{The bad}
+
+\begin{itemize}
+  \item The big problem with Linked Data is that is not designed to be easily
+    retrievable (in the sense of computationally efficient). The idea is to give
+    every resource its own URI, which can be dereferenced to get a description.
+    Now imagine having an application that needs a list of hundred of those
+    resources with their descriptions. It would require at least one hundred
+    HTTP request-response interactions in order to complete the list. Now
+    imagine the application running in the browser of some user who could be
+    located anywhere on the network; it would be very inefficient and slow. 
+  \item Linked Data is all (and only) about publishing and therefore the data is
+    read-only; no authoring options.
+  \item JSONP cannot be supported, as it is not part of the RDF/JSON
+    specification. Possible ways around this problem would be to install a
+    local server-side component that pipes the JSON, or to run it through a
+    converter that supports JSONP (like morph).
+\end{itemize}
+
+\subsection{Custom API}
+
+\todo{Rewrite to same setup as \ref{publishing_sparql}.}
+
+Yes: 1, 3, 4, 5, 6 (anything), 7. No: 2
+
+\subsubsection{The good}
+
+\begin{itemize}
+  \item We can supply our data in any structure we like and that suits our
+    domain and data models the best, including RDF.
+  \item It can efficiently retrieve data from the persistent storage as it has
+    direct access to the data interface.
+  \item It can implement authentication (local/custom, OAuth, HTTP Auth).
+  \item It can support all the authoring options that are deemed necessary (and
+    similarly it can explicitly not support all the authoring options that are
+    considered harmful or dangerous).
+  \item It can support any format we like (JSON, JSONP) over HTTP.
+  \item It can support CORS if we want to.
+  \item We can design our interface in the way that it makes the most sense for
+    our specific domain and our data models.
+\end{itemize}
+
+\subsubsection{The bad}
+
+\begin{itemize}
+  \item Our data is not (automatically) discoverable.
+\end{itemize}
+
+\subsection{Interface converters}
+
+\begin{itemize}
+  \item SPARQL wrappers (D2R, Semantic Bridge)
+  \item Linked Data from SPARQL endpoint (Pubby)
+\end{itemize}
+
+Wrappers have the same architectural position as adapters (see
+section \ref{structural_integration}), but they provide a full SPARQL interface
+and need to be capable of on-the-fly translation of queries to corresponding
+requests to the wrapped data source and map the data to RDF model at the same
+time. Complexity and limitations depend on nature of wrapped source:
+
+\begin{itemize}
+  \item Relational database: feasible
+  \item Web service: not feasible
+\end{itemize}
+
+\subsection{Optimal reusability}
+
+Published data is optimally reusable if the same publication can be consumed by
+all of the following in a workable fashion:
+
+\todo{Insert comparison table with results from sections
+\ref{publishing_sparql}, \ref{publishing_linkeddata} and
+\ref{publishing_customapi}.}
+
+\begin{itemize}
+  \item We have seen that neither SPARQL (see section \ref{publishing_sparql})
+    or Linked Data (see section \ref{publishing_linkeddata}) is a feasible
+    approach for publishing data for the purpose of reuse by applications. We
+    need a dedicated API (see section \ref{publishing_customapi}) for that, like
+    the APIs that are commonly implemented in traditional web applications.
+  \item No authentication and authoring in case of SPARQL and Linked Data; makes
+    it only suitable for public (shared) data.
+  \item We have also seen that custom APIs have the disadvantage that their data
+    will not be discovered by automated integration providers, as they will not
+    know how to talk to the API.
+  \item We conclude that in order to facilitate for maximum reusability of our
+    data, we will have to supply both a custom API and a SPARQL endpoint or
+    Linked Data.
+  \item This sucks, because:
+    \begin{itemize}
+      \item it means the developer needs to do all the work of a traditional web
+        application, and then some more. No benefit on the side of the developer
+        here!
+      \item providing a data interface for a (local) user interface can be used
+        as an incentive to make developers contribute to the web of data; this
+        incentive is now lost.
+    \end{itemize}
+\end{itemize}
+
+
+\chapter{Separation of concerns}
+
+\section{Data integration is data modeling}
+
+\begin{itemize}
+  \item integration service (see sections on integration) and user interface
+    (see sections \ref{ui_domain_dependence} and \ref{publisher_assumptions})
+    require descriptive specifications of (part of) the data model, which makes
+    for an asymetric separation between application logic and data model:
+    application depends on data, data does not depend on application.
+  \item semantic integration is possible without the help of any dedicated
+    integration procedures in the application logic. Instead general-purpose
+    components can be leveraged (reasoning engine), and integration logic can be
+    defined declaratively and in the same language as the data (RDF). As such,
+    integration semantics are part of the data model and therefore the task of
+    the data modeler instead of the application developer. I.e. separation of
+    concerns between application logic (developer) and data model (modeler).
+\end{itemize}
+
+\section{Data model flexibility}
+
+\todo{Entailed by and defined in terms of section \ref{separation_of_concerns}.}
+
+\chapter{Learning curve}
+
+\section{Data modeling}
+
+Making modeling decisions when working with RDF and shared ontologies is hard
+and error-prone, which makes it a task the should be left to experts (instead of
+application developers). This calls for a maximum separation of concerns between
+data model and application logic, which happens to be a promise of \SW
+technology. Unfortunately, as we will see in section
+\ref{separation_of_concerns}, this promise is only partially realized.
+
+\begin{itemize}
+  \item It requires an in-depth understanding of RDF(S) and OWL. Books have been
+    written about their semantics and its implications; a lack of proficience
+    can lead to unsuspected errors and unintended implications with big impact.
+  \item Specifications of public vocabularies are open to interpretation.
+    \todo{Give example about foaf:img having foaf:Person as its domain, which
+    seems to imply that foaf:Agent cannot have a foaf:img property. Yet this
+    feels absurd and not workable.}
+  \item Specifications of public vocabularies are confusing. \todo{Give example
+    about FOAF's specification document claiming to be version 0.97 while it
+    prescribes the namespace \url{http://xmlns.com/foaf/0.1/}.}
+\end{itemize}
+
+\todo{Get more examples from our case study, the \SW bug
+tracker\footnote{\url{http://bugs.semanticweb.org/}}, \cite{Kinsella08} and
+\cite{Hogan07}.}
+
+\chapter{Ecosystem}
+
+\section{Availability of RDF data}
+
+None of the data sources that are candidates for being used in our aggregated
+address book provide RDF data:
+
+\begin{description}
+  \item[Contacts] Most personal contact data management service providers offer
+    programmatic access, but all provide data in a custom structure via a custom
+    interface.
+  \item[Geocoding] GeoNames provides geoposition data as RDF, but not at address
+    level and via a custom interface. Workable geocoding providers such as
+    Google Maps and Yahoo! Maps do offer programmatic access, but as a custom
+    data structure via a custom interface.
+\end{description}
+
+
 \begin{thebibliography}{9}
 
 \bibitem{Kinsella08}

File thesis-v3.toc

-\contentsline {chapter}{\numberline {1}Atomic findings}{3}
-\contentsline {section}{\numberline {1.1}Our aggregated address book is a Semantic Web\xspace application}{3}
-\contentsline {section}{\numberline {1.2}Availability of RDF data sources is a problem}{3}
-\contentsline {section}{\numberline {1.3}Modeling in RDF is an expert-task}{4}
-\contentsline {section}{\numberline {1.4}Data integration is a problem at three levels}{4}
-\contentsline {section}{\numberline {1.5}Structural integration}{5}
-\contentsline {section}{\numberline {1.6}Syntactical integration}{5}
-\contentsline {section}{\numberline {1.7}Semantical integration}{6}
-\contentsline {section}{\numberline {1.8}Automated reasoning depends on expressivity of underlying standards}{8}
-\contentsline {section}{\numberline {1.9}Automated reasoning performance is a tricky topic}{8}
-\contentsline {section}{\numberline {1.10}SPARQL (potentially) requires a lot of queries to fetch a single data set}{8}
-\contentsline {section}{\numberline {1.11}User interfaces can fuse interface elements and data on server or on client}{9}
-\contentsline {section}{\numberline {1.12}User interfaces can be domain-independent or domain-dependent}{9}
-\contentsline {section}{\numberline {1.13}Object-triple mapping to make RDF data compatible with existing web application frameworks}{10}
-\contentsline {section}{\numberline {1.14}Data source properties}{10}
-\contentsline {section}{\numberline {1.15}Publishing types and their properties}{11}
-\contentsline {section}{\numberline {1.16}Linked Data and custom API make assumptions about data model}{11}
-\contentsline {section}{\numberline {1.17}Consumer types and their properties}{12}
-\contentsline {section}{\numberline {1.18}Publishing type converters}{12}
-\contentsline {section}{\numberline {1.19}Publishing for optimal reusability}{12}
-\contentsline {section}{\numberline {1.20}Publishing data via SPARQL endpoint}{13}
-\contentsline {subsubsection}{The good}{13}
-\contentsline {subsubsection}{The bad}{14}
-\contentsline {section}{\numberline {1.21}Publishing data via Linked Data}{15}
-\contentsline {subsubsection}{The good}{15}
+\contentsline {chapter}{\numberline {1}Atomic findings}{4}
+\contentsline {section}{\numberline {1.1}Our aggregated address book is a Semantic Web\xspace application}{4}
+\contentsline {section}{\numberline {1.2}Availability of RDF data sources is a problem}{4}
+\contentsline {section}{\numberline {1.3}Modeling in RDF is an expert-task}{5}
+\contentsline {section}{\numberline {1.4}Data integration is a problem at three levels}{5}
+\contentsline {section}{\numberline {1.5}Structural integration}{6}
+\contentsline {section}{\numberline {1.6}Syntactical integration}{6}
+\contentsline {section}{\numberline {1.7}Semantical integration}{7}
+\contentsline {section}{\numberline {1.8}Automated reasoning depends on expressivity of underlying standards}{9}
+\contentsline {section}{\numberline {1.9}Automated reasoning performance is a tricky topic}{9}
+\contentsline {section}{\numberline {1.10}SPARQL (potentially) requires a lot of queries to fetch a single data set}{9}
+\contentsline {section}{\numberline {1.11}User interfaces can fuse interface elements and data on server or on client}{10}
+\contentsline {section}{\numberline {1.12}User interfaces can be domain-independent or domain-dependent}{10}
+\contentsline {section}{\numberline {1.13}Object-triple mapping to make RDF data compatible with existing web application frameworks}{11}
+\contentsline {section}{\numberline {1.14}Data source properties}{11}
+\contentsline {section}{\numberline {1.15}Publishing types and their properties}{12}
+\contentsline {section}{\numberline {1.16}Linked Data and custom API make assumptions about data model}{12}
+\contentsline {section}{\numberline {1.17}Consumer types and their properties}{13}
+\contentsline {section}{\numberline {1.18}Publishing type converters}{13}
+\contentsline {section}{\numberline {1.19}Publishing for optimal reusability}{13}
+\contentsline {section}{\numberline {1.20}Publishing data via SPARQL endpoint}{14}
+\contentsline {subsubsection}{The good}{14}
 \contentsline {subsubsection}{The bad}{15}
-\contentsline {section}{\numberline {1.22}Publishing data via custom API}{16}
+\contentsline {section}{\numberline {1.21}Publishing data via Linked Data}{16}
 \contentsline {subsubsection}{The good}{16}
 \contentsline {subsubsection}{The bad}{16}
-\contentsline {section}{\numberline {1.23}Consuming distributed data}{17}
-\contentsline {section}{\numberline {1.24}Consuming distributed data via query federation}{17}
-\contentsline {section}{\numberline {1.25}Consuming distributed data via caching locally}{17}
-\contentsline {section}{\numberline {1.26}Integration providers can reduce complexity}{18}
-\contentsline {section}{\numberline {1.27}Data integration can be achieved in an elegant manner}{19}
-\contentsline {paragraph}{Follows from}{19}
-\contentsline {section}{\numberline {1.28}Separation between application logic and data model}{19}
-\contentsline {section}{\numberline {1.29}Internal models are (relatively) flexible}{20}
+\contentsline {section}{\numberline {1.22}Publishing data via custom API}{17}
+\contentsline {subsubsection}{The good}{17}
+\contentsline {subsubsection}{The bad}{17}
+\contentsline {section}{\numberline {1.23}Consuming distributed data}{18}
+\contentsline {section}{\numberline {1.24}Consuming distributed data via query federation}{18}
+\contentsline {section}{\numberline {1.25}Consuming distributed data via caching locally}{18}
+\contentsline {section}{\numberline {1.26}Integration providers can reduce complexity}{19}
+\contentsline {section}{\numberline {1.27}Data integration can be achieved in an elegant manner}{20}
+\contentsline {paragraph}{Follows from}{20}
+\contentsline {section}{\numberline {1.28}Separation between application logic and data model}{20}
+\contentsline {section}{\numberline {1.29}Internal models are (relatively) flexible}{21}
+\contentsline {chapter}{\numberline {2}Case study}{22}
+\contentsline {chapter}{\numberline {3}Architecture}{24}
+\contentsline {section}{\numberline {3.1}Persistent storage}{24}
+\contentsline {section}{\numberline {3.2}Data interface}{24}
+\contentsline {subsection}{\numberline {3.2.1}Object-oriented programming interface}{24}
+\contentsline {subsection}{\numberline {3.2.2}Limitations of SPARQL}{25}
+\contentsline {section}{\numberline {3.3}Integration service}{25}
+\contentsline {subsection}{\numberline {3.3.1}Structural integration}{25}
+\contentsline {subsection}{\numberline {3.3.2}Syntactical integration}{26}
+\contentsline {subsection}{\numberline {3.3.3}Semantical integration}{27}
+\contentsline {subsubsection}{Automated inference}{28}
+\contentsline {paragraph}{Complexity}{28}
+\contentsline {paragraph}{Performance}{28}
+\contentsline {section}{\numberline {3.4}User interface}{29}
+\contentsline {subsection}{\numberline {3.4.1}Server- or client-side}{29}
+\contentsline {subsection}{\numberline {3.4.2}Domain-independent or -dependent}{29}
+\contentsline {chapter}{\numberline {4}I/O}{31}
+\contentsline {section}{\numberline {4.1}Properties}{31}
+\contentsline {section}{\numberline {4.2}Input (consume)}{31}
+\contentsline {subsection}{\numberline {4.2.1}Distributed data}{32}
+\contentsline {subsubsection}{Query federation}{32}
+\contentsline {subsubsection}{Local copy}{32}
+\contentsline {section}{\numberline {4.3}Output (publish)}{33}
+\contentsline {subsection}{\numberline {4.3.1}SPARQL endpoint}{34}
+\contentsline {subsubsection}{The good}{34}
+\contentsline {subsubsection}{The bad}{34}
+\contentsline {subsection}{\numberline {4.3.2}Linked Data}{35}
+\contentsline {subsubsection}{The good}{35}
+\contentsline {subsubsection}{The bad}{36}
+\contentsline {subsection}{\numberline {4.3.3}Custom API}{36}
+\contentsline {subsubsection}{The good}{36}
+\contentsline {subsubsection}{The bad}{37}
+\contentsline {subsection}{\numberline {4.3.4}Interface converters}{37}
+\contentsline {subsection}{\numberline {4.3.5}Optimal reusability}{37}
+\contentsline {chapter}{\numberline {5}Separation of concerns}{39}
+\contentsline {section}{\numberline {5.1}Data integration is data modeling}{39}
+\contentsline {section}{\numberline {5.2}Data model flexibility}{39}
+\contentsline {chapter}{\numberline {6}Learning curve}{40}
+\contentsline {section}{\numberline {6.1}Data modeling}{40}
+\contentsline {chapter}{\numberline {7}Ecosystem}{41}
+\contentsline {section}{\numberline {7.1}Availability of RDF data}{41}