Wiki

Clone wiki

Okapi / LayeredContentModels

Layered Content Models

Original Content

<p>
  A <strong><em>cockatoo</em></strong> is any of the 21 <a href="wp:BirdSpecies">bird species</a> 
  belonging to the family <em>Cacatuidae</em>. Eleven of the 21 species exist in the wild only in Australia.
</p>

Approaches

Flat text with offsets

<content>
  A cockatoo is any of the 21 bird species belonging to the family Cacatuidae. 
  Eleven of the 21 species exist in the wild only in Australia.
</content>
<layers>
 <layer id='codes'>
   <entry start-offset='2' end-offset='2'>
     <skl>&lt;strong&gt;&lt;em&gt;</skl>
   </entry>
   <entry start-offset='10' end-offset='10'>
     <skl>&lt;/em&gt;&lt;/strong&gt;</skl>
   </entry>
   <entry start-offset='27' end-offset='27'>
     <skl>&lt;a href="wp:BirdSpecies"&gt;</skl>
   </entry>
   <entry start-offset='39' end-offset='39'>
     <skl>&lt;/a&gt;</skl>
   </entry>
   <entry start-offset='63' end-offset='63'>
     <skl>&lt;em&gt;</skl>
   </entry>
   <entry start-offset='73' end-offset='73'>
     <skl>&lt;/em&gt;</skl>
   </entry>
 </layer>
 <layer id='segments'>
   <entry start-offset='0' end-offset='73'/>
   <entry start-offset='75' end-offset='130'/>
 </layer>
 <layer id='terms'>
   <entry start-offset='2' end-offset='12'/> //cockatoo
   <entry start-offset='30' end-offset='42'/> // bird species
 </layer>
</layers>

Flat with markup elements

<content>
  <m id='a-start'/>A <m id='1'/><m id='t1-start'/>cockatoo<m id='t1-end'/><m id='2'/> 
  is any of the 21 <m id='3'/><m id='t2-start'/>bird species<m id='t2-end'/><m id='4'/>
  belonging to the family <m id='5'/>Cacatuidae<m id='6'/><m id='a-end'/>. 
  <m id='b-start'/>Eleven of the 21 species exist in the wild only in Australia
  <m id='b-end'/>.
</content>
<layers>
 <layer id='codes'>
   <entry start='1' end='1'>
     <skl>&lt;strong&gt;&lt;em&gt;</skl>
   </entry>
   <entry start='2' end='2'>
     <skl>&lt;/em&gt;&lt;/strong&gt;</skl>
   </entry>
   <entry start='3' end='3'>
     <skl>&lt;a href="wp:BirdSpecies"&gt;</skl>
   </entry>
   <entry start='4' end='4'>
     <skl>&lt;/a&gt;</skl>
   </entry>
   <entry start='5' end='5'>
     <skl>&lt;em&gt;</skl>
   </entry>
   <entry start='6' end='6'>
     <skl>&lt;/em&gt;</skl>
   </entry>
 </layer>
 <layer id='segments'>
   <entry start='a-start' end='a-end'/>
   <entry start='b-start' end='b-end'/>
 </layer>
 <layer id='terms'>
   <entry start='t1-start' end-offset='t1-end'/> //cockatoo
   <entry start='t2-start' end-offset='t2-end'/> // bird species
 </layer>
</layers>

Tree-based approach

<content>
  <segment>
    <text-run>A </text-run>
    <paired-markup type='html:strong'>
      <paired-markup type='html:em'>
        <term-entry>
          <text-run>cockatoo</text-run>
        </term-entry>
      </paired-markup>
    </paired-markup>
    <text-run> is any of the 21 </text-run>
    <paired-markup type='html:a'>
      <term-entry>
        <text-run>bird species</text-run> 
      </term-entry>
    </paired-markup>
    <text-run> belonging to the family </text-run>
    <paired-markup type='html:em'>
      <text-run>Cacatuidae</text-run>
    </paired-markup>
    <text-run>.</text-run>
  </segment>
  <text-run> </text-run>
  <segment>
    <text-run>Eleven of the 21 species exist in the wild only in Australia.</text-run>
  </segment>
</content>

Updated