1. Ruprecht von Waldenfels
  2. ParaVoz


ParaVoz 1.0 - a simple web interface for querying parallel corpora 

The package provides a simple, yet effective interface for a parallel corpus
using OpenCWB (http://cwb.sourceforge.net). It should work on any linux machine
with only minimal changes in the INI files to reflect paths, and possibly
adjustments concerning language codes. All settings are found in the settings

See the movie on the ParaSol website (http://parasolcorpus.org) for ParaVoz in
motion: http://parasolcorpus.org/ParaSol_demo.mp4

Please also see ParaVoz 2.0, which has more features and is probably preferable 
for corpora with less than 5 languages. 

1) Use CWB to encode your parallel corpus so that text names are lowercase only
and composed of a common identifier followed by underscore and a two to three
letter language code, e.g. orwell1984_en,orwell1984_ru, orwell1984_de for the
English, Russian and German version of Orwell's 1984.

Annotation with a "lemma" and a "tag" attribute is supported out of the box, for
others, see below.

2) Then, unpack ParaVoz to some place where Apache can see it (e.g.,
/var/www/htdocs/ParaVoz) and
a) edit settings/init.php to set the correct paths to CWB, and the corpus and
registry directory and optionally configure other positional attributes than tag
and lemma
b) if necessary, edit Languages.ini to resolve language codes and put them into 
groups on the interface

3) Enjoy your parallel corpus with ParaVoz (http://www.youtube.com/watch?v=7Z9xHzufmOk). 


Two types of corpus result presentation are provided: one using XML and XSLT,
the other consisting of a CSS-based wrap around CWB output in HTML.

Description of files:
1. folders:
folder css: holds css style sheets for the CBW HTML 
folder js: java script functions for query page
folder settings: all settings, mainly CWB and Corpus paths and language / text

2. query page 
index.php: main entrance into the corpus / query page
query_form.php: query page html (called by index.php)
query_form_objects.php: query page functions (called by index.php)
query_table_of_texts.php: table with texts on query page (called by index.php)
results.php: main results page, forks for XML-based and CWB HTML-based concordance

2.1 XML-based concordance files 
results_xml.php: concordance of results
results_context_xml.php: concordance of results
parallel-csv.xsl: XSLT sheet for csv result export
parallel-export.xsl: XSLT sheet for XML export
parallel-kwic.xsl: XSLT sheet for concordance

2.2 CWB-HTML-CSS-based concordance files 
results_conc.php: concordance of results
results_context_conc.php: wider context 
results_export_conc.php: export function

2.3. Important files under settings/ (change to reflect corpus path etc.)
init.php: CWB and corpus paths, context specs for CWB query 
Languages.ini: list of language codes and their full names in the corpus (optional)


This web interface to CWB was initially written by Roland Meyer for use with the
ParaSol corpus (then Regensburg Parallel Corpus) in 2006 and has since been in
development by three authors. The java script based functionality was mainly
added by Andreas Zeman, XSLT-support in the new modular interface mainly by
Ruprecht von Waldenfels, who has supervised the publication as open source. Part
of the architecture is described in Waldenfels (2011) (see http://parasol.unibe.ch). 
We thank the Center for the Study of Language and Society, University of Berne, 
(http://www.csls.unibe.ch) for granting financial support enabling the publication 
of ParaVoz as open source. Support by the Swiss National Science Foundation is 
gratefully acknowledged.

If you use the interface, please cite it as
Roland Meyer, Ruprecht von Waldenfels, Andreas Zeman (2006-2014): ParaVoz - 
a simple web interface for querying parallel corpora. Bern, Regensburg, Berlin.


Copyright (C) 2006-2014 Roland Meyer, Ruprecht von Waldenfels, Andreas Zeman

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA