Commits

Anonymous committed 9a34071 Merge

Merging

  • Participants
  • Parent commits b6c2831, dc45835
  • Tags riak_search-beta1

Comments (0)

Files changed (5)

+The following people have contributed to Riak Search:
+
+John Muellerleile
+Rusty Klophaus
+Kevin Smith
+Bryan Fink
+
 The following people have contributed to Riak:
 
 Andy Gross

File doc/basho-search-doc-style.iorg

+#+STYLE: <style>
+#+STYLE: h1.title {
+#+STYLE:     font-family:'HelveticaNeue-Light','Helvetica Neue Light','Helvetica Neue',Arial,Helvetica,sans-serif;
+#+STYLE:     padding: 20px;
+#+STYLE:     font-size: 3em;
+#+STYLE:     line-height: 3em;
+#+STYLE:     font-weight: 300;
+#+STYLE:     border-bottom: solid 1px black;
+#+STYLE: }
+#+STYLE:  
+#+STYLE: h2 {
+#+STYLE:     font-family:'HelveticaNeue-Light','Helvetica Neue Light','Helvetica Neue',Arial,Helvetica,sans-serif;
+#+STYLE:     font-size: 1.8em;
+#+STYLE:     line-height: 1.8em;
+#+STYLE:     font-weight:300;
+#+STYLE:     padding: 20px 0px 0px 0px;
+#+STYLE: }
+#+STYLE:  
+#+STYLE: h3 {
+#+STYLE:     font-family:'HelveticaNeue-Light','Helvetica Neue Light','Helvetica Neue',Arial,Helvetica,sans-serif;
+#+STYLE:     font-size:1.4em;
+#+STYLE:     line-height: 1.4em;
+#+STYLE:     font-weight:300;
+#+STYLE:     color: #306990
+#+STYLE: }
+#+STYLE:  
+#+STYLE: body {
+#+STYLE:     font-family: "Helvetica", San-Serif;
+#+STYLE:     font-size: 90%;
+#+STYLE:     line-height: 160%;
+#+STYLE:     margin: 50px;
+#+STYLE: }
+#+STYLE:  
+#+STYLE: li { padding: 3px 0px 3px 0px; }
+#+STYLE:  
+#+STYLE: a { text-decoration: none; }
+#+STYLE: a:hover { text-decoration: underline; }
+#+STYLE: </style>

File doc/using_search.org

-* riak_search
+#+SETUPFILE: "basho-search-doc-style.iorg"
+#+TITLE: Riak Search Getting Started Guide - DRAFT
+
+* Getting Started
+
+** Requirements
+
+   Raptor requires:
+
+   - Java 1.6.x
+   - Ant
+   - gcc toolchain
+   - BDB 4.8.30
+
+   Riak Search requires:
+   
+   - Erlang R13B04 (or later)
+   
+** Installation
+   
+   1. *Download and build BDB*
+   
+      Download v. 4.8.30: [[http://download.oracle.com/berkeley-db/db-4.8.30.NC.tar.gz]]
+   
+      *IMPORTANT:* If you are installing on a Mac, follow the instructions "How do I build Raptor for Mac OSX below?"
+
+      : cd db-4.8.30.NC
+      : cd build_unix
+      : ../dist/configure --enable-java
+      : make
+      : sudo make install
+
+   2. *Unzip and install Raptor* 
+      
+      Enter the raptor directory and run:
+    
+      : ant clean
+      : ant
+
+   3. *Unzip and install Riak Search*
+
+      Enter the riak\_search directory and run:
+
+      : make
+      : make rel
+
+   4. *Start Raptor*
+
+      Enter the raptor directory and run:
+
+      : ./start.sh
+
+   5. *Start Riak*
+
+      Enter the riak\_search directory, and run:
+
+      : cd rel/riak
+      : bin/riak console
+
+      You can later use =bin/riak start= to start riak\_search in the background.
+       
+** Test Data
+
+   Add data to the system using Curl:
+
+   : curl -X POST -H text/xml --data-binary @tests/books.xml http://localhost:8098/solr/books/update
+
+   Run a simple select query:
+
+   : curl "http://localhost:8098/solr/books/select?start=0&rows=10000&q=prog*"
+
+* Riak Search (riak\_search)
+   
 ** Configuring Search Buckets
+
    Riak Search indices are exposed as Riak buckets. To configure additional search
-   indices, edit the search_bucket entry in the riak_search section of app.config:
+   indices, edit the search\_bucket entry in the riak\_search section of app.config:
 
-       {riak_search, [{search_buckets, ["search"]}]}
+   : {riak_search, [{search_buckets, ["search"]}]}
 
    Additional indices should be added to the list of bucket names. All nodes should
    have the same set of search buckets.
 
 ** Configuring Textual Analyzers
+
    Riak Search uses Lucene's excellent text analysis capabilities for indexing and
    querying. The communication between Riak Search and the Lucene text analysis
    code is performed over a TCP socket on localhost. The port number for this
-   connection is controlled by the analysis_port entry in the qilr section of
+   connection is controlled by the analysis\_port entry in the qilr section of
    app.config:
 
-       {qilr, [{analysis_port, 6095}]}
+   : {qilr, [{analysis_port, 6095}]}
 
    The analysis interface actually uses two ports for communication. The configured
    port, in this case 6095, is used for textual analysis processing. An additional
-   port, one above analysis_port, is used to insure Java and Erlang processes are
-   started and stopped in lockstep. In short, make sure you have analysis_port and
-   analysis_port + 1 available when you pick a value for analysis_port.
+   port, one above analysis\_port, is used to insure Java and Erlang processes are
+   started and stopped in lockstep. In short, make sure you have analysis\_port and
+   analysis\_port + 1 available when you pick a value for analysis\_port.
 
-* riak_solr
-** Configuring riak_solr
-   riak_solr's configuration consists of two parts: schema definition and the
+** Creating a Riak Search Cluster
+
+   Riak Search has all of the same operational properties as
+   Riak. Refer to the Riak wiki
+   [[https://wiki.basho.com/display/RIAK/Home]] for more information on
+   running Riak in a clustered environment.
+
+* Riak Search: Solr Interface (riak\_solr)
+
+** Configuring riak\_solr
+
+   riak\_solr's configuration consists of two parts: schema definition and the
    matching search bucket.
 
 *** Schema Definitions
-    riak_solr's schema definition language is similar in spirit to Solr's but
+
+    riak\_solr's schema definition language is similar in spirit to Solr's but
     differs substantially in syntax. Schema defintions are written in Erlang using
-    lists, strings, and tuples. Several sample schemas are included in riak_solr/priv
+    lists, strings, and tuples. Several sample schemas are included in riak\_solr/priv
     for use as reference.
 
 **** Schema headers
+
      Schema defintions consist of two sections: header and fields. The schema header
      section contains information about the schema name, Solr API version, default
      boolean operator for queries, and the default query field. An example header
-     looks like this: [{name, "books"},
-                       {version, "1.1"},
-                       {default_field, "title"},
-                       {default_op, "and"}]
+     looks like this: 
+
+     : [{name, "books"},
+     :  {version, "1.1"},
+     :  {default_field, "title"},
+     :  {default_op, "and"}]
 
      All fields must be present and all values must be strings. Also, the only supported
      Solr API version is 1.1 so the version field must be "1.1".
 
 **** Schema fields
+
      Schema field definitions describe the field's name, data type, and whether or not
      a field is required. A field definition for a rating field of type integer might look
-     like this: {field, [{name, "rating"},
-                         {type, integer},
-                         {required, true}]}
+     like this: 
+
+     : {field, [{name, "rating"},
+     :          {type, integer},
+     :          {required, true}]}
 
      Field names must be strings. Supported field types are integer, boolean, and string.
      Field types must appear without quotes in the field definition. The required attribute
      NOTE: Each schema definition MUST contain a field named "id".
 
 *** Example schema
+
     A complete schema for a Solr index storing book information could look like
-    this: {schema, [{name, "books"},
-                    {version, "1.1"},
-  	            {default_field, "title"},
-                    {default_op, "and"}],
-                    {fields,
-                        [{field, [{name, "id"},
-                                  {type, string},
-                                  {required, true}]},
-                         {field, [{name, "title"},
-                                  {type, string},
-                                  {required, true}]},
-                         {field, [{name, "author_last_name"},
-                                  {type, string},
-                                  {required, true}]},
-                         {field, [{name, "author_first_name"},
-                                  {type, string},
-                                  {required, true}]},
-                         {field, [{name, "rating"},
-                                  {type, integer},
-                                  {required, true}]},
-                         {field, [{name, "summary"},
-                                  {type, string},
-                                  {required, false}]}]}}.
+    this: 
+
+    : {schema, [{name, "books"},
+    :           {version, "1.1"},
+    :           {default_field, "title"},
+    :           {default_op, "and"}],
+    :           {fields,
+    :               [{field, [{name, "id"},
+    :                         {type, string},
+    :                         {required, true}]},
+    :                {field, [{name, "title"},
+    :                         {type, string},
+    :                         {required, true}]},
+    :                {field, [{name, "author_last_name"},
+    :                         {type, string},
+    :                         {required, true}]},
+    :                {field, [{name, "author_first_name"},
+    :                         {type, string},
+    :                         {required, true}]},
+    :                {field, [{name, "rating"},
+    :                         {type, integer},
+    :                         {required, true}]},
+    :                {field, [{name, "summary"},
+    :                         {type, string},
+    :                         {required, false}]}]}}.
+
 *** Schemas & Buckets
+
     Each schema should be backed by a corresponding search bucket. In other words,
     each Solr schema should correspond to a search bucket with the same name as
     the schema.
 
-** Using riak_solr
-   Clients will interact with riak_solr over HTTP. The next several sections will
-   describe how to connect to riak_solr, write to a configured schema, and query
+** Using riak\_solr
+
+   Clients will interact with riak\_solr over HTTP. The next several sections will
+   describe how to connect to riak\_solr, write to a configured schema, and query
    a configured schema.
 
 *** Connecting
-    riak_solr shares a webmachine instance with the Riak Key/Value HTTP API. Clients
-    should use the same port number to connect to both APIs. riak_solr uses the base
+
+    riak\_solr shares a webmachine instance with the Riak Key/Value HTTP API. Clients
+    should use the same port number to connect to both APIs. riak\_solr uses the base
     URL '/solr' for all requests.
 
 *** Updating an Index
-    Writing to a schema is handled via POST requests to riak_solr. The URL for a
-    given schema follows this pattern: /solr/<index_name>/update where <index_name>
+
+    Writing to a schema is handled via POST requests to riak\_solr. The URL for a
+    given schema follows this pattern: =/solr/<index_name>/update= where =<index_name>=
     is the name of the schema you wish to update.
 
     The request's Content-Type header must be 'text/xml'. The body of the request must be
     valid XML which follows Solr's add syntax.
 
 *** Deleting Documents
+
     Document deletion is not supported at this time.
 
 *** Querying
-    Clients can submit index queries to riak_solr via GET requests. riak_solr
+
+    Clients can submit index queries to riak\_solr via GET requests. riak\_solr
     understands two URL formats for queries. The first allows the requestor to
     specify the index name in the URL similar to the update URL. The format
-    looks like this: /solr/<index_name>/select.
+    looks like this: =/solr/<index_name>/select=.
 
     Requestors can also specify the index name via the query string parameter
-    'index'. URLs using this format will look like this: /solr/select?index=<index_name>.
+    'index'. URLs using this format will look like this: =/solr/select?index=<index_name>=.
 
     In all other respects querying riak_solr behaves like Solr with the following list
     of restrictions.
+
     - All query output will be served in JSON. This is analogous to specifying
-      wt=json on a regular Solr query.
-    - riak_solr understands the following Solr query parameters only: q, q_op,
-      start, and rows.
+      =wt=json= on a regular Solr query.
+
+    - riak\_solr understands the following Solr query parameters only: =q=, =q_op=,
+      =start=, and =rows=.
+
+* The Raptor Backend   
+
+** How do I build Raptor for Mac OSX?
+
+   To run Raptor on Mac OS X, you must first build a 64-bit version of BDB.  
+   Run the following commands before building BDB:
+
+   : export CC=gcc
+   : export CFLAGS="-arch x86_64 -m64"
+   : export CXXFLAGS="-arch x86_64 -m64"
+   : export LDFLAGS="-arch x86_64 -m64"
+       
+   If you have already installed it as 32-bit, type "make realclean" in the 
+   build\_unix directory and start the configuration and installation process 
+   again.
+
+   Additionally, the JNI portion of BDB expects to find Java headers
+   in a certain place, so run the following command:
+
+   : sudo cp /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Headers/*.h /usr/include/.
+
+
+** How do I change Raptor Logging?
+
+   1. You can edit log4j.properties to change the logging characteristics
+
+   2. From the riak search console you can significantly decrease logging output 
+      (though not turn it off entirely) by using the command:
+       
+      : raptor_index_backend:poke("toggle_debug").
+
+** How do I delete all data from Raptor?
+   
+   If you want to delete the data stored in Raptor, delete the
+   directories "raptor-db" and "raptor-catalog".
+
+** Notes
+
+   1. MMap and Cache memory use for both BDB and Lucene are currently hard 
+      coded while a good configuration system is being worked on.  In the 
+      meantime it is recommended you run Raptor on server with a MINIMUM 
+      of 2G ram.
+
+   2. You may use Raptor on 32-bit systems; change the command line option in 
+      start.sh from "-d64" to "-d32"
+
+   3. 32-bit operation is not available on Mac OS X systems (Java 1.6 has no 
+      32 bit mode on OS X)
+
+   4. If you wish to keep the BDB native library in somewhere other than it's 
+      installed location, change the following command line option in start.sh 
+      to reflect the new location:
+
+      : -Djava.library.path=/usr/local/lib:java.library.path=/usr/local/lib:/usr/local/BerkeleyDB.4.8/lib
+
+   5. By default, start.sh reserves 4G of memory for execution.  You may change 
+      this by editing this command line option in start.sh:
+
+      : -Xmx4096m
+
+   6. Raptor has only been tested on Mac OS X and Linux.
+        

File tests/books.xml

+<add>
+  <doc>
+    <field name="id">923065</field>
+    <field name="title">Practical C Programming</field>
+    <field name="author_last_name">Oualline</field>
+    <field name="author_first_name">Steve</field>
+    <field name="rating">8</field>
+  </doc>
+  <doc>
+    <field name="id">87401</field>
+    <field name="title">File Structures: An Object-Oriented Approach with C++</field>
+    <field name="author_last_name">Folk</field>
+    <field name="author_first_name">Michael</field>
+    <field name="rating">6</field>
+  </doc>
+  <doc>
+    <field name="id">521974</field>
+    <field name="title">Hadoop: The Definitive Guide</field>
+    <field name="author_last_name">White</field>
+    <field name="author_first_name">Tom</field>
+    <field name="rating">8</field>
+    <field name="summary">Good guide on how to get the most out of Hadoop</field>
+  </doc>
+  <doc>
+    <field name="id">605212</field>
+    <field name="title">File Organization and Processing</field>
+    <field name="author_last_name">Tharp</field>
+    <field name="author_first_name">Alan</field>
+    <field name="rating">5</field>
+  </doc>
+  <doc>
+    <field name="id">529321</field>
+    <field name="title">Programming Collective Intelligence</field>
+    <field name="author_last_name">Segaran</field>
+    <field name="author_first_name">Toby</field>
+    <field name="rating">8</field>
+    <field name="summary">Introduction to "social" algorithms. Provides example implementations in Python.</field>
+  </doc>
+  <doc>
+    <field name="id">5414803</field>
+    <field name="title">The Scheme Programming Language</field>
+    <field name="author_last_name">Dybvig</field>
+    <field name="author_first_name">Kent</field>
+    <field name="rating">8</field>
+    <field name="summary">Dense but thorough coverage of Scheme.</field>
+  </doc>
+</add>

File tests/query-tests/books.xml

-<add>
-  <doc>
-    <field name="id">923065</field>
-    <field name="title">Practical C Programming</field>
-    <field name="author_last_name">Oualline</field>
-    <field name="author_first_name">Steve</field>
-    <field name="rating">8</field>
-  </doc>
-  <doc>
-    <field name="id">87401</field>
-    <field name="title">File Structures: An Object-Oriented Approach with C++</field>
-    <field name="author_last_name">Folk</field>
-    <field name="author_first_name">Michael</field>
-    <field name="rating">6</field>
-  </doc>
-  <doc>
-    <field name="id">521974</field>
-    <field name="title">Hadoop: The Definitive Guide</field>
-    <field name="author_last_name">White</field>
-    <field name="author_first_name">Tom</field>
-    <field name="rating">8</field>
-    <field name="summary">Good guide on how to get the most out of Hadoop</field>
-  </doc>
-  <doc>
-    <field name="id">605212</field>
-    <field name="title">File Organization and Processing</field>
-    <field name="author_last_name">Tharp</field>
-    <field name="author_first_name">Alan</field>
-    <field name="rating">5</field>
-  </doc>
-  <doc>
-    <field name="id">529321</field>
-    <field name="title">Programming Collective Intelligence</field>
-    <field name="author_last_name">Segaran</field>
-    <field name="author_first_name">Toby</field>
-    <field name="rating">8</field>
-    <field name="summary">Introduction to "social" algorithms. Provides example implementations in Python.</field>
-  </doc>
-  <doc>
-    <field name="id">5414803</field>
-    <field name="title">The Scheme Programming Language</field>
-    <field name="author_last_name">Dybvig</field>
-    <field name="author_first_name">Kent</field>
-    <field name="rating">8</field>
-    <field name="summary">Dense but thorough coverage of Scheme.</field>
-  </doc>
-</add>