mrt-ingest /

Filename Size Date modified Message
batch-war
ingest-conf
ingest-init
ingest-src
1.5 KB
3.2 KB
744 B
================
Ingest Service
================

Requirements
------------
JDK 1.6(+)
Java servlet container that accepts .war (e.g. tomcat, resin, jetta, ...)
Maven 2
Zookeeper 3.3.1 running and referenced in "queue.txt" (see Running under Tomcat)
zookeeper-recipes-3.3.1.jar Queueing library located at repository 'cdl-zk-queue'


Packaging
---------
Start by modifying property files at:
    ingest-conf/properties/{stage,development,local}

    Defining the location of "ingest home" is critical, for example
         ingestServicePath=/dpr/ingest_home
         fileLogger.path=/dpr/ingest_home/logs

Create the distribution package: 
    mvn -Denvironment={stage,development,local} package


Architecture overview
---------------------
Distribution package contains a war file which defines two servlets, poster and ingest.  

poster handles "batch" submissions, that is multiple objects per request.  Batches are defined in manifest (see Testing section)
Manifests are parsed and then queued in queueing service (Zookeeper).  A daemon process then polls queue and submits a "jobs" to the ingest service, which is housed in the ingest servlet.

If there is no need to process batches, then ingest service can be called directly.  Both ingest and poster support a REST and command line interface.


Running under Tomcat
--------------------
Unpack distribution and move ingest_home/ template directory to location defined in properties (see above).
Modify template directory to customize user environment.

Ingest home structure:
    ingest-info.txt	(ingest configuration)
    logs/		(logging)
    profiles/		(user profiles)
    queue/		(ingest service working directory)
    queue.txt		(defines Zookeeper queue configuration)
    stores.txt		(defines Storage service)

Deploy war file in Tomcat noting that the hostname and port of servlet container must be defined at:
    ingest_home/ingest-info.txt (access-uri)

To allow ingest service to access data located in ingest home define a Context element in server.xml, for example
    <Context path="/ingestqueue" allowLinking="true" docBase="webapps/ingestqueue"/>

    Then create a symbolic link in the webapps directory, linking to ingest home queue directory.
	ln -s /dpr/ingest_home/queue ingestqueue



Running under Winstone
----------------------
While this is a simpler deployment approach, minimal testing has been done with the Winstone servlet container (http://winstone.sourceforge.net)

java -jar winstone-0.9.10.jar --warfile=mrt-ingestwar-1.0-SNAPSHOT.war --httpPort=8090


Test
----
See online help pages for manifest information and sample data: http://merritt.cdlib.org/help

Example manifest submission using queue (synchronous):
    curl --silent  \
        -F "file=@example.checkm" \
        -F "type=container-batch-manifest" \
        -F "submitter=sample-user" \
        -F "responseForm=xml" \
        -F "profile=sample-profile" \
        http://example.org:8080/poster/submit
Example manifest submission directly to Ingest (synchronous):
    curl --silent  \
        -F "file=@example.checkm" \
        -F "type=container-batch-manifest" \
        -F "submitter=sample-user" \
        -F "responseForm=xml" \
        -F "profile=sample-profile" \
        http://example.org:8080/ingest/submit-object

Logs
----
Check Tomcat logs.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.