2. Untitled project
  3. riff


Clone wiki

riff / Home

What is Riff?

Over the last decade, the majority of the designs, analyses and evaluations of early detection (or biosurveillance) systems have been geared towards specific data sources and detection algorithms. Much less effort has been focused on how these systems will "interact" with humans. For example, consider multiple domain experts working at different levels across different organizations in an environment where numerous biosurveillance algorithms may provide contradictory interpretations of ongoing events. Riff enables detection, prediction and response to health-related events (such as disease outbreaks or pandemics) through a collaborative environment that combines data exploration, integration, search and inferencing – providing more complex analysis and deeper insight.

Although development of Riff has initially been focused on health-related detection scenarios, the underlying system is a general collaboration environment for content creation, social metadata annotation, and automated analysis with potential applicability in a wide range of areas. Several organizations are exploring the use of Riff in areas as wide ranging as humanitarian crisis reporting and conflict early warning. One organization, for example, has recently begun training Riff's integrated SVM machine learning engine to identify hate speech and other potential indicators of geopolitical deterioration in news reports.


  • Create collaborative workspaces, invite colleagues, subscribe to data sources you choose to monitor
  • Interact securely with your team to sift through the data stream for emerging events
  • Annotate items with tags, comments, ratings, links, locations, files, alerts, and other social metadata
  • Autonomous agents perform data fusion, feature extraction, classification, tagging, geo-coding
  • Integrated hypothesis formation, visualization, machine learning

How does Riff work?

Riff consists of several high-level modules, including:

  1. Data aggregation and gathering
  2. Automatic feature extraction, data classification and tagging
  3. Human input, hypotheses generation and review
  4. Predictions and alerts output
  5. Field confirmation and feedback.

The data aggregation and gathering module allows users to collect (or extract, transform and load (ETL) information from several sources (SMS messages (e.g., GeoChat), RSS feeds, email list (e.g., ProMED, Veratect, HealthMap, Biocaster, EpiSpider), OpenROSA, Map Sync, Epi Info™, documents, web pages, electronic medical records (e.g., OpenMRS), animal disease data (e.g., OIE, AVRI hotline), environmental feed, NASA remote sensing, etc.).

The automatic feature extraction, data classification and tagging module is an architecturally extensible module that allows the introduction of machine learning algorithms (e.g., Bayesian, SVM). These components extract and augment the features (tags or metadata) from multiple data streams; such as: source and target geo-location, time, route of transmission (e.g., person-to-person, waterborne), etc. In addition, these components help detect relationships between these extracted features within a collaborative space or across different collaborative spaces. Furthermore, with human input, these components can suggest possible events or event types (e.g., at the earliest stages of a disease outbreak: “there is an unknown respiratory event, transmitted person-to-person, detected in location X, and with a certain spatio-temporal pattern”).

The human input and review module is exposed as a set of functionalities that allows users to comment, tag, and semantically rank the elements (positive, neutral, or negative). Additionally, users can generate and test multiple hypotheses in parallel, further collect and rank sets of related items (evidence), and model against baseline information (for cyclical or known events). The system maintains a list of ongoing possible threats allowing domain experts to focus their field information and either confirm or reject the hypotheses created. That feedback is then fed into the system to update (increase or decrease) the reliability of the sources and credibility of the users in light of their inferences or decisions.


  • Detect emerging critical events sooner and enable your team to take the right action earlier.
  • Allow human experts and autonomous agent-based analytic services to augment one another’s efforts.
  • Pattern detection algorithms learn from past events – and your team’s characterization of them – to improve performance the next time around.
  • Fully extensible open source solution allows you to incorporate your own data sources, services, and embedded modules.


In the Public Health and Biosurveillance domain, Riff helps synthesize health-related event indicators from a wide variety of information sources (structured and unstructured) into a consolidated picture for analysis, maintenance of “community-wide coherence”, and collaboration. Current automatic classification includes seven syndromes, ten transmission modes, more than 100 infectious diseases, 180 microorganisms, 140 symptoms, and more than 50 chemicals. Presently, Riff is being piloted in the Mekong Basin region of Southeast Asia. On Jan 17th 2010, the Thomson Reuters Foundation used Riff [after prior adoption in their EIS system; an Emergency Information Service for survivors of natural disasters] to launch a first-of-its kind, free disaster-information service for the people of Port Au Prince, Haiti. This allowed survivors of Haiti's earthquake to receive critical information by text message directly to their phones, free of charge.