Clone wiki

saturnalia / Goals


Aggregate, store, and usefully display metrics.

  • Defining Use Case:
    • Cost should never be a concern when adding a new metric.
  • UI:
    • Exploratory - The user can search, compare, and zoom in on interesting anomalies.
    • Responsive - The whole system (ui/storage/retrieval) database is low latency.
    • Interactive - Users can share specific reports, plots, and settings with a simple permanent URL.
  • Scaling: FIXME: Are these roughly in descending priority order?
    • Many Metrics - one million distinct metrics per node in the collection cluster.
    • Complete History - never delete/forget metrics; store all data for all time. (-in contrast to RRDtool)
    • Horizontally Scalable - if your Saturnalia hosts are falling over, it should be simple to add new ones to properly alleviate load.
    • High Frequency - handle high-frequency metrics, such as once per second.
  • Robustness: FIXME: Are these roughly in descending priority order?
    • Data Integrity - Never record incorrect data.
    • Crash-Only - All components are built to expect their own spontaneous failure. See Crash-Only Design.
    • Defensive Decoupling - A component should continue to operate even when other components have failed.
    • Failover - Even when a component fails, the functioning components should accommodate the resulting changes in their own load for some amount of time.
    • High Availability - Accept new metrics even when sub components are unresponsive. FIXME: Is this a goal? Does this conflict with other robustness goals?