Overview

Overview

JTCollector - a Java implementation of tcollector; the utility provided and used for collecting metrics for OpenTSDB.

JTCollector is highly based on the design of tcollector (http://opentsdb.net/tcollector.html) and was created in an effort to better support cross-platform distributions. Additionally, JTCollector allows you to bypass sending data directly to a TSD by adding support for:

  1. retrieving a subject schema from an Avro schema repository,
  2. serializing the metrics data via Avro encoding the collected data in the respective schema, and
  3. producing the metrics to a Kafka topic

Key Features

   * Runs all of your data collectors and gathers their data
   * Does all of the connection management work of sending data to the TSD (*pending*)
   * You don't have to embed all of this code in every collector you write
   * Does de-duplication of repeated metric values
   * Handles all of the wire protocol work for you, as well as future enhancements
   * Avro encodes the metrics
   * Manages Avro encoding schema versioning
   * Produces Avro encoded metrics to a Kafka topic
   * Provides a Windows platform system metrics collector

Configuration

Java properties file(s)

JTCollector uses a Java properties file to manage configurable items in the system such as:

  • TSD connectivity information (pending implementation)
  • Avro connectivity and schema information
  • Kafka connectivity and producing information
  • JTCollector specific settings such as threading configurations, directory locations, timing data, etc.

Additionally, a properties file is required for configuring logging via Log4j. There are example properties files for both windows and Linux. A log4j properties file has also been included for reference.

There are a few avli_jtcollector.properties worth noting here. The first is the property number.collector.threads. This property specifies the number of worker threads spawned for collecting the metrics from a respective collector. JTCollector CollectorWorkers will run a single collector for the duration of it's lifespan. this property should be greater than or equal to the number of collector you expect to run on a particular platform. If the number of threads is set to less than the number of collectors, waiting collectors will not run.

Specifying this value greater than the number of expected collectors will give you the opportunity to dynamically add collectors. The CollectorLoader thread periodically (according to the loader.wait.time property, default: 5 minutes) attempts to rerun failed collectors. The collector also discovers any new collectors present and will attempt to execute them up to the number of collector threads remaining.

number.sender.threads is the property used to specify how many threads to use in encoding and sending metrics to Kafka. As always, these threading settings should be contrasted and set based on the consideration of your available system resources.


Running the jar

To run the base collectors, prior to running JTCollector be sure that your PYTHONPATH environment variable includes the root path to the JTCollector distribution.

For example:
export PTYHONPATH=$PYTHONPATH:/usr/local/opt/jtcollector

Examples of running the jar:

# Windows
java "-Dlog4j.configuration=file:C:\jtcollector\avli_jtcollector.log4j.properties" -cp "C:\jtcollector\out\artifacts\avli_jtcollector_jar\avli_jtcollector.jar" com.tts.jtcollector.JTCollector C:\jtcollector\avli_jtcollector.win.properties

# Linux
java -Dlog4j.configuration=file:avli_jtcollector.log4j.properties -cp avli_jtcollector.jar com.tts.jtcollector.JTCollector avli_jtcollector.lin.properties

Install

The binaries and a script have been included in the project base to install jtcollector as a service on Windows.

# Windows
As an Administrator with permissions to administer services, run install.cmd

# Linux (Install script pending)

Collectors

JTCollector uses the same convention as tcollector for organizing collectors.

Similar to tcollector, collectors can be written in any language as long as they are executable and output the data to stdout. Compatibility has also been added to support executing windows batch (.bat) scripts and powershell (.ps1) scripts. The assumption is that Powershell is in your path.

Collectors in the 0 directory should be long-running collectors. Any collectors under directories that are named as a number will be run on the interval [in seconds] specified by the directory name (e.g. - Contents under the directory collectors/15/ will be run every 15 seconds, etc.). The base JTCollector distribution is shipped with the base collectors distribution of tcollector. Additionally, there is a powershell collector for Windows located in 15/windows_system_metrics.ps1.

Exit codes:

JTCollector uses the same convention as tcollector for decommissioning collectors. An exit code of 13 should be returned for the collector to request decommissioning. Otherwise, JTCollector will attempt to load the collector 3 times waiting 30 seconds between subsequent failed collector launch attempts.

Known issues:

Collectors written in Windows Powershell should include the following code if metrics are longer than 80 characters:

# Update output buffer size to prevent clipping of metrics in java input stream.
if( $Host -and $Host.UI -and $Host.UI.RawUI ) {
    $rawUI = $Host.UI.RawUI
    $oldSize = $rawUI.BufferSize
    $typeName = $oldSize.GetType( ).FullName
    $newSize = New-Object $typeName (500, $oldSize.Height)
    $rawUI.BufferSize = $newSize
}

Otherwise, the metrics will be clipped, and the output for a single metric will be collected by the CollectorWorkers on multiple lines (CollectorWorkers use the system property line.separator to parse the output). This may be an issue with batch files as well.

Special thanks

These references were immensely helpful:

http://opentsdb.net/tcollector.html

http://kafka.apache.org/documentation.html

http://avro.apache.org/docs/current/

http://stackoverflow.com/questions/978777/powershell-output-column-width (Emperor XLII)

http://blogs.technet.com/b/heyscriptingguy/archive/2011/02/01/use-powershell-to-check-your-server-s-performance.aspx