Source

pyohio_talk / presentation / index.rst

Splunking With Python

Benjamin W. Smith

  • Husband of one

  • Dad of three

  • Guitarist

  • Employed - YAY!
    • SysAdmin @ AG Interactive
  • Hack

AG Interactive

  • Social Expressions
  • Large installation (>1k machines)
  • LAMPish stack (that's sooo 2002)
  • PyOhio sponsor ;)

AG Interactive (cont..)

  • Python everywhere
    • Primary web app server
    • Configuration management
    • Application deployment
    • Content management
    • Touches most mission critical tiers
    • My weekly reports

Backstory

I went into preparing this presentation knowing nothing of the Splunk API, or Pythons role in it.

Now that I know some of it, I see lots things I don't like :(

So instead of lots of detail on Splunk, you get the basics.

You should know that I've begun rewriting the Python implementation of the Splunk API.

I should have done that in the first place.

The moral?

Don't jump into something blind before going to speak about it in front of a bunch of people.

Splunk

Taking the sh out of IT

  • Log aggregation and indexing.

  • Hella reporting capabilities
    • Hello, PCI/SOX!
  • Monitoring and Alerting

  • Extensible and flexible.

Why should you care about Splunk?

Well, here is what grabbed me.

Heavy Python usage in the Splunk code base screamed awesome early on.

Great community backing it.

Neat, practical application for deployment.

Total potential for growth of the Python (and other language) extensibility.

Should you be interested in yet another API?

Totally, here's why.

  • You like Python.
  • Data, lots of data! Managers like data.
  • Reporting, lots of reporting! Managers like reporting.
  • Automation of data reporting! Sysadmins like automation.
  • Alerting! Everyone likes stability and visibility.
  • You really don't even need to touch the UI! (sysadmins <3 the cli)

All this and more

A little less talk, a lot more action

I think I've gone over the basics enough, so let look at some examples.

Basic Usage

Real time log analysis

Tail logs from a whole server farm!

Show example of real time search

Alerting

Parse search results and format for Nagios

Show example of modifying search results

Usage (cont)

Using petit with all this data!

As we saw yesterday, petit is pretty cool.

Let's take some of the live data from splunk and run it through petit!

Conclusion

You have lots of data just sitting there, wasting disk.

Splunk is awesome at playing with data.

Python is awesome at playing with Splunk.

Look, everyone gets along, party!

Conclusion Part Duex

I wasn't done yet

I still have like 20 minutes.

So I'm going to tell you a story.

What else do I do with Python?

Well, pretty much everything.

Seriously, we manage ~80% of Linux OS install and configuration with Python.

That's nearly 1,000 servers..

How, you ask?

cfbuild

A home grown cfengine like framework that we use internally.

Somewhat F/OSS, may be more to come.

Show you the goods? OK!

Well, I'll show you some of it.

Here is a basic "service" describing something a machine should do.

This service creates a set of runtime scripts (Perl, eek!) to execute the actions.

These scripts get deployed to the machine and are then executed.

Showing you an example of this code

Want to see more?

Here is a template that uses the service.

The template is obviously not limited to one service, there are _many_.

Showing you an example template and all that.

Cool, but how do those changes get to the box?

At a high level, this is how it goes:

CF, as we call it, builds a list of hosts based on a "group" or "role" designation.

This is tightly coupled to the definitions in the "template" we built.

Each machine is bootstrapped as a part of our kickstart image.

cfbuild (cont)

The bootstrapping process setups a set of callback crons.

The crons run every 'x' minutes and:

  • Looks for a new "release" number on the master via HTTP.

  • If the relnum is higher, do some logic to:
    • Make sure the machine needs the updates
    • Pulls down updates via HTTP
    • Apply incremental updates

cfbuild (cont)

The grand idea is to release a stable version as part of our F/OSS initiatives at AGI.

Would love to do several things off the top including:

  • Python Runtime Engine
  • Refactor with performance and abstraction bias.
  • Make markup agnostic
  • Strip out internals specific to AGI

Fabulous application deployment with Fabric

What is Fabric? Well, as the README says:

Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems.

Basically allows you a Pythonic way to interact with the command line tools used to deploy your application.

Automation, FTMFW

Doing system stuff pythonically Fabric

I need a copy of my dotfiles on this machine, lets get them from my remote one

Showing you an example fab file.

Wow, pretty simple, but why?

You may not have 1 gagillion servers to manage like I do.

Automation is full of win. Why would you want to repeat yourself?

Also, I was searching for filler.

Questions?

The Final Cut

Seriously, I'm done this time..

Benjamin W. Smith

Benjamin Smith

http://just-another.net

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.