Splunking With Python
Benjamin W. Smith
Husband of one
Dad of three
- Employed - YAY!
- SysAdmin @ AG Interactive
- Social Expressions
- Large installation (>1k machines)
- LAMPish stack (that's sooo 2002)
- PyOhio sponsor ;)
AG Interactive (cont..)
- Python everywhere
- Primary web app server
- Configuration management
- Application deployment
- Content management
- Touches most mission critical tiers
- My weekly reports
I went into preparing this presentation knowing nothing of the Splunk API, or Pythons role in it.
Now that I know some of it, I see lots things I don't like :(
So instead of lots of detail on Splunk, you get the basics.
You should know that I've begun rewriting the Python implementation of the Splunk API.
I should have done that in the first place.
Don't jump into something blind before going to speak about it in front of a bunch of people.
Taking the sh out of IT
Log aggregation and indexing.
- Hella reporting capabilities
- Hello, PCI/SOX!
Monitoring and Alerting
Extensible and flexible.
Why should you care about Splunk?
Well, here is what grabbed me.
Heavy Python usage in the Splunk code base screamed awesome early on.
Great community backing it.
Neat, practical application for deployment.
Total potential for growth of the Python (and other language) extensibility.
Should you be interested in yet another API?
Totally, here's why.
- You like Python.
- Data, lots of data! Managers like data.
- Reporting, lots of reporting! Managers like reporting.
- Automation of data reporting! Sysadmins like automation.
- Alerting! Everyone likes stability and visibility.
- You really don't even need to touch the UI! (sysadmins <3 the cli)
All this and more
A little less talk, a lot more action
I think I've gone over the basics enough, so let look at some examples.
Real time log analysis
Tail logs from a whole server farm!
Show example of real time search
Parse search results and format for Nagios
Show example of modifying search results
Using petit with all this data!
As we saw yesterday, petit is pretty cool.
Let's take some of the live data from splunk and run it through petit!
You have lots of data just sitting there, wasting disk.
Splunk is awesome at playing with data.
Python is awesome at playing with Splunk.
Look, everyone gets along, party!
Conclusion Part Duex
I wasn't done yet
I still have like 20 minutes.
So I'm going to tell you a story.
What else do I do with Python?
Well, pretty much everything.
Seriously, we manage ~80% of Linux OS install and configuration with Python.
That's nearly 1,000 servers..
How, you ask?
A home grown cfengine like framework that we use internally.
Somewhat F/OSS, may be more to come.
Show you the goods? OK!
Well, I'll show you some of it.
Here is a basic "service" describing something a machine should do.
This service creates a set of runtime scripts (Perl, eek!) to execute the actions.
These scripts get deployed to the machine and are then executed.
Showing you an example of this code
Want to see more?
Here is a template that uses the service.
The template is obviously not limited to one service, there are _many_.
Showing you an example template and all that.
Cool, but how do those changes get to the box?
At a high level, this is how it goes:
CF, as we call it, builds a list of hosts based on a "group" or "role" designation.
This is tightly coupled to the definitions in the "template" we built.
Each machine is bootstrapped as a part of our kickstart image.
The bootstrapping process setups a set of callback crons.
The crons run every 'x' minutes and:
Looks for a new "release" number on the master via HTTP.
- If the relnum is higher, do some logic to:
- Make sure the machine needs the updates
- Pulls down updates via HTTP
- Apply incremental updates
The grand idea is to release a stable version as part of our F/OSS initiatives at AGI.
Would love to do several things off the top including:
- Python Runtime Engine
- Refactor with performance and abstraction bias.
- Make markup agnostic
- Strip out internals specific to AGI
Fabulous application deployment with Fabric
What is Fabric? Well, as the README says:
Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems.
Basically allows you a Pythonic way to interact with the command line tools used to deploy your application.
Doing system stuff pythonically Fabric
I need a copy of my dotfiles on this machine, lets get them from my remote one
Showing you an example fab file.
Wow, pretty simple, but why?
You may not have 1 gagillion servers to manage like I do.
Automation is full of win. Why would you want to repeat yourself?
Also, I was searching for filler.