To Splunk or Not To Splunk: Making sense of your logs

01 May '06 - 16:58 by benr

Splunk went 1.0 in December of 2005, since that time Splunk employees have been popping up everywhere they might encounter sysadmins in order to make a case for Splunk and handing out a massive number of tshirts in the process. Heck, even my daughter, Nova, has a Splunk tshirt!

So then, the next natural question is: WTF is Splunk? To put it simply, Splunk sucks up every type of log you care to feed it, indexes them, and then makes them easily searchable via a nifty AJAX-enabled web interface. The most common usage would be to aggrigate a centralize syslog server, but you can feed it all sorts of logs including Apache, Microsoft IIS, JBoss, Windows Event Logs, Sendmail/Postfix/Qmail, OpenLDAP, Active Directory, etc, etc, etc.

The real beauty of Splunk is that its the first real solution I've seen that makes centralized syslog seem like a good and interesting idea. All to often centralizing syslog just means that instead of having logs on 100 servers you've got 100 more junk to dig through on a single system. Syslog-NG can help divide and conquer, but you've still got a lot of stuff to dig through. With Splunk you search your logs as though you were searching with Yahoo! or Google. Enter "full" into the search box and all log messages with the words "full" show up.

The real magic is in Splunk's ability to find and display patterns. This tends to be the biggest problem for me when it comes to Syslog, and logs in general, these logs tend to be so large that unless something "pops out at you", you'll skip right over something important. And its that ability that Splunk is using to assert its value.

May favorite feature is the ability to graph search hits (log entires) over a period of time to really show you when things are happening. The following is log archives from an Oracle Alert log:

Put it all together and you've got something pretty handy. Even on a small network, this makes centralizing syslog look appealing. But how do you use the thing?

Splunk is Open Source, powered by an AJAX-interface... first thought is that just getting it installed is going to bury me in dependancy hell for 3 weeks. Wrong! The free Splunk server can be downloaded and installed in minutes. Solaris/SPARC is available but Solaris/X86 isn't quite ready yet. Supposedly "its coming", but no ETA yet. You download a single executable file, run it, it installs with the standard "yes, yes, yes, yes, accept, yes" and then you just run the start script. Simple. It provides its own dependancies, so you don't need to have anything else prep'ed or installed or anything like that, and its accessable on port 8000, so you don't need to move your existing webserver.

Once Splunk is installed and running, the next question is "How do I load this thing up with data?" There are several ways, all of them are covered in the admin guide, but here are the notable ones:

  • The Tailing Processor: Splunk tails a given file, such as Syslog's /var/adm/messages.
  • The Directory Monitor: Splunk monitors a specific directory for new files to appear, when they do, it sucks up the logs, indexes them, and moves on. Useful for instances when you want to scp logs into a tempoary spool on your server, such as a database or web server logs. What Splunk does when its done indexing the log is up to you, there are several modes, but the one you'd expect is the "sinkhole" where Splunk deletes the input file after indexing.
  • FIFO/Named Pipe: Logs in, Splunk out. A beautiful black-hole into which logs mystically become useful. FIFO is useful in situations where you want a central Syslog server to just pump directly into Splunk without writting everything out to disk and tailed.

All 3 of those methods are free and the common ones that you'll use. If you buy Splunk Pro you can also use an integrated Syslog module that will act as your central syslog server itself, but centralizing Syslog isn't difficult at all, so don't buy Pro just for this feature. (I'll discuss this in the next day or two.)

The free server is pretty handy but lacks certain features. For a details on what you get and what it costs, check out the Splunk Pro pricing page. Splunk pricing is based on the ammount of data you index, and the free version lets you go up to 500MB per day, so most users should get by on the free version without a problem. The one thing that I do like about the Pro version is that you can create multiple indexes, so that in the event that you wanted to keep Apache logs seperated from IIS logs, you could just create 2 indexes. If your too cheap to buy Pro for this reason, I'd suggest Solaris Zones. ;) But honestly, if you can pony up the bucks for it, you might find a lot of value in it, it'll even offer you "Live Splunks", effectively a monitoring solution, although you can do this with Syslog-NG, its still pretty nifty.

If you want a better look at Splunk, there is a kool Flash overview and live demo. If people are interested, I'll blog more about Splunk in the future.

- - C O M M E N T S - -

Oh. My. Goodness. This is wonderful. Thanks for the lead… INSTALLING NOW.

Bill Bradford (Email) (URL) - 01 May '06 - 19:43

Ben, great writeup for a truly fantastic piece of software. One minor correction: Splunk is not actually open source, although it uses some OSS components like python twistd. However, I would consider Splunk to be an open source company based on their heavy use of OSS and their outstanding community involvement. If anybody is interested, I have a couple of things I’ve written about splunk as well, such as a syslog-ng -> splunk howto and a homegrown FAQ based off my experiences in the forums.

howto: [[]]
FAQ: [[]]

cheers :-)

Joe Reeves (Email) (URL) - 02 May '06 - 14:39

To digress from the topic, I have a request. Could you do some OpenSolaris on Intel with Dell Powervault DAS MySQL clusters or SAN architecture proof-of-concepts. I’d like (and probably others too) see a low cost comparison vs a like setup with Sun hardware, ie. Solaris 10 on an x2100, with Storedge. And benchmarks would be nice.
You did a nice piece with the Red Hat cluster but an OpenSolaris cluster, I think, would be real nice.

Bob (Email) - 02 May '06 - 14:41

Bob: Yes, off topic, in the future please email me this types of requests. My email address is at the bottom of the page.

As for doing the setup you asked for… thats sort of a tall order. I’m not a Sun employee, I’m just a normal joe like anyone else. If Dell wants to send me a crapload of systems, I’ll happily accept. I’ve made some informal requests to Sun’s Cluster group for systems by which to do further documentation and testing. Currently my Cluster setup is made up of 2 Ultra2 workstations and an A5100 array. The setup is slow and power hungry, which makes everything much harder to get done. If I ever get my requested gear you’ll see a never-ending stream of Sun Cluster docs from Cuddletech.

But, please, in the future, email me. Comments are pretty useless to other users when they are off-topic and just says to me (and others) that what I’m blogging about is boring.

benr - 02 May '06 - 15:13

I have setup splunk on Solaris 9 in literally 10 mins. But I were never able to get the `tail’ working. I would love to see how you set that up.

axisys (Email) - 03 May '06 - 00:21

An excellent article, and I like Splunk too. The rest of the guys were a bit more lukewarm about it for some reason, but it was easy to set up and started splunking logs right away.

As for the blog comments, you should do more to block the comment spam, many older blog entries have reams of porn spam and other such ickiness that makes the comments a pain to sift through. A foul blot on what has otherwise become a favorite blog for me. ;)

kimmo (Email) - 05 May '06 - 08:20

