We would love to stay in touch with you!

Enter your details to join our mailing list and we'll send you a link to exclusive content.

* indicates required
Close

Splunk – Log storage, search and reporting

by Jago Maniscalchi  //  October 18, 2010  //  Threat Mitigation  //  No comments

Splunk is a leading software system used to monitor, report and analyse text-based IT data, namely log files, statistics and events. It will analyse live streams as well as index historical data. It uses a distributed architecture to collect information in a series of feeds and then stores that information in a central location. For large installations, Splunk supports load balancing and Splunk ‘forwarders’ can collect data in remote locations. We installed the system on one of our Nepenthes SensorNet probes to put it through its paces.

Installation and Web Interface

Splunk is a dream to install. We tried it on OSX, Linux and Windows and in all cases it installed from a single file, required no configuration and was instantly available via its clean, flash enabled, web interface.

The system is divided into a series of ‘apps’, each of which add collection and/or analysis functionality. By default, Splunk ships with the ‘Search’ app, the ‘Getting Started’ app and a specialised app to collect data from the local machine – a ‘*nix app’ and a ‘Windows’ app.

Feeds

Data is ingested from feeds. These can be locally monitored log files, syslog events from remote systems, or log files on remote computers monitored by Splunk forwarders. We configured a central system on a Linux server and forwarded data from a Windows 7 machine and an OSX machine.

To begin with, we asked Splunk to monitor the /var/log/ folder on our Linux server and it quickly imported and parsed 70 files. The parsing is intelligent – some file formats are known and automatically parsed, whilst the structure of unknown formats is guessed. The user is able to teach Splunk the format of a new log file by giving examples of correctly parsed fields. It will build a regular expression to extract that field and will use that expression to update internally indexed data.

On the Windows machine Splunk automatically indexes the contents of the Windows Event Logs (System and Security). In addition to the standard log parsing feeds, we also enabled the *nix and Windows apps which provide several Unix and Windows specific data feeds.

Once the data feeds are configured and active, Splunk will begin indexing the inbound and historical events. If historical events are not of interest, or there are too many to ingest, you can configure individual feeds to act in ‘tail’ mode – only new events will be processed. By the end of our period of evaluation, our test system contained 13 million records collected over 22 months.

Configuring data forwarders is refreshingly simple. Firstly, install Splunk on the central system and configure it to receive data by entering a single configuration option – the TCP port that the daemon will listen on. Then install Splunk on the remote system and configure the IP address of the central aggregator and its listening port. Once configured, the web interface on the remote system can be disabled (it becomes a “lite” installation) and, after a service restart, events will begin flowing from the remote system to the central aggregator. The data is instantly available via the central search interface and copies can be kept locally for resiliance.

Search App

Once the feeds are configured, and your data is flowing into the system, the search app is the starting point for analysis. Any indexed search term can act as the starting point for a search, which can then be refined using on screen prompts.

We started with a search for the phrase nepenthes which we expected to hit as part of the hostname (nepenthes-desktop), a process name, and a system username. The results begin to flow in with the most recent first and you can pause or stop the search at any time, finalising it for analysis. We settled for 64,408 results.

A flash timeline across the top of the results pane indicates the number of hits in each of a series of timebands. The bands vary from whole months to single minutes, depending on the number of results returned. You can drag a filter window across the timeline to reduce the number of results in a particular search. We filtered down to 21,597 events across a two month window in autumn 2010.

The results in the screenshot to the right are all from syslog messages but the events are actually drawn from 28 different sources. The fields menu on the left can be used to drill down further into results. Clicking on, for example, the source field will reveal a the different sources present in the 21,597 current results. Selecting the ps source would add another term to the search bar (source="ps") and will refine the results selection. In this way, an investigator can easily use fields to drill down into log entries from particular times, from particular sources or relating to particular events, actors or host names.

The *nix app

We mentioned earlier that Splunk is comprised of a series of apps. The *nix app adds functionality to interrogate a Unix host using standard management tools like ls, ps and netstat. The results are indexed as new sources in a separate index.

The *nix specific results are accessible from the search app (as shown above), the *nix search screen and via custom *nix reports. These reports cover CPU uitilisation (by process, user, host etc), Memory usage, Disk usage, network usage etc. The data that they are based upon is all indexed and available elsewhere, but these custom reports present it in a usable form.

The Windows App

Much like the *nix app, the Windows app provides Windows specific data feeds and analysis. It adds data from the Windows Event Logs (System and Security) and uses WMI to access CPU info, process info and disk information. The Windows update log is also parsed to provide information on operating system patching, including identification of failed patches.

The search in the screenshot on the right was conducted from within the Windows app on a Windows installation of Splunk, but could be replicated via the standard search app on a central aggregator. The search string is designed to isolate logon events from the Windows Security Event Log for a particular user in a particular domain.

Custom Log Parsing

When a custom log is parsed by Splunk it will intelligently guess the names of fields. If it cannot guess accurately, it can be trained by the user through a graphical interface that builds regular expressions based on samples of correct field parsing.

We imported the the Nepenthes malware log file /var/log/nepenthes/logged_downloads, which contains details of the shellcode files captured by Nepenthes. A typical entry looks like the following:

[2010-08-22T22:42:18] 188.101.190.174 -> 192.168.1.70 tftp://188.101.190.174/ssms.exe 1f8a826b2ae94daa78f6542ad4ef173b
[2010-08-23T05:54:08] 188.168.44.157 -> 192.168.1.70 tftp://188.168.44.157/ssms.exe 3228f8bc721572422c268f244476dbb8
[2010-08-23T06:21:24] 188.168.44.157 -> 192.168.1.70 tftp://188.168.44.157/ssms.exe 3228f8bc721572422c268f244476dbb8
[2010-08-23T10:35:01] 188.116.74.16 -> 192.168.1.70 tftp://188.116.74.16/ssms.exe 1d419d615dbe5a238bbaa569b3829a23
[2010-08-23T11:01:16] 188.174.62.41 -> 192.168.1.70 tftp://188.174.62.41/ssms.exe 14a09a48ad23fe0ea5a180bee8cb750a
[2010-08-23T11:25:25] 188.120.169.17 -> 192.168.1.70 tftp://188.120.169.17/ssms.exe 1f8a826b2ae94daa78f6542ad4ef173b

Splunk will automatically assign several fields – the hostname that as provided when the data feed was configured is applied to the host field, the source is based on the filepath, and the sourcetype is based on the filename. To configure additional fields, the Extract Fields tool may be used.

When extracting fields, the user is presented with a list of events from the sourcetype in question and must identify multiple instances of a correct parse of a new field.

Splunk will use the examples provided by the user to automatically generate a regex pattern that will extract those examples. It will also conduct a number of additional field extractions using the new regex to allow the user to confirm that the regex will work across a larger data set. It can be refined by hand if required. When correct, the field can be saved, named and then used in searching.

Below you can see us configuring a new field – attackerip – which extracts the source IP address of the attacks that our Nepenthes probe is collecting.

Custom Searches

Custom searches can be configured to run at regular intervals. For example, we configured a custom search to return all the logged_downloads from Nepenthes in the last day. Alerts can be set up based on the number or content of the results. We asked for an e-mail alert if the number of attacks resulting in a malware download was greater than 15.

Conclusion

Splunk is an extremely well designed tool with an easy to use and extremely flexible user interface. It is quick to install, has a small footprint, and adds value almost immediately. The basic version is free though it will only index up to 500MB/day and doesn’t include monitoring and alerting, PDF reports or distributed search. For those features the Enterprise version must be purchased.

IT managers looking after small or medium sized corporate networks that don’t yet have a centralised log management system would do well to try out Splunk 4.0.

About the Author

Jago Maniscalchi is a Cyber security consultant, though he tries to avoid the word "Cyber" at all costs. He has spent 15 years working with Information Systems and has experience in website hosting, software engineering, infrastructure management, data analysis and security assessment. Jago lives in London with his family, enough pets to start a small zooalogical society, and a Samsung NaviBot Robotic Vacuum Cleaner. Despite an aptitude for learning computer languages, his repeated attempts to learn Italian have resulted in spectacular failure.

Leave a Comment

comm comm comm