Friday, 28 December 2018

Splunk ES performance fundamentals (skipped searches)

When you have a busy Splunk environment with multiple apps, ES and custom correlation searches you need to make sure to optimize your configuration to best use your kit. Scheduling your searches and prioritizing them appropriately is usually step 1.

When you create a correlation search its important to configure the following parameters:

  • Cron Schedule
    • You can randomize the run times yourself here 2,22,42 * * * * (for a 20m search)
  • Scheduling
    • Continuous is less intensive than Real-time
  • Schedule Window
    • auto is my preferred option here 
  • Schedule Priority
    • Usually preferred option is Higher (which makes it fifth overall in the priority order which you can see here)

More detailed explanations as always in the manual here.

Having taken into consideration the latter, leaves the searches that run in Splunk due to installed apps, be that ES, CIM acceleration or other apps you have installed in your search head.

The scheduler is the source of very valuable information to illustrate what the situation looked like before and after.

Splunk search:
index=_internal sourcetype=scheduler source="/opt/splunk/var/log/splunk/scheduler.log" savedsearch_name="*" status!="continued" host="<your_search_head_hostname>" | timechart count by status



The reason field in the majority of those searches showed:
"The maximum number of concurrent auto-summarization searches on this instance has been reached" or "The maximum number of concurrent historical scheduled searches on this instance has been reached"

To get more detail on the issue I used the following:

index=_internal host="<your_search_head_hostname>" sourcetype=scheduler source="/opt/splunk/var/log/splunk/scheduler.log" status!="continued" savedsearch_name="*"
| eval instance=mvindex(split(host,"."),0)
| eval savedsearch_name=instance+":"+savedsearch_name
| eval app=instance+":"+app
| eval user=instance+":"+user
| eval search_type=instance+":"+search_type
| table *
| stats count As executions, count(eval(status="success")) AS "status:success", count(eval(status="skipped")) AS "status:skipped", sum(run_time) AS "total run_time" avg(run_time) as "average run_time", sum(result_count) AS "total result_count", count(eval(action_time_ms>0)) AS alerts, values(reason) As "reasons" values(savedsearch_name) by app
The above will produce the following, which highlights that the Acceleration searches performed in the background for the highlighted data models are very inefficient.  The searches in question have been successful 3373 times in 24h but have been skipped 1086913 times in the same time.

To fix the issue there are two options:

  • Disable acceleration to the data models that you are not using (keep in mind that dashboards based on those data models will stop working!)
  • Restrict data models to particular indexes.
    • Under each data model's configuration a macro is used to identify the indexes to be queried for that data model's relevant data. 


In order to identify the relevant indexes for this case you can run the following query for the past 7 days (in fast mode) or longer if you want to be 100% certain you have all the data:

tag=change | stats values(index) as index

The result will be a small list of indexes you can add to the cim_Change_Analysis_indexes macro found under Settings -> Advanced search -> Search marcos (search with app context all and owner any to be sure). The result should look something like this:




When you have completed the above process for all the data models you will eventually see the following result in skipped searches:




From 1524947/day to 6279/day.
I'd call that a win.
Hope this helps.

Monday, 22 October 2018

Hunting malware with Nexthink

Following the previous post on hunting cryptominers with Nexthink, this time the scope is expanded to attempt to capture a wide variety of malicious activity. In most situations a user will be willingly or unwillingly be forwarded to a web server to download malicious payload after a document has been opened, a macro has been run or something of that nature.

Assumptions:
1) At least from your endpoints there should be one and only way to reach the web, that being your proxy.

2) You trust the reputation data to know what is trustworthy (better than trying to know what is bad). From what I can find it is done via 3rd party BrightCloud (yes they have a URL lookup tool here)

NXQL code:

(select ((web_request (start_time protocol protocol_version incoming_traffic outgoing_traffic)) (user name) (device name) (binary paths) (port port_number) (domain name))
(from (web_request port domain binary user device)
(where executable (eq name (pattern "*rundll32.exe*")))
(where executable (eq name (pattern "*mshta.exe*")))
(where executable (eq name (pattern "*powershell*")))
(where executable (eq name (pattern "*ftp.exe*")))
(where executable (eq name (pattern "*cscript.exe*")))
(where executable (eq name (pattern "*wscript.exe*")))
(where executable (eq name (pattern "*bitsadmin*")))
(where executable (eq name (pattern "*wmic.exe*")))
(where executable (eq name (pattern "*regsvr*")))
(where executable (eq name (pattern "*infdefaultinstall*")))
(where executable (eq name (string "java.exe")))
(where executable (eq name (string "javaw.exe")))
(where executable (eq name (string "javaws.exe")))
(where executable (eq name (string "certutil.exe")))
(where executable (eq name (string "winword.exe")))
(where executable (eq name (string "excel.exe")))
(where executable (eq name (string "powerpnt.exe")))
(where domain (gt first_seen (datetime "$TWODAYSAGO"))
(ne threat_level (enum "none detected")))
(where destination (eq #"Servers"(enum "Proxy")))
(between now-1h now))
(limit 1000))

Breakdown:

  • We want the web_request table as the primary since we are looking for web connections.
  • The domain first seen time needs to be in the last two days, the $TWODAYSAGO variable is a bash one you can add at the top of your pull script like so:
    • TWODAYSAGO=`date --date="2 days ago" "+%Y-%m-%dT%H:%M:%S"`
  • Even though in the GUI you will see Domain -> Reputation, in the NXQL format things are not like that. There were some Data Model changes in version 6.10 where the field was introduced, yet the table in the background stayed the same (to avoid breaking client's scripts). Release document available here  
  • Define the destination of the traffic as corporate proxy traffic
  • Run in short intervals, every 1h

The resulting search following a test:



This is definitely not a wide scope rule, but it will definitely allow you to identify malicious activity without having to parse all the internal traffic your java apps make, or the internal powershell work your admins do.


Hope you get 0 hits :) 

Thursday, 22 March 2018

NXQL cheatsheet (Nexthink tables)

When tasked to write queries for NexThink using NXQL and the Web API V2 the first resource you should hit is.. the manual. Not a lot in there to be honest but it gives you some fundamentals to work with. What would also be useful to have from NexThink is the tables (which I will list below) and the relationships between them (I have not mapped them out yet!). Hopefully it will serve you as a quick reference guide on the available fields when attempting to write a rule.

I have made a freemind and PDF export and made it available here(freemind) and here(PDF).






Hope this helps.

Monday, 5 March 2018

Hunting cryptominers with NexThink

I know, its been a minute since my last post, nevertheless, Cryptominers is where the money is right now so following yesterday's post by Xavier Mertens (@xme) in the SANS Diary, I thought it would be cool to write something in NextThink to use the IOCs.. and yes next step will be getting the data in Splunk so we can alert on it properly. 

Going to NexThink world (Documentation here) you can use the NxQL Editor to confirm your logic for the API (if you are using Web API V2 which is kind of recommended). You can usually access that from the portal (newest version) or directly from the engine you want to query like so:

 https://nxtengine.mydomain.local:1671/2/editor/nxql_editor.html

I will post a breakdown of the tables and their fields on a later post but for the time being here is the rule based on the above IOCs with some added tuning on the side too.

(select ((binary (first_seen last_seen executable_name paths hash threat_level)))
   (from binary
       (where binary (eq executable_name (pattern "*AMDDriver64*")))
       (where binary (eq executable_name (pattern "*Silence*")))
       (where binary (eq executable_name (pattern "*Carbon*")))
       (where binary (eq executable_name (pattern "*xmrig32*")))
       (where binary (eq executable_name (pattern "*nscpucnminer64*")))
       (where binary (eq executable_name (pattern "*mrservicehost*")))
       (where binary (eq executable_name (pattern "*servisce*")))
       (where binary (eq executable_name (pattern "*svchosts3*")))
       (where binary (eq executable_name (pattern "*svhosts*")))
       (where binary (eq executable_name (pattern "*system64*")))
       (where binary (eq executable_name (pattern "*systemiissec*")))
       (where binary (eq executable_name (pattern "*winlogo*"))
            (ne paths (path "%System%/winlogon.exe")))
       (where binary (eq executable_name (pattern "*taskhost*"))
            (ne paths (path "%System%/taskhost.exe"))
            (ne paths (path "%System%/backgroundtaskhost.exe"))
            (ne paths (path "%System%/taskhostw.exe")))
       (where binary (eq executable_name (pattern "*vrmserver*")))
       (where binary (eq executable_name (pattern "*vshell*")))
       (where binary (eq executable_name (pattern "*winlogan*")))
       (where binary (eq executable_name (pattern "*logon*"))
            (ne paths (path "%System%/logonui.exe"))
            (ne paths (path "%System%/winlogon.exe")))
       (where binary (eq executable_name (pattern "*win1nit*")))
       (where binary (eq executable_name (pattern "*wininits*")))
       (where binary (eq executable_name (pattern "*winlnlts*")))
       (where binary (eq executable_name (pattern "*taskngr*")))
       (where binary (eq executable_name (pattern "*tasksvr*")))
       (where binary (eq executable_name (pattern "*mscl*")))
       (where binary (eq executable_name (pattern "*cpuminer*")))
       (where binary (eq executable_name (pattern "*sql31*")))
       (where binary (eq executable_name (pattern "*taskhots*")))
       (where binary (eq executable_name (pattern "*svchostx*")))
       (where binary (eq executable_name (pattern "*xmr86*")))
       (where binary (eq executable_name (pattern "*xmrig*")))
       (where binary (eq executable_name (pattern "*xmr*")))
       (where binary (eq executable_name (pattern "*win1ogin*")))
       (where binary (eq executable_name (pattern "*win1ogins*")))
       (where binary (eq executable_name (pattern "*ccsvchst*")))
       (where binary (eq executable_name (pattern "*nscpucnminer64*")))
       (where binary (eq executable_name (pattern "*update_windows*")))
       )
       (limit 1000))
This provides very few false positives in my environment so I would recommend giving it a try but also tune accordingly. 

To make that into your command line URL you need to take that request, put it in the CyberChef, URL Encode it and then run it like so:

curl -u myusername -k "https://nxtengine.mydomain.local:1671/2/query?query=<add_your_output_here>&platform=windows&format=json"

JSON is easier for Splunk to digest so thats why I have chosen it, its up to you if you want to choose csv. The result should look something like this:




That's all for now. I wish you all get 0 results.