Deeper Than Mainstream: How Big Data Improves Cybersecurity

Print Friendly, PDF & Email

brandon-louderIn this special guest feature, Brandon Louder, Regional Technical Manager at Solutionary, asserts that with the implementation of big data into other prevalent industries, it is only natural that the technology become a major focal point for cybersecurity. Just like other industries, cybersecurity also needs to manage risk and produce actionable intelligence from a tremendous amount of data, with speed and accuracy at the top of mind. Brandon has been in the IT industry since 1999, focusing primarily on information security. He holds four degrees in the field of IT and most notably a Master of Science degree in Information Assurance from Dakota State University, which is recognized as a Center of Academic Excellence in Information Assurance Education (CAE/IAE) by the National Security Agency (NSA) and the Department of Homeland Security (DHS). Additionally, Brandon holds the CISSP as well as the ISSAP security architecture concentration from (ISC)2. In past roles, Brandon worked as a security architect for a federal agency as well as consulted on matters of information security and big data analytics to several defense contractors, civilian government agencies, healthcare organizations as well as private and public companies with varying regulatory and compliance needs.

As much as professionals are tired of hearing buzz words, big data is one of the few technologies that is truly “innovating and revolutionizing” a vast majority of industries and markets today. The field of big data has recently become a specific area of expertise and has revolutionized several industry verticals, including finance, healthcare and cybersecurity.

Big data goes mainstream

We are seeing big data go mainstream; it’s even showing up in headlines today. A great example comes from the team over at Splunk4Good, whose goal was to use Splunk (a commercial big-data product) for positive social change. During Hurricane Sandy in 2012, this group collected social-media chatter to identify the fear level of affected people and what essential resources they needed during the hurricane (e.g., food, gas, power).

We are also seeing big data used to perform quantitative analytics in financial markets. The commodity and option markets rely on statistical analytics quite heavily when making financial decisions. The adoption of big data enables organizations to produce probabilities in real time with streaming market data, and many financial institutions make decisions based on the output of those analytics. Several companies, such as Jump Trading, Belvedere Trading, IVolatility and Intuit for example, were built purely on the application of big-data technologies to financial data.

The pain of big data when it comes to cybersecurity

With the implementation of big data in other industries, it is only natural that the technology becomes a major focal point for cybersecurity. Just like other verticals, cybersecurity needs to manage risk and produce actionable intelligence from a tremendous amount of data, with speed and accuracy at top of mind.

Traditional security information and event management (SIEM) solutions, used for cybersecurity log monitoring and incident detection, have inherent flaws that are predominantly attributed to their reliance on relational databases with fixed schemas. Using a relational database as its back end makes a SIEM very rigid and inflexible. Such databases restrict the amount and categories of data that the SIEM can analyze. That means if you have unstructured data or data that does not match the database schema, you are out of luck.

Another issue with traditional SIEMs is their inability to retain raw logs. The raw logs go in and metadata comes out. Without the original raw log, you have to know what you are looking for ahead of time; any additional context or enrichment must be predefined as well, before you start processing the data. If you receive a new threat intelligence feed and want to cross-reference that information with old logs to look for indicators of compromise, you simply cannot.

Big data analytics addresses these limitations

The most important benefit of utilizing big data analytics in cybersecurity is that these tools allow raw data to be retained in its original state; it does not have to fit into a fixed schema. This means that enterprises can use unstructured data sources. In addition, since historical data is available, the scope of analysis becomes much wider, giving an enterprise the ability to look back for previously missed indicators of compromise rather than seeing only what is taking place in real time. For example, when security professionals identify a new IP address that is known to be malicious, they can search all of the historical data for any communications that may have occurred to that IP address before it was known to be malicious.

Historical data also opens up a completely new realm of capabilities that use statistical and predictive models, as well as machine learning. With historical information, you can determine statistical baselines that traverse multiple device types, allowing you to identify what is “normal” within your network. You can then identify anything that deviates from that baseline and respond accordingly. With big data, the types of observable events based on historical information can now cover irregularities of privileged user account activity, inbound and outbound network traffic flows or abnormalities in geographical communications.

Big data can also go a step further when you start to dabble with predictive analytics. A combination of historical data and statistical metadata gives enterprises the ability to predict the probability of an event happening in the future. Botnet clients are doing this today – e.g., if an activity such as  a system “phoning home” to a known botnet command and control server is identified, and this activity has been seen in the past, security professionals can then predict and determine what the malware’s next actions are likely to be.

Machine learning allows for automation in information discovery by “learning” when a machine is exposed to new data rather than by following static, predefined instructions programmed by the user. This capability allows for the automated creation of new methods to detect intrusions, anomalies and new attacks not previously seen.

Simply put, in the not-so-distant future, we will no longer have to teach SIEM and other security tools what to look for – all thanks to big data.


Sign up for the free insideAI News newsletter.




Speak Your Mind