Politics and the bungling of big data

Posted by David Pegna on Nov 17, 2016 12:00:00 PM

We live in the age where big data and data science are used to predict everything from what I might want to buy on Amazon to the outcome of an election.

The results of the Brexit referendum caught many by surprise because pollsters suggested that a “stay” vote would prevail. And we all know how that turned out.

History repeated itself on Nov. 8 when U.S. president-elect Donald Trump won his bid for the White House. Most polls and pundits predicted there would be a Democratic victory, and few questioned their validity.

The Wall Street Journal article, Election Day Forecasts Deal Blow to Data Science, made three very important points about big data and data science:

  • Dark data, data that is unknown, can result in misleading predictions.
  • Asking simplistic questions yields a limited data set that produces ineffective conclusions.
  • “Without comprehensive data, you tend to get non-comprehensive predictions.”
Read More »

Topics: Data Science, cyber security, machine learning

Cybersecurity and machine learning: The right features can lead to success

Posted by David Pegna on Sep 15, 2015 9:52:24 AM

Big data is around us. However, it is common to hear from a lot of data scientists and researchers doing analytics that they need more data. How is that possible, and where does this eagerness to get more data come from?

Very often, data scientists need lots of data to train sophisticated machine-learning models. The same applies when using machine-learning algorithms for cybersecurity. Lots of data is needed in order to build classifiers that identify, among many different targets, malicious behavior and malware infections. In this context, the eagerness to get vast amounts of data comes from the need to have enough positive samples — such as data from real threats and malware infections — that can be used to train machine-learning classifiers.

Is the need for large amounts of data really justified? It depends on the problem that machine learning is trying to solve. But exactly how much data is needed to train a machine-learning model should always be associated with the choice of features that are used.

Read More »

Topics: Data Science, cyber security

Automate detection of cyber threats in real time. Why wait?

Posted by Jerish Parapurath on May 15, 2015 10:01:43 AM

Time is a big expense when it comes to detecting cyber threats and malware. The proliferation of new malware variants makes it impossible to detect and prevent zero-day threats in real-time. Sandboxing takes at least 30 minutes to analyze a file and deliver a signature – and by then, threats will have spread to many more endpoints. 

Read More »

Topics: Targeted Attacks, Malware Attacks, Data Science, machine learning

Cybersecurity, data science and machine learning: Is all data equal?

Posted by David Pegna on May 9, 2015 9:00:00 AM

Big Data Sends Cybersecurity back to the future In big-data discussions, the value of data sometimes refers to the predictive capability of a given data model and other times to the discovery of hidden insights that appear when rigorous analytical methods are applied to the data itself. From a cybersecurity point of view, I believe the value of data refers first to the "nature" of the data itself. Positive data, i.e. malicious network traffic data from malware and cyberattacks, have much more value than some other data science problems. To better understand this, let's start to discuss how a wealth of network traffic data can be used to build network security models through the use of machine learning techniques.

Read More »

Topics: Data Science, cyber security, machine learning

Big Data Sends Cybersecurity Back to the Future

Posted by David Pegna on Apr 1, 2015 12:56:43 PM

Big Data Sends Cybersecurity back to the future The main reason behind the rising popularity of data science is the incredible amount of digital data that gets stored and processed daily. Usually, this abundant data is referred to as "big data" and it's no surprise that data science and big data are often paired in the same discussion and used almost synonymously. While the two are related, the existence of big data prompted the need for a more scientific approach – data science – to the consumption and analysis of this incredible wealth of data.

Read More »

Topics: Data Science, cyber security

Do you know how to protect your key assets?

Posted by Oliver Brdiczka, Principal Data Scientist, Vectra Networks on Mar 27, 2015 10:26:34 AM

Security breaches did not stop making headlines in recent months, and while hackers still go after credit card data, the trends goes towards richer data records and exploiting various key assets inside an organization. As a consequence, organizations need to develop new schemes to identify and track key information assets.

The biggest recent breach in the financial industry occurred at JP Morgan Chase, with an estimated 76 million customer records and another 8 million records belonging to businesses stolen from several internal servers. At Morgan Stanley, an employee of the company’s wealth management group was fired after information from up to 10% of Morgan Stanley’s wealthiest clientele was leaked. Even more sensitive was the largest health-care breach thus far: at Anthem, over 80 million records containing personally identifiable information (PII) including social security numbers were exposed. Less well-known, but potentially more costly in terms of damage and litigation is the alleged theft of trade secrets by the former CEO of Chesapeake’s Energy (NYSE: CHK).

Read More »

Topics: Insider Threats, Data Science

Creating Cyber Security That Thinks

Posted by David Pegna on Mar 9, 2015 1:50:00 PM

Until recently, using the terms “data science” and ”cybersecurity” in the same sentence would have seemed odd. Cybersecurity solutions have traditionally been based on signatures – relying on matches to patterns identified with previously identified malware to capture attacks in real time. In this context, the use of advanced analytical techniques, big data and all the traditional components that have become representative of “data science” have not been at the center of cybersecurity solutions focused on identification and prevention of cyber attacks.

This is not surprising. In a signature-based solution, any given malware or new flavor of it needs to be identified, sometimes reverse-engineered and have a matching signature deployed in an update of the product in order to be “detectable.” For this reason, signature-based solutions are not able to prevent zero-day attacks and provide very limited benefit compared to the predictive power offered by data science.

Read More »

Topics: Data Science, cyber security

Detecting the Insider Threat – how to find the needle in a haystack?

Posted by Oliver Brdiczka, Principal Data Scientist, Vectra Networks on Jan 10, 2015 10:00:00 AM

In the previous posts, we have examined the insider threat from various angles and we have seen that insider threat prevention involves the information security, legal and human resources (HR) departments of an organization. In this post, we want to examine what information security departments can actually do to detect ongoing insider threats, and even prevent them before they happen.

The literal needle in the haystack

Overall, insider threats represent only a small proportion of employee behavior. And while only the ‘black swan’ incidents become public knowledge, minor incidents such as theft of IP or customer contact lists will add up to major costs for organizations.

In addition, insiders are by default authorized to be inside the network and are both granted access to and make use of key resources of an organization. Given the large pile of access patterns visible in an organization’s network, how is one to know which ones are negligent, harmful or malicious behavior?

Read More »

Topics: Insider Threats, Data Science