AI AND CYBER SECURITY: A TECHNO- LEGAL APPROACH BY - DR. RAJU NARAYANA SWAMY IAS

AI AND CYBER SECURITY: A TECHNO- LEGAL APPROACH

 

AUTHORED BY - DR. RAJU NARAYANA SWAMY IAS

 

 

The digitalisation of almost every area of society has changed the rules of our economy.  Cloudification, IoT and BYOD (Bring Your Own Device to Work) are all giving rise to micro environments that contain a lot of sensitive data.  If these devices fall into wrong hands, it could certainly lead to grave consequences.  To put it a bit differently, cyber security - the technology, process and practice to protect networks, devices, programs and data from attacks, damages or unauthorized access - will not only become a crucial issue for the safety of our new digital critical systems, but it will also be a prerequisite for creating trust in our digital economy.  The situation is further alarming when one considers the newer types of attacks, which are mostly machine engineered. Thus cyber security which was a war among humans has changed to a battle of human versus machine. In fact the old protection mechanism which was largely a “seal the borders” approach via firewalls, proxies, antivirus software, access controls and dynamic passwords is today grossly inadequate.  A new approach is required to continuously monitor the large number of factors and detect what constitutes abnormal activity.  This could be similar to our body's immune system where the white cells and antibodies are continuously scanning and neutralizing any organism that does not fit the normal functioning patterns within the body.  This is where AI comes into play.  It needs to be reiterated here that AI works in three ways - assisted intelligence, augmented intelligence and autonomous intelligence.

 

The primary targets for AI application in cyber security are network intrusion detection, malware analysis and classification, phishing and spam emails.  Machine learning (ML) algorithms can recognize potential security breaches or attacks by continuously observing what is an abnormal behavior and given the authority, they can automatically shut down systems under perceived threat. In fact ML can revolutionize the way cyber security has been handled to date - whether it be in detection, protection, prediction or termination.  There are broadly two categories of possible uses:-

  1. Apply supervised learning to the massive amount of historical data to continuously improve prediction capabilities.
  2. Apply unsupervised learning to make some sense out of the massive amount of data through clustering and dimensionality reduction techniques.

 

As regards the former, the most talked about cases in the context of cyber security are malware classification and spam detection.  In Gmail for example, the supervised machine learning algorithm scans countless variables such as the originating IP address and phrases in the email content to determine whether the email conforms to an abnormal pattern and then pushes it out of your inbox folder into the junk folder.

           

As regards the latter, context and expert knowledge base are two critical aspects to make sense out of raw data.  For instance, rather than looking at network traffic logs in isolation, we need to add context to make sense of the data such as whether the device is supposed to respond to DNS queries.  If it is a DNS server, then this is absolutely a normal behavior, but if it is not, the behavior could be the sign of an attack.

 

However, there is a hitch – machine learning lacks the general knowledge required to distinguish real threats leading to too many false alarms.  A potential solution could be a hybrid human- machine collaborative approach such as the AI2 cyber security platform from MIT's Computer Science and Artificial Intelligence Lab.  Here human experts handle the judgment related tasks of validating and classifying the threats and associating severity tags.

 

A major challenge here lies in defining what is not an anomaly.  For example, starting from reading the morning news online to shopping to travel booking to carry our work-related activities, we use our laptopsin many different ways.  There could also be infrequent patterns such as downloading a game or organising pictures from a vacation. In essence, the most potent security threats are not just statistical outliers.

 

Most cyber attacks follow certain attack phases that can be described as a cyber kill chain.  Every attack sequence starts with a reconnaissance phase (in which an attacker tries to locate gaps and vulnerabilities of a target system). The weaponizing phase follows.  This is followed by the delivery phase when the malware is transferred to the potential target.  After the malware is delivered successfully, the exploit phase occurs during which the malware triggers the installation of an intruder's code. Aim of ISA (Integrated Security Approach) is to generate early warnings before the exploit phase.

 

ANNs (Artificial Neural Networks: statistical learning models imitating the structure and function of the human brain) have been used successfully within all stages of ISAs.  ANNs can be used to learn from past network activities and attacks in order to prevent future attacks from actually transpiring. DNNs (Deep Neural Networks: a more elaborate and computationally expensive form of ANNs) have been used not only to protect organizations from cyber attacks, but also to predict these attacks.

 

It must not be forgotten that machine learning is no silver bullet.  Just as businesses are beginning to adopt AI systems, attackers are aslo finding ways to manipulate the same AI systems.  They are focusing on finding ways to turn AI against its owners -from hacking chatbots to deliberately misleading pattern recognition algorithms.  A classic example is Tay, a chat bot introduced by Microsoft to engage people through casual and playful conversation.  Within 24 hours, a structured attack on Tay resulted in the bot shouting all sorts of misogynistic and racist comments.

 

To summarize, through AI powered cyber security is no panacea, AI will become a standard element of cyber security in the short term.  But AI needs to deliver greater accuracy in detection and fewer false positives for it to earn the trust. A drawback of using AI within cyber security is the concern of data privacy. Moreover due to the unique and unforeseeable nature of AI, existing legal frameworks do not necessarily apply to this discipline. Cyber security being a back-and-forth game between attackers and defenders that will constantly evolve as technology grows, AI needs to be trained for all these varied scenarios.  Attackers are already using AI to power their attacks – spear phishing tweets being classic instances – and we must deploy AI – driven defenses to keep up.  As an example, the AI – driven defenses of tomorrow must be geared up to deal with the upcoming challenge of generative adversarial networks (GAN), a class of machine learning frameworks that can be used to generate deep fakes by swapping or manipulating faces or voices in an image or a video.  In fact, offences on AI systems often appear in three areas – adversarial inputs, poisoning training data and model extraction attacks.

 

The reality is that until now, AI alone has not proven overall success in cyber protection.  Despite the great improvements that AI has brought to the realm of cyber security, related systems are not yet able to adjust fully and automatically to changes in their environment, learn all the threats and attack types and choose and autonomously apply dedicated countermeasures to protect against these attacks.  

 

AI methods in Cyber security

Security function

DT

SVM

NB

K

HMM

GA

ANN

CNN

RNN

SNN

Intrusion detection

X

X

X

X

X

X

X

X

X

X

Malware detection

X

X

X

X

-

-

-

X

X

-

Vulnerability assessment

X

-

-

-

-

-

-

-

-

-

Spam filtering

-

-

X

-

-

-

-

-

-

-

Malware Classification

-

-

-

-

-

X

X

-

-

X

Phisting detection

-

-

-

-

-

-

X

-

-

-

Traffic Analysis

-

-

-

-

-

-

-

X

X

-

 

 

DT = Decision Tree; SVM = Support Vector machine

 

NB = Naive Baye's classifier

 

K= K- means clustering

 

HMM = Hidden Markov Model

 

GA = Genetic Algorithm (heuristic search algorithm employing the concept of genetics and natural selection)

 

ANN = Artificial Neural Network

 

CNN = Convolutional Neural Networks

 

RNN = Recurrent Neural Network

 

SNN = Siamese Neural Network

 

In the Indian context, the challenges are even more:

  1. Large digital divide (Lack of digital literacy makes them vulnerable to phishing attacks and online scams)
  2. Fragmented cyber security infrastructure: Responsibility for cyber security is distributed across various government agencies and private entities leading to a lack of coordination
  3. Shortage of qualified cyber security professionals.

 

Needless to say cyber security is not only a technological issue, it is also about regulation and the way that security risks are dealt with.  At the end of the day it is still the human factor that malters – not only the tools.

Indexing Partner