Enhancing Big Data Security with Collaborative Intrusion Detection

IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID
1
Enhancing Big Data Security with
Collaborative Intrusion Detection
Zhiyuan Tan, Member, IEEE, Upasana T. Nagar, Xiangjian He, Senior Member, IEEE,
Priyadarsi Nanda, Senior Member, IEEE, Ren Ping Liu, Senior Member, IEEE,
Song Wang, Senior Member, IEEE and Jiankun Hu, Senior Member, IEEE
Abstract— As an asset of Cloud computing, big data is now changing our business models and applications. Rich information
residing in big data is driving business decision making to be a data-driven process. Its security and privacy, however, have
always been a concern of the owners of the data. The security and privacy could be strengthened via securing Cloud computing
environments. This requires a comprehensive security solution from attack prevention to attack detection. Intrusion Detection
Systems (IDSs) are playing an increasingly important role within the realm of a set of network security schemes. In this article,
we study the vulnerabilities in Cloud computing and propose a collaborative IDS framework to enhance the security and privacy
of big data.
Index Terms—Big data, Cloud computing, Collaborative intrusion detection, Security and privacy
——————————  ——————————
1 INTRODUCTION
C
loud computing has gained popularity since late
2000s. It delivers a flexible network computing model
for organizations to adjust their IT capabilities on the fly
with the least investment in IT infrastructure and maintenance. Cloud computing allows an organization to pay
for only the service they use and to focus on its core business instead of handling technical issues.
In the context of Cloud computing, network-accessible
resources are defined as services. These services are delivered via the typical Cloud computing service models
including: (1) Infrastructure-as-a-Service (IaaS) that offers
storage, computation and network capabilities to service
subscribers through Virtual Machines (VMs); (2) Platform-as-a-Service (PaaS) that provides an environment
for software application development and hosts a client’s
applications in a PaaS provider’s computing infrastructure; and (3) Software-as-a-Service (SaaS) that delivers ondemand software services via computer network and
eliminates the expensive cost of software purchasing and
in-house maintenance.
These technical and business advantages, however, do
not come without cost. The security vulnerability inherited from the underlying technologies (i.e., virtualization,
internet protocol suit, Application Programming Interfac————————————————
 Z.Tan, U. Nagar, X. He and P. Nanda are with the School of Computing
and Communications, University of Technology, Sydney, NSW, 2007,
Australia. E-mail: [email protected];
[email protected]; [email protected];
[email protected].
 R.P. Liu is with the Computational Informatics, Commonwhealth Scientific
and Industrial Research Organisation, NSW, 1710, Australia. E-mail:
[email protected].
 S. Wang is with the Department of Electronic Engineering, La Trobe University, Melbourne, VIC 3083, Australia. E-mail:
[email protected].
 J. Hu is with the School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2600, Australia. E-mail:
[email protected].
xxxx-xxxx/0x/$xx.00 © 200x IEEE
es (APIs) and data center) has been a major inconvenience
preventing Cloud from being widely adopted in many
critical business applications [1]. Generally speaking,
Cloud computing is a Service-Oriented Architecture
(SOA). A comprehensive dependability and security taxonomy framework revealing the complex security causeimplication relations in this architecture is provided in
[2]. Table 1 shown below summarizes the Cloud computing vulnerabilities in relation to the underlying technologies.
These vulnerabilities leave loopholes for cyber intruders to exploit Cloud computing services and pose threats
to the security and privacy of big data. To address these
security issues, a variety of security schemes has been
proposed over the past years. These schemes include encryption, authentication, access control, firewall, IDS and
Data Leak Prevention System (DLPS). In this complex
computing environment, however, one can hardly find a
single scheme that fits all cases. These schemes should be
integrated and cooperate as a comprehensive line of defense.
2 WHAT INTRUSION DETECTION CAN HELP
SECURE CLOUD COMPUTING
IDSs are playing an increasingly important role within
the realm of a set of security schemes. Their goal is to
provide a layer of defense against malicious uses of computing systems by sensing and alerting attacks, which
exploit the vulnerabilities alongside virtualization, Internet protocol suit and APIs. As it is impossible to prevent
all cyber-attacks, IDSs have become essential to secure
Cloud computing environments.
Published by the IEEE Computer Society
2
IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID
TABLE 1
VULNERABILITIES IN UNDERLYING TECHNOLOGIES AND THEIR CONSEQUENCES
Underlying
Technologies
Virtualization
Internet protocol
suit
APIs
Data center
Descriptions
Introduced Vulnerabilities & Consequences
 Facilitating multi-tenancy.
 Enabling maximum utilization of available resources.
 Sharing resources including physical machines and
networks.
 Categories including Full,
OS-layer and Hardwarelayer virtualizations.
 Full access to the resources of a host is obtained by VMs if
isolation between the host and the VMs is not properly configured and maintained. (In this case, the VMs escape to the
host and seize root privileges)
 The security of VMs is not guaranteed, if their host is compromised.
 Networks are shared between a host and its VMs via a virtual
switch. (This offers the VMs a channel to capture the packets
transiting over the networks or even to launch ARP poisoning
attacks.)
 Computing resources of a host are shared with its VMs. (A
guest can launch a DoS attack via a VM by taking up all the
possible resources of the host.)
 Defects in the implementation of the TCP/IP protocol suit
have claimed as the victims of many attacks including IP
spoofing, ARP spoofing, DNS poisoning, RIP attacks, flooding, HTTP session-riding and session hijacking etc.
 The core component of
Internet.
 Assuring the functioning
of internetworking.
 Enabling access to remote
computing resources.
 Offering interfaces to
manage Cloud services including service provisioning, orchestration and
monitoring.
 Enabling management and
storage of data
 Weak credentials, authorization checks and input-data validation are employed. (These result in seizing of root privileges.)
 Defects are introduced in the design and implementation of
Cloud APIs.
 New security vulnerabilities may be introduced during bug
fixing.
 Data are often stored, processed, and transferred in plain text.
(This may result in compromise of the confidentiality of data)
 Residual Data may be found after deletion.
 Data of different users (ordinary users and intruders) are
mixed together under a weak separation (This leaves opportunities to an intruder to access the data of the ordinary users)
With respect to the type of data source involved in detection, IDSs are categorized as Host-based IDSs (HIDSs)
and Network-based IDSs (NIDSs). The former is intended
to detect malicious events on host machines. It is capable
of handling (1) insider attacks which attempt to gain unauthorized privileges, and (2) user-to-root attacks which
attempt to gain root privileges to VMs or host etc. In contrast, the latter monitors and flags traffic carrying malicious contents or presenting malicious patterns. This type
of IDS is feasible to detect (1) direct/indirect flooding
attacks, and (2) port scanning attacks etc.
Although to a certain extent DLPSs can be considered
as a type of IDS, they are more tailored for data security.
It is well known that the security of data cannot be completely guaranteed with DLPSs alone. Attackers who gain
control of the host machines can modify the settings of
the DLPSs. The data are then completely disclosed to the
attackers. Moreover, despite that firewalls can block unwanted network traffic packets according to a pre-defined
rule set, they are infeasible to detect sophisticated intrusive attempts such as flooding attacks and Insider attacks.
IDSs, DLPSs and firewalls are, therefore, not interchangeable security schemes but collaborative ones.
3 CONVENTIONAL INTRUSION DETECTION
SYSTEMS
Conventional IDSs are mostly standalone systems residing on computer networks or host machines. Unlike the
taxonomy used in the previous section, these IDSs can
alternatively be categorized into misuse-based IDSs and
anomaly-based IDSs with respect to the detection mechanism applied.
Misuse-based IDSs enjoy high detection accuracy but
are vulnerable to all zero-day intrusions (i.e., attacks to
the previously unknown system’s vulnerabilities) [3].
This is due to the underlying detection mechanism that
checks for a match with any existing attack signatures. It
ZHIYUAN TAN ET AL.: ENHANCING BIG DATA SECURITY WITH COLLABORATIVE INTRUSION DETECTION
is a common sense that one cannot generate signatures for
unknown attacks. In contrast, anomaly-based IDSs show
a promising performance in the detection of zero-day
intrusions [4, 5]. However, they are prone to high false
positive.
However, current enterprise networks (such as a
Cloud Computing environment) commonly have multiple entry points. This topology is intended to enhance the
accessibility and availability of a network, whereas it
leaves security vulnerability for sophisticated attackers to
exploit using advanced infliction techniques, such as cooperative intrusions.
Unlike the traditional attack mechanisms, the cooperative ones are launched simultaneously under the collaboration among the slave machines within a botnet. The
instances of this type of attack are organized on a sophisticated manner to penetrate an enterprise network
through all the entry points. By evenly distributing the
attack traffic volume to the different entry points, cooperative intrusions can manage to evade the detection of traditional standalone IDSs set right in front of the entry
points. This is because network traffic behavior at each
entry point does not present a significant derivation from
the normal one. After travelling through the entry points,
the attack instances are directed to one single targeted
service within the enterprise network.
Moreover, many of the existing intrusions can occur
collaboratively and simultaneously on nodes throughout
a network. Attackers can initiate automated attacks targeting all vulnerable services within a network simultaneously [6], rather than just focusing on a specific service.
4 THE NEED FOR COLLABORATIVE INTRUSION
DETECTION
Conventional standalone IDSs are susceptible to cooperative attacks, and hence, unsuitable to be in a collaborative
environments (e.g., a Cloud Computing environment). To
defend against this type of attack, Collaborative Intrusion
Detection Systems (CIDSs) have been proposed to correlate suspicious evidence between different IDSs to improve the efficiency of intrusion detection. Unlike the
Conventional standalone IDSs, a CIDS intends to share
traffic information among its fellow IDSs located at the
entry points to a local network.
In real practice, fellow IDSs within a CIDS can be organized either in a decentralized manner or in a hierarchical manner over a large network. These fellow IDSs
communicate directly with each other or with a central
coordinator with respect to the mode of organization applied.
In terms of decentralized CIDSs, each of the fellow
IDSs can generate a complete attack diagram of the network via aggregating network information received from
other fellow IDSs. Detection of malicious attempts is undertaken locally at each of the fellow IDSs. An Instance of
this type of CIDS can be found in [7]. In terms of hierarchical CIDSs, a coordinator is a central point responsible
for information aggregation. A complete attack diagram
of the network is generated by the central coordinator,
who performs analysis on the aggregated information. A
hierarchical CIDS is presented in [8].
5 IMPERFECTION OF CURRENT COLLABORATIVE
INTRUSION DETECTION SCHEMES
Collaborative intrusion detection systems seem promising
in detecting cooperative intrusions. However, existing
system architectures are not without criticism. In CIDSs,
network data summarization is of an important precursor
[9] towards reliable intrusion detection. However, traditionally, network information is collected and processed
by IDS-alike software built on a single network device
that deals with only the traffic flow in and out from that
device. The traffic information, which it is capable of
knowing, is hence limited. In addition, the computation
of network data summarization is proportional to the
amount of traffic flow that it experiences. The drawback
of such approach can be found both in terms of its accuracy and efficiency.
In terms of accuracy, without the knowledge of network data from other nodes, any summarization is built
specific to some partial and insignificant portion of all
available data over the entire network. The effort of exchanging and combining these summarizations alone at a
later stage without the data itself is of course, having a
minimal gain in information.
In terms of efficiency, more additional computation is
required for a node with denser traffic to process summarization. As summarization is of a pure overhead operation, therefore, in an ideal environment, one would prefer
a node having less traffic to process in performing summarization tasks.
Besides, security is another concern of existing CIDSs.
Given a CIDS is compromised, the entire Cloud computing environment is in danger. Conventional IDS-alike
software on a single network device analyzes and maintains network information on the device, and is not built
with security properties ensuring confidentiality, authentication and integrity. Thus, any CIDSs designed simply
integrating this conventional IDS-alike software without
employing proper security enhancements are hence vulnerable to attacks.
6 A NEW COLLABORATIVE INTRUSION DETECTION
FRAMEWORK
Due to the defects in existing CIDSs, a new sophisticated
CIDS framework is greatly appreciated to strengthen the
security of Cloud computing systems. However, Cloud
computing presents us with special characteristics. With
the large dense network of nodes forming a cloud environment, firstly, it offers us with the unprecedented opportunities where network data from all nodes can be
made readily available. At the same time, the challenge is
also unprecedented in a sense that, one must perform
summarization and combine the results in a distributed
and parallel manner. In addition, as we are now dealing
with all network data of the entire cloud, where an unknown number of categories can possibly exist, the sum-
3
4
IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID
Internet
Firewall
Firewall
NIDS
NIDS
Gateway
Gateway
Central Coordinator
Backup Central Coordinator
Host Machines
HIDS
Host Machines
HIDS
Cloud Computing Environment
Host Machines
HIDS
Fig. 1. Framework of a New Collaborative Intrusion Detection System.
marization algorithms will need to expand its categories
in an “on-demand” fashion. That is to automatically create new clusters, once it discovers a new type of traffic is
emerging.
Taking the characteristics of Cloud computing into account, several desirable properties need to be considered
in the design of a new CIDS framework. These properties
include (1) fast detection of various attacks with minimum false positive rates, (2) scalability with the expansion of the Cloud computing system, (3) self-adaption to
the changes of Cloud computing environment and (4)
resistance to compromise [10]. Complying with these requirements, a new CIDS framework is proposed in this
article and illustrated in Fig. 1.
As shown in Fig. 1, HIDSs and NIDSs cooperate to perform intrusion detection at host and network levels, and
each fellow IDS is armed with signature and anomaly
based detectors [11]. This tactic assures better detection
accuracy in both known and unknown attacks. In addition, there are two categories of nodes in this framework,
namely cooperative agent and central coordinator. These
nodes form a collaborative system, whose security is assured via the implementation of necessary security mechanisms. The functionality of these nodes and the security
mechanisms involved in this CIDS framework are detailed as follows.
6.1 Cooperative Agents
They are the nodes standing at the front line and detecting misuses on host machines or malicious behaviors on
networks. These cooperative agents are equipped with
HIDSs or NIDSs with respect to the location on where an
agent sits. If an agent lies on a host machine to detect suspicious events, a HIDS is in place. Otherwise, a NIDS is
employed on a network and monitors network traffic.
In our framework, on one hand, the cooperative agents
located on host machines is a new type of HIDS, which
requires no instrumentation within VMs and models processes at the VM granularity level (i.e., Treating VMs as
individual processes and modeling behaviors of the processes accordingly). This scheme ensures our detection
system complies with service level agreements and legal
restrictions, which may not allow an IaaS provider to
conduct any amendment and to perform intensive monitoring and surveillance on the client VMs. In addition, it
alleviates the ineffectiveness of NIDSs on encrypted traffic. The host-based cooperative agents inform a central
coordinator once an intrusive behavior or activity is detected.
On the other hand, the cooperative agents residing on
network level conduct first tier detection, which provides
defense to generic attacks that present abnormality within
traffic contents and do not involve sophisticated cooperation. The network-based cooperative agents alert a central
ZHIYUAN TAN ET AL.: ENHANCING BIG DATA SECURITY WITH COLLABORATIVE INTRUSION DETECTION
coordinator to any suspicious packets detected. In the
meanwhile, these agents summarize network traffic flowing over in a distributed and parallel manner. In summarization, the non-parametric Bayes [12] could be a suitable
machine learning approach to solve the challenges (mentioned in the previous section) caused by nature of Cloud
computing. Network summarization is particularly important for detecting cooperative intrusions, such as
DDoS attacks. The summarizations are periodically sent
to a central coordinator, whom is discussed in the next
section.
Taking a leap forward, this parallel summarization is
empowered by Cloud computing itself and based on
Mapreduce framework [13]. The Mapreduce framework
provides a seamless and effortless integration of our CIDS
framework into a distributed and parallel architecture by
taking the network-based cooperative agents as slave
nodes, and treating the central coordinator of our CIDS as
a master node in the Mapreduce framework correspondingly. The Mapreduce framework manages all the details
ranging from scheduling to information aggregation.
6.2 Central Coordinator
Finally, the network traffic aggregation is performed on
the central coordinator, which generates a complete attack diagram of the entire network (i.e., the Cloud computing system). Based on this aggregation, the central
coordinator is hence capable of capturing those sophisticated cooperative intrusions that are missed out at the
individual network-based cooperative agents. Upon any
intrusive behaviors (including those identified on the cooperative agents and the central coordinator) are detected, the central coordinator raises alerts to a system administrator.
It is worthy of noting that the a hybrid detector combining misuse-based and anomaly-based detection mechanisms can help reduce the time cost of detection and
enhance the detection accuracy of known and unknown
attacks.
6.3 Security Mechanisms Involved
To ensure the CIDS is resistant to compromise, authentication and encryption as well as integrity check are applied. As the CIDS is working in a 24/7 mode, energy
efficient group key distribution schemes are preferred for
secure key distribution and nodes authentication [14, 15].
These schemes provide a strong secure mechanism for
group key update for a new node joining in, a node leaving and a node compromising. They are also resilient to
collusion attacks where multiple nodes are compromised
and coordinated for attacks. Last but not least, a backup
central coordinator runs alongside with the main coordinator to prevent a single point of failure. The roles of the
coordinators can be exchanged according to actual requirements and conditions of the network.
7 CONCLUSION
Following the framework proposed in the article, a CIDS
will be equipped with the highlighted desirable properties including (1) fast detection of various attacks with
minimum false positive rates, (2) scalability with the expansion of the Cloud computing system, (3) self-adaption
to the changes of Cloud computing environment and (4)
resistance to compromise. The application of the CIDS
alongside other security schemes offers a more feasible
protection to a Cloud computing environment. This in
turn strengthens the security and privacy of big data. The
detailed implementation and application on different
Cloud computing systems are interesting topics to explore in future studies.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Modi, C., Patel, D., Borisaniya, B., Patel, A., and Rajarajan, M.:
‘A survey on security issues and solutions at different layers of
Cloud computing’, The Journal of Supercomputing, 2013, 63,
(2), pp. 561-592.
Hu, J., Khalil, I., Han, S., and Mahmood, A.: ‘Seamless integration of dependability and security concepts in SOA: A feedback
control system based framework and taxonomy’, Journal of
Network and Computer Applications, 2011, 34, (4), pp. 11501159.
Meng, Y., Li, W., and Kwok, L.-F.: ‘Towards adaptive character
frequency-based exclusive signature matching scheme and its
applications in distributed intrusion detection’, Computer
Networks, 2013, 57, (17), pp. 3630-3640.
Creech, G., and Jiankun, H.: ‘A Semantic Approach to HostBased Intrusion Detection Systems Using Contiguousand
Discontiguous System Call Patterns’, Computers, IEEE Transactions on, 2014, 63, (4), pp. 807-819.
Zhiyuan, T., Jamdagni, A., Xiangjian, H., Nanda, P., and Ren
Ping, L.: ‘A System for Denial-of-Service Attack Detection
Based on Multivariate Correlation Analysis’, Parallel and Distributed Systems, IEEE Transactions on, 2014, 25, (2), pp. 447456.
Savage, S.: ‘Internet outbreaks: epidemiology and defenses’, in
Keynote Address, Internet Society Symp. Network and Distributed System Security (NDSS 05), 2005.
Ram, S.: ‘Secure cloud computing based on mutual intrusion
detection system’, International journal of computer application, 2012, 2, (1), pp. 57–67.
Dhage, S.N., and Meshram, B.: ‘Intrusion detection system in
cloud computing environment’, International Journal of Cloud
Computing, 2012, 1, (2), pp. 261-282.
Hoplaros, D., Tari, Z., and Khalil, I.: ‘Data summarization for
network traffic monitoring’, Journal of Network and Computer
Applications, 2014, 37, pp. 194-205.
Patel, A., Taghavi, M., Bakhtiyari, K., and Celestino JúNior, J.:
‘An intrusion detection and prevention system in cloud computing: A systematic review’, Journal of Network and Computer
Applications, 2013, 36, (1), pp. 25-41.
Jones AK, Sielken RS. Computer system intrusion detection: a
survey,
/http://www.cs.virginia.edu/_jones/IDSresearch/Documents/jones-sielken-survey-v11.pdfS; 2000.
Hjort, Nils Lid, et al., eds. Bayesian nonparametrics. Vol. 28.
Cambridge University Press, 2010.
Dean, J., and Ghemawat, S.: ‘MapReduce: simplified data processing on large clusters’, Communications of the ACM, 2008,
51, (1), pp. 107-113.
Tian, B., Han, S., Hu, J., and Dillon, T.: ‘A mutual-healing key
distribution scheme in wireless sensor networks’, Journal of
Network and Computer Applications, 2011, 34, (1), pp. 80-88
5
6
[15] Tian, B., Han, S., Parvin, S., Hu, J., and Das, S.: ‘Self-Healing
Key Distribution Schemes for Wireless Networks: A Survey’,
The Computer Journal, 2011, 54, (4), pp. 549-569.
IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID