IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID 1 Enhancing Big Data Security with Collaborative Intrusion Detection Zhiyuan Tan, Member, IEEE, Upasana T. Nagar, Xiangjian He, Senior Member, IEEE, Priyadarsi Nanda, Senior Member, IEEE, Ren Ping Liu, Senior Member, IEEE, Song Wang, Senior Member, IEEE and Jiankun Hu, Senior Member, IEEE Abstract— As an asset of Cloud computing, big data is now changing our business models and applications. Rich information residing in big data is driving business decision making to be a data-driven process. Its security and privacy, however, have always been a concern of the owners of the data. The security and privacy could be strengthened via securing Cloud computing environments. This requires a comprehensive security solution from attack prevention to attack detection. Intrusion Detection Systems (IDSs) are playing an increasingly important role within the realm of a set of network security schemes. In this article, we study the vulnerabilities in Cloud computing and propose a collaborative IDS framework to enhance the security and privacy of big data. Index Terms—Big data, Cloud computing, Collaborative intrusion detection, Security and privacy —————————— —————————— 1 INTRODUCTION C loud computing has gained popularity since late 2000s. It delivers a flexible network computing model for organizations to adjust their IT capabilities on the fly with the least investment in IT infrastructure and maintenance. Cloud computing allows an organization to pay for only the service they use and to focus on its core business instead of handling technical issues. In the context of Cloud computing, network-accessible resources are defined as services. These services are delivered via the typical Cloud computing service models including: (1) Infrastructure-as-a-Service (IaaS) that offers storage, computation and network capabilities to service subscribers through Virtual Machines (VMs); (2) Platform-as-a-Service (PaaS) that provides an environment for software application development and hosts a client’s applications in a PaaS provider’s computing infrastructure; and (3) Software-as-a-Service (SaaS) that delivers ondemand software services via computer network and eliminates the expensive cost of software purchasing and in-house maintenance. These technical and business advantages, however, do not come without cost. The security vulnerability inherited from the underlying technologies (i.e., virtualization, internet protocol suit, Application Programming Interfac———————————————— Z.Tan, U. Nagar, X. He and P. Nanda are with the School of Computing and Communications, University of Technology, Sydney, NSW, 2007, Australia. E-mail: [email protected]; [email protected]; [email protected]; [email protected]. R.P. Liu is with the Computational Informatics, Commonwhealth Scientific and Industrial Research Organisation, NSW, 1710, Australia. E-mail: [email protected]. S. Wang is with the Department of Electronic Engineering, La Trobe University, Melbourne, VIC 3083, Australia. E-mail: [email protected]. J. Hu is with the School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2600, Australia. E-mail: [email protected]. xxxx-xxxx/0x/$xx.00 © 200x IEEE es (APIs) and data center) has been a major inconvenience preventing Cloud from being widely adopted in many critical business applications [1]. Generally speaking, Cloud computing is a Service-Oriented Architecture (SOA). A comprehensive dependability and security taxonomy framework revealing the complex security causeimplication relations in this architecture is provided in [2]. Table 1 shown below summarizes the Cloud computing vulnerabilities in relation to the underlying technologies. These vulnerabilities leave loopholes for cyber intruders to exploit Cloud computing services and pose threats to the security and privacy of big data. To address these security issues, a variety of security schemes has been proposed over the past years. These schemes include encryption, authentication, access control, firewall, IDS and Data Leak Prevention System (DLPS). In this complex computing environment, however, one can hardly find a single scheme that fits all cases. These schemes should be integrated and cooperate as a comprehensive line of defense. 2 WHAT INTRUSION DETECTION CAN HELP SECURE CLOUD COMPUTING IDSs are playing an increasingly important role within the realm of a set of security schemes. Their goal is to provide a layer of defense against malicious uses of computing systems by sensing and alerting attacks, which exploit the vulnerabilities alongside virtualization, Internet protocol suit and APIs. As it is impossible to prevent all cyber-attacks, IDSs have become essential to secure Cloud computing environments. Published by the IEEE Computer Society 2 IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID TABLE 1 VULNERABILITIES IN UNDERLYING TECHNOLOGIES AND THEIR CONSEQUENCES Underlying Technologies Virtualization Internet protocol suit APIs Data center Descriptions Introduced Vulnerabilities & Consequences Facilitating multi-tenancy. Enabling maximum utilization of available resources. Sharing resources including physical machines and networks. Categories including Full, OS-layer and Hardwarelayer virtualizations. Full access to the resources of a host is obtained by VMs if isolation between the host and the VMs is not properly configured and maintained. (In this case, the VMs escape to the host and seize root privileges) The security of VMs is not guaranteed, if their host is compromised. Networks are shared between a host and its VMs via a virtual switch. (This offers the VMs a channel to capture the packets transiting over the networks or even to launch ARP poisoning attacks.) Computing resources of a host are shared with its VMs. (A guest can launch a DoS attack via a VM by taking up all the possible resources of the host.) Defects in the implementation of the TCP/IP protocol suit have claimed as the victims of many attacks including IP spoofing, ARP spoofing, DNS poisoning, RIP attacks, flooding, HTTP session-riding and session hijacking etc. The core component of Internet. Assuring the functioning of internetworking. Enabling access to remote computing resources. Offering interfaces to manage Cloud services including service provisioning, orchestration and monitoring. Enabling management and storage of data Weak credentials, authorization checks and input-data validation are employed. (These result in seizing of root privileges.) Defects are introduced in the design and implementation of Cloud APIs. New security vulnerabilities may be introduced during bug fixing. Data are often stored, processed, and transferred in plain text. (This may result in compromise of the confidentiality of data) Residual Data may be found after deletion. Data of different users (ordinary users and intruders) are mixed together under a weak separation (This leaves opportunities to an intruder to access the data of the ordinary users) With respect to the type of data source involved in detection, IDSs are categorized as Host-based IDSs (HIDSs) and Network-based IDSs (NIDSs). The former is intended to detect malicious events on host machines. It is capable of handling (1) insider attacks which attempt to gain unauthorized privileges, and (2) user-to-root attacks which attempt to gain root privileges to VMs or host etc. In contrast, the latter monitors and flags traffic carrying malicious contents or presenting malicious patterns. This type of IDS is feasible to detect (1) direct/indirect flooding attacks, and (2) port scanning attacks etc. Although to a certain extent DLPSs can be considered as a type of IDS, they are more tailored for data security. It is well known that the security of data cannot be completely guaranteed with DLPSs alone. Attackers who gain control of the host machines can modify the settings of the DLPSs. The data are then completely disclosed to the attackers. Moreover, despite that firewalls can block unwanted network traffic packets according to a pre-defined rule set, they are infeasible to detect sophisticated intrusive attempts such as flooding attacks and Insider attacks. IDSs, DLPSs and firewalls are, therefore, not interchangeable security schemes but collaborative ones. 3 CONVENTIONAL INTRUSION DETECTION SYSTEMS Conventional IDSs are mostly standalone systems residing on computer networks or host machines. Unlike the taxonomy used in the previous section, these IDSs can alternatively be categorized into misuse-based IDSs and anomaly-based IDSs with respect to the detection mechanism applied. Misuse-based IDSs enjoy high detection accuracy but are vulnerable to all zero-day intrusions (i.e., attacks to the previously unknown system’s vulnerabilities) [3]. This is due to the underlying detection mechanism that checks for a match with any existing attack signatures. It ZHIYUAN TAN ET AL.: ENHANCING BIG DATA SECURITY WITH COLLABORATIVE INTRUSION DETECTION is a common sense that one cannot generate signatures for unknown attacks. In contrast, anomaly-based IDSs show a promising performance in the detection of zero-day intrusions [4, 5]. However, they are prone to high false positive. However, current enterprise networks (such as a Cloud Computing environment) commonly have multiple entry points. This topology is intended to enhance the accessibility and availability of a network, whereas it leaves security vulnerability for sophisticated attackers to exploit using advanced infliction techniques, such as cooperative intrusions. Unlike the traditional attack mechanisms, the cooperative ones are launched simultaneously under the collaboration among the slave machines within a botnet. The instances of this type of attack are organized on a sophisticated manner to penetrate an enterprise network through all the entry points. By evenly distributing the attack traffic volume to the different entry points, cooperative intrusions can manage to evade the detection of traditional standalone IDSs set right in front of the entry points. This is because network traffic behavior at each entry point does not present a significant derivation from the normal one. After travelling through the entry points, the attack instances are directed to one single targeted service within the enterprise network. Moreover, many of the existing intrusions can occur collaboratively and simultaneously on nodes throughout a network. Attackers can initiate automated attacks targeting all vulnerable services within a network simultaneously [6], rather than just focusing on a specific service. 4 THE NEED FOR COLLABORATIVE INTRUSION DETECTION Conventional standalone IDSs are susceptible to cooperative attacks, and hence, unsuitable to be in a collaborative environments (e.g., a Cloud Computing environment). To defend against this type of attack, Collaborative Intrusion Detection Systems (CIDSs) have been proposed to correlate suspicious evidence between different IDSs to improve the efficiency of intrusion detection. Unlike the Conventional standalone IDSs, a CIDS intends to share traffic information among its fellow IDSs located at the entry points to a local network. In real practice, fellow IDSs within a CIDS can be organized either in a decentralized manner or in a hierarchical manner over a large network. These fellow IDSs communicate directly with each other or with a central coordinator with respect to the mode of organization applied. In terms of decentralized CIDSs, each of the fellow IDSs can generate a complete attack diagram of the network via aggregating network information received from other fellow IDSs. Detection of malicious attempts is undertaken locally at each of the fellow IDSs. An Instance of this type of CIDS can be found in [7]. In terms of hierarchical CIDSs, a coordinator is a central point responsible for information aggregation. A complete attack diagram of the network is generated by the central coordinator, who performs analysis on the aggregated information. A hierarchical CIDS is presented in [8]. 5 IMPERFECTION OF CURRENT COLLABORATIVE INTRUSION DETECTION SCHEMES Collaborative intrusion detection systems seem promising in detecting cooperative intrusions. However, existing system architectures are not without criticism. In CIDSs, network data summarization is of an important precursor [9] towards reliable intrusion detection. However, traditionally, network information is collected and processed by IDS-alike software built on a single network device that deals with only the traffic flow in and out from that device. The traffic information, which it is capable of knowing, is hence limited. In addition, the computation of network data summarization is proportional to the amount of traffic flow that it experiences. The drawback of such approach can be found both in terms of its accuracy and efficiency. In terms of accuracy, without the knowledge of network data from other nodes, any summarization is built specific to some partial and insignificant portion of all available data over the entire network. The effort of exchanging and combining these summarizations alone at a later stage without the data itself is of course, having a minimal gain in information. In terms of efficiency, more additional computation is required for a node with denser traffic to process summarization. As summarization is of a pure overhead operation, therefore, in an ideal environment, one would prefer a node having less traffic to process in performing summarization tasks. Besides, security is another concern of existing CIDSs. Given a CIDS is compromised, the entire Cloud computing environment is in danger. Conventional IDS-alike software on a single network device analyzes and maintains network information on the device, and is not built with security properties ensuring confidentiality, authentication and integrity. Thus, any CIDSs designed simply integrating this conventional IDS-alike software without employing proper security enhancements are hence vulnerable to attacks. 6 A NEW COLLABORATIVE INTRUSION DETECTION FRAMEWORK Due to the defects in existing CIDSs, a new sophisticated CIDS framework is greatly appreciated to strengthen the security of Cloud computing systems. However, Cloud computing presents us with special characteristics. With the large dense network of nodes forming a cloud environment, firstly, it offers us with the unprecedented opportunities where network data from all nodes can be made readily available. At the same time, the challenge is also unprecedented in a sense that, one must perform summarization and combine the results in a distributed and parallel manner. In addition, as we are now dealing with all network data of the entire cloud, where an unknown number of categories can possibly exist, the sum- 3 4 IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID Internet Firewall Firewall NIDS NIDS Gateway Gateway Central Coordinator Backup Central Coordinator Host Machines HIDS Host Machines HIDS Cloud Computing Environment Host Machines HIDS Fig. 1. Framework of a New Collaborative Intrusion Detection System. marization algorithms will need to expand its categories in an “on-demand” fashion. That is to automatically create new clusters, once it discovers a new type of traffic is emerging. Taking the characteristics of Cloud computing into account, several desirable properties need to be considered in the design of a new CIDS framework. These properties include (1) fast detection of various attacks with minimum false positive rates, (2) scalability with the expansion of the Cloud computing system, (3) self-adaption to the changes of Cloud computing environment and (4) resistance to compromise [10]. Complying with these requirements, a new CIDS framework is proposed in this article and illustrated in Fig. 1. As shown in Fig. 1, HIDSs and NIDSs cooperate to perform intrusion detection at host and network levels, and each fellow IDS is armed with signature and anomaly based detectors [11]. This tactic assures better detection accuracy in both known and unknown attacks. In addition, there are two categories of nodes in this framework, namely cooperative agent and central coordinator. These nodes form a collaborative system, whose security is assured via the implementation of necessary security mechanisms. The functionality of these nodes and the security mechanisms involved in this CIDS framework are detailed as follows. 6.1 Cooperative Agents They are the nodes standing at the front line and detecting misuses on host machines or malicious behaviors on networks. These cooperative agents are equipped with HIDSs or NIDSs with respect to the location on where an agent sits. If an agent lies on a host machine to detect suspicious events, a HIDS is in place. Otherwise, a NIDS is employed on a network and monitors network traffic. In our framework, on one hand, the cooperative agents located on host machines is a new type of HIDS, which requires no instrumentation within VMs and models processes at the VM granularity level (i.e., Treating VMs as individual processes and modeling behaviors of the processes accordingly). This scheme ensures our detection system complies with service level agreements and legal restrictions, which may not allow an IaaS provider to conduct any amendment and to perform intensive monitoring and surveillance on the client VMs. In addition, it alleviates the ineffectiveness of NIDSs on encrypted traffic. The host-based cooperative agents inform a central coordinator once an intrusive behavior or activity is detected. On the other hand, the cooperative agents residing on network level conduct first tier detection, which provides defense to generic attacks that present abnormality within traffic contents and do not involve sophisticated cooperation. The network-based cooperative agents alert a central ZHIYUAN TAN ET AL.: ENHANCING BIG DATA SECURITY WITH COLLABORATIVE INTRUSION DETECTION coordinator to any suspicious packets detected. In the meanwhile, these agents summarize network traffic flowing over in a distributed and parallel manner. In summarization, the non-parametric Bayes [12] could be a suitable machine learning approach to solve the challenges (mentioned in the previous section) caused by nature of Cloud computing. Network summarization is particularly important for detecting cooperative intrusions, such as DDoS attacks. The summarizations are periodically sent to a central coordinator, whom is discussed in the next section. Taking a leap forward, this parallel summarization is empowered by Cloud computing itself and based on Mapreduce framework [13]. The Mapreduce framework provides a seamless and effortless integration of our CIDS framework into a distributed and parallel architecture by taking the network-based cooperative agents as slave nodes, and treating the central coordinator of our CIDS as a master node in the Mapreduce framework correspondingly. The Mapreduce framework manages all the details ranging from scheduling to information aggregation. 6.2 Central Coordinator Finally, the network traffic aggregation is performed on the central coordinator, which generates a complete attack diagram of the entire network (i.e., the Cloud computing system). Based on this aggregation, the central coordinator is hence capable of capturing those sophisticated cooperative intrusions that are missed out at the individual network-based cooperative agents. Upon any intrusive behaviors (including those identified on the cooperative agents and the central coordinator) are detected, the central coordinator raises alerts to a system administrator. It is worthy of noting that the a hybrid detector combining misuse-based and anomaly-based detection mechanisms can help reduce the time cost of detection and enhance the detection accuracy of known and unknown attacks. 6.3 Security Mechanisms Involved To ensure the CIDS is resistant to compromise, authentication and encryption as well as integrity check are applied. As the CIDS is working in a 24/7 mode, energy efficient group key distribution schemes are preferred for secure key distribution and nodes authentication [14, 15]. These schemes provide a strong secure mechanism for group key update for a new node joining in, a node leaving and a node compromising. They are also resilient to collusion attacks where multiple nodes are compromised and coordinated for attacks. Last but not least, a backup central coordinator runs alongside with the main coordinator to prevent a single point of failure. The roles of the coordinators can be exchanged according to actual requirements and conditions of the network. 7 CONCLUSION Following the framework proposed in the article, a CIDS will be equipped with the highlighted desirable properties including (1) fast detection of various attacks with minimum false positive rates, (2) scalability with the expansion of the Cloud computing system, (3) self-adaption to the changes of Cloud computing environment and (4) resistance to compromise. The application of the CIDS alongside other security schemes offers a more feasible protection to a Cloud computing environment. This in turn strengthens the security and privacy of big data. The detailed implementation and application on different Cloud computing systems are interesting topics to explore in future studies. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Modi, C., Patel, D., Borisaniya, B., Patel, A., and Rajarajan, M.: ‘A survey on security issues and solutions at different layers of Cloud computing’, The Journal of Supercomputing, 2013, 63, (2), pp. 561-592. Hu, J., Khalil, I., Han, S., and Mahmood, A.: ‘Seamless integration of dependability and security concepts in SOA: A feedback control system based framework and taxonomy’, Journal of Network and Computer Applications, 2011, 34, (4), pp. 11501159. Meng, Y., Li, W., and Kwok, L.-F.: ‘Towards adaptive character frequency-based exclusive signature matching scheme and its applications in distributed intrusion detection’, Computer Networks, 2013, 57, (17), pp. 3630-3640. Creech, G., and Jiankun, H.: ‘A Semantic Approach to HostBased Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns’, Computers, IEEE Transactions on, 2014, 63, (4), pp. 807-819. Zhiyuan, T., Jamdagni, A., Xiangjian, H., Nanda, P., and Ren Ping, L.: ‘A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis’, Parallel and Distributed Systems, IEEE Transactions on, 2014, 25, (2), pp. 447456. Savage, S.: ‘Internet outbreaks: epidemiology and defenses’, in Keynote Address, Internet Society Symp. Network and Distributed System Security (NDSS 05), 2005. Ram, S.: ‘Secure cloud computing based on mutual intrusion detection system’, International journal of computer application, 2012, 2, (1), pp. 57–67. Dhage, S.N., and Meshram, B.: ‘Intrusion detection system in cloud computing environment’, International Journal of Cloud Computing, 2012, 1, (2), pp. 261-282. Hoplaros, D., Tari, Z., and Khalil, I.: ‘Data summarization for network traffic monitoring’, Journal of Network and Computer Applications, 2014, 37, pp. 194-205. Patel, A., Taghavi, M., Bakhtiyari, K., and Celestino JúNior, J.: ‘An intrusion detection and prevention system in cloud computing: A systematic review’, Journal of Network and Computer Applications, 2013, 36, (1), pp. 25-41. Jones AK, Sielken RS. Computer system intrusion detection: a survey, /http://www.cs.virginia.edu/_jones/IDSresearch/Documents/jones-sielken-survey-v11.pdfS; 2000. Hjort, Nils Lid, et al., eds. Bayesian nonparametrics. Vol. 28. Cambridge University Press, 2010. Dean, J., and Ghemawat, S.: ‘MapReduce: simplified data processing on large clusters’, Communications of the ACM, 2008, 51, (1), pp. 107-113. Tian, B., Han, S., Hu, J., and Dillon, T.: ‘A mutual-healing key distribution scheme in wireless sensor networks’, Journal of Network and Computer Applications, 2011, 34, (1), pp. 80-88 5 6 [15] Tian, B., Han, S., Parvin, S., Hu, J., and Das, S.: ‘Self-Healing Key Distribution Schemes for Wireless Networks: A Survey’, The Computer Journal, 2011, 54, (4), pp. 549-569. IEEE CLOUD COMPUTING MAGAZINE, MANUSCRIPT ID
© Copyright 2024