Hadoop Solution Brief Nutanix 2U High-Density Block What is Hadoop? The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop Why Virtualize Hadoop Nodes? software library is a framework that allows for the Boost Hardware utilization model. It is designed to scale up from single Bare metal Hadoop deployments average 10-20% CPU utilization, a major waste of hardware resources and datacenter space. Virtualizing Hadoop allows for better hardware utilization and flexibility. local computation and storage. Rather than rely on distributed processing of large data sets across clusters of computers using a simple programming servers to thousands of machines, each offering hardware to deliver high availability, the library itself is designed to detect and handle failures at the application layer, delivering a highly-available Dynamically Grow and Shrink Capacity Expand and contract MapReduce data crunching capacity with the addition and removal of nodes based on load. Avoid the perils of physical server inflexibility. service on top of a cluster of computers, each of which may be prone to failures. eMail Allow DevOps & IT Ops to live harmony Big Data scientists demand performance, reliability, and a flexible scale model. IT Ops relies on virtualization to tame server sprawl, increase utilization, encapsulate workloads, manage capacity growth, and mitigate disruptive hardware downtime. By virtualizing Hadoop, Data Scientists and IT Ops mutually achieve all objectives while preserving autonomy and independence for their respective charters. Isolate jobs – Make Hadoop and Enterprise Apps play nice Buggy MapReduce jobs can quickly saturate hardware resources, creating havoc for remaining jobs in the queue. Virtualizing Hadoop clusters encapsulates and isolates MapReduce jobs from other important sorting runs. Contracts Enterprise “Dark Data” Partner, Employee Customer, Supplier Credit Weather Transactions Monitoring Sensor Commercial Public Population Economic Industry Sentiment Social Media Network How will Hadoop help my business? Everyone knows that data is growing exponentially. What’s not so clear is how to unlock the value it holds. Hadoop is the answer. Developed by architect Doug Cutting, Hadoop is open source Batch Scheduling & Stacked workloads Allow all workloads and applications to co-exist, e.g. Hadoop, virtual desktops and servers. Schedule MapReduce job runs during off-peak hours to take advantage of idle night time and weekend hours that would otherwise go to waste. software that enables distributed parallel processing of huge amounts of data across inexpensive, commodity servers. With Hadoop, no data is too big. And in today’s hyper-connected world where people and businesses are creating more and more data every day, Hadoop’s ability to New Hadoop Economics grow virtually without limits means businesses and Bare metal implementations are expensive and difficult to predict size and scale. Downtime and underutilized CPU consequences of physical servers can jeopardize project viability. Virtualizing Hadoop reduces complexity and ensures success for sophisticated projects with a scale-out grow as you go model – a perfect fit for Big Data projects all their data. organizations can now unlock potential value from Nutanix & Hadoop = EnterpriseGrade Big Data Business Continuity and Data Protection: As Hadoop clusters gradually become mission critical, enterprise-grade data management features are essential ingredients for successful big data projects. Nutanix has built-in thin provisioning, snapshots, clones, compression, and array-level replication that complements Hadoop projects that mature from PoC to production. High Availability: SORT THROUGHPUT - MB/s Hadoop Solution Brief 5000 4000 3000 2000 HP 1000 Oracle SGI 2 Distributed file system name nodes are single points of failure. Nutanix has built-in network RAID and high-availability features to secure all pieces of Hadoop data, including the file system name node. Change Management: Traditional deployment models make it very difficult to run multiple MapReduce jobs side by side. Nutanix allows you to create parallel MapReduce branches that can execute simultaneously on the same production data using Nutanix Snapshots and FastClone capabilities. Combine Hadoop with your Data Sources Programmers Data Scientists Power Users Business Users Developer Environments Business Intelligence Tools Extract, Transform File Copy Integrated Data Warehouse Extract, Transform, Load 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 RACKSPACE (U) Blazing fast performance Start with 2,000 MB/s of sequential throughput in a compact 2U 4-node cluster. A Terasort benchmark yields 250 MB/s in the same 2U cluster. Linear Scale-out Performance Throughput scales linearly by stacking more blocks. 2x better throughput compared to closest bare metal equivalents. Nutanix Complete Cluster at a Glance Nutanix SAN-Free architecture: Sensor Date Blogs Emails Web Date Docs PDFs Images Videos Raw Data Streams CRM SCM ERP Legacy Delivers best of both worlds: local bus speed performance to virtualized Hadoop workloads with all the benefits of shared storage. 3rd Party Operational Systems Scale with Your Business, One Node at a Time Time-sliced clusters: Like Amazon EC2, Nutanix clusters coupled with VMware vCloud Director, can run Hadoop and other virtualized workloads on the same shared physical hardware. High-density Hadoop: Nutanix uses a hyperscale server architecture in which 8 sockets of Intel fit in a single 2U spread over 4 motherboards. Coupled with data archiving and compression, Nutanix can shrink Hadoop hardware footprints by up to 4x and reduce completion times by 5x. Large clusters set up in hours, not months. Software Defined Data Tiering and Compression: Place data on the best performing Flash tier for most frequently accessed data. Compresse cold data on conventional high-capacity spinning disks. Maximize the performance/cost advantages of every storage medium ( PCIe Flash, SSDs, HDDs ). Flash SSDs for fast NoSQL: Business reports created from roll-up summaries posted to NoSQL databases like HBase are typically memory and I/O constrained. Nutanix direct-attached Flash and local SATA SSD storage help accelerate data sorting with its innovative heat-optimized tiering technology. Data is migrated up and down tiers transparently to assist with I/O-thirsty workloads. Intuitive Cluster Management: Nutanix employs an Apple-like approach to managing large clusters, including a modern dashboard that displays a single pane of glass for servers and storage. This open-source Bonjour mechanism auto-discovers and configures new nodes as they enter and leave the cluster. Visit www.nutanix.com for more information. Follow us @nutanix Email [email protected] Just-in-time turnkey infrastructure: Capacity augmentation reduces unnecessary up-front planning and capacity waste, enabling a more predictable and manageable sizing approach. Now you can dynamically grow or shrink with a single click with zero downtime to end-users or application availability.
© Copyright 2024