High Performance Commercial Computing in 2015 1/30/2015 Rob Klopp Technology Consultant ©2015—Cognilytics. All rights are reserved. Part 1 - The Case for High Performance Software The slow growth of ever-more-subtle features without hardware optimization creates the phenomena known as software bloat. Most of us have seen this on our personal computers, but it expands to enterprise software as well. Since we are suggesting 20X or more improvement, the ROI for high performance software is not too hard to find. There are several scenarios worth considering which offer a return. A more interesting benefit comes from using extreme performance to dramatically reduce the cost of developing new business applications. We have all observed and experienced the effort that goes into making business applications perform. We build redundant copies of data into warehouses and marts and then pre-process data into cubes and other aggregated forms, or we completely preprocess data so that result rows may be retrieved directly by key. Even then we tune query syntax and add indexes and struggle to find the sweet spot between insert/update performance and read performance only to find that adding the next business query forces yet another tuning effort or redundant copy of the data. ©2015—Cognilytics. All rights are reserved. Conclusion By eliminating all of these performance hoops, developers must jump through after they have developed the required business functionality we can significantly reduce the time and cost associated with adding business capability. In other words, we can use the 20X+ performance boost to satisfy performance requirements without tuning. We can deploy data once, or at most twice Business Software The most conventional benefit comes from making current business applications faster. This is a tried and true case: if performance improves, work can proceed within the user’s attention span and staff becomes more productive. Further, improved performance is delightful and unfortunately, IT departments are rarely accused of being delightful. Database Computing While there is a cost to re-factoring distributed bloated software into high performance software, the costs may vary wildly. If you are buying high performance commercial software then the costs may be limited to the acquisition of modern hardware, but if you are trying to trim down your business application software there will be some development costs. Technology This paper makes the case for building or buying high performance business software. We make the case by first suggesting in Part 1 how high performance software might positively impact your business. Then, in Part 2, we discuss the high performance commercial software available today and the hardware requirements to make them hum. Finally, in Part 3, we will dive a little deeper and discuss some of the high performance capabilities offered by Intel, which make these advancements possible. Executives will benefit from the first two Parts. The third Part provides the technical foundation for the case to help you to understand that the points being made are about architecture, not marketing. Software While Moore’s Law suggests that processor performance might improve by n% every m months, it is normal to find that the actual performance gains you see in your business applications are a fraction of this. This is because, until recently, both commercial and home-grown software have used the performance enhancements built into each subsequent hardware generation to add ever-morenuanced features into products and applications. Today some vendors are reversing this trend and beginning to build high performance business applications optimized to the hardware offered by Intel. And in a world where a 30% (1.3X) performance boost is considered significant, modern software optimized “to the bare metal” offers 20X, and sometimes 200X, performance improvements. Executive Overview Executive Overview The usual case for commercial enterprises will be to buy, rather than build, high performance software. Today we can find two categories of high performance commercial software emerging: high performance database computing and high performance business software. Introduction to High Performance Technology High Performance computing is built upon a set of hardware technologies and features that include the use of high-performance peripheral devices, the parallel utilization of multiple cores, the effective use of processor cache, and several super-computing features built-into modern processors. Database Computing Part 2 - High Performance Commercial Software Technology So deploying modern software highly optimized for the processor products from Intel and others can provide benefits that far outweigh the cost of an upgrade. High performance software on high performance hardware can provide reduced response times for existing applications, can allow for a simplified system landscape and introduce agility into the software development process, and can enable totally new application functionality. In each case the benefits provide a competitive edge and a large return on any investment. Software The final benefit of this extreme performance comes from using speed to implement extreme business applications. This use of high-performance can introduce business functionality that is currently unheard-of. Real-time applications become possible where analysts can see what is happening as it happens. More importantly, deep analytics can be performed instantly to automate processes that currently require human intervention. Today human actors use business intelligence products to scan thousands of events and facts per day, each representing a single scenario, hoping to find something “interesting”. Extreme performance can be used to scan millions of scenarios per minute and the results of each scan can be compared to historical trends exposing only the interesting or anomalous scenarios to staff. The results of this approach, using extreme performance to glean signals from the noise, represents the next generation of business intelligence and these analytic applications are in the near-term future for many competitive businesses. Executive Overview (OLTP + EDW), in a general form and let all of the different user queries use the high performance software and hardware to meet SLAs. Imagine the cost savings of eliminating redundant data in your enterprise. This approach, based on simplifying the systems landscape, delights not only the end-users in an enterprise; but often delights the executives and the shareholders as net new business functionality and reduced IT infrastructure costs both positively affect the bottom line. Business Software ©2015—Cognilytics. All rights are reserved. Conclusion Figure 1 Data Access Latency Intel® SSD Devices The Intel SSD roadmap has now introduced SSD devices that connect using PCIe without a controller inbetween. This reduces the latency to start an I/O to 100,000ns. For more information, see http://www. intel.com/content/www/us/en/solidstate-drives/solid-state-drives-ssd.html. While Figure 1 shows the gains from moving data into DRAM there are also significant gains to be had from accessing more data in the processor caches. High performance software will pre-load data into the cache to gain a 15X performance improvement over software that access data only from DRAM. Finally, since memory and cache are expensive resources it is important to compress data in-memory and, whenever possible, to process the data directly in a compressed form. High performance software stores and processes data as compressed vectors and utilizes vector instructions to process the data without decompression. ©2015—Cognilytics. All rights are reserved. Conclusion The point is that, by combining these features, extreme performance or extreme savings are possible. With this as background let’s review the state of the market for high performance commercial computing. Business Software The roadmap continues with more performance gains, price reductions, and interconnect improvements coming in the near-term . When multiple cores are all accessing the same memory it is critical to use high performance techniques to lock data into transactions. Modern processors provide hardware features to support this locking and high performance software will use these features. Database Computing Intel is focusing on next-gen nonvolatile memory to make the bridge to ever larger server memory footprints possible as NVM is always cheaper than DRAM and also denser. In addition, Intel is readying 3D NAND and has announced that Intel will bring ever larger and cost effective PCIe SSD’s to the marketplace and architectures for Big Data, and ever larger Oracle databases. Modern microprocessors are packing more and more cores per chip. Intel has announced 18-core processors based on an architecture that will scale to 40-cores per processor. Further, Intel’s partners are building servers with multiple processors per server, all sharing the same memory, so 32-core, 64-core, and 128-core products will be available in the short term. While we might run 128 applications to keep 128 cores busy, high performance commercial computing will deploy a single unit of work, a SQL query, across all these cores. Technology SSD devices today hook to a server as a fast peripheral device. They execute through a controller like a disk drive. Further, SSD devices today are built on NAND solid-state technology that is prone to wear. Data in-memory, in-DRAM, provides another 1000X-10000X performance boost over data-in-SSD. If all data, or the most popular data, were in-memory a giant performance gain results. This explains the emergence of in-memory database products and in-memory caches. Software Solid State Devices are more-or-less memory in a peripheral device. By eliminating the mechanics associated with spinning disk we can build a high-performance product without re-writing the software. This is the low-hanging fruit picked by most commercial vendors and it is the easiest way to improve the performance of your systems. As the cost per GB of SSD continues to shrink and the performance continues to improve the price/performance of SSD-based solutions will become more and more attractive. Executive Overview Let’s start this with a quick overview of the potential for improvement. Figure 1 presents an overview of the latency required to start fetching data into the core for processing. You can sense the opportunities. If data were fetched from an SSD device instead of from disk a 50X performance advantage is yours. A 2000X advantage accrues if you eliminate I/O and fetch data from DRAM, and so on. Up to 12 cores Intel® Advanced Vector Extensions (Intel® AVX): • 256-bit Vector Registers and 256-bit Vector instructions More memory, faster memory Supports PCIe* 3.0 specification, which improves I/O bandwidth up to 2x1,2,3 Intel® AVX2: • Up to 1.9x increase in performance 4 • New GATHER instruction increases usability of vectorization • Four new AVX instructions extends applications Other 2 3 IBM offers DB2 BLU*. While BLU is designed to support analytics-only it includes many of the same capabilities as HANA with in-memory columnar options, vector compression, and super-computing processing. BLU does not scale out at this time, but on a single, large, multi-core node it should compete against HANA. Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double the interconnect bandwidth over the PCIe 2.0 specification. Source: http://www.pcisig.com/news_room/November_18_2010_Press_Release/. ©2015—Cognilytics. All rights are reserved. Conclusion 1 The category leader for the time being is SAP with their HANA* database. HANA offers a DBMS that is designed to support both high-performance OLTP workloads and high-performance analytics against a single table. They accomplish this with a unique design that stores incoming data in an OLTP-optimized structure and then moves the data to an analytics-optimized structure behind-the-scenes. Both structures are stored in-memory in a columnar format, compressed into vectors, and processed with super-computing instructions sets. HANA also includes in-memory parallelization that uses all of the cores on each query. Finally, HANA scales out using a shared-nothing approach that lets you add nodes to grow. Business Software • New bit-manipulation instructions to build and maintain vectors • Reduced overhead for virtual machines • Up to 3x increase in memory bandwidth with DDR4 as compared to a typical 4-year old server1,5 • Up to 3x performance boost over previous Intel Xeon processor generation1,6,7 To complete this section we will cover several of the major database products and vendors, describing how they use the capabilities listed above to deliver performance. Database Computing Up to 18 cores Technology Intel® Xeon® E5-2600 v3 It is worth noting that high performance database computing is not the same as Big Data computing. The Big Data category focuses on scale rather than raw performance. In other words, the Hadoop* ecosystem is not currently a high performance ecosystem, even though, as we will see, there are some high-performance initiatives in the Hadoop portfolio. Software Intel® Xeon® Processor E5-2600 v2 High performance database computing has been around for some time. In the 1990’s the Gartner Group even had a High Performance Commercial Computing team. At that time the category was dominated by Tandem who offered a high performance OLTP product, and by Teradata with a high performance analytics offering. These companies used massively parallel processing (MPP) to gain high performance. Executive Overview High Performance Database Computing Microsoft SQL Server* offers Hekaton*, an in-memory OLTP DBMS that uses hardware supported locking. The bottom line here is clear … all of the major database vendors are working to develop out high performance capabilities that utilize the latest features of modern hardware: larger memory footprints, vector processing, and the latest SSD devices. This combination of the latest and greatest hardware with high performance commercial database software can provide the extreme performance that is the subject of this paper. Capabilities here will change with each new product release … but the hardware/software combination will be a constant going forward. Software Finally the Apache Open Source community offers Spark*. Spark provides in-memory support for analytic workloads with options to store data in a columnar form. Executive Overview Oracle Database 12c* includes an in-memory capability as well. This is a new offering with some limitations, but it does include an in-memory column store with vector compression and the use of high performance vector instructions. Over time, the R2 limitations should ease and Oracle too will be competitive. Oracle also offers database appliances that utilize the latest SSD technology. High Performance Business Software Technology High Performance Business Software describes cases where the business logic you use, the application layer, utilizes an extreme performance boost. We will consider two specific cases where high performance business software is providing competitive advantage, introduce a very important third case that is emerging, and briefly discuss SAP’s high performance platform strategy. Before going into these cases let’s reflect on what comprises a high-performance business platform. A High-Performance Business Platform 4 5 7 ©2015—Cognilytics. All rights are reserved. Conclusion 6 Source as of August 2014 TR#3034 on Linpack*. Baseline configuration: Intel® Server Board S2600CP with two Intel® Xeon® Processor E5-2697 v2, Intel® HT Technology disabled, Intel® Turbo Boost Technology enabled, 8x8GB DDR3-1866, RHEL* 6.3, Intel® MKL 11.0.5, score: 528 GFlops. New configuration: Intel® Server System R2208WTTYS with two Intel® Xeon® Processor E5-2699 v3, Intel® HT Technology disabled, Intel® Turbo Boost Technology enabled, 8x16GB DDR4-2133, RHEL* 6.4, Intel® MKL 11.1.1, score: 1,012 GFlops. Source as of August 2014 TR#3044 on STREAM (triad): Supermicro X8DTN+ platform with two Intel® Xeon® Processor X5680, 18x8GB DDR3-800 score: 26.5 GB/sec. New Configuration: Intel® Server System R2208WTTYS with two Intel® Xeon® Processor E5-2699 v3, 24x16GB DDR4-2133 @ 1600MHz DR-RDIMM, score: 85.2 GB/sec. Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. Source as of September 8, 2014. New configuration: Hewlett-Packard Company HP ProLiant ML350 Gen9 platform with two Intel® Xeon® Processor E5-2699 v3, Oracle Java Standard Edition 8 update 11, 190,674 SPECjbb2013-MultiJVM max-jOPS, 47,139 SPECjbb2013-MultiJVM critical-jOPS. Source. Baseline: Cisco Systems Cisco UCS C240 M3 platform with two Intel® Xeon® Processor E5-2697 v2, Oracle Java Standard Edition 7 update 45, 63,079 SPECjbb2013-MultiJVM max-jOPS , 23,797 SPECjbb2013-MultiJVM critical-jOPS. Source. Business Software Next will be: to squeeze out any unnecessary latency in the software stack. This latency most often comes from using a legacy, distributed, architecture that deploys separate virtual or physical servers talking over a virtual or physical network layer. If your application will fit on a large single server, and they are becoming quite large, then as a first step you might virtualize the physically distributed architecture onto a single server with several virtual machines. Better yet, deploy as many distributed components as you can as processes in a single virtual machine and allow the components to communicate using inter-process communications instead of a virtual network. To squeeze a little more performance you might deploy a micro-OS in a container to squeeze out some of the overhead of running multiple distributed components. Even better still you could deploy the components onto a physical server and remove the overhead of virtualization. The gains will be significant. Database Computing Importantly, the first step will almost always be to implement a high-performance database platform as a foundation. Executive Overview Squeezing Out the Fat Software Technology Figure 2. Squeezing Latency Out of Your Stack Mobile BI The conventional way to solve these performance issues is to tune your way out. But it is usually the case that database tuning has already been applied to hit the top-notch 20-second average. So application-level tuning is required. Application-level tuning usually requires the development of a replica of the data that contains very specific pre-computed values that can quickly be fetched by key. Each fetch returns a single row containing the pre-computed answer. Business Software An obvious first use case is mobile business intelligence (BI). BI is a reporting function tied to your data warehouse that usually plows through fairly large volumes of data with each query. A BI capability that provides an average response of 20 seconds for queries would be considered top-notch. But 20-second response is 10X too slow for mobile devices where 2-3 seconds is a hard requirement. And a 20-second average means that there will be a significant percentage of queries that require 2-3 minutes or more. Database Computing Now back to business use cases. This, of course, is not a sustainable approach to BI as it involves a specialized data structure and a time-consuming project and ongoing maintenance for each and every query template. ©2015—Cognilytics. All rights are reserved. Conclusion A more cost-effective, agile, and sustainable approach might be to build a high-performance BI platform that can crunch data so fast that nearly any query can be aggregated from the raw data without the need for pre-computation. Modern in-memory databases can aggregate data at super-computer speeds. SAP HANA, for example will aggregate up to 12M records per second per core … and modern hardware provides 32, 64, or up to 128 cores per node. A second use case applies predictive analytics to your analytic data. In general we may think of analytics as a process that models historical data over time and then uses the model to predict the next data set. An interesting variation on this approach compares each day’s result to what was predicted to identify anomalies. Real-time Business Processes When all of the fat is squeezed out of the application stack and in-memory computing provides a basis for high volume OLTP and for low volume analytics (like you may current process on an Operational Data Store replica), and the analytics are boosted by the high performance hardware and software we have discussed, then it becomes possible to register a business transaction and instantly query it for analytics. It is possible to analyze your business in real time. Consider: • Healthcare: Real time feeds from sensors and from automated lab equipment can identify most effective treatments based on the real time state of a patient. • Manufacturing: Real time feedback can automatically adjust tolerances on the shop floor based on a holistic sense of quality rather than adjusting each machine in isolation. ©2015—Cognilytics. All rights are reserved. Conclusion Real time computing is the next big thing, the initiative that, when combined with deep analytics, will provide the competitive edge that differentiates the early adopters from the pack and then draws everybody in. It will be the next phase of the analytics advance that started with data warehousing and decision support systems. Business Software • Retail: Your websites can instantly react to browsing or buying behavior and react based on personalized rules for your customer rather than general rules. Further, your general rules can flex, reacting to changes in buying behavior in real time. You can see some of this in very advanced applications from companies like Netflix, Amazon, and Apple, but you now have the ability to run these advanced applications in real time on your website. Database Computing The implications of this are huge. Today only very large financial institutions can assemble enough super-computing to evaluate a financial market and automatically initiate trades. With the highperformance commercial computing capabilities suggested in this paper it is possible to automate more of the decisions that drive your business. Technology It is important to see that these predictive analytic capabilities are just now feasible with modern hardware. Despite the aura of mathematics, predictive analytics is for the most part a brute force computational problem. It is super-computing brought to the commercial space. While the latest Intel hardware will allow for you to troll through a million cuts at your data in a day there are tens of millions more that will become possible with each upcoming tick and tock of the Intel roadmap. Software In order to accomplish this effectively we must read through a complete history of your data several million of times a day. Frankly, the compute resources required to perform this task were too expensive until the recent Intel product line became available. A combination of the latest processors and high-performance software architecture make this feasible and extremely important. The ability to predict market behavior or to quickly perceive market anomalies will provide distinct competitive advantage to early adopters. Executive Overview Predictive Analytics Finally, it is worth pointing out how SAP is leading this drive by building a platform upon which they can extend their business applications into all of these use cases. The market thinks of HANA as an in-memory database management system, but it is actually an application layer platform with an inmemory DBMS built-in. Software The Intel hardware roadmap has affected and will affect in the future the way we architect software. Today we are beginning to see the fully distributed software stack collapse from separate physical servers into co-located virtual machines into converged virtual machines with multiple components of the stack in separate address spaces, into containers with micro-operating systems. The HANA platform collapses this further with the stack compressed into lightweight threads communicating via shared memory. It is the HANA Platform that provides the basis for effectively using the ever-larger servers with 128 cores and then 256 cores that are coming. Other approaches are trying to retrofit a distributed architecture onto a hardware platform that does not require distribution any longer. Executive Overview SAP HANA as a Platform It is only software so you can expect to see other attempts to collapse the stack into threads; but as the HANA platform matures, and while the other large software vendors catch up, you will see HANA utilize modern hardware to the fullest affect. Conclusion Technology You may wonder why Intel has sponsored this paper; which, not counting the sidebars, only lightly promotes Intel technology. The reason is that they need you to understand that with each new product line they are delivering new capability that supports high performance commercial computing. They need you to understand that with each new product line there are benefits beyond just core speed and core count. They need you to look for software products coming from vendors like SAP, Oracle, and IBM and see how those products improve with each hardware revision. Database Computing This paper has laid out a straightforward case: that there is an opportunity to better utilize the compute capabilities companies like Intel are building into their microprocessor products. Taking advantage of this opportunity provides the ability to reduce the number of servers you support, reduce the floor space and energy required, reduce the effort required to tune applications, and deliver faster state-of-the-art capabilities to your business, faster. The current distributed software architecture was designed for much older, less capable, hardware. Intel is providing significantly more capability … but we have to take advantage of it. Business Software Conclusion ©2015—Cognilytics. All rights are reserved. Cognilytics Inc. 1875 Lawrence Street STE 610 Denver, CO 80202 Intel, the Intel logo, the Intel Inside logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. This paper was commissioned by Intel.
© Copyright 2025