High_Performance_Commercial_Computing_in_2015

High Performance Commercial
Computing in 2015
1/30/2015
Rob Klopp
Technology Consultant
©2015—Cognilytics. All rights are reserved.
Part 1 - The Case for High Performance Software
The slow growth of ever-more-subtle features without hardware optimization creates the
phenomena known as software bloat. Most of us have seen this on our personal computers,
but it expands to enterprise software as well.
Since we are suggesting 20X or more improvement, the ROI for high performance software is not too
hard to find. There are several scenarios worth considering which offer a return.
A more interesting benefit comes from using extreme performance to dramatically reduce the cost
of developing new business applications. We have all observed and experienced the effort that goes
into making business applications perform. We build redundant copies of data into warehouses and
marts and then pre-process data into cubes and other aggregated forms, or we completely preprocess data so that result rows may be retrieved directly by key. Even then we tune query syntax
and add indexes and struggle to find the sweet spot between insert/update performance and read
performance only to find that adding the next business query forces yet another tuning effort or
redundant copy of the data.
©2015—Cognilytics. All rights are reserved.
Conclusion
By eliminating all of these performance hoops, developers must jump through after they have
developed the required business functionality we can significantly reduce the time and cost
associated with adding business capability. In other words, we can use the 20X+ performance boost
to satisfy performance requirements without tuning. We can deploy data once, or at most twice
Business Software
The most conventional benefit comes from making current business applications faster. This is a
tried and true case: if performance improves, work can proceed within the user’s attention span
and staff becomes more productive. Further, improved performance is delightful and unfortunately,
IT departments are rarely accused of being delightful.
Database Computing
While there is a cost to re-factoring distributed bloated software into high performance software,
the costs may vary wildly. If you are buying high performance commercial software then the costs
may be limited to the acquisition of modern hardware, but if you are trying to trim down your
business application software there will be some development costs.
Technology
This paper makes the case for building or buying high performance business software. We make
the case by first suggesting in Part 1 how high performance software might positively impact
your business. Then, in Part 2, we discuss the high performance commercial software available
today and the hardware requirements to make them hum. Finally, in Part 3, we will dive a little
deeper and discuss some of the high performance capabilities offered by Intel, which make these
advancements possible. Executives will benefit from the first two Parts. The third Part provides the
technical foundation for the case to help you to understand that the points being made are about
architecture, not marketing.
Software
While Moore’s Law suggests that processor performance might improve by n% every m months, it is
normal to find that the actual performance gains you see in your business applications are a fraction
of this. This is because, until recently, both commercial and home-grown software have used the
performance enhancements built into each subsequent hardware generation to add ever-morenuanced features into products and applications. Today some vendors are reversing this trend and
beginning to build high performance business applications optimized to the hardware offered by Intel.
And in a world where a 30% (1.3X) performance boost is considered significant, modern software
optimized “to the bare metal” offers 20X, and sometimes 200X, performance improvements.
Executive Overview
Executive Overview
The usual case for commercial enterprises will be to buy, rather than build, high performance
software. Today we can find two categories of high performance commercial software emerging:
high performance database computing and high performance business software.
Introduction to High Performance Technology
High Performance computing is built upon a set of hardware technologies and features that include
the use of high-performance peripheral devices, the parallel utilization of multiple cores, the effective
use of processor cache, and several super-computing features built-into modern processors.
Database Computing
Part 2 - High Performance Commercial Software
Technology
So deploying modern software highly optimized for the processor products from Intel and others
can provide benefits that far outweigh the cost of an upgrade. High performance software on high
performance hardware can provide reduced response times for existing applications, can allow for
a simplified system landscape and introduce agility into the software development process, and
can enable totally new application functionality. In each case the benefits provide a competitive
edge and a large return on any investment.
Software
The final benefit of this extreme performance comes from using speed to implement extreme
business applications. This use of high-performance can introduce business functionality that
is currently unheard-of. Real-time applications become possible where analysts can see what is
happening as it happens. More importantly, deep analytics can be performed instantly to automate
processes that currently require human intervention. Today human actors use business intelligence
products to scan thousands of events and facts per day, each representing a single scenario, hoping
to find something “interesting”. Extreme performance can be used to scan millions of scenarios
per minute and the results of each scan can be compared to historical trends exposing only the
interesting or anomalous scenarios to staff. The results of this approach, using extreme performance
to glean signals from the noise, represents the next generation of business intelligence and these
analytic applications are in the near-term future for many competitive businesses.
Executive Overview
(OLTP + EDW), in a general form and let all of the different user queries use the high performance
software and hardware to meet SLAs. Imagine the cost savings of eliminating redundant data in
your enterprise. This approach, based on simplifying the systems landscape, delights not only the
end-users in an enterprise; but often delights the executives and the shareholders as net new
business functionality and reduced IT infrastructure costs both positively affect the bottom line.
Business Software
©2015—Cognilytics. All rights are reserved.
Conclusion
Figure 1 Data Access Latency
Intel® SSD Devices
The Intel SSD roadmap has now
introduced SSD devices that connect
using PCIe without a controller inbetween. This reduces the latency
to start an I/O to 100,000ns.
For more information, see http://www.
intel.com/content/www/us/en/solidstate-drives/solid-state-drives-ssd.html.
While Figure 1 shows the gains from moving data into DRAM
there are also significant gains to be had from accessing
more data in the processor caches. High performance
software will pre-load data into the cache to gain a 15X
performance improvement over software that access data
only from DRAM.
Finally, since memory and cache are expensive resources it
is important to compress data in-memory and, whenever
possible, to process the data directly in a compressed form.
High performance software stores and processes data
as compressed vectors and utilizes vector instructions to
process the data without decompression.
©2015—Cognilytics. All rights are reserved.
Conclusion
The point is that, by combining these features, extreme performance or extreme savings are
possible. With this as background let’s review the state of the market for high performance
commercial computing.
Business Software
The roadmap continues with more
performance gains, price reductions,
and interconnect improvements
coming in the near-term .
When multiple cores are all accessing the same memory it is
critical to use high performance techniques to lock data into
transactions. Modern processors provide hardware features
to support this locking and high performance software will
use these features.
Database Computing
Intel is focusing on next-gen nonvolatile memory to make the bridge to
ever larger server memory footprints
possible as NVM is always cheaper than
DRAM and also denser. In addition,
Intel is readying 3D NAND and has
announced that Intel will bring ever
larger and cost effective PCIe SSD’s
to the marketplace and architectures
for Big Data, and ever larger Oracle
databases.
Modern microprocessors are packing more and more cores
per chip. Intel has announced 18-core processors based on an
architecture that will scale to 40-cores per processor. Further,
Intel’s partners are building servers with multiple processors
per server, all sharing the same memory, so 32-core, 64-core,
and 128-core products will be available in the short term. While
we might run 128 applications to keep 128 cores busy, high
performance commercial computing will deploy a single unit of
work, a SQL query, across all these cores.
Technology
SSD devices today hook to a server as
a fast peripheral device. They execute
through a controller like a disk drive.
Further, SSD devices today are built
on NAND solid-state technology that
is prone to wear.
Data in-memory, in-DRAM, provides another 1000X-10000X
performance boost over data-in-SSD. If all data, or the most
popular data, were in-memory a giant performance gain
results. This explains the emergence of in-memory database
products and in-memory caches.
Software
Solid State Devices are more-or-less memory in a peripheral device. By eliminating the mechanics
associated with spinning disk we can build a high-performance product without re-writing the
software. This is the low-hanging fruit picked by most commercial vendors and it is the easiest way
to improve the performance of your systems. As the cost per GB of SSD continues to shrink and the
performance continues to improve the price/performance of SSD-based solutions will become more
and more attractive.
Executive Overview
Let’s start this with a quick overview of the potential for improvement. Figure 1 presents an overview of
the latency required to start fetching data into the core for processing. You can sense the opportunities.
If data were fetched from an SSD device instead of from disk a 50X performance advantage is yours. A
2000X advantage accrues if you eliminate I/O and fetch data from DRAM, and so on.
Up to 12 cores
Intel® Advanced Vector Extensions
(Intel® AVX):
• 256-bit Vector Registers and
256-bit Vector instructions
More memory, faster memory
Supports PCIe* 3.0 specification, which
improves I/O bandwidth up to 2x1,2,3
Intel® AVX2:
• Up to 1.9x increase in
performance 4
• New GATHER instruction increases
usability of vectorization
• Four new AVX instructions
extends applications
Other
2
3
IBM offers DB2 BLU*. While BLU is designed to support
analytics-only it includes many of the same capabilities
as HANA with in-memory columnar options, vector
compression, and super-computing processing. BLU does
not scale out at this time, but on a single, large, multi-core
node it should compete against HANA.
Software and workloads used in performance tests may have been optimized for performance only on Intel®
microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems,
components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products.
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference
in system hardware or software design or configuration may affect actual performance.
8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double the interconnect bandwidth over the PCIe 2.0
specification. Source: http://www.pcisig.com/news_room/November_18_2010_Press_Release/.
©2015—Cognilytics. All rights are reserved.
Conclusion
1
The category leader for the time being is SAP with their
HANA* database. HANA offers a DBMS that is designed
to support both high-performance OLTP workloads and
high-performance analytics against a single table. They
accomplish this with a unique design that stores incoming
data in an OLTP-optimized structure and then moves the
data to an analytics-optimized structure behind-the-scenes.
Both structures are stored in-memory in a columnar
format, compressed into vectors, and processed with
super-computing instructions sets. HANA also includes
in-memory parallelization that uses all of the cores on each
query. Finally, HANA scales out using a shared-nothing
approach that lets you add nodes to grow.
Business Software
• New bit-manipulation instructions
to build and maintain vectors
• Reduced overhead for
virtual machines
• Up to 3x increase in memory
bandwidth with DDR4 as
compared to a typical 4-year
old server1,5
• Up to 3x performance boost
over previous Intel Xeon
processor generation1,6,7
To complete this section we will cover several of
the major database products and vendors,
describing how they use the capabilities listed above
to deliver performance.
Database Computing
Up to 18 cores
Technology
Intel® Xeon®
E5-2600 v3
It is worth noting that high performance database
computing is not the same as Big Data computing.
The Big Data category focuses on scale rather than
raw performance. In other words, the Hadoop* ecosystem
is not currently a high performance ecosystem, even
though, as we will see, there are some high-performance
initiatives in the Hadoop portfolio.
Software
Intel® Xeon® Processor
E5-2600 v2
High performance database computing has been around
for some time. In the 1990’s the Gartner Group even had
a High Performance Commercial Computing team. At
that time the category was dominated by Tandem who
offered a high performance OLTP product, and by
Teradata with a high performance analytics offering.
These companies used massively parallel processing
(MPP) to gain high performance.
Executive Overview
High Performance Database Computing
Microsoft SQL Server* offers Hekaton*, an in-memory OLTP DBMS that uses hardware
supported locking.
The bottom line here is clear … all of the major database vendors are working to develop out
high performance capabilities that utilize the latest features of modern hardware: larger memory
footprints, vector processing, and the latest SSD devices. This combination of the latest and
greatest hardware with high performance commercial database software can provide the extreme
performance that is the subject of this paper. Capabilities here will change with each new product
release … but the hardware/software combination will be a constant going forward.
Software
Finally the Apache Open Source community offers Spark*. Spark provides in-memory support
for analytic workloads with options to store data in a columnar form.
Executive Overview
Oracle Database 12c* includes an in-memory capability as well. This is a new offering with some
limitations, but it does include an in-memory column store with vector compression and the use
of high performance vector instructions. Over time, the R2 limitations should ease and Oracle too
will be competitive. Oracle also offers database appliances that utilize the latest SSD technology.
High Performance Business Software
Technology
High Performance Business Software describes cases where the business logic you use, the
application layer, utilizes an extreme performance boost. We will consider two specific cases where
high performance business software is providing competitive advantage, introduce a very important
third case that is emerging, and briefly discuss SAP’s high performance platform strategy.
Before going into these cases let’s reflect on what comprises a high-performance business platform.
A High-Performance Business Platform
4
5
7
©2015—Cognilytics. All rights are reserved.
Conclusion
6
Source as of August 2014 TR#3034 on Linpack*. Baseline configuration: Intel® Server Board S2600CP with two Intel® Xeon®
Processor E5-2697 v2, Intel® HT Technology disabled, Intel® Turbo Boost Technology enabled, 8x8GB DDR3-1866, RHEL* 6.3,
Intel® MKL 11.0.5, score: 528 GFlops. New configuration: Intel® Server System R2208WTTYS with two Intel® Xeon® Processor
E5-2699 v3, Intel® HT Technology disabled, Intel® Turbo Boost Technology enabled, 8x16GB DDR4-2133, RHEL* 6.4, Intel®
MKL 11.1.1, score: 1,012 GFlops.
Source as of August 2014 TR#3044 on STREAM (triad): Supermicro X8DTN+ platform with two Intel® Xeon® Processor X5680,
18x8GB DDR3-800 score: 26.5 GB/sec. New Configuration: Intel® Server System R2208WTTYS with two Intel® Xeon® Processor
E5-2699 v3, 24x16GB DDR4-2133 @ 1600MHz DR-RDIMM, score: 85.2 GB/sec.
Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in
this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance
benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance
of systems available for purchase.
Source as of September 8, 2014. New configuration: Hewlett-Packard Company HP ProLiant ML350 Gen9 platform with two
Intel® Xeon® Processor E5-2699 v3, Oracle Java Standard Edition 8 update 11, 190,674 SPECjbb2013-MultiJVM max-jOPS,
47,139 SPECjbb2013-MultiJVM critical-jOPS. Source. Baseline: Cisco Systems Cisco UCS C240 M3 platform with two Intel®
Xeon® Processor E5-2697 v2, Oracle Java Standard Edition 7 update 45, 63,079 SPECjbb2013-MultiJVM max-jOPS , 23,797
SPECjbb2013-MultiJVM critical-jOPS. Source.
Business Software
Next will be: to squeeze out any unnecessary latency in the software stack. This latency most often
comes from using a legacy, distributed, architecture that deploys separate virtual or physical servers
talking over a virtual or physical network layer. If your application will fit on a large single server, and
they are becoming quite large, then as a first step you might virtualize the physically distributed
architecture onto a single server with several virtual machines. Better yet, deploy as many distributed
components as you can as processes in a single virtual machine and allow the components to
communicate using inter-process communications instead of a virtual network. To squeeze a little
more performance you might deploy a micro-OS in a container to squeeze out some of the overhead
of running multiple distributed components. Even better still you could deploy the components onto a
physical server and remove the overhead of virtualization. The gains will be significant.
Database Computing
Importantly, the first step will almost always be to implement a high-performance database
platform as a foundation.
Executive Overview
Squeezing Out the Fat
Software
Technology
Figure 2. Squeezing Latency Out of Your Stack
Mobile BI
The conventional way to solve these performance issues is to tune your way out. But it is usually
the case that database tuning has already been applied to hit the top-notch 20-second average.
So application-level tuning is required. Application-level tuning usually requires the development
of a replica of the data that contains very specific pre-computed values that can quickly be fetched
by key. Each fetch returns a single row containing the pre-computed answer.
Business Software
An obvious first use case is mobile business intelligence (BI). BI is a reporting function tied to
your data warehouse that usually plows through fairly large volumes of data with each query.
A BI capability that provides an average response of 20 seconds for queries would be considered
top-notch. But 20-second response is 10X too slow for mobile devices where 2-3 seconds is a
hard requirement. And a 20-second average means that there will be a significant percentage
of queries that require 2-3 minutes or more.
Database Computing
Now back to business use cases.
This, of course, is not a sustainable approach to BI as it involves a specialized data structure and
a time-consuming project and ongoing maintenance for each and every query template.
©2015—Cognilytics. All rights are reserved.
Conclusion
A more cost-effective, agile, and sustainable approach might be to build a high-performance
BI platform that can crunch data so fast that nearly any query can be aggregated from the raw
data without the need for pre-computation. Modern in-memory databases can aggregate data
at super-computer speeds. SAP HANA, for example will aggregate up to 12M records per second
per core … and modern hardware provides 32, 64, or up to 128 cores per node.
A second use case applies predictive analytics to your analytic data.
In general we may think of analytics as a process that models historical data over time and then
uses the model to predict the next data set. An interesting variation on this approach compares
each day’s result to what was predicted to identify anomalies.
Real-time Business Processes
When all of the fat is squeezed out of the application stack and in-memory computing provides
a basis for high volume OLTP and for low volume analytics (like you may current process on an
Operational Data Store replica), and the analytics are boosted by the high performance hardware
and software we have discussed, then it becomes possible to register a business transaction and
instantly query it for analytics. It is possible to analyze your business in real time.
Consider:
• Healthcare: Real time feeds from sensors and from automated lab equipment can identify
most effective treatments based on the real time state of a patient.
• Manufacturing: Real time feedback can automatically adjust tolerances on the shop floor
based on a holistic sense of quality rather than adjusting each machine in isolation.
©2015—Cognilytics. All rights are reserved.
Conclusion
Real time computing is the next big thing, the initiative that, when combined with deep analytics,
will provide the competitive edge that differentiates the early adopters from the pack and then
draws everybody in. It will be the next phase of the analytics advance that started with data
warehousing and decision support systems.
Business Software
• Retail: Your websites can instantly react to browsing or buying behavior and react based on
personalized rules for your customer rather than general rules. Further, your general rules
can flex, reacting to changes in buying behavior in real time. You can see some of this in very
advanced applications from companies like Netflix, Amazon, and Apple, but you now have the
ability to run these advanced applications in real time on your website.
Database Computing
The implications of this are huge. Today only very large financial institutions can assemble enough
super-computing to evaluate a financial market and automatically initiate trades. With the highperformance commercial computing capabilities suggested in this paper it is possible to automate
more of the decisions that drive your business.
Technology
It is important to see that these predictive analytic capabilities are just now feasible with modern
hardware. Despite the aura of mathematics, predictive analytics is for the most part a brute force
computational problem. It is super-computing brought to the commercial space. While the latest
Intel hardware will allow for you to troll through a million cuts at your data in a day there are tens
of millions more that will become possible with each upcoming tick and tock of the Intel roadmap.
Software
In order to accomplish this effectively we must read through a complete history of your data several
million of times a day. Frankly, the compute resources required to perform this task were too
expensive until the recent Intel product line became available. A combination of the latest processors
and high-performance software architecture make this feasible and extremely important. The ability
to predict market behavior or to quickly perceive market anomalies will provide distinct competitive
advantage to early adopters.
Executive Overview
Predictive Analytics
Finally, it is worth pointing out how SAP is leading this drive by building a platform upon which they
can extend their business applications into all of these use cases. The market thinks of HANA as an
in-memory database management system, but it is actually an application layer platform with an inmemory DBMS built-in.
Software
The Intel hardware roadmap has affected and will affect in the future the way we architect software.
Today we are beginning to see the fully distributed software stack collapse from separate physical
servers into co-located virtual machines into converged virtual machines with multiple components
of the stack in separate address spaces, into containers with micro-operating systems. The HANA
platform collapses this further with the stack compressed into lightweight threads communicating via
shared memory. It is the HANA Platform that provides the basis for effectively using the ever-larger
servers with 128 cores and then 256 cores that are coming. Other approaches are trying to retrofit a
distributed architecture onto a hardware platform that does not require distribution any longer.
Executive Overview
SAP HANA as a Platform
It is only software so you can expect to see other attempts to collapse the stack into threads; but as
the HANA platform matures, and while the other large software vendors catch up, you will see HANA
utilize modern hardware to the fullest affect.
Conclusion
Technology
You may wonder why Intel has sponsored this paper; which, not counting the sidebars, only
lightly promotes Intel technology. The reason is that they need you to understand that with each
new product line they are delivering new capability that supports high performance commercial
computing. They need you to understand that with each new product line there are benefits beyond
just core speed and core count. They need you to look for software products coming from vendors
like SAP, Oracle, and IBM and see how those products improve with each hardware revision.
Database Computing
This paper has laid out a straightforward case: that there is an opportunity to better utilize the
compute capabilities companies like Intel are building into their microprocessor products. Taking
advantage of this opportunity provides the ability to reduce the number of servers you support,
reduce the floor space and energy required, reduce the effort required to tune applications, and
deliver faster state-of-the-art capabilities to your business, faster.
The current distributed software architecture was designed for much older, less capable, hardware.
Intel is providing significantly more capability … but we have to take advantage of it.
Business Software
Conclusion
©2015—Cognilytics. All rights are reserved.
Cognilytics Inc.
1875 Lawrence Street
STE 610
Denver, CO 80202
Intel, the Intel logo, the Intel Inside logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
This paper was commissioned by Intel.