Memory performance on HP Z840/Z640/Z440 Workstations

Technical white paper
Memory performance on HP
Z840/Z640/Z440 Workstations
Performance impact of moving to larger DIMM sizes and
populating fewer memory channels
Table of contents
Introduction......................................................................................................................................................................... 2
Memory performance in the HP Z840/Z640/Z440 Workstation.............................................................................. 2
Application/Benchmark performance comparisons................................................................................................... 4
HP Z440 comparison results............................................................................................................................................ 4
HP Z840 comparison results............................................................................................................................................ 6
Results discussion.............................................................................................................................................................. 7
Conclusion............................................................................................................................................................................ 8
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
Introduction
This paper looks at the performance impact of moving to larger DIMM
sizes in conjunction with populating fewer memory channels, for specific
configurations in HP Z840/Z640/Z440 Workstations.
As a general rule, memory performance is optimized when all memory
channels are populated in a system. Moving from a balanced memory
configuration (all memory channels populated) to an unbalanced memory
configuration (not all memory channels populated) will result in sub-optimal
memory performance. This performance impact will depend upon the
application, workload and the specific configuration being considered.
Memory performance in the HP Z840/Z640/Z440 Workstation
Each processor in the HP Z840, HP Z640 and HP Z440 Workstation supports four DDR4 memory channels. Each
memory channel provides a certain bandwidth capability, so by default the number of memory channels populated
impacts the achievable memory bandwidth in a system. In addition to providing the greatest memory bandwidth
capability, populating all memory channels (a balanced memory configuration) also allows the greatest interleaving
of memory accesses among the channels.
Figure 1. Memory channels
Channel 1
Channel 1
Channel 3
1
3
Channel 3
1
CPU0
2
3
CPU1
4
2
Channel 4
Channel 2
4
Channel 4
Channel 2
Note:
In a dual processor NUMA configuration (default for Z840/Z640), the interleaving of memory accesses among the channels is on a per
processor basis. With NUMA disabled, the interleaving of memory accesses among the channels would be across both processors.
2
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
Figure 2. Memory bandwidth comparison
Channels populated vs. cores accessing memory
4 channels populated STREAM Triad
Relative bandwidth
1.20
1.00
1.00
0.48
2 channels populated STREAM Triad
0.40
0.42
1 channels populated STREAM Triad
0.20
0.26
0.80
0.60
0.00
1
2
3
4
5
6
7
0.56
0.28
8
Number of cores accessing memory
Figure 2 uses a STREAM subtest (Triad), to show how system memory bandwidth is impacted by the number of memory
channels loaded. For those not familiar with STREAM, it is a synthetic benchmark that measures the sustainable
memory bandwidth in a system. Figure 2 also shows how maximum memory bandwidth for a particular memory
channel loading is impacted by the number of cores accessing memory. Looking at the case of all the memory
channels loaded, it is apparent that the available memory bandwidth is not saturated until multiple cores are accessing
memory simultaneously.
Reviewing this relationship a general rule can be inferred. Single or lightly threaded applications that do not require or
use significant memory bandwidth will experience a smaller performance impact moving to larger memory DIMMs, in
conjunction with populating fewer memory channels. Threaded applications that do require or use significant memory
bandwidth will see a more significant performance impact moving to larger memory DIMMs, in conjunction with
populating fewer memory channels.
Application/Benchmark performance comparisons
Application/benchmark performance comparisons were done on specific HP Z840/Z440 memory configurations,
evaluating the performance impact of moving to larger DIMM sizes in conjunction with populating fewer memory channels.
The comparisons look at moving from 4 GB DIMM configurations to 8 GB DIMM configurations, keeping the total memory
capacity equivalent. Similar performance results would be anticipated moving from 8 GB DIMM configurations to 16 GB
DIMM configurations, again keeping the total memory capacity equivalent.
The HP Z840 comparison results are representative of a dual processor HP Z640 configuration, while the HP Z440
comparison results are representative of a single processor configuration in either an HP Z640 or HP Z840 system.
Table 1. Memory configurations evaluated on the HP Z440 and Z840
System
Processor
Total
memory
Baseline
configuration
Comparison
configuration
HP Z440
Intel® Xeon®
E5-1680v3
16 GB
4 x 4 GB DDR4-2133 MHz
(4 channels loaded balanced configuration)
2 x 8 GB DDR4-2133 MHz
(2 channels loaded unbalanced configuration)
HP Z440
Intel® Xeon®
E5-1680v3
8 GB
2 x 4 GB DDR4-2133 MHz
(2 channels loaded unbalanced configuration)
1 x 8 GB DDR4-2133 MHz
(1 channel loaded unbalanced configuration)
HP Z840
Intel® Xeon®
E5-2687W
32 GB
8 x 4 GB DDR4-2133 MHz
(4 channels/proc loaded balanced configuration)
4 x 8 GB DDR4-2133 MHz
(2 channels/proc loaded unbalanced configuration)
HP Z840
Intel® Xeon®
E5-2687W
16 GB
4 x 4 GB DDR4-2133 MHz
(2 channels/proc loaded unbalanced configuration)
2 x 8 GB DDR4-2133 MHz
(1 channel/proc loaded unbalanced configuration)
3
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
As a reference point, figure 3 shows the memory bandwidth relationship between a balanced configuration (all
memory channels loaded) and an unbalanced configuration (not all memory channels loaded), both in the lightly
threaded case and in the highly threaded case.
The results across the applications and workloads tested show some cases where the performance impact of
memory channel loading is small and other cases where the performance impact is similar to that shown in figure 3.
Figure 3. HP Z440 memory performance
Stream (Triad)
1.20
4 x 4 GB DIMMs (4 ch)
all cores
2 x 8 GB DIMMs (2 ch)
all cores
2 x 4 GB DIMMs (2 ch)
all cores
1 x 8 GB DIMMs (1 ch)
all cores
1.00
Relative bandwidth
1.00
0.80
0.56
0.60
0.50
0.40
0.42
0.28
4 x 4 GB DIMMs (4 ch)
1 core
2 x 8 GB DIMMs (2 ch)
1 core
2 x 4 GB DIMMs (2 ch)
1 core
1 x 8 GB DIMMs (1 ch)
1 core
0.24
0.20
0.00
Highly threaded
Lightly threaded
HP Z440 comparison results
The Comparison 1 results shown in figures 4, 5, and 6 below are for 16 GB configurations with 4 channels versus 2 channels
loaded. The Comparison 2 results are for 8 GB configurations with 2 channels versus 1 channel loaded. Results are normalized
to the 16 GB, 4 channels loaded configuration. This is done to highlight the performance impact due to channel loading.
In the below comparisons, result variation from 2-3% is considered run-to-run variation and is not considered a true
performance variation.
Figure 4. HP Z440 Product Development
1.10
1.00
1.00
Relative performance
0.90
0.80
0.71
0.70
0.60
0.50
0.40
0.40
0.30
Total index
AutoCAD 2014
Graphics
composite
4
Graphics
composite
SPECapc Solidworks
Comparison 1
4 x 4 GB DIMMs
(16 GB - 4 ch loaded)
CPU
composite
CPU
composite
SPECapc NX8.5
WPCcfd
SPECwpc
Comparison 2
2 x 8 GB DIMMs
(16 GB - 2 ch loaded)
2 x 4 GB DIMMs
(8 GB - 2 ch loaded)
1 x 8 GB DIMM
(8 GB - 1 ch loaded)
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
Figure 5. HP Z440 Media and Entertainment
1.10
Relative performance
1.00
0.95
0.90
0.80
0.70
0.60
0.50
0.40
0.30
GFX
CPU
Blender
SPECapc Maya 2012
HandBrake
SPECwpc
Comparison 1
Comparison 2
4 x 4 GB DIMMs
(16 GB - 4 ch loaded)
2 x 8 GB DIMMs
(16 GB - 2 ch loaded)
2 x 4 GB DIMMs
(8 GB - 2 ch loaded)
1 x 8 GB DIMM
(8 GB - 1 ch loaded)
Figure 6. HP Z440 Energy and Life Sciences
1.10
1.00
1.00
1.00
0.91
0.90
Relative performance
1.00
0.85
0.80
0.70
0.75
0.69
0.63
0.60
0.50
0.40
0.40
0.30
FFTW
SRMP
LAMMPS
SPECwpc - Energy
Comparison 1
4 x 4 GB DIMMs
(16 GB - 4 ch loaded)
NAMD
Rodinia
SPECwpc - Life Sciences
Comparison 2
2 x 8 GB DIMMs
(16 GB - 2 ch loaded)
2 x 4 GB DIMMs
(8 GB - 2 ch loaded)
1 x 8 GB DIMM
(8 GB - 1 ch loaded)
5
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
HP Z840 comparison results
The Comparison 1 results in figures 7, 8, and 9 below are for 32 GB configurations with 4 channels versus 2 channels
loaded per processor. The Comparison 2 results below are for 16 GB configurations with 2 channels versus 1 channel
loaded per processor. Results are normalized to the 32 GB, 4 channels loaded per processor configuration. This is
done to highlight the performance impact due to channel loading.
Again, in the below comparisons, result variation from 2-3% is considered run-to-run variation and is not considered
a true performance variation.
Figure 7. HP Z840 Product Development
1.10
1.00
Relative performance
1.00
0.90
0.78
0.80
0.70
0.60
0.49
0.50
0.40
0.30
Total index
Graphics
composite
AutoCAD 2014
CPU
composite
Graphics
composite
SPECapc Solidworks
Comparison 1
CPU
composite
SPECapc NX8.5
WPCcdf
SPECwpc
Comparison 2
8 x 4 GB DIMMs
(32 GB - 4 ch loaded/CPU)
4 x 8 GB DIMMs
(32 GB - 2 ch loaded/CPU)
4 x 4 GB DIMMs
(16 GB - 2 ch loaded/CPU)
2 x 8 GB DIMMs
(16 GB - 1 ch loaded/CPU)
Figure 8. HP Z840 Media and Entertainment
Relative performance
1.10
1.00
0.95
0.90
0.80
0.70
0.60
0.50
0.40
0.30
GFX
CPU
Blender
SPECapc Maya 2012
Comparison 1
8 x 4 GB DIMMs
(32 GB - 4 ch loaded/CPU)
6
HandBrake
SPECwpc
Comparison 2
4 x 8 GB DIMMs
(32 GB - 2 ch loaded/CPU)
4 x 4 GB DIMMs
(16 GB - 2 ch loaded/CPU)
2 x 8 GB DIMMs
(16 GB - 1 ch loaded/CPU)
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
Figure 9. HP Z840 Energy and Life Sciences
1.10
Relative performance
1.00
1.00
1.00
0.84
1.00
0.91
0.82
0.74
0.73
0.70
0.61
0.60
0.50
0.58
0.40
0.40
0.30
1.00
0.88
0.90
0.80
1.00
FFTW
SRMP
LAMMPS
SPECwpc - Energy
Comparison 1
8 x 4 GB DIMMs
(32 GB - 4 ch loaded/CPU)
NAMD
Rodinia
SPECwpc – Life Sciences
Comparison 2
4 x 8 GB DIMMs
(32 GB - 2 ch loaded/CPU)
4 x 4 GB DIMMs
(16 GB - 2 ch loaded/CPU)
2 x 8 GB DIMMs
(16 GB - 1 ch loaded/CPU)
Results discussion
The comparison results shown above hold to the general rule: single or lightly threaded applications and workloads
that do not require or use significant memory bandwidth will experience a smaller performance impact moving
to larger memory DIMMs, in conjunction with populating fewer memory channels. This can be seen in some of
the Product Development results, namely Autodesk® AutoCAD, Solidworks and NX. These applications are lightly
threaded and the particular workloads used for these runs use very little of the available memory bandwidth, so any
performance difference seen due to the memory channel loading would be predicted to be small.
Of the Media and Entertainment applications and workloads evaluated, HandBrake is one that showed sustained use
of all the available cores and consistent, although not high, use of memory bandwidth. You can see that HandBrake
performance begins to be noticeably impacted by memory channel loading when there is only a single memory
channel loaded per processor.
In contrast, the WPCcfd results seen under the Product Development results show a significant performance impact
moving to larger memory DIMMs while populating fewer memory channels. This application and workload has
sustained use of all the available cores and uses significant memory bandwidth. This can also be seen in the FFTW,
SRMP, LAMMPS and Rodinia results shown under the Energy and Life Sciences benchmarks.
7
Technical white paper | Memory performance on HP Z840/Z640/Z440 Workstations
Conclusion
For the best memory performance, it is necessary to populate all memory channels on the processors. However,
there are applications and workloads where the performance impact from populating fewer memory channels will be
less apparent.
As a general rule, single or lightly threaded applications that do not require or use significant memory bandwidth will
experience a smaller performance impact moving to larger memory DIMMs, in conjunction with populating fewer
memory channels. Threaded applications that do require or use significant memory bandwidth will see a more
significant impact moving to larger memory DIMMs, in conjunction with populating fewer memory channels.
Resources, contacts, or additional links
hp.com/go/whitepapers
Learn more at
hp.com/zworkstations
Sign up for updates
hp.com/go/getupdated
Share with colleagues
Rate this document
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only
warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein
should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Intel and Xeon are trademarks of Intel Corporation in the U.S. and other countries. Autodesk and AutoCAD are registered trademarks or
trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and other countries.
4AA5-6052EEW, January 2015