How to Create a Robust Platform for SAS Grid Computing... 6000 Modular System and the Sun ZFS Storage 7420 Appliance

How to Create a Robust Platform for SAS Grid Computing with the Sun Blade
6000 Modular System and the Sun ZFS Storage 7420 Appliance
by Maureen Chew
September 2011
Contents:
Abstract....................................................................................................................................... 1
Contents ..................................................................................................................................... 1
Introduction................................................................................................................................. 2
Solution Architecture................................................................................................................... 2
Test Configuration and Optimization...........................................................................................5
Test Workload #1:— SAS Grid Mixed Analytic Workload .......................................................... 8
Performance Results: —SAS Grid Mixed Analytic Workload................................................... 10
Test Workload #2:— Concurrent SAS Data Step Microbenchmark.........................................12
Performance Results: —Concurrent SAS Data Step Microbenchmark...................................12
Comparing Network Fabrics..................................................................................................... 13
I/O Considerations.................................................................................................................... 16
Conclusion................................................................................................................................ 18
Resources................................................................................................................................. 19
1
Introduction
Many SAS solutions are moving towards cloud and/or grid-enabled deployment architectures
that require horizontally scaled servers and a high-performance shared file system. As SAS
Grid Computing environments grow, the bottleneck is often I/O throughput from the shared file
system. This article describes a flexible architecture that is designed and proven to meet the
most demanding SAS Grid Computing workloads with a cost-effective solution. It provides
performance characterizations for two SAS applications that compare three different network
configurations with Oracle’s Sun Blade 6000 modular system and Oracle’s Sun ZFS Storage
7420 appliance.
Blade servers are a natural fit for SAS Grid Computing deployments because the
environment lends itself toward a horizontally scaled server architecture where server density,
manageability, and cost efficiencies are top IT considerations for running batch SAS jobs.
As the SAS Grid Computing environment grows, however, one of the key challenges is I/O
throughput for the shared file system. SAS Grid Computing requires a shared file system
rather than block-based storage as in a Storage Area Network (SAN). For small SAS
installations, a simple NFS file server works well, but these file servers typically cannot scale
to deliver the I/O performance needed for SAS Grid Computing.
Many enterprises with large deployments have, therefore, been compelled to move to cluster
file systems, which add cost and complexity to the solution architecture. In a typical cluster file
system, for example, every client node (SAS compute node) must have a host bus adapter
(HBA) card installed to connect it to the shared Fibre Channel storage. Maintaining the cluster
file system and tuning it for high performance also requires a highly trained storage
administrator, adding to the total cost of ownership.
SAS and Oracle have designed a cost-effective architecture that delivers enough storage I/O
throughput and blade server processing power to meet even the most demanding SAS Grid
Computing needs. The architecture is based on blade servers and unified storage platforms
from Oracle that are simple to deploy and manage.
Solution Architecture
The solution architecture takes advantage of the Sun Blade 6000 modular system and the
Sun ZFS Storage Appliance for high scalability and simplicity of deployment and
management. As shown in Figure 1, a variety of network configurations can be deployed
depending on the application needs.
2
Figure 1. SAS Grid Computing Solution Architecture
The Sun Blade 6000 modular system was chosen for its extreme flexibility and maximum
performance in a small footprint with easy manageability. It supports a wide choice of
compute and I/O modules, enabling SAS customers to scale the capacity of both processing
power and I/O throughput with fine or coarse granularity.
SAS Grid Computing requires a shared file system and the Sun ZFS Storage Appliance is an
excellent fit, because it offers the performance of a complex cluster file system while providing
the simplicity of an NFS file server. The Sun ZFS Storage Appliance presents a common
shared file system to all SAS Grid Computing nodes through NFS. (Other protocols, such as
CIFS, HTTP, FTP, WebDav, and iSCSI, are also available and are included in the appliance at
no additional cost.) Sun ZFS Storage Appliances come preconfigured and ready to run, so
they can often be deployed in minutes. Their excellent storage monitoring capabilities also
offer an important advantage for SAS Grid Computing environments.
DTrace Analytics in Sun ZFS Storage Appliances provides the ability to drill down into a
detailed view of I/O traffic, as illustrated by the screenshots in the performance results
sections below. It can provide valuable insight at a comprehensive level about I/O resources
per node, per job, and so on. This real-time access to detailed analytics helps SAS
administrators stay on top of usage patterns and system performance in their SAS
environment as workloads change, and it also enables them to quickly identify and fix
performance issues. In addition, knowledge about I/O traffic patterns enables accurate
planning for future storage capacity and I/O throughput requirements.
Implementing the Network Fabric
There are four options for networking between the Sun Blade 6000 modular system and the
Sun ZFS Storage Appliance:
• Single Gigabit Ethernet (GbE): Provides limited bandwidth for small SAS Grid Computing
environments. (For reference, a single GbE link provides approximately100 MB/sec
throughput.)
3
• 10 GbE: Provides adequate bandwidth for SAS Grid Computing environments that have a
medium amount of I/O traffic.
• Quad 10 GbE: Provides maximum bandwidth and enables customers to keep their
architecture consistent with other in-house deployments that might be based on Ethernet
technology.
• Quad Data Rate (QDR) InfiniBand: Provides a switched fabric communications link often
used in high-performance computing and enterprise datacenters. Its features include high
throughput, low latency, quality of service, failover, and scalability.
Figure 2 illustrates an aggregation of four Ethernet ports into a single logical datalink. In this
example, the I/O devices with the names ixgbe0, ixgbe1, ixgbe2, and ixgbe3 have been
aggregated into a logical interface called aggr1.
As annotated on Figure 2, when a link is selected, the Sun ZFS Storage Appliance
management interface automatically highlights all the components that comprise that link,
making it easy for administrators to understand the network configuration. When data is sent
over the aggregated link, the Sun ZFS Storage Appliance works with the network switch to
deliver the I/O packets across all four Ethernet links.
Figure 2. Network Configuration and Link Aggregation
Implementing the Ethernet connections on the Sun Blade 6000 modular system is
accomplished by using the Sun Blade 6000 Ethernet Switched Network Express Module
(NEM) 24p 10GbE to provide a 10 GbE non-blocking concurrent switching network fabric. The
Sun Blade 6000 Ethernet Switched NEM 24p 10GbE is integrated into the Sun Blade 6000
chassis, enabling all server blades in the chassis to have access to the Ethernet interface.
4
For more information, see the “Sun Blade 6000 I/O and Management Architecture” white
paper, which is available on the Oracle Technology Network.
The InfiniBand network fabric consists of the Sun Datacenter InfiniBand Switch 36, 10 Sun
dual-port 4x QDR PCIe modules (one for each Sun Blade X6270 M2 server module), a 4x
QDR PCIe card for the Sun ZFS Storage 7420 appliance (dual-ported to the switch), and
associated cabling.
Test Configuration and Optimization
In the test configuration, the Sun Blade 6000 modular system was deployed with the following
hardware and software components:
• Ten Sun Blade X6270 M2 server modules, each configured with two 6-core, 3.33 GHz Intel
Xeon X5680 processors and 144 GB of memory
• Sun Blade 6000 Ethernet Switched Network Express Module (NEM) 24p 10 GbE for the 10
GbE network fabric
• Oracle Solaris 10 9/10 on each Sun Blade X6270 M2 server module (note that Oracle
Linux and Microsoft Windows are supported as well; see Appendix B, “Supported
Operating Systems,” in the Sun Blade X6270 M2 Server Module Installation Guide for
Windows Operating Systems)
• SAS Grid 9.2M3
• Sun ZFS Storage 7420 appliance
The Sun ZFS Storage 7420 appliance was configured as follows:
• Head node: 256 GB RAM, two internal disk drives, and four 477 GB flash drives mirrored to
yield almost 1 TB of usable flash-enabled read cache
• Disk shelves: Two disk shelves, each with 20 hard disk drives (HDDs) of 1.8 TB each
(SAS-2 7200 rpm) and four 18 GB flash drives for approximately 68 GB of write cache. A
mirrored pool was created from each shelf using 19 HDDs (one kept for a spare) with the
four flash drives mirrored to its twin. This yields a pool of 33 TB of usable space, as
illustrated by Figure 3.
Disk mirroring is a desirable configuration for SAS environments that require extra data
protection. The test results described later in this document illustrate that excellent I/O
throughput can be achieved with mirroring on the Sun ZFS Storage 7420 appliance, making
mirroring a viable option for SAS users.
5
Figure 3. Storage Pool Configuration
Three file systems were allocated from the single ZFS mirrored pool:
• /data
• /apps
• /work
The SAS work file system (/work) contains the SASWORK directory of temporary runtime
files. These files are heavily used by SAS applications and are typically deployed on a local
disk on each SAS Grid node to avoid network and storage I/O contention.
For this test, the SASWORK directory was purposely placed on the Sun ZFS Storage 7420
appliance in an attempt to maximize I/O throughput of the Sun ZFS Storage 7420 appliance.
The test results below show that the Sun ZFS Storage 7420 appliance can be a viable and
practical place for the SASWORK directory on an NFS-based storage system. This is
especially useful for large and volatile SASWORK directories that need to be available for the
general population of SAS users.
Multiple Network Configurations
Both of the workloads were tested with both 10 GbE networking and an InfiniBand network
fabric.
Each of the 10 Sun Blade X6270 M2 server modules contained both a single 10 GbE
interface and an InfiniBand interface. The network configuration was selected by simply
modifying the client NFS V3 mounts to change from the different Sun ZFS Storage 7420
appliance mount points assigned to the various network interfaces.
6
All other variables (application, workload, servers, and so on) remained unchanged for all the
network configurations tested.
Tuning and Optimization
To optimize performance, the following tuning modifications were done on each server node.
The following changes were made to the /kernel/drv/ixgbe.conf file for the 10 GbE
driver:
default_mtu=9000;
intr_throttling=1;
Jumbo Frames were enabled through the default_mtu setting. (See “Configure Jumbo
Frames in Solaris OS” in the “Configuring the Driver Parameters” section of the Sun Dual
10GbE SPF+ PCIe 2.0 Low Profile Adapter User's Guide.) A value of 9000 was used because
that matches the Jumbo Frame size on the Sun ZFS Storage 7420 appliance.
The intr_throttling setting is related to disabling interrupt blanking. With interrupt
blanking enabled, interrupts are grouped or batched together. (See “ixgbe Parameters” in the
“Network Driver Parameters” section of Chapter 2, “Oracle Solaris Kernel Tunable
Parameters” in the Oracle Solaris Tunable Parameters Reference Manual.)
The following was changed in the InfiniBand /kernel/drv/ibd.conf file. (See ibd(7d) in the
“Devices and Network Interfaces” section of the man pages section 7: Device and Network Interfaces manual.)
enable_rc=1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
The "1" entries correspond to the two InfiniBand ports on each HBA card and specify the use
of connected mode. Connected mode for IP over InfiniBand (IPoIB) can provide better
performance than Datagram IPoIB.
Additionally, the following NFS changes were made in /etc/system, which set the NFS client
logical buffer size and read-ahead operations. (See Chapter 3, “NFS Module Parameters,” of
the Oracle Solaris Tunable Parameters Reference Manual.)
set nfs:nfs3_bsize=131072
set nfs:nfs3_nra=32
7
In order to switch testing between 10 GbE and InfiniBand, we unmounted and remounted the
correct NFS server assignments. For instance, /etc/vfstab would have the following entry for
10 GbE:
s7420-011ge2:/export/sasgeno_data
vers=3,noxattr
s7420-011ge2:/export/sasgeno_swsg
vers=3,noxattr
s7420-011ge2:/export/sasgeno_work
vers=3,noxattr
- /data nfs
-
yes
- /apps nfs
-
yes
- /work1 nfs
-
yes
Or it would have the entries similar to the following for InfiniBand:
s7420-011ib0:/export/sasgeno_data
vers=3,noxattr,proto=tcp
s7420-011ib0:/export/sasgeno_swsg
vers=3,noxattr,proto=tcp
s7420-011ib0:/export/sasgeno_work
vers=3,noxattr,proto=tcp
- /data nfs
-
yes
- /apps nfs
-
yes
- /work nfs
-
yes
Test Workload #1: SAS Grid Mixed Analytic Workload
The SAS Grid Mixed Analytic workload represents the type of SAS processing commonly
seen in grid-based SAS applications today. More than 130 resource-intensive jobs are
submitted to the grid. The number of concurrent jobs per node is configurable by SAS Grid
Manager. For this workload test, 40 concurrent jobs were selected. Many of the I/O intensive
tests in the SAS Grid Mixed Analytic workload involve multiple passes through extremely
large data sets.
This workload heavily utilizes the SAS data step and many commonly used procedures in
Base SAS and SAS/STAT. Common characteristics of this type of workload include the
following:
• Many simultaneous SAS jobs/users (both batch and interactive)
• Many passes through the data
• Predominately sequential file access
• Data stored in text, binary, or SAS data set formats
• Wide range of data volumes (1 MB to 50+ GB input/output files)
• Mixed computational and data processing workload
The test simulates roughly 30 CPU hours that represent a typical workday for a team of
analytic users. The user personas that these jobs are composed of include the following:
• Data integration tasks
• Advanced analytics
• Business analyst
Table 1. Workload Details
8
Workload
Characteristics
SAS PROCS used
Data input
Data output
Description
MIXED, RANK, LOGISTIC, REG, GLM, SORT, SUMMARY, FREQ, MEANS, SQL
Approximately 50 distinct files totaling almost 600 GB;
24 of the files are between 12 GB and 50 GB.
Since total I/O is roughly divided evenly between input and output, the output is
written to both SASWORK and designated output directories.
Performance Results: SAS Grid Mixed Analytic Workload
The SAS Grid Mixed Analytic workload comprises more than 130 SAS applications that are
submitted to the grid with predefined timing delays to represent a typical period of heavy
usage. The cumulative total of all the job runtimes represents 30 CPU hours, but with the
power of the grid, the test is completed in roughly 45 minutes.
For the SAS Grid Mixed Analytic workload, the highest overall throughput was delivered with
the InfiniBand network fabric. As Figure 4 illustrates, peak throughput results for the various
network fabrics were as follows:
• InfiniBand: 3.05 GB/sec
• Quad 10GbE link aggregation: 2.43 GB/sec
• Single 10 GbE interface: 2.01 GB/sec
Figure 4. NFS Network Fabric I/O Performance Comparison for the SAS Grid Mixed Analytic Workload
This performance data was gathered using DTrace Analytics, the industry’s only
comprehensive and intuitive storage analytics environment. The screenshots in Figures 5 and
6 below were captured by DTrace Analytics on the Sun ZFS Storage 7420 appliance along
with other data to capture a complete I/O profile for the SAS Grid Mixed Analytic workload.
9
Figure 5 shows the I/O throughput for the InfiniBand configuration, and Figure 6 shows the
I/O throughput for the quad 10 GbE test configuration.
Note that DTrace Analytics plotted the throughput broken down by direction (“in” = write, “out"
= read), enabling administrators to see the balance between reads and writes. The I/O
throughput is plotted over time, giving administrators the ability to see when bursts of I/O
operations happen and whether the peak throughput is a big spike versus a gradual build up.
While there are significant peaks in both cases, we see nice sustained throughput for both the
InfiniBand and the quad 10 GbE configurations.
Figure 5. Peak I/O Throughput for InfiniBand (Plotted with DTrace Analytics)
Figure 6. Peak I/O Throughput for Quad 10 GbE (Plotted with DTrace Analytics)
DTrace Analytics also enables I/O to be broken down by a wide range of additional variables,
such as client, file name, packet size, and more. A breakdown of activity by client, for
example, could help an administrator determine whether a particular node behaved
unexpectedly or whether it was scheduled with jobs that had unbalanced I/O requirements.
Test Workload #2: Concurrent SAS Data Step Microbenchmark
In order to demonstrate high throughput and scalable I/O, a Concurrent SAS Data Step
microbenchmark was used in addition to the SAS Grid Mixed Analytic workload. This test
10
consists of running multiple instances of a single, write-intensive SAS data step that outputs
10 million observations. Each observation is 824 bytes (consisting of 103 variables), and 10
million observations result in an 8.3 GB data set. Throughput is increased by running 10
instances concurrently on 10 nodes for a total of 100 concurrent jobs. This microbenchmark is
almost exclusively doing write I/O.
The microbenchmark is realistic in the sense that it represents typical SAS data step code.
Yet it is contrived in the sense that all jobs are launched simultaneously under controlled
circumstances. The benchmark’s usefulness is in generating anecdotal performance proof
points for a simple and known workload that is uniformly distributed in a predictable pattern.
Performance Results: Concurrent SAS Data Step Microbenchmark
The aim of the Concurrent SAS Data Step microbenchmark was to compare NFS over
different network fabrics and to demonstrate I/O scalability.
Figure 7 shows the average time for completion of 100 identical SAS jobs when using
SASWORK over NFS via the single 10 GbE interface, the quad 10 GbE link aggregation
interface, and the InfiniBand interface.
Performance was roughly 10% better for the Sun ZFS Storage 7420 appliance’s quad 10 GbE
interface. Although performance averages were better for the quad 10 GbE interface, we did
see higher peak throughputs for InfiniBand, as shown in Figure 8.
Figure 7. Average Job Completion Times Across the Three Different NFS Network Fabrics
11
Figure 8. Peak I/O Throughput for the Three Different NFS Network Fabrics
DTrace Analytics provides additional insight into the I/O performance differences between 10
GbE and InfiniBand. Figure 9 shows an I/O profile for a quad 10 GbE test followed by a run
over InfiniBand. While the I/O profile runtimes are close in length, notice that the InfiniBand
profile on the right is denser and exhibits a higher peak throughput.
Figure 9. I/O Profile for Quad 10 GbE Test Versus InfiniBand Test
Comparing Network Fabrics
For a contextual baseline, maximum throughput performance expectations for different
network fabrics are on the order of the following:
• 1 GbE: Approximately 100 MB/sec
• 10 GbE: Approximately 1 GB/sec
• InfiniBand: Approximately 3 to 3.5 GB/sec
Many factors affect InfiniBand throughput, such as cable type, switch, host adaptor, and so
on. In the tested configuration, 3 GB is a reasonable estimate of the practical upper limit due
to the unidirectional limit of the InfiniBand network card that was used.
Under perfect conditions, the quad 10 GbE (4 x ~1 GB/sec = 4GB/sec) link aggregation might
be expected to outperform the InfiniBand configuration in the given configuration by roughly
20%. The results for this microbenchmark do show better performance for the quad 10 GbE
link aggregation. However, the difference is on the order of 5% to 10% rather than 20%. As
described in the upcoming "I/O Considerations" section, the number of disk spindles in the
12
tested configuration appears to have been a key limiting factor in scaling I/O throughput. This
factor would tend to equalize performance across the two network fabrics.
Link Aggregation
Figure 10 confirms the effectiveness of the 10 GbE link aggregation. For background, in the
10 GbE link aggregation, connections are multiplexed using an administrator-designated
strategy (for example, hash based on MAC, IP, MAC+IP, and so on). Since the test
configuration has 10 blades and four potential 10 GbE interfaces, a strategy was devised to
target an even distribution.
Since there were 10 nodes and network interface assignments are on a node basis, the bestcase scenario would be for each of the four network interfaces in the link aggregation to be
assigned either two or three nodes. The actual distribution was as follows:
• ixbe0: Two nodes
• ixbe1: Four nodes
• ixbe2: Two nodes
• ixbe3: Two nodes
Figure 10 shows two views from DTrace Analytics for I/O traffic when 10 jobs were run on
each of the 10 different nodes in the SAS Grid Computing environment.
The top chart shows the aggregated logical interface ( broken down by the actual NICs
assigned. (NIC assignments were added on top of the screen shot for clarity.) In the naming
scheme (for example, n2/be1), n2 represents traffic to or from node 2 and be1 represents
interface ixgbe1 on the Sun ZFS Storage 7420 appliance.
The bottom graph shows a color-coding of each of the 10 GbE NICs that comprise the link
aggregation.
The hashing scheme used was L3, which is based on the IP address of the nodes and
resulted in an almost perfectly balanced workload. If one node assigned to ixbe1 had been
assigned to any of the other interfaces, it would have been a perfect distribution. DTrace
Analytics provided a simple and easy way to visualize the link aggregation distribution and
determine its efficiency.
13
Figure 10. Two Views of I/O Traffic from DTrace Analytics
Network Bandwidth
As mentioned previously, the stated maximum bandwidth is approximately 4 GB/sec for the
quad 10 GbE link aggregation and 3 GB/sec to 3.5 GB/sec for a single InfiniBand PCIe card.
Using the link aggregation interface, we can see the DTrace Analytics traces for a pure
network bandwidth test.
For the traffic "out" (green) from the Sun ZFS Storage 7420 appliance, read-only applications
were run from each of the blades to ensure that all reads were coming from the appliance’s
read cache. For the traffic flowing "in" to the Sun ZFS Storage 7420 appliance, a public
domain application, netperf, was used to write over the network interface. Thus no traffic
flowing over the network interface should have resulted from any disk I/O.
Figure 11 shows that both the read-only and write-only tests can sustain 4+ GB/sec,
demonstrating excellent network performance. This matches our expected maximum network
throughput.
14
Figure 11. Read-Only and Write-Only Tests Sustain 4+ GB/sec
I/O Considerations
Recall that we had two disk shelves in a mirrored configuration, which is a typical enterprise
deployment configuration. Each shelf has 24 drives. In our test configuration, one drive was
reserved for a spare, and four drives were flash-based and were used as write cache devices.
Assuming that a 7200 rpm drive can write approximately 50 MB/sec, we would expect the
baseline write performance for a single shelf to be around 1GB/sec (19 drives * 50 MB/sec =
950 MB/sec). In actuality, with the unique hybrid storage features of the Sun ZFS Storage
7420 appliance, we were able to realize 2 to 3 times that in peak throughput.
Scaling I/O Performance
Despite the excellent throughput (the Concurrent SAS Data Step microbenchmark wrote a
total of 800 GB in 10 minutes) and consistent performance (each of 100 jobs performed very
close in time), having 38 spindles available for the 100 SAS I/O jobs was deemed to be a
limiting factor in the mirrored configuration.
Since the storage configuration was fixed, the only way to test the theory about spindles
limiting the throughput was to break the ZFS mirror and surface the mirror as an alternate
pool. While it is never recommended that you run your storage without any redundancy, we
were able to validate and affirm the I/O scalability of the Sun ZFS Storage 7420 appliance by
sending half the jobs (50 total from 5 nodes) to one pool/mirror and the other half to the other
pool.
By splitting the ZFS pool and effectively doubling the storage configuration, performance was
nearly doubled, as shown in Figure 12. The average time for each of 10 jobs running
simultaneously on 10 nodes went from approximately 9 minutes to approximately 5 minutes.
15
Figure 12. Completion Times for 100 SAS Jobs over InfiniBand with One Disk Shelf Versus Two Disk
Shelves
Additionally, the I/O profile in Figure 13 shows that the runtimes are much shorter with two
disk shelves and that throughput is significantly higher with less wait time (the graph denser).
Figure 13. I/O Profile for 100 SAS Jobs over InfiniBand with One Disk Shelf (Left) Versus Two Disk
Shelves (Right)
This large difference in performance when using one versus two disk shelves was a strong
indication that the storage configuration was the limiting factor on overall throughput. To
demonstrate this hypothesis and to confirm that there was excess CPU capacity, the 100
Concurrent SAS Data Step microbenchmark was run again with just two nodes.
For this test, rather than running 10 jobs on each of 10 nodes, we ran 50 jobs on each of two
nodes. Average performance time for this configuration was only slightly worse than when the
100 jobs were distributed over 10 nodes. Since two nodes could almost keep up with the
storage I/O, this confirms that there was excess CPU capacity and that storage I/O was likely
the limiting factor. This also rationalizes why only four SAS Grid Manager job slots were
allocated per node in the SAS Grid Mixed Analytic workload. Since the data sets were so
large, we saw better peak performance and throughput as such.
From this series of tests we can conclude that additional disk shelves would have created a
more balanced configuration that would have scaled I/O throughput even higher.
16
Conclusion
The combination of Oracle’s Sun Blade 6000 modular system with Sun Blade X6270 M2
server modules from Oracle, Oracle’s Sun ZFS Storage 7420 appliance, and the Oracle
Solaris 10 operating system provides an ideal environment for SAS Grid Computing
applications. It offers robust, enterprise reliability with scalable I/O and compute performance.
The SAS workloads tested in our labs provided an extremely strenuous test for SAS Grid
Computing, stressing both the compute and I/O aspects of the architecture. The results
showed the following:
• The blades provided excellent performance. Thirty compute hours in the SAS Grid Mixed
Analytic workload finished in approximately 45 minutes.
• A mirrored pool across two disk shelves is capable of delivering more than 3 GB/sec of I/O
throughput.
• High performance network attached storage can be achieved with several network fabric
backbones, enabling flexibility for SAS Grid Computing applications that require a shared
file system. Tests using a single PCIe, dual-ported, InfiniBand connection from the Sun ZFS
Storage 7420 appliance performed roughly equivalent to a quad 10 GbE link aggregation
despite the expectation that the quad 10 GbE might perform 20% to 25% better.
• The configuration demonstrated scalable I/O. Effectively doubling the storage configuration
almost doubled the performance in the Concurrent SAS Data Step test. The average job
completion time for the 100 concurrent jobs was almost halved (from approximately 9
minutes to about 5 minutes).
A look at system resource utilization during the tests showed that there was excess capacity
from a CPU and network perspective. The network-only throughput test showed excess
capacity and the link aggregation was shown to be capable of approximately 4 GB/sec of
throughput.
The demonstration of I/O scalability is clear and simple. The Concurrent SAS Data Step
microbenchmark scaled almost linearly when the storage resources were doubled.
The choice of an appropriate configuration for storage is enterprise-specific and all factors,
such as the total of the grid node compute resources, the number of concurrent users, the I/O
utilization and heuristics, and the profile of the applications themselves, have to be taken into
consideration. Knowing which storage configuration is right for your environment is often a
very difficult question to answer, because business demands are dynamic and workloads are
unpredictable.
The Sun ZFS Storage 7420 appliance provides great flexibility in terms of configuration,
enabling you to start with a relatively modest configuration and grow the storage system as
needed.
To balance CPU performance with storage I/O performance, additional disk shelves would
have been helpful. In many cases, this would mean excess storage capacity, but the
additional disk spindles would provide greater I/O throughput. From our review of the test
data, five or six disk shelves (or 10 to 12, if mirrored for availability) would have provided a
more balanced configuration.
17
Every deployment is unique. While storage is one of the more complicated components of the
overall system configuration, a generous storage configuration can also provide the best
”insurance” against sizing and capacity planning uncertainty.
If both compute node memory and storage I/O throughput are generously configured, this
provides an insurance mechanism that enables the configuration to adequately handle spikes
in usage patterns. Since over-sizing these components does not generally affect software
licensing costs for SAS or for Sun ZFS Storage Appliances, this risk-adverse approach is still
cost-effective.
The fact that this solution architecture is based on NFS also provides tremendous IT flexibility
and eases the administrative burden. A well thought out architecture can support major
changes under the covers with little to no downtime. For example, moving a client to a
different network interface could be as simple as remapping the NFS server IP address and
host name and re-mounting the file system.
Providing the right balance of configuration, sizing, and cost tradeoffs requires a multi-faceted
knowledge base. The performance characterizations discussed in this article are intended to
provide insight to make it easier for you to formulate the appropriate criteria for sizing and
configuration exercises.
Resources
Here are resources referenced earlier in this document:
• Sun Blade 6000 modular system:
http://www.oracle.com/us/products/servers-storage/servers/blades/030803.htm
• Sun ZFS Storage Appliance:
http://www.oracle.com/us/products/servers-storage/storage/unified-storage/index.html
• Sun Blade 6000 Ethernet Switched Network Express Module (NEM) 24p 10GbE:
http://www.oracle.com/us/products/servers-storage/networking/ethernet/058289.html
• “Sun Blade 6000 I/O and Management Architecture” white paper:
http://www.oracle.com/technetwork/articles/systems-hardware-architecture/sb6000-iomgmt-400403.pdf
• Oracle Technology Network: http://www.oracle.com/technetwork/index.html
• Sun Blade X6270 M2 server module: http://www.oracle.com/us/products/serversstorage/servers/blades/sun-blade-x6270-m2-080061.html
• Sun Blade X6270 M2 Server Module Installation Guide for Windows Operating Systems:
http://download.oracle.com/docs/cd/E19474-01/
• SAS Grid 9.2M3: http://www.sas.com/technologies/architecture/grid/
• Sun ZFS Storage 7420 appliance: http://www.oracle.com/us/products/serversstorage/storage/unified-storage/ocom-sun-zfs-storage-7420-appliance-171635.html
• Sun Dual 10GbE SPF+ PCIe 2.0 Low Profile Adapter User's Guide:
http://download.oracle.com/docs/cd/E19407-01/
18
• “ixgbe Parameters” in the “Network Driver Parameters” section of Chapter 2, “Oracle
Solaris Kernel Tunable Parameters” in the Oracle Solaris Tunable Parameters Reference
Manual: http://download.oracle.com/docs/cd/E19963-01/index.html
• ibd(7d) in the “Devices and Network Interfaces” section of man pages section 7: Device
and Network Interfaces: http://download.oracle.com/docs/cd/E19253-01/index.html
• Chapter 3, “NFS Module Parameters,” of the Oracle Solaris Tunable Parameters
Reference Manual: http://download.oracle.com/docs/cd/E19253-01/index.html
For more information on Oracle and SAS, visit oracle.com/sas.
• Revision 1.1, 09/07/2011
19