Sample Hardware Configuration for Structural Analysis (ANSYS): Penguin Computing 64-Core CAE Cluster Operating Environment Scyld ClusterWare 4: Manage a cluster like a single system, Minimal overhead on compute nodes Single point of control for Scyld ClusterWare / NFS server Head Node Options: 1 x Relion Intel 2612: Dual processor, Dual/Quad Core Xeon CPU 52XX/54XX, Redundant power supply, Up to 12 hot swappable SATA or SAS Hard Drives, 1 x Altus AMD 2650: Dual processor, Dual/Quad Core AMD Opteron 2200/2300 Series, Redundant power supply, Up to 6 hot swappable SATA or SAS Hard Drives Compute engines Compute Node Options: 4 x Relion Intel 1672: Dual processor, Quad Core Xeon CPU 54XX (Harpertown), Intel Chipset (Seaburg), ‘Twin’ system integrating two nodes in one 1U unit → High Density, Single power supply → Power efficiency 8 x Altus AMD 650 Linux Server: Dual Processor, Quad Core Opteron CPU 235X (Barcelona), NVidia 3600 Chipset, 16 DIMM slots → Up to 128GB RAM capacity Storage Compute Nodes: 2 x 160GB SATA Drives, 7200RPM, RAID0 configuration Memory 16GB – 32GB (2 – 4 GB per core): Depends on model complexity, ANSYS recommends 1GB per Million DOFs Interconnect Gigabit Ethernet Memory Configuration The recommended amount of memory is highly model and solver dependent. Figure 1 shows the runtime for ANSYS benchmarks bm-1 – bm-8 for three different memory configurations of 8GB, 16GB and 32GB. The presented results were obtained on cluster of Penguin Computing Relion 1600 servers, equipped with two dual-core Intel Xeon 5160 CPUs (clock speed of 3GHz). The benchmarks are described at http://www.ansys.com/services/hardware-support-db.htm. Figure 1: Performance Impact of Memory Configuration DANSYS Scalability Distributed ANSYS spreads the computational workload of a single solver run across multiple systems. Figure 2 illustrates the solver scalability using ANSYS benchmark bmd-4. The cores used for this set of benchmark runs were allocated round-robin: Each process was launched on one core on a different system. After four cores on four systems had been allocated, the algorithm wrapped around and allocated the next core on the first node in the set etc. Each node had 8GB of RAM installed. Figure 2: Scalability of ANSYS’ Distributed Solver