RAI HPC Software
Open-Source & Commercial Code Used in RAI HPC Stacks
RAI builds each cluster with a software stack that is optimal for each
customer and application
each cluster with a software stack that is optimal for each customer and
application. That stack is then run through rigorous testing on the target
system with both industry-standard benchmarks and the customer's applications.
The set of open-source and commercial software we choose from when configuring a
system, includes, but is not limited to:
Cluster batch control systems are used to manage jobs on the system. They are
essential for sequentially queuing jobs, assigning priorities, distributing,
parallelizing, suspending, killing or otherwise controlling jobs cluster-wide.
Maui is a highly optimized and configurable advanced job scheduler for use
on clusters and supercomputers. It is capable of supporting a large array of
scheduling policies, dynamic priorities, extensive reservations, and
fairshare and also interfaces with numerous resource management systems.
Maui improves the manageability and efficiency of machines ranging from
clusters of a few processors to multi-teraflop supercomputers.
Moab Cluster Suite from Cluster Resources,Inc.®
is a policy-based intelligence engine that integrates scheduling, managing,
monitoring, and reporting of cluster workloads. It guarantees service levels
are met while maximizing job throughput. integrates with existing middleware
for consolidated administrative control and holistic cluster reporting. Its
graphical management interfaces and flexible policy capabilities offer
improved ease of use, decreased costs and increased ROI of cluster
The Portable Batch System, PBS, is the leading workload management solution
for HPC systems and Linux clusters. PBS was originally designed for NASA
because existing resource management systems were inadequate for modern
parallel/distributed computers and clusters. From the initial design
forward, PBS has included innovative new approaches to resource management
and job scheduling, such as the extraction of scheduling policy into a
single separable, completely customizable module. PBS Professional operates
in networked multi-platform UNIX environments, and supports heterogeneous
clusters of workstations, supercomputers, and massively parallel systems.
PBS Professional: the trusted solution for workload management.
When you move from network computing to grid computing, you will notice
reduced costs, shorter time to market, increased quality and innovation and
you will develop products you couldn't before. Sun Grid Computing solutions
are ideal for compute-intensive industries such as scientific research, EDA,
life sciences, MCAE, geosciences, financial services, and others.
TORQUE (Tera-scale Open-source Resource and QUEue manager) is a resource
manager providing control over batch jobs and distributed compute nodes. It
is a community effort based on the original *PBS project and has
incorporated significant advances in the areas of scalability, fault
tolerance, and feature extensions contributed by NCSA, OSC, USC, the U.S.
Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading
edge HPC organizations. TORQUE is fully supported by Moab Workload Manager
and Maui Scheduler.
Compilers are crucial for parallelizing your code to
run on a cluster
or for compiling the source code of distributed software packages or libraries for use in a cluster environment.
While freely available compilers like GNU's compilers for C/C++ or Fortran are often quite useful and robust,
they are not optimized for specific hardware and their binary output may require more execution time.
Commercial compilers optimized for your cluster hardware setup can save costs by delivering higher performance
with less overall hardware investment.
RAIs has the experience needed to recommend and install the compilers appropriate to your application
The GNU Compiler Collection is a full-featured ANSI C compiler with support for K&R C, as well as C++, Objective C, Java, and Fortran.
GCC provides many levels of source code error checking traditionally provided by other tools (such as lint), produces debugging information, and can perform many different optimizations to the resulting object code.
GCC includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages.
Create high-performance software optimized for Intel processors with Intel compilers.
Compatible with the tools developers use, Intel compilers plug into popular development environments and feature source and binary compatibility with widely used compilers.
Every compiler purchase includes one year of Intel Premier Support, providing updates, technical support and expertise for the Intel architecture.
NAG's mathematical and statistical components underpin thousands of programs and applications spanning the globe in industries as diverse as financial analysis, science and engineering, and in academia and research.
They are so widely used and trusted because of their outstanding and unrivaled quality, reliability and portability. Whether you are using a single PC or a cluster of the world's largest supercomputers, NAG has the numerical software capabilities to suit your model.
The PathScale EKOPath Compiler Suite represents the highest-performance 64-bit C, C++ and Fortran compilers for Linux-based environments. This advanced compiler suite takes advantage of the unique high-performance
64-bit features of both the Advanced Micro Devices AMD64 and the Intel Xeon EM64T architectures. EKO stands for "Every Known Optimization" and refers to a compiler framework purpose-built for inserting new optimization techniques to improve performance. At PathScale, poor compiler optimization is considered a bug and a challenge to our development team.
PGI High-Performance Compilers: Optimizing Fortran, C and C++ Compilers for 32-bit x86, 64-bit AMD64 and 64-bit Intel EM64T processor-based Linux and Windows computer systems. Outstanding single-processor performance, uncommon reliability, support for most common extensions, and automatic or user-directed parallelization for shared-memory parallel systems add up to compilers that "just work" for users migrating from RISC/UNIX to 32-bit x86 or 64-bit AMD64 or IA32/EM64T processor-based systems.
As deployments grow to hundreds or even thousands of nodes, the choice of which networked
file system to use becomes an increasingly important factor in building parallel cluster systems.
File system performance is particularly critical for applications that process large amounts of data
or those which rely on out-of-core computation.
ext3 is a robust, journaling file system with improved data integrity.
Time is saved after unclean system shutdowns because extensive
file system checking is no longer necessary after such an event.
Speed can be improved because journaling optimizes hard drive head motion.
The whitepaper found at this link covers "Red Hat's New Journaling File System: ext3".
The Journaled File System (JFS) provides fast file system restart in the event of a system crash.
Using database journaling techniques, JFS can restore a file system to a consistent state in a matter of seconds or minutes, versus hours or days with non-journaled file systems.
IBM has contributed this technology to the Linux open source community with the hope that some or all of it will be useful in bringing the best of journaling capabilities to the Linux operating system.
The Open Global File System (OpenGFS, or OGFS) is a journaled file system that
supports simultaneous sharing of a common storage device by multiple computer
nodes. It provides direct access to shared storage media. This is one way to implement a "clustered file system".
Parallel Virtual File System is a user-space parallel file system for use on clusters of computers (and Beowulf parallel clusters in particular). It provides transparent file striping across multiple machines and includes a shared library for use with existing binaries.
Parallel applications need a fast I/O subsystem. Clusters need a parallel file system that can scale as the number of nodes increases to the thousands and tens of thousands. PVFS2 is the answer.
XFS combines advanced journaling technology with full 64-bit addressing and scalable structures and algorithms. This combination delivers the most scalable and high-performance filesystem in the world.
- Journaling: Quick Recovery
- Fast Transactions
- High Scalability
- Excellent Bandwidth
Middleware is an important component to help integrate the cluster environment into a single entity.
It aids distributed applications, acting as an intermediary between components.
Our experience at RAIs means we can help with selecting the appropriate middleware for use
on your cluster.
MOSIX (Multicomputer Operating System for UnIX) is a management system that uses process migration to allow an x86-based Linux cluster or an organizational grid of such clusters to perform like a single computer with multiple processors.
All connected (participating) nodes perform like a single computer with multiple processors, almost like a single computer with multiple processors.
Users can run parallel and sequential applications by creating multiple processes and letting MOSIX seek resources and automatically migrate processes among nodes to improve the overall performance, without changing the run-time environment of migrated processes.
LAM (Local Area Multicomputer) is a high-quality open source implementation of the Message Passing Interface (MPI) standard, including all of MPI-1.2 and much of MPI-2.
From its beginnings, LAM/MPI was designed to operate on heterogeneous clusters.
LAM/MPI provides users not only with the standard MPI API, but also with several debugging and monitoring tools.
MPI/Pro is the leading commercial MPI middleware based on the MPI-1.2 standard.
MPI/Pro optimizes time to solution for parallel processing applications in hundreds of production sites.
MPI/Pro supports overlapping of communication and computation to increase overall performance.
In fact, MPI/Pro outperforms freeware MPI versions on average by 10-20% even as high as 50% on certain
MPICH is a freely available, portable implementation of MPI - a library specification for message passing, proposed as a standard by a broadly based committee of vendors, implementers, and users.
The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.
PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of systems hooked together by a network to be used as a single large parallel computer.
Large computational problems can thus be solved more cost effectively by using the aggregate power and memory of many computers.
PVM enables users to exploit their existing computer hardware to solve much larger problems at minimal additional cost.
With tens of thousands of users, PVM has become the de facto standard for distributed computing worldwide.
The choice of which operating system to run on your high-performance system
is not only affected by your specific application needs, but also will affect
many aspects of cluster operation into the future, and thus is quite an
important choice in the overall system design.
RAIs' experienced engineers can work with you to determine the best
operating system distribution for your performance needs and requirements.
Operating systems other than those listed below can be installed,
but require coordination with your sales engineer.
CentOS is an Enterprise-class Linux Distribution derived from sources freely provided to the public by a prominent North American Enterprise Linux vendor. CentOS conforms fully with the upstream vendors redistribution policy and aims to be 100% binary compatible.
Debian GNU/Linux is popular for high-performance computing due to its multi-platform support and ease of package management and administration. It provides a wide range of services, with over 8,700 packages to select from with the current distribution. Debian GNU/Linux is full featured, dynamic and also free to use and redistribute.
Red Hat Enterprise Linux technology is derived from the Fedora Project.
Fedora is a Red Hat-sponsored and community-supported open source project.
It provides a public development platform and proving ground for new open source technologies.
The Gentoo Linux distribution includes a rich set of popular software packages and is relatively easy to install and upgrade. Because it is a source distribution, it is useful for organizations that wish to customize a Linux distribution for their high-performance systems yet maintain an easy install and upgrade ability.
The Microsoft Windows Server 2003 Compute Cluster Edition will enable enterprises to tackle and solve some of the toughest computing challenges in the world using high performance computing (HPC) clusters.
The Compute Cluster Edition of Windows Server 2003 will be focused on scalability, security, ease of deployment, ease of use and ease of management.
It will support high-performance computing standards such as MPI 2 and RDMA over Ethernet and InfiniBand, with integrated job scheduling and cluster resource management tools.
Red Hat's enterprise-class Linux operating system distribution is backed by the
accountability of a major commercial software company and is stable,
reliable and broadly supported.
Easy to deploy and manage, Red Hat Enterprise Linux also provides various training & support options along with many industry and hardware certifications and certified applications.
High-performance computing customers realize increased manageability with Red Hat Network and improved scalability with Red Hat Global File System.
The SUSE LINUX family is characterized by flexibility and versatility
while delivering a scalable, high-performance foundation for secure enterprise computing.
It delivers a degree of security, reliability, availability, scalability, and automated administration that only Linux-based systems can offer.
SUSE supports its enterprise customers with a comprehensive range of qualified consulting, training and support services.
Other Software Packages
Arkeia provides established backup
solutions for departments and mid-size businesses utilizing Linux. It is a
trusted industry solution for speedy, automated backup and recovery that
eases the life of system administrators while improving efficiency.
Big Brother is designed to let anyone see
how their network is doing in near real-time, from any web browser,
anywhere. System administrators can visually assess the health of a network
through a color-coded web interface. Administrators are immediately notified
when defined events occur, facilitating proactive problem resolution and
preventing critical outages.
The Environment Modules package provides
for the dynamic modification of a user's environment via modulefiles. Each
modulefile contains the information needed to configure the shell for an
application. Once the Modules package is initialized, the environment can be
modified on a per-module basis using the module command which interprets
modulefiles. Typically modulefiles instruct the module command to alter or
set shell environment variables such as PATH, MANPATH, etc. modulefiles may
be shared by many users on a system and users may have their own collection
to supplement or replace the shared modulefiles.
Etnus TotalView is the debugger for
complex code. It helps eliminate the head-banging frustration, delays, and
pain inherent in developing complex and parallel code, with unrivaled thread
debugging support. We support all types of parallel programming models,
including MPI, and OpenMP, so if you employ parallelism, TotalView is also
the debugger for you.
Ganglia monitors distributed
high-performance systems such as clusters and Grids. It requires little
configuration and scales well. It is based on a hierarchical design and uses
carefully engineered data structures and algorithms to achieve very low
per-node overheads and high concurrency.
The PathScale OptiPath MPI Acceleration
Tools represent the simplest way to identify the root causes preventing an
application from scaling on a cluster. Using OptiPath, a novice or a
seasoned MPI expert is guided to the most important problems affecting
scalability in their applications. The tools present these problems in
ranked order, with supporting information. Most importantly, OptiPath makes
specific recommendations how to improve the application performance and
In the event of a disaster, VERITAS
solutions can quickly recover your critical data and applications. They
provide a comprehensive range of software solutions that ensure businesses
can continue to operate with minimal interruption. VERITAS disaster recovery
solutions support a wide spectrum of operating systems, applications,
databases, and hardware platforms and devices to fully protect your
heterogeneous IT environments.