- Journaling the Linux ext2fs filesystem (postscript: ps.gz) - by Stephen C. Tweedie
- EXT3 - Stephen Tweedie's journalling version of the ext2
- Reiser FS - [Howto]
- SGI XFS - Note: SGI also released CXFS (Clustered XFS - commercial product.)
- IBM JFS:
- StegFS: Steganographic File System for Linux - encrypt data, hides data such that it cannot be proven to be there.
- TCFS: Transparent Cryptographic File System project
- VERITAS: VxFS
- Legato - Commercial Storage Area Network (SAN) Software/Hardware
- LinLogFS -- A Log-Structured Filesystem For Linux
- SGI CXFS shared SAN network (fibre channel) clustered file system - Multi OS (Linux, Solaris, IRIX, MS/Windows NT all at the same time connected to the same SAN) 2Gbs Brocade SAN fabric storage.
- Tivoli SANergy - Multi OS (Linux, Windows, Solaris, IRIX, AIX) fibre channel SAN clustered file system. [Readme]
- Red Hat: Global File System (GFS)
- Ibrix.com: Fusion - Commercial software based global scalable file system
- IBM: GPFS
- HDFS: Hadoop Distributed File System - distributed, fault tolerant storage for large datasets. Write and append distributed storage. Written in Java for MapReduce distributed computing clusters.
- Ceph - POSIX compliant networked file system. Striped and replicated.
- Coda File System - Advanced networked filesystem.
- Clemson.edu: Parallel Virtual File System (PVFS) - This unique file system clusters the drives of many nodes as opposed to the typical many nodes connected to a single SAN.
Logical Volume Manager:
- The Beowulf Project
- SGI ACE: Advanced Cluster Environment - Includes CXFS - Commercial product
- Paralogic: HPCC cluster system software - Beowulf distribution
- MOSIX.org - Scalable Cluster Computing
- QPS - Process monitoring tool for Linux. Handles MOSIX clusters. I love this tool!!
- OpenAFS.org - Share files and resources across local and wide area networks
- Myricom: Myrinet - scalable cluster interconnect (Commercial) (Software: GM)
- Portland Group: Cluster development kit - compilers and tools
- U of Wisconsin: Condor - High Throughput Computing (HTC) - distributed computing
- Compaq: Single System Image Clusters for Linux
- BPROC - Beowulf Distributed Process Space
- Platform.com: LSF - Workload management and load sharing.
- OpenPBS.org: Portable Batch System (PBS)
- Maui Scheduler - [alt]
Message Passing Interface (MPI): Programmers API for software to coordinate tasks access multiple nodes.
- Open-MPI.org - MPI-2 compliant. (Compined FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI)
- MPI Standard
- MPICH Portable Implementation of MPI - Standard for message-passing libraries
- LAMMPI - TCP/IP only
- MPI-IO: ROMIO
IPC Communication Libraries:
- PVM (Parallel Virtual Machine):
- TIPC: Transparent Inter Process Communication: intra cluster IPC. High speed fault tolerant and redundant synchronous sockets and asynchronous network communication.
- UPC: Berkeley Unified Parallel C - C language extension. Uniform programming model for both shared and distributed memory hardware. Programmer is presented with a single shared Partitioned Global Address Space (PGAS) where variables may be read and written by any processor (SPMD: Single Program, Multiple Data).
- Titanium: UC Berkeley developed Java "dialect" to support massively parallel supercomputers and distributed memory clusters.
- OpenMP.org: Older cross platform API which supports shared memory and parallel programming in C/C++ and FORTRAN for NUMA and SMP Multi-Processing (MP) systems.
Linux Journal: A High-Availability Cluster for Linux - by Phil Lewis June 29, 1999
- SGI: Linux Failsafe High-Availability (HA)
- Linux-HA.org: High-Availability Linux Project
- LinuxVirtualServer.org - cluster, load balancing
- Red Hat Piranha: IP HA Load-balanced Web and FTP Clusters
- LinuxVirtualServer.org - Load balancing cluster
- FAKE: IP Address Takeover Tool - HA Switch to backup servers on a LAN for both unscheduled and scheduled down time. Fake allows you to take over the IP address of another machine using ARP spoofing.
- Compaq: Cluser Infrastructure for Linux
- SteelEye.com - enterprise-grade, low cost, high availability clustering, data replication and disaster recovery software.
- MetiLinx - Linux Failover
- UltraMonkey.org - Load balancing, high availability
Distributed computing. Generally more loosely coupled than a cluster. A grid may be heterogeneous and geographically dispersed.
Cloud computing infrastructure provides the ability to provision computing infrastructure, software, storage, security and data management as a service. Typically this is provided using distributed virtual systems preconfigured to perform these services. When greater throughput is required, more virtual machines are provisioned to support the load.
Commercial Cloud computing efforts are available as services purchased through vendors such as Amazon Web Services (EC2: Elastic Computing Cloud) or Google AppEngine.Open source cloud frameworks are also available:
- OpenStack.org - control large pools of compute, storage, and networking resources using a web dashboard
- Eucalyptus.com - mimics the Amazon Web Services cloud and API
- AppScale - provides compatibility with Google App Engine Applications
- CloudFoundary.com - EMC VmWare supported project
- CloudStack.org - Apache foundation sponsored. Java based. VM Hypervisors supported: VMware, Oracle VM, KVM, Xen.
- Nimbus - Amazon EC2/S3-compatible. Targeted to the scientific community.
- OpenQRM - works with Debian, Ubuntu and CentOS Linux and VMware, Xen, KVM and Citrix XenServer virtual machines. Supported "Enterprise" edition available. N to 1 HA failover.
- FOSS-cloud.org - Linux or MS/Windows - desktop support focus
- Myrinet - High speed low latency interconnection. Switch interconnect.
- Dolphin Interconnect - Ring topology. Also 2D and 3D topologies. Traffic shared on ring. If one node goes out, the ring is broken and communications stop.
High Performance / Low latency Gigabit Ethernet (GE):
|Quadrics||3||800 Mb/s (Elan 4)|
|Myrinet||6||800 Mb/s (Rev E)|
|Infiniband 4X (MPI driver)||6||1.8 Gb/s
(10 Gb/s Voltaire)
|Level 5 Networks (Low latency GE NIC/driver)||7 end to end MPI
13 with switch
9 TCP end to end
|Ammasso (RDMA)||16 end to end|
|Standard Ethernet (GE) TCP/IP||65 MPICH over TCP/IP
|60 - 80 Mb/s|
- IBM Beowulf presentation - (pdf) - Dan Owensly
- SuperComp.org - Supercomputing technical conference papers
- Linux Filesystem Hierarchy Standard
- IBM Redbook on Linux HPC Cluster Installation
- Top500.org - Top 500 supercomputer sites (typically clusters)
- Clemson: PARL - Parallel Architecture Research Lab