NOTICE: As of 10/1/2012, Forge has been decommissioned and is no longer in production.
Forge consists of 44 Dell PowerEdge C6145 quad-socket nodes with dual 8-core AMD Magny-Cours 6136 processors and 64 GB of memory. Each node supports 6 or 8 NVIDIA Fermi M2070 GPUs.
|Architecture|| Heterogeneous |
| Number of Servers |
Number of CPUs (cores)
| 36 |
|Number of Accelerator Units||288|
|Accelerator Units|| |
|Network Interconnect||InfiniBand QDR|
|Parallel Filesystem||GPFS (600 TB total)|
Your home directory is the default directory you are placed in when you log on. You should use this space for storing files you want to keep long term such as source code, scripts, input data sets, etc. NCSA HPC systems have a 50GB home directory quota.
The command to see your disk usage and limits is
quota. Home directories are backed up daily.
Scratch file systems are intended for short term use and should be considered volatile. The size of scratch file systems varies with the system.
Please note that backups are not performed on the scratch directories. In the event of a disk crash or file purge, files on the scratch directories cannot be recovered. Therefore, you should make sure to back up your files to permanent storage as significant changes are made (at least daily).
The common scratch filesystem
/scratch/users is available for all interactive work. The
scratch-global soft link in your home directory points to your scratch directory. For batch jobs, see the section Disk Space for Batch Jobs.
Files in the common scratch filesystem (
/scratch/users) are purged on the basis of size and time since the last access:
|File Size||Removed after|
|>= 10 GB||4 days|
|< 10 GB||14 days|
Please do not attempt to circumvent this removal scheme (e.g., with
touch). Such attempts may result in the loss of access to the scratch file systems.
Files in the batch scratch filesystem (
/scratch/batch) on Forge may be purged as soon as the batch job that it is associated with completes. Users should use the saveafterjob utility for automated, guaranteed saving of files from batch jobs.
To opt out of receiving email notifications on purged files, add a file named
.nopurgemail in your home directory:
forge$ touch $HOME/.nopurgemail
Reserved Project Space
NCSA has reserved project space available on the HPC systems. They are intended for users who require larger amounts of scratch space than exist in the normal scratch file systems for short durations. These directories are intended for short term use and are not backed up. Contact NCSA Consulting Services to request space. Please include your name, project (PSN), preferred start date, duration of need, approximate disk space needed, and a short description of the project. Include reasons why regular scratch space is insufficient for your needs.
Permanent File Storage
Permanent storage of your files is available using the NCSA archival storage system (MSS). You have read-only access to your MSS account for 4 months after your last NCSA HPC account has been deleted. The NCSA MSS Page has additional information on using MSS. See the Overview section in the MSS User Guide for user and project quota information.
Methods of Access
There are four methods of access to NCSA resources:
- GSI-SSH with grid credential authentication: (default) XSEDE users receive an XSEDE wide LoginID and Password to access XSEDE resources via the XSEDE User Portal(XUP).
- SSH with Key Pairs and Passphrase: Initial access via XUP or Site Password required. After generating a public ssh key on a local machine, the user must copy the key to the target resource (machine) using the XUP or Site Password.
- SSH with NCSA Kerberos password: XSEDE users may request a NCSA Kerberos password by emailing the XSEDE Helpdesk or submitting a consulting ticket on the XSEDE Helpdesk webpage.
- Secure FTP client (GridFTP): Provides file management (listing, moving,transfer, deletion, etc...) using the same login information provided to you as an XSEDE user.
Logging into Forge
| || |
Users are provided access to XSEDE Resources via the XSEDE User Portal (XUP). Your Allocation Information Packet contains your XUP username and password that you will need to access the XSEDE User Portal.
For information on NCSA local passwords, see here.
Managing your allocation
Once you have an allocation, you will be charged for the amount of service units (SUs) used. SUs for the NCSA HPC systems are calculated based on wall clock time of jobs. The charging algorithm for the Dell NVIDIA Cluster is as follows:
# SUs = 16 * #nodes * WallTime
where WallTime = Total Wall Clock Hours. Note: The minimum resource allocation for a batch job is 1 node.
Verifying Your Account Balance
You can monitor usage of your allocation by using the XSEDE
tgusage utility. The data displayed by
tgusage is normally updated once each day, so SUs accrued by jobs on a given day will be reflected in
tgusage the next day. Entering
tgusage with no options or
-h/--help displays the usage/help information.
Online Usage Information
Usage information can also be obtained online through the XSEDE User Portal on the "Allocations/Usage" page under the "My XSEDE" tab.
Setting Projects for Charging
If you have more than one project (PSN), you can charge to various projects within a login session. Most users have only one project. You do not need to define a default project unless you have multiple projects.
Setting a Default Project
You can define (or change) a default project with the
defacct command. After a default project is set, you will no longer be prompted to choose one of your projects during the login process. Enter
defacct at the prompt to set your default.
none at the prompt to unset a default project, and enter a carriage return to leave your default project unchanged. See the
defacct man page for more information. The
batch_accts command lists all your accounts on the current system.
IMPORTANT: If you have a default project set, batch jobs will automatically be charged to the default project at the time that the job is submitted unless you charge the batch job to a specific project (see below).
Charging to Projects in a Batch Job
You can also charge batch jobs to a specific project (PSN) with the
-A option in the PBS
When your account is first activated your default shell is set to tcsh, an enhanced version of the Berkeley UNIX C shell (csh). The tcsh shell is completely compatible with the standard csh, and all csh commands and scripts work unedited with tcsh. Enter "
man tcsh" at the prompt for details about tcsh.
The other shell available on NCSA HPC systems is the GNU Bourne-Again SHell (bash), which is completely compatible with the Bourne shell (sh). To change your shell, send email to NCSA Helpdesk with a request.
Managing your "dot" files
When your account on an NCSA system is created, default "dot" files are placed in your home directory.
Forge uses the module system to set up the user environment. See the section Managing Your Environment (Modules) for details.
For other "dot" files, you can find copies of the default files in
/etc/skel. Copy the appropriate file to your home directory and customize as needed.
Transferring your files to Forge
A variety of methods are available for transferring files between computer systems. See NCSA Data Transfers for details.
Compilers and languages
NCSA supports the NVIDIA and PGI compilers for the Tesla Fermi GPUs. As these compilers each provide multiple and differing capabilities, they are summarized below in terms of the language or API provided.
CUDA C is the computing architecture developed, by NVIDIA, for NVIDIA GPUs; it defines extensions to the C language for negotiating execution on the GPU and communication between host and GPU.
CUDA Fortran is an analogous extension to the Fortran language; it was developed as a collaboration between NVIDIA and the Portland Group.
CUDA-x86 is the PGI CUDA C/C++ compiler for x86; it provides a unified programming model for both multi-core and many-core architectures. Executables may be run either on the GPU, or on a non-GPU multi-core x86 architecture.
Environment: The module for NVIDIA's CUDA C is loaded by default upon login. The NVIDIA compiler is
nvcc. One can compile on the head node, but execution on the Tesla GPUs is available only via PBS batch jobs.
SDK: Example code, documentation and several utilities can be found in the NVIDIA SDK. When porting code to CUDA C, the examples in the SDK can be quite useful, both as illustration and as templates for certain algorithms (e.g., marching cubes, Monte Carlo). In addition, there are further examples and tutorials available on the NVIDIA site at the link below.
Examples: To use the examples, one should copy the installer to one's home directory, and run with the prompted-for defaults:
forge$ cp /uf/ncsa/consult/nvidia_sdk/gpucomputingsdk_*_linux.run $HOME forge$ cd $HOME forge$ sh ./gpucomputingsdk_*_linux.run
To build, cd into the "C" subdirectory and run
make. In all cases, the resulting executables should be run on a compute node, accessed through the batch system.
deviceQuery will list characteristics of the Tesla devices.
Environment: To use CUDA Fortran, one need only load the module for the PGI compilers:
forge$ module load pgi/2011
SDK: Examples and makefile may be found here:
/usr/local/pgi/linux86-64/2011/cuda/cudaFortranSDK. In all cases, the resulting executables should be run on a compute node, accessed through the batch system.
Environment: To use CUDA-x86, one need only load the module for the PGI compilers:
forge$ module load pgi/2011
SDK: Examples and makefile may be found here:
/usr/local/pgi/linux86-64/2011/cuda/cudaX86SDK. In all cases, the resulting executables should be run on a compute node, out of deference to other users on the login node; as noted in the introduction, however, any multi-core x86 architecture is supported.
In addition to CUDA Fortran, the PGI compilers support an API referred to as the "Accelerator Programming Model", which is similar in practice to OpenMP. In this model, user directives may be added to existing C or Fortran code that will automatically "accelerate" regions of code, by executing on the GPU.
Examples of use may be found here:
The makefile therein will build examples of accelerating C and Fortran code. A summary and references may be found here: PGI Accelerator Compilers
OpenCL is supported by the NVIDIA CUDA distribution; cf. examples in the NVIDIA_GPU_Computing_SDK mentioned above, and the discussion here: OpenCL
In addition to the compilers mentioned above, the GNU and Intel compilers are available on Forge, and loaded by default.
Several implementations of MPI built with these compilers are available ; MVAPICH2 is loaded by default, and versions of OpenMPI can be seen with the command
In all of these cases, the compilers themselves have standard names:
mpif77 (Fortran 77), and
mpif90 (Fortran 90).
An article on debugging CUDA-x86 applications may be found here: Debugging CUDA-x86 Applications
For host-based coda, the Intel Math Kernel Library (MKL) contains the complete set of functions from the basic linear algebra subprograms (BLAS), the extended BLAS (sparse), the complete set of LAPACK routines, and a set of fast Fourier transforms; it is loaded by default with the Intel compilers.
NVIDIA provides accelerated versions of certain of the above routines, namely:
The SDK contains examples of use for each of these libraries.
The NVIDIA Performance Primitives library (NPP) is a collection of basic algorithms accelerated for the GPU (arithmetic, filter, image, geometric...): NPP
Running your applications
The login node is available for interactive use. It has 16 cores and 6 GPU devices. In general, interactive use should be limited to compiling and other development tasks, such as editing source and debugging. The batch system is available for all other jobs. See the
qsub -I section for instructions on how to run an interactive job on the compute nodes.
You can compile on the head nodes, but access to nodes with the GPU devices is available only via PBS batch jobs. See the section
qsub_I for instructions on how to run an interactive job on the compute nodes.
The batch system requires no additional information for running on the GPUs. Execution of GPU kernels is controlled from within the host code, with 6 or 8 GPUs available from each node. Simply set your batch script for deployment on the host node(s), as in the sample batch scripts.
The MPI implementations on Forge have the mpirun script for running an MPI program. See the sample batch scripts for syntax details.
Before you run an OpenMP program, set the environment variable
OMP_NUM_THREADS to the number of threads you want. For example, to run a program with two threads:
forge$ setenv OMP_NUM_THREADS 2
The following environment variables may also be useful in running your OpenMP programs:
| ||Sets the schedule type and (optionally) the chunk size for DO and PARALLEL DO loops declared with a schedule of RUNTIME. The default is STATIC.|
| || sets the run-time execution mode. The default is |
| ||Sets the number of bytes to allocate for the stack of each parallel thread. You can use a suffix k, m, or g to specify kilobytes, megabytes or gigabytes. The default is 4m.|
To run a MPI/OpenMP hybrid program, you need to set the envionment variable OMP_NUM_THREADS to the number of threads you want, and change the number of cpus per node for MPI accordingly. For example, to run a program with 10 MPI ranks and 8 threads for each rank, do the following in your batch script:
#PBS -l nodes=10:ppn=1 setenv OMP_NUM_THREADS 16
qsub section for information on PBS directives.)
The following queues are currently available for users:
|Queue||GPU configuration||Walltime||Max # Nodes|
|debug||6 or 8 GPU nodes||30 mins||4|
|normal||6 GPU nodes||48 hours||18|
|eight||8 GPU nodes||48 hours||8|
NOTE: while the total number of nodes in the Forge cluster is 44, all nodes may not be available in practice due to offline nodes, etc.
For special queue requests please email email@example.com.
qsub command is used to submit a batch job to a queue. All options to
qsub can be specified either on the command line or as a line in a script (known as an embedded option). Command line options have precedence over embedded options. Scripts can be submitted using:
qsub [list of qsub options] script_name The following sample batch script illustrates
qsub usage and options.
-l resource-list: specifies resource limits. The resource_list argument is of the form:
The resource_names are:
walltime: maximum wall clock time (hh:mm:ss) [default: 10 mins]
nodes: number of 16-core nodes [default: 1 node]
ppn: how many cores per node to use (1 through 16)
#PBS -l walltime=00:30:00,nodes=2:ppn=16
-q queue_name: specify queue name. required
-N jobname: specifies the job name.
-o out_file: store the standard output of the job to file
-e err_file: store the standard error of the job to file
-j oe: merge standard output and standard error into standard output file.
-V: export all your environment variables to the batch job.
-m be: send mail at the beginning and end of a job.
-M firstname.lastname@example.org : send any email to given email address.
-A project: charge your job to a specific project (XSEDE project or NCSA PSN). (for users in more than one project)
-X: enables X11 forwarding.
- Using the
-Noption will generate stdout and stderr output files with filenames of the form:
<jobname>.o<jobid>respectively in the directory from where the batch job was submitted when used without the
- Temporary stdout/stderr files while the job is running are located in the home directory, and named
The -I option tells qsub you want to run an interactive job. You can also use other qsub options such as those documented in the batch sample scripts. For example, the following command:
forge$ qsub -I -V -q debug -l walltime=00:30:00,nodes=1:ppn=16
will run an interactive job with a wall clock limit of 30 minutes, using one node and sixteen cores per node.
After you enter the command, you will have to wait for Torque to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. Once the job starts, you will see something like this:
qsub: waiting for job 914.fsched to start qsub: job 914.fsched ready
Now you are logged into the launch node. At this point, you can use the appropriate command to start your program. When you are done with your runs, you can use the exit command to end the job.
qstat command displays the status of batch jobs.
qstat -agives the status of all jobs on the system.
qstat -nlists nodes allocated to a running job in addition to basic information. The first host on the list is the launch node.
qstat -fPBS_JOBID gives detailed information on a particular job. Note that currently PBS_JOBID needs to be the full extension:
qstat -qprovides summary information on all the queues.
qhist, a locally written tool, summarizes the raw accounting record(s) for one or more jobs. SU charges for a job are available the day after the job completes. To display information about a specific job, the syntax is
qhist PBS_JOBID. See
qhist --help for details.
qdel command deletes a queued job or kills a running job. The syntax is
qdel PBS_JOBID. Note: You only need to use the numeric part of the Job ID.
Sample Batch Scripts
Sample batch scripts are available in the directory
/usr/local/doc/batch_scripts for use as a template.
Disk Space for Batch Jobs
Scratch space for batch jobs is provided via a per-job scratch directory that is created at the beginning of the job. This directory is created under
/scratch/batch, and is based on the JobID. If the batch script uses one of the sample scripts as a template, the name of this scratch directory is available to job scripts with the
$SCR environment variable.
Your job scratch directory may be deleted soon [possibly immediately] after your job completes, so you should take care to transfer results to the mass storage system. (see the section Automated Saving of Files from Batch Jobs below).
cdjob command can be used to change the working directory to the scratch directory of a running batch job. The syntax is
forge$ cdjob PBS_JOBID
Automated Saving of Files from Batch Jobs
The saveafterjob utility is available for automated, guaranteed saving of output files from batch jobs to the mass storage system. For details on its use, see the saveafterjob page and the sample PBS batch scripts.
Debuggers and Profilers
cuda-gdb for debugging CUDA C; it is essentially a port of
gdb, with appropriate extensions, and consequentially will be familiar to users of
NVIDIA provides a "Visual Profiler",
computeprof for CUDA C and OpenCL. To invoke the profiler, one must enable X-forwarding by firstly logging into Forge with "ssh -X forge", and then launching a batch job with option "-X"; cf. the man pages for ssh, respectively, qsub.
pgprof utility enables profiling of CUDA Fortran and the PGI Accelerator directives; as with NVIDIA's visual profiler, one should enable X-forwarding as described above.
Last updated: April 4, 2012