Bridges User Guide
Last update: March 28, 2017

System Overview

Compute Nodes

Bridges comprises login nodes and three types of computational nodes:

  • Regular Shared Memory (RSM) with 128GB RAM each. This includes GPU nodes.
  • Large Shared Memory (LSM) with 3TB RAM each
  • Extreme Shared Memory (ESM) with 12TB RAM each

Currently, Bridges computational nodes supply 0.8946 Pf/s and 144 TiB RAM.

Auxiliary Nodes

In addition, Bridges contains database, web server, data transfer and login nodes. Current Bridges resources include:

  • 752 RSM nodes: HPE Apollo 2000s, with 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU), 128GB RAM and 8TB on-node storage
  • 16 RSM GPU nodes: HPE Apollo 2000s, each with 2 NVIDIA K80 GPUs, 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU), 128GB RAM, and 8TB on-node storage
  • 32 RSM GPU nodes: HPE Apollo 2000s, each with 2 NVIDIA P100 GPUs, 2 Intel Xeon E5-2683 v4 CPUs (16 cores per CPU), 128GB RAM, and 8TB on-node storage
  • 8 LSM nodes: HPE ProLiant DL580s, each with 4 Intel Xeon E7-8860 v3 CPUs (16 cores per CPU), 3TB RAM, and 16TB on-node storage
  • 34 LSM nodes: HPE ProLiant DL580s, each with 4 Intel Xeon E7-8870 v4 CPUs (20 cores per CPU), 3TB RAM, and 16TB on-node storage
  • 2 ESM nodes: HPE Integrity Superdome Xs, each with 16 Intel Xeon E7-8880 v3 CPUs (18 cores per CPU), 12TB RAM, and 64TB on-node storage
  • 2 ESM nodes: HPE Integrity Superdome Xs, each with 16 Intel Xeon E7-8880 v4 CPUs (22 cores per CPU), 12TB RAM, and 64TB on-node storage
  • Database, web server, data transfer, and login nodes: HPE ProLiant DL360s and HPE ProLiant DL380s, each with 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU) and 128GB RAM. Database nodes have SSDs or HDDs. ** Details: Bridges login, database, web server and data transfer nodes contain 2 Intel Xeon E5-2695 v3 CPUs; each CPU has 14 cores, 2.3 GHz base frequency, 3.3 GHz max turbo frequency, 35MB LLC; each node also holds 128GB RAM, DDR4 @ 2133 MHz; database nodes have SSDs or additional HDDs.

System Access

Before the first time you connect to Bridges, you must create your PSC password. Depending on your preferences, you may want to change your login shell once you are logged in.

If you have questions at any time, you can send email to bridges@psc.edu.

Your PSC password

If you do not already have an active PSC account, you must create a PSC password (also called a PSC Kerberos password) before you can connect to Bridges. Your PSC password is the same on all PSC systems, so if you have an active account on another PSC system, you do not need to reset it before connecting to Bridges.

Your PSC password is separate from your XSEDE Portal password. Resetting one password does not change the other password.

Changing your password

There are two ways to change or reset your PSC password:

When you change your PSC password, whether you do it via the online utility or via the kpasswd command on one PSC system, you change it on all PSC systems.

Connect to Bridges

When you connect to Bridges, you are connecting to one of its login nodes. The login nodes are used for managing files, submitting batch jobs and launching interactive sessions. They are not suited for production computing.

See the Running Jobs section of this User Guide for information on production computing on Bridges.

There are several methods you can use to connect to Bridges.

  • ssh, using either XSEDE or PSC credentials. If you are registered with XSEDE for DUO Multi-Factor Authentication (MFA), you can use this security feature in connecting to Bridges.
    See the XSEDE instructions to set up DUO for MFA.
  • gsissh, if you have the Globus toolkit installed
  • XSEDE Single Sign On Hub, including using Multi-Factor authentication if you are an XSEDE user

SSH

You can use an SSH client from your local machine to connect to Bridges using either your PSC or XSEDE credentials.

SSH is a program that enables secure logins over an unsecure network. It encrypts the data passing both ways so that if it is intercepted it cannot be read.

SSH is client-server software, which means that both the user's local computer and the remote computer must have it installed. SSH server software is installed on all the PSC machines. You must install SSH client software on your local machine.

Read more about SSH in the Getting Started with HPC document.

Once you have an SSH client installed, you can use either your PSC credentials or XSEDE credentials (optionally with DUO MFA) to connect to Bridges. Note that you must have created your PSC password before you can use SSH to connect to Bridges.

Use ssh to connect to Bridges using XSEDE credentials and (optionally) DUO MFA

  1. Using your SSH client, connect to hostname "bridges.psc.xsede.org" or "bridges.psc.edu" using port 2222.
    Either hostname will connect you to Bridges, but you must specify port 2222.
  2. Enter your XSEDE username and password when prompted.
  3. (Optional) If you are registered with XSEDE DUO, you will receive a prompt on your phone. Once you have answered it, you will be logged in.

Use ssh to connect to Bridges using PSC credentials

  1. Using your SSH client, connect to hostname "bridges.psc.xsede.org" or "bridges.psc.edu" using the default port (22).
    Either hostname will connect you to Bridges. You do not have to specify the port.
  2. Enter your PSC username and password when prompted.

Read more about using SSH to connect to PSC systems.

gsissh

If you have installed the Globus toolkit you can use gsissh to connect to Bridges. GsiSSH is a version of SSH which uses certificate authentication. Use the "myproxy-logon" command to get a suitable certificate. The Globus toolkit includes a man page for myproxy-logon. See more information on the Globus Toolkit website.

Public-private keys

You can also use the public-private key pair method to connect to Bridges. To do so you must first fill out our form for this type of access.

XSEDE Single Sign On Hub

XSEDE users can use their XSEDE usernames and passwords in the XSEDE User Portal Single Sign On Login Hub (SSO Hub) to access bridges.psc.xsede.org or bridges.psc.edu.

You can use DUO MFA in the SSO Hub if you choose.

See the XSEDE instructions to set up DUO for Multi-Factor Authentication.

Change your default shell

The "change_shell" command allows you to change your default shell. This command is only available on the login nodes.

To see which shells are available, type:

login1$ change_shell -l

To change your default shell, type:

login1$ change_shell newshell

where newshell is one of the choices output by the "change_shell -l" command. You must use the entire path output by change_shell -l, e.g. /usr/psc/shells/bash. You must log out and back in again for the new shell to take effect.

Account Administration

Charging

Allocations for Bridges are given for "Bridges regular" or "Bridges large" and the two are charged differently.

  • Bridges regular and GPU nodes: The RSM nodes are allocated as "Bridges regular". Service Units are defined in terms of compute resources:

    1 SU = 1 core-hour

GPU nodes are also charged at this rate. There is no extra charge for the use of the GPU units.

  • Bridges large: The LSM and ESM nodes are allocated as "Bridges large". Service Units are defined in terms of memory requested:

    SU = 1 TB-hour

Managing multiple grants

If you have more than one grant, be sure to charge your usage to the correct one. Usage is tracked by group name.

  • Find your group names

    To find your group names, use the "id" command to list all the groups you belong to.

    id -Gn
  • Find your current group

    id -gn

    will list the group you associated with your current session.

  • Change your default group

    Your primary group is charged with all usage by default. To change your primary group, the group to which your SLURM jobs are charged by default, use the "change_primary_group" command. To see all your groups, enter

    change_primary_group -l

    To set groupname as your default group, enter

    change_primary_group groupname

Charging to a different group

Batch jobs and interactive sessions are charged to your primary group by default. To charge your usage to a different group, you must specify the appropriate group with the -A groupname option to the SLURM "sbatch" command. See the Running Jobs section of this Guide for more information on batch jobs, interactive sessions and SLURM.

Tracking your usage

The "projects" command will help you keep tabs on your usage. It shows grant information, including usage and the scratch directory associated with the grant.

Type:

projects

For more detailed accounting data you can use the Grant Management System (authorization required).

File Spaces

There are several distinct file spaces available on Bridges, each serving a different function.

  • Home ($HOME), your home directory on Bridges
  • pylon2, persistent file storage
  • pylon5, temporary file storage. Pylon5 is replacing pylon1.
  • Node-local storage ($LOCAL), scratch storage in the local memory associated with a running job
  • Memory storage ($RAMDISK), scratch storage on the local disk associated with a running job

File expiration

Three months after your grant expires all of your Bridges files associated with that grant will be deleted, no matter which file space they are in. You will be able to login during this 3-month period to transfer files, but you will not be able to run jobs or create new files.

Home ($HOME)

This is your Bridges home directory. It is the usual location for your batch scripts, source code and parameter files. Its path is /home/username, where username is your PSC userid. You can refer to your home directory with the environment variable $HOME. Your home directory is visible to all of the Bridges nodes.

Your home directory is backed up daily, although it is still a good idea to store copies of your important files in another location, such as the pylon2 file system or on a local file system at your site.

$Home quota

Your home directory has a 10GB quota. You can check your home directory usage using the "quota" command or the command "du -sh".

Grant expiration

Three months after a grant expires, the files in your home directory associated with that grant will be deleted.

pylon2

The pylon2 file system is a persistent file system. You should use it for long-term storage of your files and not for working space for running jobs. The pylon2 file system is shared across all of Bridges nodes.

pylon2 directories

The path of your pylon2 home directory is /pylon2/groupname/username, where groupname is the name for the PSC group associated with your grant. The id command can be used to find your group name.

The command "id -Gn" will list all the groups you belong to.

The command "id -gn" will list the group associated with your current session.

If you have more than one grant, you will have a pylon2 home directory for each grant. Be sure to use the appropriate directory when working with multiple grants.

The pylon2 file system is not backed up. You should therefore store copies of your important pylon2 files in another location.

pylon2 quota

Your usage quota for each of your grants is the Pylon storage allocation you received when your proposal was approved. Files stored under a grant in both pylon5 and pylon2 count towards this storage allocation. If your total use in pylon5 and pylon2 exceeds this quota your access to the partitions on Bridges will be shut off until you are under quota.

Use the "du -sh" or "projects" command to check your total pylon usage (pylon5 plus pylon2). You can also check your usage on the XSEDE User Portal.

If you have multiple grants, it is very important that you store your files in the correct pylon2 directory.

Grant expiration

Three months after a grant expires, the files in the pylon2 directories associated with that grant will be deleted.

pylon1 and pylon5

<p>Starting March 7, 2017, an upgraded scratch file system named pylon5 is available on Bridges to replace pylon1.</p>
<p>Be aware that any job scripts which specifically reference /pylon1 must be edited to reference /pylon5 instead.</p>
<p>You are responsible for moving your files from pylon1 to pylon5.  You will have from March 7 to March 30 to do this.  The pylon1 file system will be decommissioned on April 4, 2017, and any files remaining there will be lost.</p>

Until April 4, 2017, the text below also applies to the pylon1 file system.

The pylon5 file system is temporary storage, to be used as working space for your running jobs. It provides fast, temporary file space for data read or written by running jobs. I/O to pylon5 is much faster IO than your home directory. The pylon5 file system is shared across all of the Bridges nodes.

The pylon1 file system is not currently available on the Bridges 12-TB nodes.

pylon5 directories

The path of your pylon5 home directory is /pylon5/groupname/username, where groupname is the name for the PSC group associated with your grant. Use the "id" command to find your group name.

The command "id -Gn" will list all the groups you belong to.

The command "id -gn" will list the group associated with your current session.

If you have more than one grant, you will have a pylon5 directory for each grant. Be sure to use the appropriate directory when working with multiple grants.

pylon5 wiper

The pylon5 file system is not a persistent file system. Files are wiped after 30 days. It is also not backed up. Be sure to move copies of your important files to another location as soon as you can after you create them.

If you have a compelling reason that your files should not be wiped after 30 days, email bridges@psc.edu and request an exemption. Be sure to explain the need for the exemption and supply a deadline after which the files can be safely wiped.

pylon5 quota

Your usage quota for each of your grants is the Pylon storage allocation you received when your proposal was approved. Files stored under a grant in both pylon5 and pylon2 count together towards this storage allocation. If your total use in pylon5 and pylon2 exceeds this quota your access to pylon5 and pylon2 will be shut off until you are under quota.

Use the "du -sh" or "projects" command to check your total pylon usage (pylon5 plus pylon2). You can also check your usage on the XSEDE User Portal.

If you have multiple grants, it is very important that you store your files in the correct pylon5 directory.

Grant expiration

Three months after a grant expires, the files in any pylon5 directories associated with that grant will be deleted, although the wiper will probably have deleted them already.

Node-local ($LOCAL)

Each Bridges node has a local file system attached to it. This local file system is only visible to the node to which it is attached. The local file system provides fast access to local storage

This file space is available on all nodes as $LOCAL.

$LOCAL is only available when your job is running, and can only be used as working space for a running job. Once your job finishes your local files are inaccessible and deleted. To use local space, copy files to $LOCAL at the beginning of your job and back out to a persistent file space before your job ends.

If a node crashes all the $LOCAL files are lost. Therefore, you should checkpoint your $LOCAL files by copying them to pylon5 during long runs.

Multi-node jobs

If you are running a multi-node job the variable "$LOCAL" points to the local file space on your node that is running your rank 0 process.

You can use the "srun" command to copy files between $LOCAL on the nodes in a multi-node job. See the MPI job script in the Running Jobs section of this User Guide for details.

$LOCAL size

The maximum amount of local space varies by node type. The RSM (128GB) nodes have a maximum of 3.7TB. The LSM (3TB) nodes have a maximum of 15TB and the ESM (12TB) nodes have a maximum of 64TB.

To check on your local file space usage type:

du -sh

The LSM (3TB) nodes have a maximum of 15TB and the ESM (12TB) nodes have a maximum of 64TB.

There is no charge for the use of $LOCAL.

Memory files ($RAMDISK)

You can also use the memory allocated for your job for IO rather than using disk space. This will offer the fastest IO on Bridges.

In a running job the environment variable $RAMDISK will refer to the memory associated with the nodes in use.

The amount of memory space available to you depends on the size of the memory on the nodes and the number of nodes you are using. You can only perform IO to the memory of nodes assigned to your job.

If you do not use all of the cores on a node, you are allocated memory in proportion to the number of cores you are using. Note that you cannot use 100% of a node's memory for IO; some is needed for program and data usage.

$RAMDISK is only available to you while your job is running, and can only be used as working space for a running job. Once your job ends this space is inaccessible. To use memory files, copy files to $RAMDISK at the beginning of your job and back out to a permanent space before your job ends. If your job terminates abnormally your memory files are lost.

Within your job you can cd to $RAMDISK, copy files to and from it, and use it to open files. Use the command du -sh to see how much space you are using.

If you are running a multi-node job the $RAMDISK variable points to the memory space on your node that is running your rank 0 process.

Transferring Files

Overview

There are a variety of file transfer methods available for Bridges:

  • Globus - recommended for all file transfers of over 10 Gbytes
  • globus-url-copy - recommended for file transfers of over 10 Gbytes, if you have access to Globus client software but cannot use Globus
  • scp or sftp - recommended for external transfers under 10 Gbytes or for external transfers if you do not have access to Globus or Globus client software
  • cp - recommended for use between Bridges filesystems if the transfer is under 10 Gbytes

Transfers to your home directory

Since your Bridges home directory quota is 10 Gbytes, you should use scp or sftp for all file transfers involving your home directory. Large files cannot be stored there; they should be copied into one of your pylon file spaces instead.

Exceeding your home directory quota will prevent you from writing more data into your home directory and will adversely impact other operations you might want to perform.

Paths for Bridges file spaces

No matter which file transfer method you use, other than cp, the default directory for file transfers into a Bridges file system is a non-writeable directory. You will not be able to store any files in the default directory. This means that for transfers involving Bridges file systems, in which you are not using cp, you must always use the full path for your Bridges files. The full paths for your Bridges directories are:

Home directory /home/username
Pylon2 directory /pylon2/groupname/username
Pylon5 directory /pylon5/groupname/username

The command id -Gn will tell you your groupnames. You have a pylon2 and pylon5 directory for each grant you have.

Transfer rates

PSC maintains a Web page at http://speedpage.psc.edu that lists average data transfer rates between all XSEDE resources. If your data transfer rates are lower than these average rates or you believe that your file transfer performance is subpar, send email to bridges@psc.edu. We will examine approaches for improving your file transfer performance.

Globus

To use Globus to transfer files you must obtain proper authentication either by setting up a Globus account or by using your InCommon credentials if you have them. You can set up a Globus account at the Globus site.

Once you have the proper authentication you can initiate your file transfers using the Globus site. A Globus transfer requires a Globus endpoint, a file path and a file name for both your source and destination.

If you are using your Globus account for authentication, the Bridges endpoint is psc#bridges-xsede.

If you are using InCommon for your authentication the Bridges endpoint is psc#bridges-cilogon.

You must always specify a full path for the Bridges file systems.

Globus is recommended for file transfers internal to the PSC that are over 10 Gbytes. This includes transfers between the Data Supercell and Bridges pylon5 and pylon2 file systems and between pylon5 and pylon2.

Globus-url-copy

The globus-url-copy command can be used for file transfer if you cannot use Globus, but do have access to Globus client software.

To use globus-url-copy you must have a current user proxy certificate. The command "grid-proxy-info" will tell you if you have current user proxy certificate and if so, what the remaining life of your certificate is.

If

  • you get an error from the grid-proxy-info command,
  • you do not have a current user proxy certificate, or
  • the remaining life of your certificate is not sufficient for your planned file transfer use the myproxy-logon command to get a valid user proxy certificate.

When prompted for your MyProxy passphrase enter your XSEDE Portal password.

To use globus-url-copy for transfers to a machine, you must know the Grid FTP server address. The Grid FTP server address for Bridges is

gsiftp://gridftp.bridges.psc.edu

The use of globus-url-copy always requires full paths.

If you have file transfers internal to the PSC that are over 10 Gbytes, but you cannot use Globus, you can use globus-url-copy. The globus-url-copy and myproxy-logon commands are available on Bridges.

SCP

To use scp for a file transfer you must specify a source and destination for your transfer. The format for either source or destination is

username@machine-name:path/filename

For transfers involving Bridges, username is your PSC username. The machine-name should be specified as follows:

If your transfer is Use machine
10 Gbytes or less bridges.psc.edu
Over 10 Gbytes and between Bridges and another XSEDE platform data.bridges.psc.edu
Over 10 Gbytes and is not between Bridges and another XSEDE platform data.bridges.psc.edu

data.bridges.psc.edu is the name for a high-speed data connector located at the PSC. All large file transfers should use this data connector.

File transfers using scp must specify full paths for Bridges file systems.

SFTP

To use sftp, first connect to your destination machine:

sftp username@machine-name

When Bridges is your destination, use your PSC userid as username. The Bridges machine-name should be specified as follows:

If your transfer is Use machine-name
10 Gbytes or less bridges.psc.edu
Over 10 Gbytes and between Bridges and another XSEDE platform data.bridges.psc.edu
Over 10 Gbytes and is not between Bridges and another XSEDE platform data.bridges.psc.edu

data.bridges.psc.edu is the name for a high-speed data connector located at the PSC. All large file transfers should use this data connector.

You will be prompted for your password on the destination machine. If Bridges is your destination machine enter your PSC password.

You can then enter sftp subcommands, like put to copy a file from your local system to the destination system, or get to copy a file from the destination system to your local system.

To copy files into Bridges you must either cd to the proper directory or use full pathnames in your file transfer commands.

Computing Environment

Bridges provides a rich programming environment for the development of applications.

C, C++ and Fortran

Intel, Gnu and PGI compilers for C, C++ and Fortan are available on Bridges. The compilers are:

  C C++ Fortran
Intel icc icpc ifort
Gnu gcc g++ gfortran
PGI pgcc pgc++ pgfortran

The Intel and Gnu compilers are loaded for you automatically.

To run the PGI compilers you must first issue the command

module load pgi

There are man pages for each of the compilers.

OpenMP programming

To compile OpenMP programs you must add an option to your compile command:

Intel -qopenmp for example: icc -qopenmp myprog.c
Gnu -fopenmp for example: gcc -fopenmp myprog.c
PGI -mp for example: pgcc -mp myprog.c

MPI programming

Three types of MPI are supported on Bridges: MVAPICH2, OpenMPI and Intel MPI.

There are two steps to compile an MPI program:

  1. Load the correct module for the compiler and MPI type you want to use, unless you are using Intel MPI. The Intel MPI module is loaded for you on login.
  2. Issue the appropriate MPI wrapper command to compile your program

INTEL COMPILERS

To use the Intel compilers with Load this module Compile C with this command Compile C++ with this command Compile Fortran with this command
Intel MPI none, this is loaded by default mpiicc mpiicpc mpiifort
OpenMPI mpi/intel_openmpi mpicc mpicxx mpifort
MVAPICH2 mpi/intel_mvapich mpicc code.c -lifcore mpicxx code.cpp -lifcore mpifort code.f90 -lifcore

GNU COMPILERS

To use the Gnu compilers with Load this module Compile C with this command Compile C++ with this command Compile Fortran with this command
Intel MPI none, this is loaded by default mpicc mpicxx mpifort
OpenMPI mpi/gcc_openmpi
MVAPICH2 mpi/gcc_mvapich

PGI COMPILERS

To use the PGI compilers with Load this module Compile C with this command Compile C++ with this command Compile Fortran with this command
OpenMPI mpi/pgi_openmpi mpicc mpicxx mpifort
MVAPICH2 mpi/pgi_mvapich

Other languages

Other languages, including Java, Python, R, and MATLAB, are available. See the software section for information.

Debugging and performance

  • DDT: DDT is a debugging tool for C, C++ and Fortran 90 threaded and parallel codes. It is client-server software. Install the client on your local machine and then you can access the GUI on Bridges to debug your code. See the DDT page for more information.

  • VTune: VTune is a performance analysis tool from Intel for serial, multithreaded and MPI applications. Install the client on your local machine and then you can access the GUI on Bridges. See the VTune page for more information.

Software available on Bridges

The Module package

The environment management package Module is essential for running software on PSC systems. Be sure to check if there is a module for the software you want to use by typing module avail software-name

The module help command lists any additional modules that must also be loaded. Note that in some cases the order in which these additional modules are loaded matters.

View the module package documentation.

You may request that additional software be installed by sending mail to remarks@psc.edu.

PSC provides an up-to-date list of available software.

Running Jobs

You can manage your work on Bridges in either interactive or batch mode.

In an interactive session, you type commands and receive output back to your screen as the commands complete. See Interactive sessions below for more information.

To work in batch mode, you must first create a batch (or job) script which contains the commands to be run, then submit the job to a SLURM partition. It will be scheduled and run as soon as possible. See Batch jobs below for more information.

Whether you use an interactive session or a batch job, all of your computing must be done on Bridges compute nodes. You cannot do your work on login nodes.

In both interactive and batch modes, the SLURM scheduler controls access to all of the compute nodes. SLURM manages five partitions, which are defined by the type of compute node that they control. The Partitions section below contains more information.

Partitions

There are five partitions to which batch jobs and interactive sessions can be directed:

  • RM, for jobs that will run on Bridges RSM (128GB) nodes.
  • RM-shared, for jobs that will run on the RSM nodes, but share nodes with other jobs.
  • LM, for jobs that will run on Bridges LSM and ESM (3TB and 12TB) nodes.
  • GPU, for jobs that will run on Bridges GPU nodes.
  • GPU-shared, for jobs that will run on Bridges GPU nodes, but share nodes with other jobs

All the partitions use FIFO scheduling, although if the top job in the partition will not fit on the machine, SLURM will skip that job and try to schedule the next job in the partition.

RM partition

When submitting a job to the RM partition, you must specify the number of nodes, the number of cores and the walltime that it needs.

Jobs in the RM partition do not share nodes, so jobs are allocated all the cores associated with the nodes assigned to them. Your job will be charged for all the cores associated with your assigned nodes. However, although RM jobs can use more than one node, the memory space of all the nodes is not an integrated memory space. The cores within a node access a shared memory space. Cores in different nodes do not.

Internode communication performance of jobs in the RM partition is best when using 42 or fewer nodes.

RM Parameters
Default walltime 30 minutes
Max walltime 48 hours
Default # nodes 1

RM-shared partition

Jobs in the RM-shared partition will share nodes, but not cores. By using less than 28 cores your job will be charged less. It could also start running sooner.

When submitting a job to the RM-shared partition, you must specify the number of cores, the number of nodes, and the walltime it needs. The number of nodes requested must always be 1.

Jobs are assigned memory in proportion to their number of requested cores. You get the fraction of the node's total memory in proportion to the fraction of cores you requested. If your job exceeds this amount of memory it will be killed.

RM-shared parameters
Default walltime 30 minutes
Max walltime 48 hours
Number of nodes Always 1
Default # cores 1
Max # cores 28

LM partition

Jobs in the LM partition always share nodes. They never span nodes.

When submitting a job to the LM partition, you must specify the amount of memory in GB and the walltime that it needs.

Any value up to 12000GB can be requested.

SLURM will place a job on a 3TB or a 12TB node based on the memory request.

Jobs asking for 3000GB or less will run on a 3TB node, unless there are none available and there is a 12TB node available; then the job will run on a 12TB node.

The number of cores assigned to jobs in the LM partition is proportional to the amount of memory requested. For every 48 GB of memory you get 1 core.

LM parameters
Default walltime 30 minutes
Max walltime 96 hours
Default memory There is no default value
Max memory 12000 GB

GPU Partition

Jobs in the GPU partition use Bridges' GPU nodes. Note that Bridges has 2 types of GPU nodes: K80s and P100s. See the System Configuration section of this User Guide for the details of each type.

Jobs in the GPU partition do not share nodes, so jobs are allocated all the cores associated with the nodes assigned to them and all of the GPUs. Your job will be charged for all the cores associated with your assigned nodes.

However, the memory space across nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.

When submitting a job to the GPU partition, you must specify the number of GPUs.

You should also specify:

  • the type of node you want, K80or P100, with the --gres=type option to the interact or sbatch commands. K80 is the default if no type is specified. See the sbatch command options below for more details.
  • the number of nodes
  • the walltime limit

For information on requesting resources and submitting a job to the RM partition see the section below on the interact or the sbatch commands.

GPU parameters
Default walltime 30 minutes
Max walltime 48 hours
Default # nodes 1
Max # nodes 16

GPU-Shared Partition

Jobs in the GPU-shared partition run on Bridges GPU nodes. Note that Bridges has 2 types of GPU nodes: K80s and P100s. See the System Configuration section of this User Guide for the details of each type.

Jobs in the GPU-shared partition share nodes, but not cores. By sharing nodes your job will be charged less. It could also start running sooner.

You will always run on (part of) one node in the GPU-shared partition.

Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node's total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.

When submitting a job to the GPU-shared partition, you must specify the number of GPUs.

You should also specify:

  • the type of node you want, K80or P100, with the --gres=type option to the interact or sbatch commands. K80 is the default if no type is specified. See the sbatch command options below for more details.
  • the walltime limit

For information on requesting resources and submitting a job to the RM partition see the section below on the interact or the sbatch commands.

GPU-shared parameters
Default walltime 30 minutes
Max walltime 48 hours
Number of nodes Always 1
Default # of GPUs No default
Max # of GPUs 4

Interactive sessions

You can work on Bridges in an interactive session. You must still be allocated the use of one or more compute nodes by SLURM. You cannot use the Bridges login nodes for your work.

You can run an interactive session in all five SLURM partitions. You will need to specify which partition you want, so that the proper resources are allocated for your use.

Resources are set aside for interactive use. If those resources are all in use, your request will wait until it can be fulfilled . An interactive session directed at a partition that shares nodes will probably start sooner than an interactive session directed at a partition that does not share nodes.

To start an interactive session, use the command interact.

The simplest interact command is

login1$ **interact**

This command will start an interactive job in the RM-shared partition that will run on 1 core for 60 minutes.

If you want to run using different parameters you will need to use options to the interact command. You must specify all options on the command line.

The available options are:

Option Description Default value
-p Partition requested RM-shared
-t Walltime requested in HH:MM:SS 60:00 (1 hour)
-N Nodes requested 1
-A Group to charge the job to Your default group Find your default group
-R Reservation name for the job if you have one No default
--mem Note the "--" for this option Memory requested No default
--gres Note the "--" for this option Special resources requested, such as GPUs No default
--ntasks-per-node Note the "--" for this option Number of cores to allocate per node 1
-h Help, lists all the available command options

Sample interact commands

Run in the RM-shared partition using 4 cores interact --ntasks-per-node=4

Run in the LM partition interact -p LM --mem=2000GB

Run in the GPU-shared partition. This command requests 2 GPUs. The job will be allocated 14 cores (7 per GPU). interact -p GPU-shared --gres=gpu:2

Once the interact command returns with a command prompt, you can enter your commands. The shell will be your default shell. When you are finished with your job type CTRL-D.

Use of the "-R" option does not automatically set any other interact options. You need to specify your other options as you would for an interact command without the "-R" option.

You will be charged for your resource usage from the time your job starts until you type CTRL-D, so be sure to type CTRL-D as soon as you are done. Interactive jobs that are inactive for 30 minutes will be logged out by the system.

If you want more complex control over your interactive job you can use the srun command instead of the interact command.

See the srun man page.

Batch jobs

To run a batch job, you must first create a batch (or job) script, and then submit the script using the sbatch command.

A batch script is a file that consists of SBATCH directives, executable commands and comments. SBATCH directives are an alternative to specifying your resource requests and other job options on the sbatch command line.

Sample OpenMP batch script

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM
#SBATCH --ntasks-per-node 28
#SBATCH -t 5:00:00
# echo commands to stdout 
set -x

# move to working directory
cd /pylon5/groupname/username
# copy input file to working directory
cp /pylon2/groupname/username/input.data .

# run OpenMP program
export OMP_NUM_THREADS=28
./myopenmp

# copy output file to persistent space
cp output.data /pylon2/groupname/username

Notes:

  • The first line of any batch script must indicate the shell to use for your batch job. This example uses bash.
  • The SBATCH directives must be preceded by a '#' character and start in column 1. The other lines with a '#' character are comments.
  • For username and groupname you must substitute your username and your appropriate group.
  • The --ntasks-per-node option indicates that you will use all 28 cores.

Sample MPI batch script

#!/bin/bash
#SBATCH -p RM
#SBATCH -t 5:00:00
#SBATCH -N 2
#SBATCH --ntasks-per-node 28

#echo commands to stdout
set -x

#move to working directory
cd /pylon5/groupname/username

#copy input files to LOCAL file storage
srun -N $SLURM_NNODES --ntasks-per-node=1 \
    sh -c 'cp /pylon2/groupname/username/input.${SLURM_PROCID} $LOCAL'

#run MPI program
mpirun -np $SLURM_NTASKS ./mympi

#copy output files to persistent space
srun -N $SLURM_NNODES --ntasks-per-node=1 \ 
    sh -c 'cp $LOCAL/output.* /pylon2/groupname/username'

The variable SLURM_NTASKS gives the total number of cores requested in a job. In this example the value of the variable will be 56 since you asked for 2 nodes with the "-N" option and all 28 cores on each nodes with the "--ntasks-per-node" option.

The srun command is used to copy files between pylon2 and the LOCAL file systems on each of your nodes. The first srun command assumes you have two files named input.0 and input.1 in your pylon2 file space. It will copy input.0 and input.1 to, respectively, the LOCAL file systems on the first and second nodes allocated to your job. The second srun command will copy files named output.* back from your LOCAL file systems to your pylon2 file space before your job ends. In this command * functions as the usual Unix wildcard.

Sample RM-shared batch script

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH -t 5:00:00
#SBATCH --ntasks-per-node 2

#echo commands to stdout
set -x

#move to working directory
cd /pylon5/groupname/username

#copy input file to working directory
cp /pylon2/groupname/username/input.data .

#run OpenMP program
export OMP_NUM_THREADS 2
./myopenmp

#copy output file to persistent space
cp output.data /pylon2/groupname/username

When using the RM-shared partition the number of nodes requested with the "-N" option must always be 1. The --ntasks-per-node option indicates how many cores you want.

Sample GPU batch script

#!/bin/bash
#SBATCH -N 2
#SBATCH -p GPU
#SBATCH --ntasks-per-node 28
#SBATCH -t 5:00:00
#SBATCH --gres=gpu:4

#echo commands to stdout
set -x

#move to working directory
cd /pylon5/groupname/username

#copy input to working directory
cp /pylon2/groupname/username/input.data .

#run GPU program
./mygpu

#copy output file to persistent storage
cp output.data /pylon2/groupname/username

The value of the --gres-gpu option indicates the number of GPUs you want. In the GPU partition the value must always be 4.

Sample GPU-shared batch script

#!/bin/bash
#SBATCH -N 1
#SBATCH -p GPU-shared
#SBATCH --ntasks-per-node 7
#SBATCH --gres=gpu:1
#SBATCH -t 5:00:00

#echo commands to stdout
set -x

#move to working directory
cd /pylon5/groupname/username

#copy input file to working directory
cp /pylon2/groupname/username/input.data .

#run GPU program
/mygpu

#copy output file to persistent storage
cp output.data /pylon2/groupname/username

The option `--gres-gpu` indicates the number of GPUs you want. The option `--ntasks-per-node` indicates the number of cores you want. It must be greater than or equal to 7 in the GPU-shared partition.

sbatch command

To submit a batch job, use the sbatch command.

RM partition

An example of a sbatch command to submit a job to the RM partition is sbatch -p RM -t 5:00:00 -N 1 myscript.job where:

  • -p indicates the intended partition
  • -t is the walltime requested in the format HH:MM:SS
  • -N is the number of nodes requested
  • myscript.job is the name of your batch script

The options to sbatch can either be in your batch script or on your sbatch command line. The latter override the former.

LM partition

Jobs submitted to the LM partition must request the amount of memory they need rather than the number of cores. Each core on the 3TB and 12TB nodes is associated with a fixed amount of memory, so the amount of memory you request determines the number of cores assigned to your job. The environment variable SLURM_NTASKS tells you the number of cores assigned to your job. Since there is no default memory value you must always include the --mem option for the LM partition.

A sample sbatch command for the LM partition is: sbatch -p LM - t 10:00:00 --mem 2000GB myscript.job where:

  • -p indicates the intended partition (LM)
  • -t is the walltime requested in the format HH:MM:SS
  • --mem is the amount of memory requested
  • myscript.job is the name of your batch script

Jobs in the LM partition do share nodes. They cannot span nodes. Your memory space for an LM job is an integrated, shared memory space.

There are many useful options to the sbatch command.

The -d option can be used to set up dependencies between your jobs.

The --res option is used to specify the name of your job's reservation if you have a reservation. Use of the --res option does not automatically set any other sbatch options. You need to specify your other options as you would for an sbatch command without the --res option.

The --mail-type and --mail-user options can be used to have email sent to you when your job changes state.

The --no-requeue option specifies that your job will be not be requeued under any circumstances. If your job is running on a node that fails it will not be restarted.

The --time-min option specifies a minimum walltime for your job. When the scheduler is considering which job to put on the machine next the walltime request of jobs is a factor in this decision. Free slots on the machine are defined by the number of nodes and the length of time those nodes will be free before they will be used by another job. By specifying a minimum walltime you are allowing the scheduler to reduce your walltime request to at most your specified minumum time when it decides whether to schedule your job. This could improve your job's turnaround. If you use this option your actual walltime assignment can vary between your minumum time and the time you specified with the -t option. If your job hits its actual walltime limit, whatever it is, your job will be killed. Thus, when you use this option you should perform frequent checkpointing of your job so you waste as little of your processing time as possible. The value for --time-min is specified in the same manner as for the -t option.

If you are asking for more than 1 and fewer than 42 nodes, your job will run more efficiently if it runs on one switch. A switch is a hardware grouping of 42 nodes. You can request that your job run on one switch with the option

--switches=1[@max-time]

If you use the switches option your job will probably wait longer in the queue until a switch is free, because normally switches are shared across jobs. The optional max-time parameter can be used to indicate a maximum time that you will let your job wait for a switch. If it has waited this maximum time, your request for your job to be run on a switch will be cancelled. The format of the value for max-time is the same as for the -t option.

For more information about these options and other useful sbatch options see the sbatch man page.

sinfo command

The sinfo command displays information about the state of Bridges nodes. The nodes can have several states:

alloc Allocated to a job
down Down
drain Not available for scheduling
idle Free
resv Reserved

squeue command

The squeue command displays information about the jobs in the partitions. Some useful options are:

-j jobid Displays the information for the specified jobid
-u username Restricts information to jobs belonging to the specified username
-p partition Restricts information to the specified partition
-l (long) Displays information including: time requested, time used, number of requested nodes, the nodes on which a job is running, job state and the reason why a job is waiting to run.

See the squeue man page for more options, for a discussion of the codes for job state and for why a job is waiting to run.

scancel command

The scancel command is used to kill a job in a partition, whether it is running or still waiting to run. Specify the jobid for the job you want to kill. For example,

scancel 12345

kills job # 12345.

sacct command

The sacct command can be used to display detailed information about jobs. It is especially useful in investigating why one of your jobs failed. The general format of the command is sacct -X -j XXXXXX -S MMDDYY --format parameter1,parameter2, ...

For 'XXXXXX' substitute the jobid of the job you are investigating. The date given for the -S option is the date at which sacct begins searching for information about your job.

The --format option determines what information to display about a job. Useful paramaters are JobID, Partition, Account, ExitCode, State, Start, End, Elapsed, NodeList, NNodes, MaxRSS and AllocCPUs. The ExitCode and State parameters are especially useful in determining why a job failed. NNodes displays how many nodes your job used, while AllocCPUs displays how many cores your job used. MaxRSS displays how much memory your job used. The commas between the parameters in the --format option cannot be followed by spaces.

See the man page for sacct for more information.

srun command

Monitoring memory usage

It can be useful to find the memory usage of your jobs. For example, you may want to find out if memory usage was a reason a job failed or if you need to move your job from the 3-TB nodes to the 12-TB nodes.

There are two cases. You can determine a job's memory usage if it is still running or when it it has finished.

If your job is still running, which you can determine with the squeue command, you can issue the command:

srun --jobid=XXXXXX top -b -n 1 | grep userid

For 'XXXXXX' substitute the jobid of your job. For 'userid' substitute your userid. The RES field in the top output shows the actual amount of memory used by a process. The top man page can be used to identify the fields of top output.

The other method to use for a running job is to issue the command

sstat -j XXXXXX.batch --format=JobID,MaxRss

For 'XXXXXX' substitute your jobid.

If your job has finished there are two methods to use to find out its memory usage. If you are checking within a day or two after your job has finished you can issue the command:

sacct -j XXXXXX --format=JobID,MaxRss

If this command no longer shows a value for MaxRss then you must use the command: job_info XXXXXX | grep max_rss

Again, substitute your jobid for 'XXXXXX' in both of these commands.

There are man pages for top, srun, sstat and sacct if you need more information.

More help with SLURM

There are man pages for all the SLURM commands. SLURM also has extensive online documentation.

Using Bridges GPU nodes

A standard NVIDIA accelerator environment is installed on Bridges GPU nodes. If you have programmed using GPUs before, you should find this familiar.

GPU Nodes: There are 16 GPU nodes in Phase 1 of Bridges. They are HPE Apollo 2000s, each with 2 NVIDIA K80 GPUs, 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU) and 128GB RAM.

Each Bridges Phase 1 RSM GPU node contains

  • 2 NVIDIA K80 GPUs
  • 2 Intel Xeon E5-2695 v3 CPUs, each with ** 14 cores, 2.3 GHz base frequency, 3.3 GHz max turbo frequency ** 35MB LLC
  • 128GB RAM, DDR4 @ 2133 MHz

File Systems

The /home file system, Pylon1 and Pylon2 are available on all of these nodes. See the File Spaces section of the User Guide for more information on these file systems.

Compiling and Running Jobs

Use the GPU queue, either in batch or interactively, to compile your code and run your jobs. See the Running Jobs section of the User Guide for more information on Bridges queues and how to run jobs.

OpenACC

Our primary GPU programming environment is OpenACC.

The PGI compilers are available on all GPU nodes. To set up the appropriate environment for the PGI compilers, use the "module" command: module load pgi

Read more about the module command at PSC

If you will be using these compilers often, it will be useful to add this command to your shell initialization script.

There are many options available with these compilers. See the online man pages

  • man pgf90
  • man pgcc
  • man pgCC

for detailed information.

You may find these basic OpenACC options a good place to start: pgcc -acc yourcode.c pgf90 -acc yourcode.f90

Adding the -Minfo=accel flag to the compile command (whether pgf90, pgcc or pgCC) will provide useful feedback regarding compiler errors or success with your OpenACC commands. pgf90 -acc yourcode.f90 -Minfo=accel

Profiling

Enable the runtime GPU performance profiling by setting the "PGI_ACC_TIME" environment variable. The command to do that depends on the which shell you are using:

Unix shells

A Unix shell is a command-line interpreter that provides a traditional user interface for Unix and Unix-like systems.

The two major shell types are the Bourne shell and the C shell. Each type has its own commands and syntax.

The default shell on Bridges is bash, a Bourne-type shell. Other shells, including some C-type shells, are available for you to use if you prefer.

bash csh
export PGI_ACC_TIME=1 setenv PGI_ACC_TIME 1

We may install additional profiling tools in the future or at user request.

Debugging

Basic debugging can be accomplished by setting the $PGI_ACC_NOTIFY environment variable. The command to do that depends on the which shell you are using:

bash csh
export PGI_ACC_NOTIFY=1 setenv PGI_ACC_NOTIFY 1

If you want more detail, set $PGI_ACC_NOTIFY to 3.

A further level of debugging is available with the $PGI_ACC_DEBUG environment variable. The command to set that variable depends on the which shell you are using:

bash csh
export PGI_ACC_DEBUG=1 setenv PGI_ACC_DEBUG 1

Send email to bridges@psc.edu to request that additional CUDA-oriented debugging tools be installed.

Hadoop and Spark

If you want to run Hadoop or Spark on Bridges, you should note that when you apply for your account.

Accessing the cluster

  1. Request resources: Request access to the Bridges Hadoop cluster by completing the Hadoop/Spark request form. You will be contacted within one business day about your request. When your reservation is ready, you will be emailed a list of nodes in your reservation. The lowest node is the namenode.
  2. Connect to the Hadoop cluster: Log into bridges.psc.edu. From there, ssh to your namenode. You can now run or submit Hadoop jobs.
  3. Monitor your jobs: If you like, you can monitor your Hadoop/Spark jobs while they run, including browsing the filesystem, by setting up a web proxy. See the Hadoop Proxy Set-up Guide for instructions.

/home

The /home file system, which contains your home directory, is available on all of these nodes.

HDFS

The Hadoop filesystem, HDFS, is available from all Hadoop nodes. There is no explicit quota for the HDFS, but it uses the disk space on your reserved node. Files must reside in HDFS to be used in Hadoop jobs. Putting files into HDFS requires these steps:

  • Transfer the files to the namenode with scp or sftp
  • Format them for ingestion into HDFS
  • Use the "hadoop fs -put" command to copy the files into HDFS. This command distributes your data files across the cluster's datanodes.

The "hadoop fs" command should be in your command path by default.

Documentation for the "hadoop fs" command lists other options. These options can be used to list your files in HDFS, delete HDFS files, copy files out of HDFS and other file operations.

To request the installation of data ingestion tools on the Hadoop cluster send email to bridges@psc.edu.

A Simple Hadoop Example

This section demonstrates how to run a MapReduce Java program on the Hadoop cluster. This is the standard paradigm for Hadoop jobs. If you want to run jobs using another framework or in other languages besides Java send email to bridges@psc.edu for assistance.

Follow these steps to run a job on the Hadoop cluster. All the commands listed below should be in your command path by default. The variable HADOOP_HOME should be set for you also.

  1. Compile your Java MapReduce program with a command similar to:
    javac -cp $HADOOP_HOME/hadoop-core*.jar -d WordCount WordCount.java

    where:

    • WordCount is the name of the output directory where you want your class file to be put
    • WordCount.java is the name of your source file
  2. Create a jar file out of your class file with a command similar to:
    jar -cvf WordCount.jar -C WordCount/ .

    where:

    • WordCount.jar is the name of your output jar file
    • WordCount is the name of the directory which contains your class file
  3. Launch your Hadoop job with the "hadoop" command

    Once you have your jar file you can run the hadoop command to launch your Hadoop job. Your hadoop command will be similar to

    hadoop jar WordCount.jar org.myorg.WordCount \/datasets/compleat.txt $MYOUTPUT

    where:

    • Wordcount.jar is the name of your jar file
    • org.myorg.WordCount specifies the folder hierarchy inside your jar file. Substitute the appropriate hierarchy for your jar file.
    • /datasets/compleat.txt is the path to your input file in the HDFS file system. This file must already exist in HDFS.
    • $MYOUTPUT is the path to your output file, which will be saved in the HDFS file system. You must set this variable to the output file path before you issue the hadoop command.

After you issue the hadoop command your job is controlled by the Hadoop scheduler to run on the datanodes. The scheduler is currently a stricty FIFO scheduler. If your job turnaround is not meeting your needs send email to bridges@psc.edu.

When your job finishes, the hadoop command will end and you will be returned to the system prompt.

Spark

The Spark data framework is available on Bridges. Spark, built on the HDFS filesystem, extends the Hadoop MapReduce paradigm in several directions. It supports a wider variety of workflows than MapReduce. Most importantly, it allows you to process some or all of your data in memory if you choose. This enables very fast parallel processing of your data.

Python, Java and Scala are available for Spark applications. The pyspark interpreter is especially effective for interactive, exploratory tasks in Spark. To use Spark you must first load your data into Spark's highly efficient file structure called Resilient Distributed Dataset (RDD).

Extensive online documentation is available at the Spark website. If you have questions about or encounter problems using Spark, send email to bridges@psc.edu.

Other Hadoop Technologies

An entire ecosystem of technologies has grown up around Hadoop, such as HBase and Hive. To request the installation of a different package send email to bridges@psc.edu.

Virtual Machines

A Virtual Machine (VM) is a portion of a physical machine that is partitioned off through software so that it acts as an independent physical machine. You should indicate that you want a VM when you apply for time on Bridges. When you have an active Bridges grant, use the VM Request form to request a VM. You will be contacted in one business day about your request.

Why use a VM?

A VM will look to you as if you are logging into and using your laptop, but you will have access to the computing power, memory capacity and file spaces of Bridges.

Common uses of VMs include hosting database and web servers. These servers can be restricted just to you or you can open them up to outside user communities to share your work. You can also connect your database and web servers and other processing components in a complex workflow.

VMs provide several other benefits. Since the computing power behind the VM is a supercomputer, sufficient resources are available to support multiple users. Since each VM acts like an independent machine, user security is heightened. No outside users can violate the security of your independent VM. However, you can allow other users to access your VM if you choose.

A VM can be customized to meet your requirements. PSC will set up the VM and give you access to your database and web server at a level that matches your requirements.

To discuss whether a VM would be appropriate for your research project send email to bridges@psc.edu.

Data Collections

Bridges hosts both public and private datasets, providing rapid access for individuals, collaborations and communities with appropriate protections.

Data collections are stored on pylon2, Bridges persistent file system. The space they use counts toward the Bridges storage allocation for the grant hosting them.

If you would like to store a large data collection on Bridges, submit the Community Dataset Request form.

Publicly available datasets

Some data collections are available to anyone with a Bridges account. They include:

Natural Languge Tool Kit Data

NLTK comes with many corpora, toy grammars, trained models, etc. A complete list of the available data is posted at: http://www.nltk.org/nltk_data/

NLTK is available on Bridges at /pylon2/datasets/community/nltk.

Help

To report a problem on Bridges, please email bridges@psc.edu. Please report only one problem per email; it will help us to track and solve any issues more quickly and efficiently.

Be sure to include

  • the JobID
  • the error message you received
  • the date and time the job ran
  • any other pertinent information
  • a screen shot of the error or the output file showing the error, if possible