Last update: June 19, 2020
Connecting to Bridges
Before the first time you connect to Bridges, you must create your PSC password. Depending on your preferences, you may want to change your login shell once you are logged in.
We take security very seriously! Be sure to read and comply with PSC policies on passwords, security guidelines, resource use, and privacy.
If you have questions at any time, you can send email to bridges@psc.edu.
Create or change your PSC password
If you do not already have an active PSC account, you must create a PSC password (also called a PSC Kerberos password) before you can connect to Bridges. Your PSC password is the same on all PSC systems, so if you have an active account on another PSC system, you do not need to reset it before connecting to Bridges.
Your PSC password is separate from your XSEDE User Portal password. Resetting one password does not change the other password.
Setting your initial PSC password
To set your initial PSC password, use the web-based PSC password change utility.
Changing your PSC password
There are two ways to change or reset your PSC password:
- Use the web-based PSC password change utility.
- Use the
kpasswd
command when logged into a PSC system. Do not use thepasswd
command.
When you change your PSC password, whether you do it via the online utility or via the kpasswd
command on one PSC system, you change it on all PSC systems.
Connect to Bridges
When you connect to Bridges, you are connecting to one of its login nodes. The login nodes are used for managing files, submitting batch jobs and launching interactive sessions. They are not suited for production computing.
See the Running Jobs section of this User Guide for information on production computing on Bridges.
There are several methods you can use to connect to Bridges.
You can access Bridges through a web browser by using the OnDemand software. You will still need to understand Bridges' partition structure and the options which specify job limits like time and memory use, but OnDemand provides a more modern, graphical interface to Bridges.
See the OnDemand section of this User Guide for more information.
You can connect to a traditional command line interface by logging in via one of these:
ssh
, using either XSEDE or PSC credentials. If you are registered with XSEDE for DUO Multi-Factor Authentication (MFA), you can use this security feature in connecting to Bridges.
See the XSEDE instructions to set up DUO for MFA.- XSEDE Single Sign On Hub, including using Multi-Factor authentication if you are an XSEDE user
This section explains how to use ssh or XSEDE Single Sign On to access Bridges.
SSH
You can use an SSH client from your local machine to connect to Bridges using either your PSC or XSEDE credentials.
SSH is a program that enables secure logins over an unsecure network. It encrypts the data passing both ways so that if it is intercepted it cannot be read.
SSH is client-server software, which means that both the user's local computer and the remote computer must have it installed. SSH server software is installed on all the PSC machines. You must install SSH client software on your local machine.
Free ssh clients for Macs, Windows machines and many versions of Unix are available. Popular ssh clients (GUI) include PuTTY for Windows and Cyberduck for Macs. A command line version of ssh is installed on Macs by default; if you prefer that, you can use it in the Terminal application. You can also check with your university to see if there is an ssh client that they recommend.
Once you have an SSH client installed, you can use either your PSC credentials or XSEDE credentials (optionally with DUO MFA) to connect to Bridges. Note that you must have created your PSC password before you can use SSH to connect to Bridges.
Use ssh
to connect to Bridges using XSEDE credentials and (optionally) DUO MFA
- Using your SSH client, connect to hostname
bridges.psc.xsede.org
orbridges.psc.edu
using port 2222.
Either hostname will connect you to Bridges, but you must specify port 2222. - Enter your XSEDE username and password when prompted.
- (Optional) If you are registered with XSEDE DUO, you will receive a prompt on your phone. Once you have answered it, you will be logged in.
Use ssh
to connect to Bridges using PSC credentials
- Using your SSH client, connect to hostname
bridges.psc.xsede.org
orbridges.psc.edu
using the default port (22).
Either hostname will connect you to Bridges. You do not have to specify the port. - Enter your PSC username and password when prompted.
Read more about using SSH to connect to PSC systems.
Public-private keys
You can also use public-private key pairs to connect to Bridges. To do so, you must first fill out this form to register your keys with PSC.
XSEDE Single Sign On
XSEDE users can use their XSEDE usernames and passwords in the XSEDE User Portal Single Sign On Login Hub (SSO Hub) to access bridges.psc.xsede.org
or bridges.psc.edu
.
You must use DUO Multi-Factor Authentication in the SSO Hub.
See the XSEDE instructions to set up DUO for Multi-Factor Authentication.
System Configuration
Bridges comprises over 850 computational nodes:
- Regular Shared Memory (RSM) with 128GB RAM each
- Large Shared Memory (LSM) with 3TB RAM each
- Extreme Shared Memory (ESM) with 12TB RAM each
- Regular Shared Memory GPU (RSM-GPU) 48 nodes of two types:
- 16 nodes with 2 Tesla K80 GPUs and 128GB RAM each
- 32 nodes with 2 Tesla P100 GPUs and 128GB RAM each
- AI-GPU (AI-GPU) nodes - ten nodes of two types:
- Nine with 8 Volta V100 GPUs, 16GB GPU memory and 192GB RAM each ("Volta 16")
- An NVIDIA DGX-2, with 16 Volta V100 GPUs, 1.5TB RAM and an NVSwitch, which tightly couples the GPUs for capability and scaling ("Volta 32")
Bridges computational nodes supply 1.3018 Pf/s and 274 TiB RAM. The Bridges system also includes more than 6PB of node-local storage and 10PB of shared storage in the Pylon file system.
In addition to its computational nodes, Bridges contains a number of login, database, web server and data transfer nodes.
RSM nodes | ||
---|---|---|
Number | 752 | |
CPUs | 2 Intel Haswell (E5-2695 v3) CPUs; 14 cores/CPU; 2.3 - 3.3 GHz | |
RAM | 128GB, DDR4-2133 | |
Cache | 35MB LLC | |
Node-local Storage | 2 HDDs, 4TB each | |
Server | HPE Apollo 2000 | |
Node names | r001 – r752 | |
LSM nodes | ||
Number | 8 | 34 |
CPUs | 4 Intel Xeon E5-8860 v3 CPUs; 16 cores per CPU; 2.2 -3.2 GHz | 4 Intel Xeon E7-8870 v4 CPUs; 20 cores/CPU; 2.1 -3.0 GHz |
RAM | 3TB, DDR4-2133 | 3TB, DDR4-2400 |
Cache | 40MB LLC | 50MB LLC |
Node-local Storage | 4 HDDs, 4TB each | 4 HDDs, 4TB each |
Server | HPE ProLiant DL580 | HPE ProLiant DL580 |
Node names | l001 – l008 | l009 – l042 |
ESM nodes | ||
Number | 2 | 2 |
CPUs | 16 Intel Xeon E7-8880 v3 CPUs; 18 cores/CPU; 2.3 - 3.1 GHz | 16 Intel Xeon E7-8880 v4 CPUs; 22 cores/CPU; 2.2 - 3.3 GHz |
RAM | 12TB, DDR4-2133 | 12TB, DDR4-2400 |
Cache | 45MB LLC | 55MB LLC |
Node-local Storage | 16 HDDs, 4TB each | 16 HDDs, 4TB each |
Server | HPE Integrity Superdome X | HPE Integrity Superdome X |
Node names | xl001 – xl002 | xl003 – xl004 |
RSM-GPU nodes | ||
Number | 16 | 32 |
GPUs | 2 NVIDIA Tesla K80 Kepler architecture | 2 NVIDIA Tesla P100 Pascal architecture |
GPU memory | 12GB/GPU 48GB total/node | 16GB/GPU 32GB total/node |
CPUs | 2 Intel Haswell (E5-2695 v3) CPUs; 14 cores/CPU; 2.3 - 3.3 GHz | 2 Intel Broadwell E5-2683 v4 CPUs; 16 cores/CPU; 2.1 - 3.0 GHz |
RAM | 128GB, DDR4-2133 | 128GB, DDR4-2400 |
Cache | 35MB LLC | 40MB LLC |
Node-local Storage | 2 HDDs, 4TB each | 2 HDDs, 4TB each |
Server | HPE Apollo 2000 | HPE Apollo 2000 |
Node names | gpu001 – gpu016 | gpu017 – gpu048 |
AI-GPU nodes | ||
Node Type | Volta 16 | DGX-2 (Volta 32) |
Number | 9 | 1 |
GPUs | 8 NVIDIA Volta V100 | 16 NVIDIA Volta V100 |
GPU memory | 16GB/GPU 128GB total/node | 32GB/GPU 512GB total |
CPUs | 2 Intel Xeon Gold 6148 20 cores/CPU (40 cores total) 2.4 - 3.7 GHz | 2 Intel Xeon Platinum 8168 24 cores/CPU (48 cores total) 2.7 - 3.7 GHz |
RAM | 192 GB, DDR4-2666 | 1.5 TB, DDR4-2666 |
Cache | 33MB | |
Node-local storage | 4 NVMe SSDs, 2TB each (total 8TB) | 8 NVMe SSDs, 8.84TB each (total ~30TB) |
Server | HPE Apollo 6500 | - |
Node names | gpu049 – gpu057 | gpu058 |
Database, web server, data transfer, login nodes | ||
CPUs | 2 Intel Xeon E5 series CPUs; 14 cores/CPU; 2.3 - 3.3 GHz | |
RAM | 128GB | |
Cache | 35MB LLC | |
Node-local Storage | Database nodes have additional SSDs or HDDs | |
Server | HPE ProLiant DL360s or HPE ProLiant DL380s |
Account Administration
The projects command
The projects
command will help you monitor your allocation on Bridges. You can determine what Bridges resources you have been allocated, your remaining balance, your account id (used to track usage), and more. The output below shows that this user has an allocation on Bridges AI and Bridges Large resources for computing and Bridges Pylon for file storage.
[userid@login018 ~]$ projects
Your default charging project charge id is ABC0123456. If you would like to change the default charging project
use the command change_primary_group ~charge_id~. Use the charge id listed below for the project you would like
to make the default in place of ~charge_id~
Project: XYZ654321D
PI: My Principal Investigator
Title: Important Research
Resource: BRIDGES AI
Allocation: 10,000.00
Balance: 680.49
End Date: 2030-07-15
Award Active: Yes
User Active: Yes
Charge ID: ABC0123456
*** Default charging project ***
Directories:
HOME /home/username
Resource: BRIDGES LARGE MEMORY
Allocation: 200,000.00
Balance: 84,597.05
End Date: 2030-07-15
Award Active: Yes
User Active: Yes
Charge ID: ABC0123456
*** Default charging project ***
Directories:
HOME /home/username
Resource: BRIDGES PYLON STORAGE
Allocation: 100,000.00
Balance: 21,937.62
End Date: 2030-07-15
Award Active: Yes
User Active: Yes
Charge ID: ABC0123456
Directories:
Lustre Project Storage /pylon5/ABC0123456
Lustre Storage /pylon5/ABC0123456/username
Accounting for Bridges use
Accounting for Bridges use varies with the type of node used, which is determined by the type of allocation you have: "Bridges Regular", for Bridges' RSM (128GB) nodes); "Bridges Large", for Bridges LSM and ESM (3TB and 12TB) nodes; "Bridges GPU", for Bridges' K80 and P100 GPU nodes; and "Bridges AI" for Bridges' Volta GPU nodes and DGX-2 system.
Usage is defined in terms of "Service Units" or SUs. The definition of an SU varies with the type of node being used.
Bridges regular
The RSM nodes are allocated as "Bridges regular". This does not include Bridges' GPU nodes. Each RM node holds 28 cores, each of which can be allocated separately. Service Units are defined in terms of "core hours": the use of one core for 1 hour.
1 core-hour = 1 SU
Because the RM nodes each hold 28 cores, if you use one entire RM node for one hour, 28 SUs will be deducted from your allocation.
28 cores x 1 hour = 28 core-hours = 28 SUs
If you use 2 cores on a node for 30 minutes, 1 SU will be deducted from your allocation.
2 cores x 0.5 hours = 1 core-hour = 1 SU
Bridges large
The LSM and ESM nodes are allocated as "Bridges large". Accounting for the LM and ESM nodes is done by the memory requested for the job. Service Units (SUs) are defined in terms of "TB-hours": the use of 1TB of memory for one hour. Note that because the memory requested for a job is set aside for your use when the job begins, SU usage is calculated based on memory requested, not on how much memory is actually used.
1 SU = 1 TB-hour
If your job requests 3TB of memory and runs for 1 hour, 3 SUs will be deducted from your allocation.
3TB x 1 hour = 3TB-hours = 3 SUs
If your job requests 8TB and runs for 6 hours, 48 SUs will be deducted from your allocation.
8TB x 6 hours = 48 TB-hours = 48 SUs
Bridges GPU
Bridges contains two kinds of GPU nodes: NVIDIA Tesla K80s and NVIDIA Tesla P100s. Service Units (SUs) for GPU nodes are defined in terms of "gpu-hours": the use of one GPU Unit for one hour.
Because of the difference in the performance of the nodes, SUs are calculated differently for the two types of nodes.
K80 nodes
The K80 nodes hold 4 GPU units each, which can be allocated separately. Service units (SUs) are defined in terms of gpu-hours:
For K80 GPU nodes, 1 gpu-hour = 1 SU
If you use 2 entire K80 nodes for 1 hour, 8 SUs will be deducted from your allocation.
4 GPU units/node x 2 nodes x 1 hour = 8 gpu-hours = 8 SUs
If you use 2 GPU units for 3 hours, 6 SUs will be deducted from your allocation.
2 GPU units x 3 hours = 6 gpu-hours = 6 SUs
P100 nodes
The P100 nodes hold 2 GPU units each, which can be allocated separately. Service Units (SUs) are defined in terms of gpu-hours. Because the P100s are more powerful than the K80 nodes, the SU definition is different.
For P100 GPU nodes, 1 gpu-hour = 2.5 SUs
If you use an entire P100 node for one hour, 5 SUs will be deducted from your allocation.
2 GPU units/node x 1 node x 1 hour = 2 gpu-hours
2 gpu-hours x 2.5 SUs/gpu-hour = 5 SUs
If you use 1 GPU unit on a P100 for 8 hours, 20 SUs will be deducted from your allocation.
1 GPU unit x 8 hours = 8 gpu-hours
8 gpu-hours x 2.5 SU/gpu-hours = 20 SUs
Bridges AI
Bridges AI comprises two kinds of GPU nodes: an NVIDIA DGX-2 enterprise research AI system ("Volta 32"), and 9 HPE Apollo 6500 servers ("Volta 16"). The DGX-2 tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs, each with 32 GB of GPU memory, connected by NVLink and NVSwitch. Each Volta 16 node has 8 NVIDIA Tesla V100 GPUs, each with 16 GB of GPU memory, and the GPUs are connected by NVLink 2.0.
Service Units (SUs) for AI nodes are defined in terms of "gpu-hours": the use of one GPU Unit for one hour.
DGX-2 node
The DGX-2 node holds 16 GPU units, each of which can be allocated separately. Service Units (SUs) are defined in terms of gpu-hours:
For the DGX-2 node, 1 GPU-hour = 1 SU
If you use 2 GPUs on the DGX-2 node for 1 hour, 2 SUs will be deducted from your allocation.
2 GPU units x 1 hour = 2 gpu-hours = 2 SUs
If you use the entire DGX-2 for 3 hours, 48 SUs will be deducted from your allocation.
16 GPU units x 3 hours = 48 gpu-hours = 48 SUs
Volta 16 nodes
The Volta 16 nodes hold 8 GPU units each, each of which can be allocated separately. Service Units (SUs) are defined in terms of gpu-hours.
For Volta 16 GPU nodes, 1 gpu-hour = 1 SU
If you use an entire Volta 16 node for one hour, 8 SUs will be deducted from your allocation.
8 GPU units/node x 1 node x 1 hour = 8 gpu-hours = 8 SUs
If you use 4 GPU units on a Volta 16 for 48 hours, 196 SUs will be deducted from your allocation.
4 GPU units x 48 hours = 196 gpu-hours = 196 SUs
Accounting for file space
Every Bridges grant has a pylon storage allocation associated with it. There are no SUs deducted from your allocation for the space you use, but if you exceed your storage quota, you will not be able to submit jobs to Bridges.
Each grant has a Unix group associated with it. Every file is "owned" by a Unix group, and that file ownership determines which grant is charged for the file space. See Managing multiple grants for a further explanation of Unix groups, and how to manage file ownership if you have more than one grant.
You can check your pylon usage with the projects
command.
[username@br018]$ projects
.
.
.
Resource: BRIDGES PYLON STORAGE
Allocation: 500.00
Balance: 412.75
End Date: 2019-02-15
Award Active: Yes
User Active: Yes
Charge ID: account-1
Directories:
Lustre Project Storage /pylon5/account-1
Lustre Storage /pylon5/account-1/username
Managing multiple grants
If you have multiple grants on Bridges, you should ensure that the work you do under each grant is assigned correctly to that grant. The files created under or associated with that grant should belong to it, to make them easily available to others on the same grant.
There are two fields associated with each grant for these purposes: a SLURM account id and a Unix group.
SLURM account ids determine which grant your Bridges use is deducted from.
Unix groups determine which pylon5 allocation the storage space for files is deducted from, and who owns and can access a file or directory.
For a given grant, the SLURM account id and the Unix group are identical strings.
One of your grants has been designated as your default grant, and the account id and Unix group associated with the grant are your default account id and default Unix group.
When a Bridges job runs, any SUs it uses are deducted from the default grant. Any files created by that job are owned by the default Unix group.
Find your default account id and Unix group
To find your SLURM account ids, use the projects
command. It will display all the grants you belong to. It will also list your default account id (called charge id in the projects
output) at the top. Your default Unix group is the same.
In this example, the user has two grants with account ids account-1 and account-2. The default account id is account-2.
[myusername@br006]$ projects
Your default charging project charge id is account-2. If you would like to change the default charging project use the command change_primary_group charge_id. Use the charge id listed below for the project you would like to make the default in place of charge_id.
Project: AAA000000A
PI: My Principal Investigator
Title: Important Research
Resource: BRIDGES GPU
Allocation: 37,830.00
Balance: 17,457.19
End Date: 2030-07-15
Award Active: Yes
User Active: Yes
Charge ID: account-1
Directories:
HOME /home/myusername
.
.
.
Project: AAA111111A
PI: My Other PI
Title: More Important Research
Resource: BRIDGES PYLON STORAGE
Allocation: 57,500.00
Balance: 12,474.99
End Date: 2019-06-15
Award Active: Yes
User Active: Yes
Charge ID: account-2
*** Default charging project ***
Directories:
Lustre Project Storage /pylon5/account-2
Lustre Storage /pylon5/account-2/myusername
Use a secondary (non-default) grant
To use a grant other than your default grant on Bridges, you must specify the appropriate account id with the -A
option to the SLURM sbatch
command. See the Running Jobs section of this Guide for more information on batch jobs, interactive sessions and SLURM.
Note that using the -A
option does not change your default Unix group. Any files created during a job are owned by your default Unix group, no matter which account id is used for the job, and the space they use will be deducted from the pylon allocation for the default Unix group.
Change your Unix group for a login session
To temporarily change your Unix group, use the newgrp
command. Any files created subsequently during this login session will be owned by the new group you have specified. Their storage will be deducted from the pylon allocation of the new group. After logging out of the session, your default Unix group will be in effect again.
newgrp unix_group
Note that the newgrp
command has no effect on the account id in effect. Any Bridges usage will be deducted from the default account id or the one specified with the -A
option to sbatch
.
Change your default account id and Unix group permanently
You can permanently change your default account id and your default Unix group with the change_primary_group
command. Type:
change_primary_group -l
to see all your groups. Then type
change_primary_group account-id
to set account-id as your default.
Your default account id changes immediately. Bridges use by any batch jobs or interactive sessions following this command are deducted from the new account by default.
Your default Unix group does not change immediately. It takes about an hour for the change to take effect. You must log out and log back in after that window for the new Unix group to be the default.
Tracking your usage
There are several ways to track your Bridges usage: the xdusage
command, the projects
command, and the Grant Management System.
The xdusage
command displays project and user account usage information for XSEDE projects. Type man xdusage
on Bridges for information.
The projects
command shows information on all Bridges grants, including usage and the pylon directories associated with the grant.
xdusage
and projects
commands and the XSEDE User Portal accurately reflect the impact of a Grant Renewal but the Grant Management System currently does not. Managing your XSEDE allocation
Most account management functions for your XSEDE grant are handled through the XSEDE User Portal. Some common questions:
Change your default shell
The change_shell
command allows you to change your default shell. This command is only available on the login nodes.
To see which shells are available, type:
change_shell -l
To change your default shell, type:
change_shell newshell
where newshell is one of the choices output by the change_shell -l
command. You must use the entire path output by change_shell -l
, e.g. /usr/psc/shells/bash
. You must log out and back in again for the new shell to take effect.
File Spaces
There are several distinct file spaces available on Bridges, each serving a different function.
- Home ($HOME), your home directory on Bridges
- pylon5 ($SCRATCH), a Lustre system for persistent file storage. Pylon5 has replaced pylon1.
- Node-local storage ($LOCAL), scratch storage on the local disk associated with a running job
- Memory storage ($RAMDISK), scratch storage in the local memory associated with a running job
Note that pylon2 was decommissioned on June 19, 2018.
File expiration
Three months after your grant expires all of your Bridges files associated with that grant will be deleted, no matter which file space they are in. You will be able to login during this 3-month period to transfer files, but you will not be able to run jobs or create new files.
File permissions
Access to files in any Bridges space is governed by Unix file permissions. If your data has additional security or compliance requirements, please contact compliance@psc.edu.
Unix file permissions
For detailed information on Unix file protections, see the man page for the chmod
command.
To share files with your group, give the group read and execute access for each directory from your top-level directory down to the directory that contains the files you want to share:
chmod g+rx directory-name
Then give the group read and execute access to each file you want to share:
chmod g+rx filename
To give the group the ability to edit or change a file, add write access to the group:
chmod g+rwx filename
Access Control Lists
If you want more fine-grained control than Unix file permissions allow, e.g. if you want to give only certain members of a group access to a file, but not all members, then you need to use Access Control Lists (ACLs). Suppose, for example, that you want to give janeuser
access to a file in a directory, but no one else in the group.
Use the setfacl
(set file acl) command:
setfacl -m user:janeuser:rx directory-name
for each directory from your top-level directory down to the directory that contains the file you want to share. Then give janeuser
access to the specific file with:
setfacl -m user:janeuser:r filename
User janeuser
will now be able to read this file, but no one else in the group will have access to it.
To see what ACLs are set on a file, use the getfacl
command.
There are man pages for chmod
, setfacl
and getfacl
.
Home ($HOME
)
This is your Bridges home directory. It is the usual location for your batch scripts, source code and parameter files. Its path is /home/username, where username is your PSC userid. You can refer to your home directory with the environment variable $HOME
. Your home directory is visible to all of the Bridges nodes.
Your home directory is backed up daily, although it is still a good idea to store copies of your important files in another location, such as the pylon5 file system or on a local file system at your site. If you need to recover a home directory file from backup send email to remarks@psc.edu. The process of recovery will take 3 to 4 days.
$HOME
quota
Your home directory has a 10GB quota. You can check your home directory usage using the quota
command or the command du -sh
. To improve the access speed to your home directory files you should stay as far below your home directory quota as you can.
Grant expiration
Three months after a grant expires, the files in your home directory associated with that grant will be deleted.
pylon5 ($SCRATCH)
Pylon5 is a Lustre file system shared across all of Bridges' nodes. It is available on Bridges compute nodes as $SCRATCH
.
The pylon5 file system is persistent storage, and can be used as working space for your running jobs. It provides fast access for data read or written by running jobs. I/O to pylon5 is much faster than to your home directory.
Files on pylon5 are not backed up, so you should store copies of important pylon5 files in another location.
pylon5 directories
The path of your pylon5 home directory is /pylon5/groupname/username, where groupname is the Unix group associated with your grant. Use the id
command to find your group name.
The command id -Gn
will list all the groups you belong to.
The command id -gn
will list the group associated with your current session.
If you have more than one grant, you will have a pylon5 directory for each grant. Be sure to use the appropriate directory when working with multiple grants.
pylon5 quota
Your usage quota for each of your grants is the Pylon storage allocation you received when your proposal was approved. If your total use in pylon5 exceeds this quota your access to the partitions on Bridges will be shut off until you are under quota.
Use the du -sh
or projects
command to check your pylon5 usage. You can also check your usage on the XSEDE User Portal.
If you have multiple grants, it is very important that you store your files in the correct pylon5 directory.
Grant expiration
Three months after a grant expires, the files in any pylon5 directories associated with that grant will be deleted.
Node-local ($LOCAL
)
Each Bridges node has a local file system attached to it. This local file system is only visible to the node to which it is attached. The local file system provides fast access to local storage.
This file space is available on all nodes as $LOCAL
.
If your application performs a lot of small reads and writes, then you could benefit from using $LOCAL
.
$LOCAL
is only available when your job is running, and can only be used as working space for a running job. Once your job finishes, your local files are inaccessible and deleted. To use local space, copy files to $LOCAL
at the beginning of your job and back out to a persistent file space before your job ends.
If a node crashes, all the $LOCAL
files are lost. Therefore, you should checkpoint your $LOCAL
files by copying them to pylon5 during long runs.
Multi-node jobs
If you are running a multi-node job, the variable $LOCAL
points to the local file space on the node that is running your rank 0 process.
You can use the srun
command to copy files between $LOCAL
on the nodes in a multi-node job. See the Sample script for MPI in the Sample Batch Scripts section of this User Guide for details.
$LOCAL
size
The maximum amount of local space varies by node type. The RSM (128GB) and GPU nodes have a maximum of 3.7TB. The LSM (3TB) nodes have a maximum of 14TB and the ESM (12TB) nodes have a maximum of 49TB.
To check on your local file space usage:
du -sh
No Service Units accrue for the use of $LOCAL
.
Using $LOCAL
To use $LOCAL
you must first copy your files to $LOCAL
at the beginning of your script, before your executable runs. The following script is an example of how to do this:
RC=1 n=0 while [[ $RC -ne 0 && $n -lt 20 ]]; do rsync -aP $sourcedir $LOCAL/ RC=$? let n = n + 1 sleep 10 done
Set $sourcedir
to point to the directory that contains the files to be copied before you execute your program. This code will try at most 20 times to copy your files. If it succeeds, the loop will exit. If an invocation of rsync was unsuccessful, the loop will try again and pick up where it left off.
At the end of your job you must copy your results back from $LOCAL
or they will be lost. The following script will do this:
mkdir $SCRATCH/results RC=1 n=0 while [[ $RC -ne 0 && $n -lt 20 ]]; do rsync -aP $LOCAL/ $SCRATCH/results RC=$? let n = n + 1 sleep 10 done
This code fragment copies your files to a directory in your pylon5 file space named results, which you must have created previously with the mkdir
command. Again it will loop at most 20 times and stop if it is successful.
Memory files ($RAMDISK)
You can also use the memory allocated for your job for I/O rather than using disk space. This will offer the fastest I/O on Bridges.
In a running job the environment variable $RAMDISK
will refer to the memory associated with the nodes in use.
The amount of memory space available to you depends on the size of the memory on the nodes and the number of nodes you are using. You can only perform I/O to the memory of nodes assigned to your job.
If you do not use all of the cores on a node, you are allocated memory in proportion to the number of cores you are using. Note that you cannot use 100% of a node's memory for I/O; some is needed for program and data usage.
$RAMDISK
is only available to you while your job is running, and can only be used as working space for a running job. Once your job ends this space is inaccessible. To use memory files, copy files to $RAMDISK
at the beginning of your job and back out to a permanent space before your job ends. If your job terminates abnormally, your memory files are lost.
Within your job you can cd
to $RAMDISK
, copy files to and from it, and use it to open files. Use the command du -sh
to see how much space you are using.
If you are running a multi-node job the $RAMDISK
variable points to the memory space on the node that is running your rank 0 process.
Transferring Files
Paths for Bridges file spaces
For all file transfer methods other than cp, you must always use the full path for your Bridges files. The start of the full paths for your Bridges directories are:
Home directory /home/username
Pylon5 directory /pylon5/Unix-group/username
The command id -Gn
will show all of your valid Unix-groups. You have a pylon5 directory for each grant you have.
Transfers into your Bridges home directory
Your home directory quota is 10GB, so large files cannot be stored there; they should be copied into one of your pylon file spaces instead. Exceeding your home directory quota will prevent you from writing more data into your home directory and will adversely impact other operations you might want to perform.
rsync
You can use the rsync
command to copy files to and from Bridges. A sample rsync
command to copy to a Bridges directory is
rsync -rltpDvp -e 'ssh -l joeuser' source_directory data.bridges.psc.edu:target_directory
Substitute your userid for 'joeuser'. Make sure you use the correct group name in your target directory. By default, rsync will not copy older files with the same name in place of newer files in the target directory. It will overwrite older files.
We recommend the rsync options -rltDvp. See the rsync man page for information on these options and other options you might want to use. We also recommend the option
-oMACS=umac-64@openssh.com
If you use this option your transfer will use a faster data validation algorithm.
You may to want to put your rsync
command in a loop to ensure that it completes. A sample loop is
RC=1
n=0
while [[ $RC -ne 0 && $n -lt 20 ]] do
rsync ...
RC = $?
let n = n + 1
sleep 10
done
This loop will try your rsync command 20 times. If it succeeds it will exit. If an rsync invocation is unsuccessful the system will try again and pick up where it left off. It will copy only those files that have not already been transferred. You can put this loop, with your rsync command, into a batch script and run it with sbatch
.
scp
To use scp
for a file transfer, you must specify a source and destination for your transfer. The format for either source or destination is
username@machine-name:path/filename
For transfers involving Bridges, username is your PSC username. The machine-name should be given as data.bridges.psc.edu. This is the name for a high-speed data connector at PSC. We recommend using it for all file transfers using scp
involving Bridges. Using it prevents file transfers from disrupting interactive use on Bridges login nodes.
File transfers using scp
must specify full paths for Bridges file systems. See Paths for Bridges file spaces for details.
sftp
To use sftp
, first connect to the remote machine:
sftp username@machine-name
When Bridges is the remote machine, use your PSC userid as username. The Bridges machine-name should be specified as data.bridges.psc.edu. This is the name for a high-speed data connector at PSC. We recommend using it for all file transfers using sftp
involving Bridges. Using it prevents file transfers from disrupting interactive use on Bridges login nodes.
You will be prompted for your password on the remote machine. If Bridges is the remote machine enter your PSC password.
You can then enter sftp
subcommands, like put
to copy a file from the local system to the remote system, or get
to copy a file from the remote system to the local system.
To copy files into Bridges you must either cd
to the proper directory or use full pathnames in your file transfer commands. See Paths for Bridges file spaces for details.
Two-factor Authentication
If you are required to use two-factor authentication (TFA) to access Bridges filesystems, you must enroll in XSEDE DUO. Once that is complete, use scp
or sftp
to transfer files to/from Bridges.
TFA users must use port 2222 and XSEDE User Portal usernames and passwords. The machine name for these transfers is data.bridges.psc.edu.
In the examples below, myfile is the local filename, XSEDE-username is your XSEDE User Portal username and /path/to/file is the full path to the file on a Bridges filesystem. Note that -P
(capital P) is necessary.
scp
Transfer a file from a local machine to Bridges:
scp -P 2222 myfile XSEDE-username@data.bridges.psc.edu:/path/to/file
Transfer a file from Bridges to a local machine:
scp -P 2222 XSEDE-username@data.bridges.psc.edu:/path/to/file myfile
sftp
Use sftp interactively:
sftp -P 2222 XSEDE-username@data.bridges.psc.edu
Then use the put
command to copy a file from the local machine to Bridges, or the get
command to transfer a file from Bridges to the local machine.
Graphical SSH client
If you are using a graphical SSH client, configure it to connect to data.bridges.psc.edu on port 2222/TCP. Login using your XSEDE User Portal username and password.
Globus
Globus can be used for any file transfer to Bridges. It tracks the progress of the transfer and retries when there is a failure; this makes it especially useful for transfers involving large files or many files.
To use Globus to transfer files you must authenticate either via a Globus account or with InCommon credentials.
Globus account
You can set up a Globus account at the Globus site.
InCommon credentials
If you wish to use InCommon credentials to transfer files to/from Bridges, you must first provide your CILogon Certificate Subject information to PSC. Follow these steps:
Find your Certificate Subject string
- Navigate your web browser to https://cilogon.org.
- Select your institution from the 'Select an Identity Provider' list.
- Click the 'Log On' button. You will be taken to the web login page for your institution.
- Login with your username and password for your institution.
- If your institution has an additional login requirement (e.g., Duo), authenticate to that as well. After successfully authenticating to your institution's web login interface, you will be returned to the CILogon webpage. Note the boxed section near the top that lists a field named 'Certificate Subject'.
Send your Certificate Subject string to PSC
- In the CILogon webpage, select and copy the Certificate Subject text. Take care to get the entire text string if it is broken up onto multiple lines.
- Send email to support@psc.edu. Paste your Certificate Subject field into the message, asking that it be mapped to your PSC username.
Your CILogon Certificate Subject information will be added within one business day, and you will be able to begin transferring files to and from Bridges.
Globus endpoints
Once you have the proper authentication you can initiate file transfers from the Globus site. A Globus transfer requires a Globus endpoint, a file path and a file name for both the source and destination. The endpoints for Bridges are:
psc#bridges-xsede
if you are using an XSEDE User Portal account for authenticationpsc#bridges-cilogon
if you are using InCommon for authentication
These endpoints are owned by psc@globusid.org. If you use DUO MFA for your XSEDE authentication, you do not need to because you cannot use it with Globus. You must always specify a full path for the Bridges file systems. See Paths for Bridges file spaces for details.
Globus-url-copy
The globus-url-copy
command can be used if you have access to Globus client software. Both the globus-url-copy
and myproxy-logon
commands are available on Bridges, and can be used for file transfers internal to the PSC.
To use globus-url-copy
you must have a current user proxy certificate. The command grid-proxy-info
will tell you if you have current user proxy certificate and if so, what the remaining life of your certificate is.
Use the myproxy-logon
command to get a valid user proxy certificate if any one of these applies:
- you get an error from the
grid-proxy-info
command - you do not have a current user proxy certificate
- the remaining life of your certificate is not sufficient for your planned file transfer
When prompted for your MyProxy passphrase enter your XSEDE User Portal password.
To use globus-url-copy
for transfers to a machine, you must know the Grid FTP server address. The Grid FTP server address for Bridges is
gsiftp://gridftp.bridges.psc.edu
The use of globus-url-copy
always requires full paths. See Paths for Bridges file spaces for details.
Transfer rates
PSC maintains a Web page at http://speedpage.psc.edu that lists average data transfer rates between all XSEDE resources. If your data transfer rates are lower than these average rates or you believe that your file transfer performance is subpar, send email to bridges@psc.edu. We will examine approaches for improving your file transfer performance.
Programming Environment
Bridges provides a rich programming environment for the development of applications.
C, C++ and Fortran
Intel, Gnu and PGI compilers for C, C++ and Fortan are available on Bridges. The compilers are:
C | C++ | Fortran | |
---|---|---|---|
Intel | icc | icpc | ifort |
Gnu | gcc | g++ | gfortran |
PGI | pgcc | pgc++ | pgfortran |
The Intel and Gnu compilers are loaded for you automatically.
To run the PGI compilers you must first issue the command
module load pgi
There are man pages for each of the compilers. Load the pgi module first for access to the pgi man pages.
See also:
- PGI web site
- GNU compilers web site
- Module documentation for information on what modules are available and how to use them.
OpenMP programming
To compile OpenMP programs you must add an option to your compile command:
Intel | -qopenmp | for example: icc -qopenmp myprog.c |
Gnu | -fopenmp | for example: gcc -fopenmp myprog.c |
PGI | -mp | for example: pgcc -mp myprog.c |
See also:
MPI programming
Three types of MPI are supported on Bridges: MVAPICH2, OpenMPI and Intel MPI.
There are two steps to compile an MPI program:
- If you are using MVAPICH2 or OpenMPI, load the correct module for the compiler and MPI type you want to use. The Intel MPI module is loaded for you on login.
- Issue the appropriate MPI wrapper command to compile your program.
The three MPI types may perform differently on different problems or in different programming environments. If you are having trouble with one type of MPI, please try using another type. Contact bridges@psc.edu for more help.
INTEL COMPILERS
To use the Intel compilers with | Load this module | Compile C with this command | Compile C++ with this command | Compile Fortran with this command |
---|---|---|---|---|
Intel MPI | none, this is loaded by default | mpiicc | mpiicpc | mpiifort |
OpenMPI | mpi/intel_openmpi | mpicc | mpicxx | mpifort |
MVAPICH2 | mpi/intel_mvapich | mpicc \ | mpicxx \ | mpifort \ |
For proper Intel MPI behavior, you must set the environment variable I_MPI_JOB_RESPECT_PROCESS_PLACEMENT to 0. Otherwise the mpirun task placement settings you give will be ignored.
BASH
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0
CSH:
setenv I_MPI_JOB_RESPECT_PROCESS_PLACEMENT 0
GNU COMPILERS
To use the Gnu compilers with | Load this module | Compile C with this command | Compile C++ with this command | Compile Fortran with this command |
---|---|---|---|---|
Intel MPI | none, this is loaded by default | mpicc | mpicxx | mpifort |
OpenMPI | mpi/gcc_openmpi | |||
MVAPICH2 | mpi/gcc_mvapich |
PGI COMPILERS
To use the PGI compilers with | Load this module | Compile C with this command | Compile C++ with this command | Compile Fortran with this command |
---|---|---|---|---|
OpenMPI | mpi/pgi_openmpi | mpicc | mpicxx | mpifort |
MVAPICH2 | mpi/pgi_mvapich |
See also:
- Intel MPI website
- MVAPICH2 website
- OpenMPI website
- Module documentation for information on what modules are available and how to use them.
Other languages
Other languages, including Java, Python, R, and MATLAB, are available. See the software section for information.
DDT
DDT is a debugging tool for C, C++ and Fortran 90 threaded and parallel codes. It is client-server software. Install the client on your local machine and then you can access the GUI on Bridges to debug your code.
See the PSC DDT and MAP page for more information.
MAP
MAP is a profiling tool for C, C++ and Fortran applications. It is client-server software. Install the client on your local machine and then you can access the GUI on Bridges to debug your code.
See the PSC DDT and MAP page for more information.
Software installed on PSC systems
PSC provides an up-to-date list of available software. The list includes software installed on most PSC computing resources. Anton runs specific software written for its specialized hardware and is not included there. See the Anton document for specifics on Anton.
The Module package
The environment management package Module is essential for running software on most PSC systems. Be sure to check if there is a module for the software you want to use by typing module avail software-name
.
The module help software-name
command lists any additional modules that must also be loaded. Note that in some cases the order in which these additional modules are loaded matters. See documentation on the module command for more information.
Running Jobs
All production computing must be done on Bridges' compute nodes, NOT on Bridges' login nodes. The SLURM scheduler (Simple Linux Utility for Resource Management) manages and allocates all of Bridges' compute nodes. Several partitions, or job queues, have been set up in SLURM to allocate resources efficiently.
To run a job on Bridges, you need to decide how you want to run: interactively, in batch, or through OnDemand; and where to run - that is, which partitions you are allowed to use.
What are the different ways to run a job?
You can run jobs in Bridges in several ways:
- interactive mode - where you type commands and receive output back to your screen as the commands complete
- batch mode - where you first create a batch (or job) script which contains the commands to be run, then submit the job to be run as soon as resources are available
- through OnDemand - a browser interface that allows you to run interactively, or create, edit and submit batch jobs and also provides a graphical interface to tools like RStudio, Jupyter notebooks, and IJulia. See the OnDemand section for more information.
Regardless of which way you choose to run your jobs, you will always need to choose a partition to run them in.
Which partitions can I use?
Different partitions control different types of Bridges' resources; they are configured by the type of node they control along with other job requirements like how many nodes or how much time or memory is needed. Your access to the partitions is based on the type of Bridges allocation that you have ("Bridges regular memory", "Bridges large memory", "Bridges GPU", or "Bridges-AI"). You may have more than one type of allocation; in that case, you will have access to more than one set of partitions.
You can see which Bridges resources you have been allocated with the projects
command. See The projects command in the Account Administration section of this guide for more information.
Interactive sessions
You can do your production work interactively on Bridges, typing commands on the command line, and getting responses back in real time. But you must be allocated the use of one or more Bridges' compute nodes by SLURM to work interactively on Bridges. You cannot use the Bridges login nodes for your work.
You can run an interactive session in any of the SLURM partitions. You will need to specify which partition you want, so that the proper resources are allocated for your use.
If all of the resources set aside for interactive use are in use, your request will wait until the resources you need are available. Using a shared partition (RM-shared, GPU-shared) will probably allow your job to start sooner.
The interact
command
To start an interactive session, use the command interact
. The format is:
interact -options
The simplest interact
command is:
$ interact
This command will start an interactive job using the defaults for interact
, which are:
Partition: RM-small
Cores: 1
Time limit: 60 minutes
Once the interact
command returns with a command prompt you can enter your commands. The shell will be your default shell. When you are finished with your job, type CTRL-D
.
[bridgesuser@br006 ~]$ interact
A command prompt will appear when your session begins "Ctrl+d" or "exit" will end your session
[bridgesuser@r004 ~]$
Notes:
- Be sure to use the correct account id for your job if you have more than one grant. See Charging with multiple grants.
- Service Units (SUs) accrue for your resource usage from the time the prompt appears until you type CTRL-D, so be sure to type CTRL-D as soon as you are done.
- The maximum time you can request is 8 hours. Inactive
interact
jobs are logged out after 30 minutes of idle time. - By default,
interact
uses the RM-small partition. Use the-p
option forinteract
to use a different partition.
Options for interact
If you want to run in a different partition, use more than one core or set a different time limit, you will need to use options to the interact
command. Available options are given below.
Option | Description | Default value |
---|---|---|
-p partition | Partition requested | RM-small |
-t HH:MM:SS | Walltime requested The maximum time you can request is 8 hours. | 60:00 (1 hour) |
-N n | Number of nodes requested | 1 |
--egress | Allows your compute nodes to communicate with sites external to Bridges. | N/A |
-A account id | SLURM account id the job to Find or change your default account id Note: Files created during a job will be owned by the Unix group in effect when the job is submitted. This may be different than the account id for the job. See the discussion of the | Your default account id |
-R reservation-name | Reservation name, if you have one Use of -R does not automatically set any other interact options. You still need to specify the other options (partition, walltime, number of nodes) to override the defaults for the interact command. If your reservation is not assigned to your default account, then you will need to use the " | No default |
--mem=nGB | Amount of memory requested in GB. This option should only be used for the LM partition. | No default |
--gres=gpu:type:n | Specifies the type and number of GPUs requested. 'type' is one of: volta32, volta16, p100 or k80. For the GPU-shared and GPU-small partitions, type is either k80 or p100. The default is k80. For the GPU-AI partition, type is either volta16 or volta32. 'n' is the number of GPUs. Valid choices are:
| No default |
-gpu | Runs your job on one P100 GPU in the GPU-small partition | N/A |
--ntasks-per-node=n | Number of cores to allocate per node | 1 |
-h | Help, lists all the available command options | N/A |
See also:
- Bridges partitions
- How to determine your valid SLURM account ids and Unix groups and change your default in the Account adminstration section
- Charging with multiple grants
- The
srun
command, for more complex control over your interactive job
Batch jobs
To run a batch job, you must first create a batch (or job) script, and then submit the script using the sbatch
command.
A batch script is a file that consists of SBATCH
directives, executable commands, and comments.
SBATCH
directives specify your resource requests and other job options in your batch script. You can also specify resource requests and options on the sbatch
command line. Any options on the command line take precedence over those given in the batch script. The SBATCH
directives must start with "#SBATCH
" as the first text on a line, with no leading spaces.
Comments begin with a '#
' character.
The first line of any batch script must indicate the shell to use for your batch job.
Sample Batch Scripts
Both sample batch scripts for some popular software packages and sample batch scripts for general use on Bridges are available.
Sample batch scripts for popular software packages
Sample scripts for some popular software packages are available on Bridges in the directory /opt/packages/examples
. There is a subdirectory for each package, which includes the script along with input data that is required and typical output.
See the documentation for a particular package for more information on using it and how to test any sample scripts that may be available.
Sample batch scripts for common types of jobs
Sample Bridges batch scripts for common job types are given in this document.
Note that in each sample script:
- the bash shell is used, indicated by the first line "
!#/bin/bash
". If you use a different shell, some Unix commands will be different. - For username and groupname you must substitute your username and your appropriate Unix group.
The sbatch
command
To submit a batch job, use the sbatch
command. The format is
sbatch -options batch-script
The options to sbatch
can either be in your batch script or on the sbatch
command line. Options in the command line override those in the batch script.
Note:
- Be sure to use the correct account id for your job if you have more than one grant. See the
-A
option forsbatch
to change the SLURM account id for a job. Information on how to determine your valid account ids and change your default account id is in the Account adminstration section. - In some cases, the options for
sbatch
differ from the options forinteract
orsrun
. - By default,
sbatch
submits jobs to the RM partition. Use the-p
option forsbatch
to direct your job to a different partition.
Options to the sbatch
command
For more information about these options and other useful sbatch
options see the sbatch
man page.
Table 2. Options to the sbatch
command
Option | Description | Default |
---|---|---|
-p partition | Partition requested | RM |
-t HH:MM:SS | Walltime requested in HH:MM:SS | 30 minutes |
-N n | Number of nodes requested. | 1 |
-A groupname | Group to charge the job to. If not specified, your default group is charged. Find your default group Note: Files created during a job will be owned by the group in effect when the job is submitted. This may be different than the group the job is charged to. See the discussion of the | Your default group |
--res reservation-name Note the "--" for this option | Use the reservation that has been set up for you. Use of "--res " does not automatically set any other options. You still need to specify the other options (partition, walltime, number of nodes) that you would in any sbatch command. If your reservation is not assigned to your default account then you will need to use the "-A " option to sbatch to specify the account. | NA |
--mem=nGB Note the "--" for this option | Memory in GB. This option is only valid for the LM partition. | None |
-C constraints | Specifies constraints which the nodes allocated to this job must satisfy. An Some valid constraints are:
See the discussion of the -C option in the | None |
--gres=gpu:type:n Note the "--" for this option | Specifies the type and number of GPUs requested. 'type' is either p100 or k80. The default is k80. 'n' is the number of requested GPUs. Valid choices are 1-4, when type is k80 and 1-2 when type is p100. | None |
--ntasks-per-node=n Note the "--" for this option | Request n cores be allocated per node. | 1 |
--mail-type=type Note the "--" for this option | Send email when job events occur, where type can be BEGIN, END, FAIL or ALL. | None |
--mail-user=user Note the "--" for this option | User to send email to as specified by -mail-type. Default is the user who submits the job. | None |
-d=dependency-list | Set up dependencies between jobs, where dependency-list can be:
| None |
--no-requeue Note the "--" for this option | Specifies that your job will be not be requeued under any circumstances. If your job is running on a node that fails it will not be restarted. Note the "--" for this option. | NA |
--time-min=HH:MM:SS Note the "--" for this option. | Specifies a minimum walltime for your job in HH:MM:SS format. SLURM considers the walltime requested when deciding which job to start next. Free slots on the machine are defined by the number of nodes and how long those nodes are free until they will be needed by another job. By specifying a minimum walltime you allow the scheduler to reduce your walltime request to your specified minimum time when deciding whether to schedule your job. This could allow your job to start sooner. If you use this option your actual walltime assignment can vary between your minimum time and the time you specified with the " | None |
--switches=1 --switches=1@HH:MM:SS Note the "--" for this option | Requests that the nodes your job runs on all be on one switch, which is a hardware grouping of 42 nodes. If you are asking for more than 1 and fewer than 42 nodes, your job will run more efficiently if it runs on one switch. Normally switches are shared across jobs, so using the switches option means your job may wait longer in the queue before it starts. The optional time parameter gives a maximum time that your job will wait for a switch to be available. If it has waited this maximum time, the request for your job to be run on a switch will be cancelled. | NA |
-h | Help, lists all the available command options |
See also:
- Bridges partitions
- How to determine your valid groups and change your default group in the Account adminstration section
- Charging with multiple grants below
Charging with multiple grants
If you have more than one grant, be sure to use the correct SLURM account id and Unix group when running jobs.
See Managing multiple grants in the Account Administration section to see how to find your account ids and Unix groups and determine or change your defaults.
Permanently change your default SLURM account id and Unix group
See the change_primary_group
command in the Account Administration section to permanently change your default SLURM account id and Unix group.
Temporarily change your SLURM account id or Unix group
See the -A
option to the sbatch
or interact
commands to set the SLURM account id for a specific job.
The newgrp
command will change your Unix group for that login session only. Note that any files created by a job are owned by the Unix group in effect when the job is submitted, which is not necessarily the same as the account id used for the job. See the newgrp
command in the Account Administration section to see how to change the Unix group currently in effect.
Bridges partitions
Each SLURM partition manages a subset of Bridges' resources. Each partition allocates resources to interactive sessions, batch jobs, and OnDemand sessions that request resources from it.
Know which partitions are open to you: Your Bridges allocations determine which partitions you can submit jobs to.
A "Bridges regular memory" allocation allows you to use Bridges' RSM (128GB) nodes. Partitions available to "Bridges regular memory" allocations are
- RM, for jobs that will run on Bridges' RSM (128GB) nodes, and use one or more full nodes
- RM-shared, for jobs that will run on Bridges' RSM (128GB) nodes, but share a node with other jobs
- RM-small, for short jobs needing 2 full nodes or less, that will run on Bridges RSM (128GB) nodes
A "Bridges large memory" allocation allows you to use Bridges LSM and ESM (3TB and 12TB) nodes. There is one partition available to "Bridges large memory" allocations:
- LM, for jobs that will run on Bridges' LSM and ESM (3TB and 12TB) nodes
A "Bridges GPU" allocation allows you to use Bridges' GPU nodes. Partitions available to "Bridges GPU" allocations are:
- GPU, for jobs that will run on Bridges' GPU nodes, and use one or more full nodes
- GPU-shared, for jobs that will run on Bridges' GPU nodes, but share a node with other jobs
- GPU-small, for jobs that will use only one of Bridges' GPU nodes and 8 hours or less of wall time.
A "Bridges-AI" allocation allows you to you Bridges' Volta GPU nodes. There is one partition available to "Bridges-AI" allocations:
- GPU-AI, for jobs that will run on Bridges' Volta 16 nodes or the DGX-2.
All the partitions use FIFO scheduling. If the top job in the partition will not fit, SLURM will try to schedule the next job in the partition. The scheduler follows policies to ensure that one user does not dominate the machine. There are also limits to the number of nodes and cores a user can simultaneously use. Scheduling policies are always under review to ensure best turnaround for users.
Partitions for "Bridges regular memory" allocations
There are three partitions available for "Bridges regular memory allocations": RM, RM-shared and RM-small.
Use your allocation wisely: To make the most of your allocation, use the shared partitions whenever possible. Jobs in the RM partition are charged for the use of all cores on a node, and incur Service Units (SU) for all 28 cores. Jobs in the RM-shared partition share nodes, and SUs accrue only for the number of cores they are allocated. The RM partition is the default for the sbatch
command, while RM-small is the default for the interact
command. See the discussion of the interact
and sbatch
commands in this document for more information.
Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job. See Charging with multiple grants.
For information on requesting resources and submitting jobs see the discussion of the interact
and sbatch
commands.
RM partition
Jobs in the RM partition run on Bridges' RSM (128GB) nodes. Jobs do not share nodes, and are allocated all 28 of the cores on each of the nodes assigned to them. A job in the RM partition incurs SUs for all 28 cores per node on its assigned nodes.
RM jobs can use more than one node. However, the memory space of all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.
The internode communication performance for jobs in the RM partition is best when using 42 or fewer nodes.
When submitting a job to the RM partition, you can request:
- the number of nodes
- the walltime limit
If you do not specify the number of nodes or time limit, you will get the defaults. See the summary table for Bridges' regular memory nodes below for the defaults.
You cannot specify:
- a specific memory allocation
Asking explicitly for memory for a job in the RM partition will cause the job to fail.
Sample interact
command for the RM partition
An example of an interact
command for the RM partition, requesting the use of 2 nodes for 30 minutes is
interact -p RM -N 2 -t 30:00
where:
-p
indicates the intended partition-N
is the number of nodes requested-t
is the walltime requested in the format HH:MM:SS
Sample sbatch
command for RM partition
An example of a sbatch
command to submit a job to the RM partition, requesting one node for 5 hours is
sbatch -p RM -t 5:00:00 -N 1 myscript.job
where:
- "
-p
" indicates the intended partition - "
-t
" is the walltime requested in the format HH:MM:SS - "
-N
" is the number of nodes requested - "
myscript.job
" is the name of your batch script
RM-small
Jobs in the RM-small partition run on Bridges' RSM (128GB) nodes, but are limited to at most 2 full nodes and 8 hours. Jobs can share nodes. Note that the memory space of all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.When submitting a job to the RM-small partition, you should specify:
- the number of nodes
- the number of cores
- the walltime limit
If you do not specify the number of nodes or time limit, you will get the defaults. See the summary table for Bridges' regular memory nodes below for the defaults.
You cannot specify:
- a specific memory allocation
Asking explicitly for memory for a job in the RM-small partition will cause the job to fail.
Sample interact
command for the RM-small partition
Run in the RM-small partition using one node, 8 cores and 45 minutes of walltime.
interact -p RM-small -N 1 --ntasks-per-node=8 -t 45:00
where:
- "
-p
" indicates the intended partition - "
-N
" requests one node - "
--ntasks-per-node
" requests the use of 8 cores - "
-t
" is the walltime requested in the format HH:MM:SS
Sample sbatch
command for the RM-small partition
Submit a job to RM-small asking for 2 nodes and 6 hours of walltime.
sbatch -p RM-small -N 2 -t 6:00:00 myscript.job
where:
- "
-p
" indicates the intended partition - "
-N
" requests the use of 2 nodes - "
-t
" is the walltime requested in the format HH:MM:SS - "
myscript.job
" is the name of your batch script
Summary of partitions for Bridges regular memory nodes
Partition name | RM | RM-shared | RM-small |
---|---|---|---|
Node type | 128GB 28 cores 8TB on-node storage | 128GB 28 cores 8TB on-node storage | 128GB 28 cores 8TB on-node storage |
Nodes shared? | No | Yes | Yes |
Node default | 1 | 1 | 1 |
Node max | 168 If you need more than 168, contact bridges@psc.edu to make special arrangements. | 1 | 2 |
Core default | 28/node | 1 | 1 |
Core max | 28/node | 28 | 28/node |
Walltime default | 30 mins | 30 mins | 30 mins |
Walltime max | 48 hrs | 48 hrs | 8 hrs |
Memory | 128GB/node | 4.5GB/core | 4.5GB/core |
See also:
Partitions for "Bridges large memory" allocations
There is one partition available for Bridges large memory allocations: LM.
Charge your jobs to the correct group: If you have more than one Bridges grant, be sure to charge your usage to the correct one. See Charging with multiple grants.
For information on requesting resources and submitting jobs see the interact
or sbatch
commands.
LM partition
Jobs in the LM partition share nodes. Your memory space for an LM job is an integrated, shared memory space.
When submitting a job to the LM partition, you must:
- use the --mem option to request the amount of memory you need, in GB. Any value up to 12000GB can be requested There is no default memory value. Each core on the 3TB and 12TB nodes is associated with a fixed amount of memory, so the amount of memory you request determines the number of cores assigned to your job.
- specify the walltime limit
You cannot:
- specifically request a number of cores
SLURM will place jobs on either a 3TB or a 12TB node based on the memory request. Jobs asking for 3000GB or less will run on a 3TB node. If no 3TB nodes are available but a 12TB node is available, those jobs will run on a 12TB node.
Once your job is running, the environment variable SLURM_NTASKS_PER_NODE tells you the number of cores assigned to your job.
Sample interact
command for LM
Run in the LM partition and request 2TB of memory. Use the wall time default of 30 minutes.
interact -p LM --mem=2000GB
where:
- "
-p
" indicates the intended partition (LM) - "
--mem
" is the amount of memory requested
Sample sbatch
command for the LM partition
A sample sbatch command for the LM partition requesting 10 hours of wall time and 6TB of memory is:
sbatch -p LM - t 10:00:00 --mem 6000GB myscript.job
where:
- "
-p
" indicates the intended partition (LM) - "
-t
" is the walltime requested in the format HH:MM:SS - "
--mem
" is the amount of memory requested - "
myscript.job
" is the name of your batch script
Summary of partitions for Bridges large memory nodes
Partition name | LM | |
---|---|---|
LSM nodes | ESM nodes | |
Node type | 3TB RAM 16TB on-node storage | 12TB RAM 64TB on-node storage |
Nodes shared? | Yes | Yes |
Node default | 1 | 1 |
Node max | 8 | 4 |
Cores | Jobs are allocated 1 core/48GB of memory requested. | Jobs are allocated 1 core/48GB of memory requested. |
Walltime default | 30 mins | 30 mins |
Walltime max | 14 days | 14 days |
Memory | Up to 3000GB | Up to 12,000GB |
See also:
Partitions for "Bridges GPU" allocations
There are three partitions available for "Bridges GPU" allocations: GPU, GPU-shared and GPU-small.
Use your allocation wisely: To make the most of your allocation, use the shared partitions whenever possible. Jobs in the GPU partition use of all cores on a node, and accrue SU costs for every core. Jobs in the GPU-shared partition share nodes, and are only charged for the cores they are allocated.
Charge your jobs to the correct group: If you have more than one Bridges grant, be sure to charge your usage to the correct one. See Charging for multiple grants
For information on requesting resources and submitting jobs see the interact
or sbatch
commands.
GPU partition
Jobs in the GPU partition use Bridges' GPU nodes. Note that Bridges has 2 types of GPU nodes: K80s and P100s. See the System Configuration section of this User Guide for the details of each type.
Jobs in the GPU partition do not share nodes, so jobs are allocated all the cores and all of the GPUs associated with the nodes assigned to them . Your job will be charged for all the cores associated with your assigned nodes.
However, the memory space across nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.
When submitting a job to the GPU partition, you must use the --gres=type
option to specify:
- the type of node you want, K80 or P100. K80 is the default if no type is specified.
- the number of GPUs you want
See the sbatch
command options for more details on the --gres=type
option.
You should also specify:
- the number of nodes
- the walltime limit
Sample interact
command for GPU
An interact command to start a GPU job on 4 P100 nodes for 30 minutes is
interact -p GPU --gres=gpu:p100:2 -N 4 -t 30:00
where:
- "
-p
" indicates the intended partition - "
--gres=gpu:p100:2
" requests the use of 2 P100 GPUs - "
-N
" requests 4 nodes - "
-t
" is the walltime requested in the format HH:MM:SS
Sample sbatch
command for GPU
This command requests the use of one K80 GPU node for 45 minutes;
sbatch -p GPU --gres=gpu:k80:4 -N 1 -t 45:00 myscript.job
where:
- "
-p
" indicates the intended partition - "
--gres=gpu:k80:4
" requests the use of 4 K80 GPUs - "
-N
" requests one node - "
-t
" is the walltime requested in the format HH:MM:SS - "
myscript.job
" is the name of your batch script
Sample batch script for GPU partition
#!/bin/bash #SBATCH -N 2 #SBATCH -p GPU #SBATCH --ntasks-per-node 28 #SBATCH -t 5:00:00 #SBATCH --gres=gpu:p100:2 #echo commands to stdout set -x
#move to working directory
# this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory cd /pylon5/groupname/username/path-to-directory
#run GPU program ./mygpu
Notes: The value of the --gres-gpu option indicates the type and number of GPUs you want.For groupname, username and path-to-directory you must substitute your group, username and appropriate directory path.
GPU-small
Jobs in the GPU-small partition run on one of Bridges' P100 GPU nodes. Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node's total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.
When submitting a job to the GPU-small partition, you must specify the number of GPUs with the --gres=gpu:p100:n
option to the interact
or sbatch
command. In this partition, n can be 1 or 2. You should also specify the walltime limit.
Sample interact
command for GPU-small
Run in the GPU-small partition and ask for 2 P100 GPUs and 2 hours of wall time.
interact -p GPU-small --gres=gpu:p100:2 -t 2:00:00
where:
- "
-p
" indicates the intended partition - "
--gres=gpu:p100:2
" requests the use of 2 P100 GPUs - "
-t
" is the walltime requested in the format HH:MM:SS
Sample sbatch
command for GPU-small
Submit a job to the GPU-small partition using 2 P100 GPUs and 1 hour of wall time.
sbatch -p GPU-small --gres=gpu:p100:2 -t 1:00:00 myscript.job
where:
- "
-p
" indicates the intended partition - "
--gres=gpu:p100:2
" requests the use of 2 P100 GPUs - "
-t
" is the walltime requested in the format HH:MM:SS - "
myscript.job
" is the name of your batch script
Summary of partitions for Bridges GPU nodes
Partition name | GPU | GPU-shared | GPU-small | ||
---|---|---|---|---|---|
P100 nodes | K80 nodes | P100 nodes | K80 nodes | P100 nodes | |
Node type | 2 GPUs 2 16-core CPUs 8TB on-node storage | 4 GPUs 2 14-core CPUS 8TB on-node storage | 2 GPUs 2 16-core CPUs 8TB on-node storage | 4 GPUs 2 14-core CPUs 8TB on-node storage | 2 GPUs 2 16-core CPUs 8TB on-node storage |
Nodes shared? | No | No | Yes | Yes | No |
Node default | 1 | 1 | 1 | 1 | 1 |
Node max | 4 Limit of 8 GPUs/job. Because there are 2 GPUs on each P100 node, you can request at most 4 nodes. | 2 Limit of 8 GPUs/job. Because there are 4 GPUs on each K80 node, you can request at most 2 nodes. | 1 | 1 | 1 |
Core default | 32/node | 28/node | 16/GPU | 7/GPU | 32/node |
Core max | 32/node | 28/node | 16/GPU | 7/GPU | 32/node |
GPU default | 2/node | 4/node | No default | No default | No default |
GPU max | 2/node | 4/node | 2 | 4 | 2 |
Walltime default | 30 mins | 30 mins | 30 mins | 30 mins | 30 mins |
Walltime max | 48 hrs | 48 hrs | 48 hrs | 48 hrs | 8 hrs |
Memory | 128GB/node | 128GB/node | 7GB/GPU | 7GB/GPU | 128GB/node |
See also:
Partition for "Bridges-AI" allocations
There is one partition available for "Bridges-AI" allocations: GPU-AI. There are two node types available:
- "Volta 16" - nine HPE Apollo 6500 servers, each with 8 NVIDIA Tesla V100 GPUs with 16 GB of GPU memory each, connected by NVLink 2.0
- "Volta 32" - NVIDIA DGX-2 enterprise research AI system tightly coupling 16 NVIDIA Tesla V100 (Volta) GPUs with 32 GB of GPU memory each, connected by NVLink and NVSwitch
We strongly recommend the use of Singularity containers on the AI nodes, especially on the DGX-2. We have installed containers for many popular AI packages on Bridges for you to use, but you can create your own if you like.
For more information on Singularity and the containers available on Bridges:
- The PSC documentation on Singularity
- The Containers section of this guide
Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job. See Charging with multiple grants.
For information on requesting resources and submitting jobs see the interact or sbatch commands.
Using module files on Bridges-AI
The Module package provides for the dynamic modification of a users's environment via module files. Module files manage necessary changes to the environment, such as adding to the default path or defining environment variables, so that you do not have to manage those definitions and paths manually. Before you can use module files in a batch job on Bridges-AI, you must issue the following command:
If you are using the bash or ksh:
source /etc/profile.d/modules.sh
If you are using csh or tcsh:
source /etc/profile.d/modules.csh
See the PSC Module documentation for information on the module command.
GPU-AI partition
When submitting a job to the GPU-AI partition, you must use the --gres=gpu:type:n parameter to specify the type and number of Volta GPUs you will use. Valid options are
- For the Volta 16 nodes, with 16GB of GPU memory, type is volta16; n can be 1-8.
- For the DGX-2, with 32GB of GPU memory, type is volta32; n can be 1-16.
See the sbatch command for an explanation of the --gres
option.
Sample interact command for GPU-AI
To run in an interactive session on Bridges-AI, use the interact
command and specify the GPU-AI partition. An example interact
command to request 1 GPU on a Volta 16 node is:
interact -p GPU-AI --gres=gpu:volta16:1
Where:
-p
indicates the intended partition--gres=gpu:volta16:1
requests the use of 1 V100 GPU on an Apollo node
Sample sbatch command for the GPU-AI partition
A sample sbatch
command to submit a job to run on one of the Volta 16 nodes and use all eight GPUs would be
sbatch -p GPU-AI -N 1 --gres=gpu:volta16:8 -t 1:00:00 myscript.job
where
-p
GPU-AI requests the GPU-AI partition-N 1
requests one node--gres=gpu:volta16:8
requests an Apollo server with Volta 100 GPUs , and specifies you will use all 8 GPUs on that node-t 1:00:00
requests one hour of running time
myscript.job is your batch script.
Summary of the partition for Bridges' "AI" GPU nodes
Partition name | GPU-AI | |
---|---|---|
Node type | Volta 16 | DGX-2 |
8 Tesla V100 (Volta) GPUs with 16 GB of GPU memory each 2 20-core CPUs | 16 Tesla V100 GPUs with 32 GB of GPU memory each 2 24-core CPUs | |
Node default | 1 | 1 |
Node max | 4 | 1 |
Min GPUs per job | 1 | 1 |
Max GPUs per job | 32 | 16 |
Max GPUs in use per user | 32 | 16 |
Walltime default | 1 hour | 1 hour |
Walltime max | 48 hours | 48 hours |
See also:
Node, partition, and job status information
sinfo
The sinfo
command displays information about the state of Bridges's nodes. The nodes can have several states:
alloc | Allocated to a job |
down | Down |
drain | Not available for scheduling |
idle | Free |
resv | Reserved |
See also: sinfo man page
squeue
The squeue
command displays information about the jobs in the partitions. Some useful options are:
-j jobid | Displays the information for the specified jobid |
-u username | Restricts information to jobs belonging to the specified username |
-p partition | Restricts information to the specified partition |
-l | (long) Displays information including: time requested, time used, number of requested nodes, the nodes on which a job is running, job state and the reason why a job is waiting to run. |
See also: squeue man page for a discussion of the codes for job state, for why a job is waiting to run, and more options.
scancel
The scancel
command is used to kill a job in a partition, whether it is running or still waiting to run. Specify the jobid for the job you want to kill. For example,
scancel 12345
kills job # 12345.
See also: scancel man page
sacct
The sacct
command can be used to display detailed information about jobs. It is especially useful in investigating why one of your jobs failed. The general format of the command is
sacct -X -j nnnnnn -S MMDDYY --format parameter1,parameter2, ...
- For 'nnnnnn' substitute the jobid of the job you are investigating.
- The date given for the -S option is the date at which
sacct
begins searching for information about your job. - The commas between the parameters in the --format option cannot be followed by spaces.
The --format option determines what information to display about a job. Useful parameters are
- JobID
- Partition
- Account - the account charged
- ExitCode - useful in determining why a job failed
- State - useful in determining why a job failed
- Start, End, Elapsed - start, end and elapsed time of the job
- NodeList - list of nodes used in the job
- NNodes - how many nodes the job was allocated
- MaxRSS - how much memory the job used
- AllocCPUs - how many cores the job was allocated
See also: sacct man page
Monitoring memory usage
It can be useful to find the memory usage of your jobs. For example, you may want to find out if memory usage was a reason a job failed.
You can determine a job's memory usage whether it is still running or has finished. To determine if your job is still running, use the squeue
command.
squeue -j nnnnnn -O state
where nnnnnn is the jobid.
For running jobs: srun
and top
or sstat
You can use the srun
and top
commands to determine the amount of memory being used.
srun --jobid=nnnnnn top -b -n 1 | grep userid
For nnnnnn substitute the jobid of your job. For 'userid' substitute your userid. The RES field in the output from top
shows the actual amount of memory used by a process. The top
man page can be used to identify the fields in the output of the top
command.
You can also use the sstat
command to determine the amount of memory being used in a running job
sstat -j nnnnnn.batch --format=JobID,MaxRss
where nnnnnn is your jobid.
- See the man page for sstat for more information.
For jobs that are finished: sacct or job_info
If you are checking within a day or two after your job has finished you can issue the command
sacct -j nnnnnn --format=JobID,MaxRss
If this command no longer shows a value for MaxRss, use the job_info
command
job_info nnnnnn | grep max_rss
Substitute your jobid for nnnnnn in both of these commands.
- See the man page for sacct for more information.
See also: documentation for SLURM, including man pages for all the SLURM commands
OnDemand
The OnDemand interface allows you to conduct your research on Bridges through a web browser. You can manage files - create, edit and move them - submit and track jobs, see job output, check the status of the queues, run a Jupyter notebook through JupyterHub and more, without logging in to Bridges via traditional interfaces.
OnDemand was created by the Ohio Supercomputer Center (OSC). This document provides an outline of how to use OnDemand on Bridges. For more help, check the extensive documentation for OnDemand created by OSC, including many video tutorials, or email bridges@psc.edu.
This document covers these topics:
- Start OnDemand
- Manage Files
- Create and edit jobs
- Submit jobs to Bridges
- JupyterHub, IJulia
- RStudio
- Shell Access
- Miscellaneous: Accessing Bridges documentation, changing your PSC password
Start OnDemand
To connect via OnDemand, point your browser to https://ondemand.bridges.psc.edu.
- You will be prompted for a username and password. Enter your PSC username and password.
- The OnDemand Dashboard will open. From this page, you can use the menus across the top of the page to manage files and submit jobs to Bridges.
To end your OnDemand session, choose Log Out at the top right of the Dashboard window and close your browser.
Manage files
To create, edit or move files, click on the Files menu from the Dashboard window. A dropdown menu will appear, listing all your file spaces on Bridges: your home directory and the pylon directories for each of your Bridges' grants.
Choosing one of the file spaces opens the File Explorer in a new browser tab. The files in the selected directory are listed. No matter which directory you are in, your home directory is displayed in a panel on the left.
There are two sets of buttons in the File Explorer.
Buttons on the top left just below the name of the current directory allow you to View, Edit, Rename, Download, Copy or Paste (after you have moved to a different directory) a file, or you can toggle the file selection with (Un)Select All.
Buttons on the top of the window on the right perform these functions:
- Go To Navigate to another directory or file system
- Open in Terminal Open a terminal window on Bridges in a new browser tab
- New File Creates a new empty file
- New Dir Create a new subdirectory
- Upload Copies a file from your local machine to Bridges
- Show Dotfiles Toggles the display of dotfiles
- Show Owner/Mode Toggles the display of owner and permisson settings
Create and edit jobs
You can create new job scripts and edit existing scripts, and submit those scripts to Bridges through OnDemand.
From the top menus in the Dashboard window, choose Jobs > Job Composer. A Job Composer window will open.
There are two tabs at the top: Jobs and Templates.
In the Jobs tab, a listing of your jobs is gven.
Create a new job script
To create a new job script:
- Select a template to begin with
- Edit the job script
- Edit the job options
Select a template
- Go to the Jobs tab in the Jobs Composer window. You have been given a default template, named Simple Sequential Job.
- To create a new job script, click the blue New Job > From Default Template button in the upper left. You will see a green message at the top of the window, "Job was successfully created".
At the right of the Jobs window, you will see the Job Details, including the location of the script and the script name (by default, main_job.sh). Under that, you will see the contents of the job script in a section titled Submit Script.
Edit the job script
Edit the job script so that it has the commands and workflow that you need.
If you do not want the default settings for a job, you must include options to change them in the job script. For example, you may need more time or more than one node. For the GPU partitions, you must specify the type and number of GPUs you want. For the LM partition, you must specify how much memory you need. Use an SBATCH
directive in the job script to set these options.
You can edit the script in several ways.
- Click the blue Edit Files button at the top of the Jobs tab in the Jobs Composer window
- In the Jobs tab in the Jobs Composer window, find the Submit Script section at the bottom right. Click the blue Open Editor button.
After you save the file, the editor window remains open, but if you return to the Jobs Composer window, you will see that the content of your script has changed.
Edit the job options
In the Jobs tab in the Jobs Composer window, click the blue Job Options button. The options for the selected job such as name, the job script to run, and the account it will run under are displayed and can be edited. Click Save or Cancel to return to the job listing.
Submit jobs to Bridges
Select a job in the Jobs tab in the Jobs Composer window. Click the green Submit button to submit the selected job. A message at the top of the window shows whether the job submission was successful or not. If it is not, you can edit the job script or options and resubmit. When the job submits successfully, the status of the job in the Jobs Composer window will change to Queued or Running. When the job completes, the status will change to Completed.
JupyterHub and IJulia
You can run JupyterHub, and IJulia notebooks, through OnDemand. You must do some setup before the first time you run IJulia through OnDemand.
Setup IJulia for OnDemand use
Note: You only need to do this once.
While logged in to Bridges, request an interactive session with access to sites external to Bridges by typing:
interact --egress
Once the sesstion starts, type these commands:
module load anaconda3
module load julia
julia
When Julia starts, type
Pkg.add("IJulia")
When you see the message that IJulia has been installed, you can end your interactive session.
Select Interactive Apps > Jupyter Notebooks from the top menu in the Dashboard window.
In the screen that opens, specify the time limit, number of nodes, and partition to use. If you have more than one grant on Bridges, you can also designate the account to deduct this usage from. If you will use the LM or one of the GPU partitions, you must add a flag in the Extra Args field for the amount of memory or the number and type of GPUs you want:
--mem=numberGB
--gres=gpu:type:number
See the Running jobs section of this User Guide for more information on Bridges' partitions and the options available.
Click the blue Launch button to start your JupyterHub session. You may have to wait in the queue for resources to be available.
When your session starts, click the blue Connect to Jupyter button. The Dashboard window now displays information about your JupyterHub session including which node it is running on, when it began, and how much time remains.
A new window running JupyterHub also opens. Note the three tabs: Files, Running and Clusters.
Files
By default you are in the Files tab, and it displays the contents of your Bridges home directory. You can navigate through your home directory tree.
Running
Under the Running tab, you will see listed any notebooks or terminal sessions that you are currently running.
Now you can start a Jupyter or IJulia notebook:
a. To start a Jupyter notebook which is stored in your home directory space, in the Files tab, click on its name. A new window running the notebook opens.
b. To start a Jupyter notebook which is stored in your pylon5 directory, you must first create a symbolic link to it from your home directory. While in your home directory, use a command like:
ln -s /pylon5/yourgroup/youruserid PYLONDIR
When you enter JuypterHub, you will see the entry PYLONDIR in your list of files under the Files tab. Click on this to be moved to your pylon5 directory.
c. To start IJulia, in the Files tab, click on the New button at the top right of the file listing. Choose IJulia from the drop down.
Errors
If you get an "Internal Server Error" when starting a JupyterHub session, you may be over your home directory quota. Check the Details section of the error for a line like:
ActionView::Template::Error: Disk quota exceeded @ dir_s_mkdir ...
You can confirm that you are over quota by opening a Bridges shell access window and typing:
du -sh
This command shows the amount of storage in your home directory. Home directory quotas are 10GB. If du -sh
shows you are near 10GB, you should delete or move some files out of your home directory. You can do this in OnDemand in the File Explorer window or in a shell access window.
When you are under quota, you can try starting a JupyterHub session again.
Stopping your JupyterHub session
In the Dashboard window, click the red Delete button.
RStudio
You can run RStudio through OnDemand.
- Select Interactive Apps > RStudio Server from the top menu in the Dashboard window.
- In the screen that opens, specify the time limit, number of nodes, and partition to use. You can also designate the account to apply this usage to if you have more than one grant on Bridges.
Use the Extra Args field to ask for specific resources. If you will use the LM partition, you must add a flag in the Extra Args field for the amount of memory you need:
--mem=numberGB
If you will use one of the GPU partitions, you must add a flag in the Extra Args field for the number and type of GPUs you want:--gres=gpu:type:number
If you want to add additional external packages to your User Library, you must use the `-C EGRESS` flag in the __Extra Args__ field to allow access to external sites:-C EGRESS
See the Running jobs section of this User Guide for more information on Bridges' partitions and the options available. - Click the blue Launch button to start your RStudio session. You may have to wait in the queue for resources to be available.
- When your session starts, click the blue Connect to RStudio Server button. A new window opens with the RStudio interface.
Installed Packages
The Packages tab in the lower right pane of the RStudio interface lists all the packages currently installed in your User Library and in the System Library. To install additional packages into your user library location, click the Install link under the Packages tab. A pop-up window will open asking where to find the package (either the CRAN repository or a Package Archive file).
If you choose CRAN repository, you can type the names of the package(s) you want in the Packages field.
If you choose Package Archive File, you can browse for the file you want. By default your home directory space on Bridges will be shown. To browse through your pylon5 space, click on "..." at the right end of the Home row and enter pylon5/yourgroup in the Path to Folder field in the pop-up window.
Errors
If you exceed the time limit you requested when setting up your RStudio session, you will see this error:
Error: Status code 503 returned
To continue using RStudio, go to Interactive Apps > RStudio from the top menu in the Dashboard window and start a new session.
Stopping your RStudio session
To end your RStudio session, either select File > Quit Session or click the red icon in the upper right of your RStudio window. NOTE that this only closes your RStudio session; it does not close your interactive Bridges session. You are still accruing Service Units. If you like, you can start another RStudio session.
To end your interactive Bridges session so that you are no longer using Service Units, return to the Dashboard window and click the red Delete button.
Shell access
You can get shell access to Bridges by choosing Clusters >> Bridges Shell Access from the top menus in the Dashboard window. In the window that opens, you are logged in to one of Bridges' login nodes as if you used ssh to connect to Bridges.
Accessing Bridges documentation
In the Dashboard window, under the Help menu, choose Online Documentation to be taken to the Bridges User Guide.
Change your PSC password
In the Dashboard window, under the Help menu, choose Change HPC Password to be taken to the PSC password change utility.
Using Bridges GPU nodes
Two GPU resources are part of Bridges: "Bridges AI" and "Bridges GPU". When you receive a Bridges allocation, the resource you are allocated determines which set of GPU nodes you have access to.
Bridges' GPU nodes have either NVIDIA Tesla K80, P100 or V100 GPUs, providing substantial, complementary computational power for deep learning, simulations and other applications.
A standard NVIDIA accelerator environment is installed on Bridges' GPU nodes. If you have programmed using GPUs before, you should find this familiar. Please contact bridges@psc.edu for more help.
GPU Nodes
Bridges AI
The Bridges AI resource consists of ten nodes.
An NVIDIA DGX-2 enterprise research AI system which tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs with 32GB of GPU memory each (512GB/node). The DGX-2 also holds two Intel Xeon Platinum 8168 CPUs with 24 cores/CPU (48 cores total) and 1.5TB RAM. The GPUs are connected by NVLink and NVSwitch, to provide maximum capability for the most demanding of AI challenges.
9 NVIDIA Tesla V100 GPU nodes, each with 8 GPUs with 16GB of GPU memory each (128GB/node), on HPE Apollo 6500 servers with 2 Intel Xeon Gold 6148, 20 cores/CPU (40 cores total) and 192GB RAM. The GPUs are connected by NVLink 2.0, to balance great AI capability and capacity.
Bridges GPU
The Bridges GPU resource consists of 48 nodes.
- 16 NVIDIA Tesla K80 GPU nodes on HPE Apollo servers. Each holds two NVIDIA K80 GPU cards with 24GB of GPU memory (48GB/node), and each card contains two GPUs that can be individually scheduled. Each node also has 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU) and 128GB RAM.
Ideally, the GPUs are shared in a single application to ensure that the expected amount of on-board memory is available and that the GPUs are used to their maximum capacity. This makes the K80 GPU nodes optimal for applications that scale effectively to 2, 4 or more GPUs. Some examples are GROMACS, NAMD and VASP. Applications using a multiple of 4 K80 GPUs will maximize system throughput.
- 32 NVIDIA Tesla P100 GPU nodes on HPE Apollo 2000 servers. Each node contains 2 NVIDIA P100 GPU cards, and each card holds one very powerful GPU and 16GB of GPU memory (32GB/node). Each nodes also has 2 Intel Xeon E5-2683 v4 CPUs (16 cores per CPU) and 128GB RAM.
These nodes are optimally suited for single-GPU applications that require maximum acceleration. The most prominent example of this is deep learning training using frameworks that do not use multiple GPUs.
See the System configuration section for hardware details for all GPU node types.
File Systems
The /home
and pylon5
file systems are available on all of these nodes. See the File Spaces section for more information on these file systems.
Compiling and Running Jobs
Use the GPU partition, either in batch or interactively, to compile your code and run your jobs. See the Running Jobs section for more information on Bridges partitions and how to run jobs.
CUDA
More information on using CUDA on Bridges can be found in the PSC CUDA document.
To use CUDA, first you must load the CUDA module. To see all versions of CUDA that are available, type:
module avail cuda
Then choose the version that you need and load the module for it.
module load cuda
loads the default CUDA. To load a different version, use the full module name.
module load cuda/8.0
CUDA 8 codes should run on both types of Bridges GPU nodes with no issues. CUDA 7 should only be used on the K80 GPUs (Phase 1). Performance may suffer with CUDA 7 on the P100 nodes (Phase 2).
OpenACC
Our primary GPU programming environment is OpenACC.
The PGI compilers are available on all GPU nodes. To set up the appropriate environment for the PGI compilers, use the module
command:
module load pgi
Read more about the module command at PSC.
If you will be using these compilers often, it will be useful to add this command to your shell initialization script.
There are many options available with these compilers. See the online man pages
man pgf90
man pgcc
man pgCC
for detailed information.
You may find these basic OpenACC options a good place to start:
pgcc -acc yourcode.c
pgf90 -acc yourcode.f90
P100 node users should add the -ta=tesla,cuda8.0
option to the compile command, for example:
pgcc -acc -ta=tesla,cuda8.0 yourcode.c
Adding the -Minfo=accel
flag to the compile command (whether pgf90
, pgcc
or pgCC
) will provide useful feedback regarding compiler errors or success with your OpenACC commands.
pgf90 -acc -Minfo=accel yourcode.f90
Hybrid MPI/GPU Jobs
To run a hybrid MPI/GPU job, use the following commands for compiling your program:
module load cuda
module load mpi/pgi_openmpi
mpicc -acc yourcode.c
When you execute your program you must first issue the above two module load
commands.
Profiling and Debugging
For CUDA codes, use the command line profiler nvprof
. See the CUDA document for more information.
For OpenACC codes, the environment variables PGI_ACC_TIME
, PGI_ACC_NOTIFY
and PGI_ACC_DEBUG
can provide profiling and debugging information for your job. Specific commands depend on the shell you are using.
Unix shells
A Unix shell is a command-line interpreter that provides a traditional user interface for Unix and Unix-like systems.
The two major shell types are the Bourne shell and the C shell. Each type has its own commands and syntax.
The default shell on Bridges is bash, a Bourne-type shell. Other shells, including some C-type shells, are available for you to use if you prefer.
Performance profiling
Enable runtime GPU performance profiling:
export PGI_ACC_TIME=1 [bash]
setenv PGI_ACC_TIME 1 [csh]
Debugging
Basic debugging; For data transfer information, set PGI_ACC_NOTIFY to 3:
export PGI_ACC_NOTIFY=1 [bash]
setenv PGI_ACC_NOTIFY 1 [csh]
More detailed debugging:
export PGI_ACC_DEBUG=1 [bash]
setenv PGI_ACC_DEBUG 1 [csh]
Hadoop and Spark
If you want to run Hadoop or Spark on Bridges, you should note that when you apply for your account.
/home
The /home
file system, which contains your home directory, is available on all Bridges Hadoop nodes.
HDFS
The Hadoop filesystem, HDFS, is available from all Hadoop nodes. There is no explicit quota for the HDFS, but it uses your $SCRATCH
disk space. Please delete any files you don't need when your job has ended.
Files must reside in HDFS to be used in Hadoop jobs. Putting files into HDFS requires these steps:
- Transfer the files to the namenode with
scp
orsftp
- Format them for ingestion into HDFS
- Use the
hadoop fs -put
command to copy the files into HDFS. This command distributes your data files across the cluster's datanodes.
The hadoop fs
command should be in your command path by default.
Documentation for the hadoop fs
command lists other options. These options can be used to list your files in HDFS, delete HDFS files, copy files out of HDFS and other file operations.
To request the installation of data ingestion tools on the Hadoop cluster send email to bridges@psc.edu.
Accessing the Hadoop/Spark cluster
To start using Hadoop and Spark with Yarn and HDFS on Bridges, connect to the login node and issue the following commands:
interact -N 3 # you will need to wait until resources are allocated to you before continuing
module load hadoop
start-hadoop.sh
Your cluster will be set up and you'll be able to run hadoop and spark jobs. The cluster requires a minimum of three nodes (-N 3). Larger jobs may require a reservation. Please contact bridges@psc.edu if you would like to use more than 8 nodes or run for longer than 8 hours.
Please note that when your job ends, your HDFS will be unavailable so be sure to retrieve any data you need before your job finishes.
Web interfaces are currently not available for interactive jobs but can be made available for reservations.
Spark
The Spark data framework is available on Bridges. Spark, built on the HDFS filesystem, extends the Hadoop MapReduce paradigm in several directions. It supports a wider variety of workflows than MapReduce. Most importantly, it allows you to process some or all of your data in memory if you choose. This enables very fast parallel processing of your data.
Python, Java and Scala are available for Spark applications. The pyspark
interpreter is especially effective for interactive, exploratory tasks in Spark. To use Spark you must first load your data into Spark's highly efficient file structure called Resilient Distributed Dataset (RDD).
Extensive online documentation is available at the Spark website. If you have questions about or encounter problems using Spark, send email to bridges@psc.edu.
Spark example using Yarn
Here is an example command to run a Spark job using yarn. This example calculates pi using 10 iterations.
spark-submit --class org.apache.spark.examples.SparkPi --master yarn \
--deploy-mode cluster $SPARK_HOME/examples/jars/spark-examples_2.11-2.1.0.jar 10
To view the full output:
yarn logs -applicationId yarnapplicationId
where yarnapplicationId is the yarn applicationId assigned by the cluster.
A Simple Hadoop Example
This section demonstrates how to run a MapReduce Java program on the Hadoop cluster. This is the standard paradigm for Hadoop jobs. If you want to run jobs using another framework or in other languages besides Java send email to bridges@psc.edu for assistance.
Follow these steps to run a job on the Hadoop cluster. All the commands listed below should be in your command path by default. The variable HADOOP_HOME should be set for you also.
- Compile your Java MapReduce program with a command similar to:
hadoop com.sun.tools.javac.Main WordCount WordCount.java
where:
- WordCount is the name of the output directory where you want your class file to be put
- WordCount.java is the name of your source file
- Create a jar file out of your class file with a command similar to:
jar -cvf WordCount.jar -C WordCount/ .
where:
- WordCount.jar is the name of your output jar file
- WordCount is the name of the directory which contains your class file
- Make an input directory in the HDFS, if it doesn't already exist:
hdfs dfs -mkdir -p /datasets
- Transfer an input file to the /datasets directory in HDFS
hdfs dfs -put /home/training/hadoop/datasets/compleat.txt /datasets
- Launch your Hadoop job with the "
hadoop
" commandOnce you have your jar file you can run the
hadoop
command to launch your Hadoop job. Your hadoop command will be similar tohadoop jar WordCount.jar org.myorg.WordCount \/datasets/compleat.txt $MYOUTPUT
where:
- Wordcount.jar is the name of your jar file
- org.myorg.WordCount specifies the folder hierarchy inside your jar file. Substitute the appropriate hierarchy for your jar file.
- /datasets/compleat.txt is the path to your input file in the HDFS file system. This file must already exist in HDFS.
- $MYOUTPUT is the path to your output file, which will be saved in the HDFS file system. You must set this variable to the output file path before you issue the hadoop command.
After you issue the hadoop
command your job is controlled by the Hadoop scheduler to run on the datanodes. The scheduler is currently a stricty FIFO scheduler. If your job turnaround is not meeting your needs send email to bridges@psc.edu.
When your job finishes, the hadoop
command will end and you will be returned to the system prompt.
Other Hadoop Technologies
An entire ecosystem of technologies has grown up around Hadoop, such as HBase and Hive. To request the installation of a different package send email to bridges@psc.edu.
Containers
Containers are stand-alone packages holding the the software needed to create a very specific computing environment. If you need a very specialized environment, you can create your own container or use one that is already installed on Bridges. Singularity is the only type of container supported on Bridges.
However, in many cases, Bridges has all the software you will need. Before creating a container for your work, check the extensive list of software that has been installed on Bridges. While logged in to Bridges, you can also get a list of installed packages by typing
module avail
If you need a package that is not available on Bridges, you can request that it be installed by emailing bridges@psc.edu. You can also install software packages in your own file spaces and, in some cases, we can provide assistance if you encounter difficulties.
Containers available on Bridges
We have installed many containers from the NVIDIA GPU Cloud (NGC) on Bridges. These containers are fully optimized, GPU-accelerated environments for AI, machine learning and HPC. They can be used on the Bridges-AI (Volta 16 and DGX-2) nodes and on some RM-GPU nodes (P100 GPU only).
See the PSC documentation on Singularity for more details on Singularity use on Bridges.
Creating a container
Singularity is the only container software supported on Bridges. You can create a Singularity container, copy it to Bridges and then execute your container on Bridges, where it can use Bridges' compute nodes and filesystems. In your container you can use any software required by your application: a different version of CentOS, a different Unix operating system, any software in any specific version needed. You can install your Singularity container without any intervention from PSC staff.
See the PSC documentation on Singularity for more details on producing your own container and Singularity use on Bridges.
Singularity images
We have installed many of the NVIDIA GPU Cloud (NGC) containers as Singularity images on Bridges. These containers have been optimized for Volta and Pascal architectures by NVIDIA, including rigorous quality assurance.
These containers can all be found on Bridges in the directory /pylon5/containers/ngc/package-name, e.g., /pylon5/containers/ngc/caffe. A link to further documentation is included below the table for each package. The naming convention for these containers includes the year of the release as the first two digits and month as the next two, so release 18.10 was created in October 2018.
See also:
- Additional information about the NVIDIA NGC Registry
- Much more detail on the NGC containers
Caffe version 0.17.1 Available in /pylon5/containers/ngc/caffe | ||
---|---|---|
Image name | Python version | Other supported software |
18.10-py2.simg | 2.7 |
|
18.09-py2.simg | 2.7 | |
18.08-py2.simg | 2.7 |
|
18.07-py2.simg | 2.7 |
Caffe2 version 0.81 Available in /pylon5/containers/ngc/caffe2 | ||
---|---|---|
Image name | Python version | Other supported software |
18.08-py3.simg | 3.5 |
|
18.08-py2.simg | 2.7 | |
18.07-py3.simg | 3.5 | |
18.07-py2.simg | 2.7 |
Microsoft Cognitive Toolkit (formerly CNTK) version 2.5 Available in /pylon5/containers/ngc/cntk | ||
---|---|---|
Image name | Python version | Other supported software |
18.08.py3.simg | 3.6 |
|
18.07.py3.simg |
DIGITS version 6.1.1 Available in /pylon5/containers/ngc/digits | ||
---|---|---|
Image name | Python version | Other supported software |
18.10.simg | 2.7 |
|
18.09.simg |
Inference Server Available in /pylon5/containers/ngc/inferenceserver | ||
---|---|---|
Image name | Python version | Other supported software |
18.08.1-py3.simg | 3.5 | |
18.08.1-py2.simg | 2.7 | |
18.08-py3.simg | 3.5 |
|
18.08-py2.simg | 2.7 |
MATLAB Available in /pylon5/containers/mdl | |||
---|---|---|---|
Image name | |||
matlab_r2019a.sif |
MXNet Available in /pylon5/containers/ngc/mxnet | |||
---|---|---|---|
Image name | MXNet version | Python version | Other supported software |
18.10-py3.simg | 1.3.0 | 3.5 |
|
18.09-py3.simg | 1.3.0 | 3.5 |
|
18-08.py2.simg | 1.2.0 | 2.7 |
|
18.07-py2.simg | 1.2.0 | 2.7 |
|
PyTorch version 0.41+ Available in /pylon5/containers/ngc/pytorch | ||
---|---|---|
Image name | Python version | Other supported software |
18.10-py3.simg | 3.6 |
|
18.09-py3.simg |
|
Tensorflow Available in /pylon5/containers/ngc/tensorflow | |||
---|---|---|---|
Image name | Tensorflow version | Python version | Other supported software |
19.11-tf2-py3.simg | 1.13.1 | 3.6 |
|
tensorflow-19.05-py3.simg | 3.5 |
| |
tensorflow-19.05-py2.simg | 2.7 | ||
18.10-py3.simg | 1.10.0 | 3.5 |
|
18.10-py2.simg | 2.7 | ||
18.09-py3.simg | 3.5 |
| |
18.09-py2.simg | 2.7 |
TensorRT version 5.0.0 RC Available in /pylon5/containers/ngc/tensorrt | ||
---|---|---|
Image name | Python version | Other supported software |
18.10-py3.simg | 3.6 | — |
18.10-py2.simg | 2.7 | — |
18.09-py3.simg | 3.6 | — |
18.09-p2.simg | 2.7 | — |
TensorRT Inference Server Available in /pylon5/containers/ngc/tensorrtserver | |||
---|---|---|---|
Image name | TensorRT Inference Server version | Python version | Other supported software |
18.10-py3.simg | 0.7 Beta | 3.5 | |
18.09-py3.simg | 0.6 Beta |
Theano version 1.02 Available in /pylon5/containers/ngc/theano | ||
---|---|---|
Image name | Python version | Other supported software |
18.08.simg | 2.7 | — |
18.07.simg |
Torch version 7 Available in /pylon5/containers/ngc/torch | ||
---|---|---|
Image name | Python version | Other supported software |
18.08-py2.simg | 2.7 | — |
18.07-py2.simg |
Virtual Machines
A Virtual Machine (VM) is a portion of a physical machine that is partitioned off through software so that it acts as an independent physical machine.
You should indicate that you want a VM when you apply for time on Bridges.
When you have an active Bridges grant, use the VM Request form to request a VM. This form requests information about the software and hardware resources you need for your VM and your reason for requesting a VM. Your request will be evaluated by PSC staff for its suitability. You will be contacted in one business day about your request.
Why use a VM?
If you need a persistent environment you need to use a virtual machine (VM). Examples of a need for a persistent environment are a Web server with a database backend or just a persistent database.
If you can use a Singularity container rather than a VM you should use the container. You can set up your Singularity container yourself without any intervention by PSC staff. Also, a VM is charged for usage the entire time the VM is set up, whether or not it is being actively used. Containers are only charged for the time during which they are executing, because they are not persistent computing environments. A Singularity environment only exists while you are executing it. Thus, VMs are much more expensive in terms of SUs used than Singularity containers.
A VM provides you with control over your environment, but you will have access to the computing power, memory capacity and file spaces of Bridges.
Common uses of VMs include hosting database and web servers. These servers can be restricted just to you or you can open them up to outside user communities to share your work. You can also connect your database and web servers and other processing components in a complex workflow.
VMs provide several other benefits. Since the computing power behind the VM is a supercomputer, sufficient resources are available to support multiple users. Since each VM acts like an independent machine, user security is heightened. No outside users can violate the security of your independent VM. However, you can allow other users to access your VM if you choose.
A VM can be customized to meet your requirements. PSC will set up the VM and give you access to your database and web server at a level that matches your requirements.
To discuss whether a VM would be appropriate for your research project, send email to bridges@psc.edu.
Downtime
VMs are affected by system downtime, and will not be available during an outage. Scheduled downtimes are announced in advance.
Data backups
It is your responsiblity to backup any important data to another location outside of the VM. PSC will make infrequent snapshots of VMs for recovery from system failure, but cannot be responsible for managing your data.
Grant expiration
When your grant expires, your VM will be suspended. You have a 3-month grace period to request via email to bridges@psc.edu that it be reactivated so that you can move data from the VM. Three months after your grant expires, the VM will be removed. Please notify bridges@psc.edu if you need help moving your data during the grace period.
You can request a VM by submitting the VM Request form.
Data Collections
A community dataset space allows Bridges users from different grants to share data in a common space. Bridges hosts both public and private datasets, providing rapid access for individuals, collaborations and communities with appropriate protections.
Community datasets are appropriate when data will be shared amongst Bridges' groups. Any data that should only be accessed by one group should be stored in that group's pylon5 space.
If you have a dataset for use by multiple groups on Bridges, request that it be stored in the community dataset space by completing the Community Dataset Request form. If your data collection has security or compliance requirements, you should indicate so on the form, or you can contact compliance@psc.edu.
Public datasets
Some data collections are available to anyone with a Bridges account. They include:
ImageNet
ImageNet is an image dataset organized according to WordNet hierarchy. See the ImageNet website for complete information.
Available on Bridges at /pylon5/datasets/community/imagenet
Natural Languge Tool Kit Data
NLTK comes with many corpora, toy grammars, trained models, etc. A complete list of the available data is posted at: https://www.nltk.org/nltk_data/
NLTK is available on Bridges at /pylon2/datasets/community/nltk
.
MNIST
Dataset of handwritten digits used to train image processing systems.
Available on Bridges at /pylon5/datasets/community/mnist
Genomics Data
Several genomics datasets are publicly available.
- BLAST
- The BLAST databases can be accessed through the environment variable
$BLASTDB
after loading theBLAST
module. - CAMI
- CAMI (Critical Assessment of Metagenome Interpretation) is a community-led initiative designed to help tackle challenges in metagenome assembly and analysis by aiming for an independent, comprehensive and bias-free evaluation of methods. Data from the first CAMI challenge is available at
/pylon5/datasets/community/genomics/cami.
- RepBase
- RepBase is the most commonly used database of repetitive DNA elements. You must register with RepBase and send proof of registration in order to use the RepBase database.
- UCSC
- The University of California Santa Cruz reference genomes are available at
/pylon5/datasets/community/genomics/UCSC
. The collection includes human, mouse and drosophila genomes. - Other genomics datasets
- Other available datasets are typically used with a particular genomics package. These include:
Barrnap | /pylon5/datasets/community/genomics/barrnap |
CheckM | /pylon5/datasets/community/genomics/checkm |
Dammit | /pylon5/datasets/community/genomics/dammit |
Dammit uniref90 | /pylon5/datasets/community/genomics/dammit_uniref90 |
Homer | /pylon5/datasets/community/genomics/homer |
Kraken | /pylon5/datasets/community/genomics/kraken |
Long Ranger | /pylon5/datasets/community/genomics/longranger |
MetaPhlAn2 | /pylon5/datasets/community/genomics/metaphlan2 |
Phylosift | /pylon5/datasets/community/genomics/phylosift |
Prokka | /pylon5/datasets/community/genomics/prokka |
Other useful datasets
A list of datasets that may be useful follows. These datasets are not currently installed on Bridges, but can be copied to your pylon5 space, or if you think they would be useful to many Bridges' users, you can request that they be installed in a public space.
Keras Datasets for Import
Keras datasets are available from https://keras.io/datasets.
- CIFAR10 small image classification
- CIFAR100 small image classification
- IMDB Movie reviews sentiment classification
- Reuters newswire topics classification
- MNIST database of handwritten digits
- Fashion-MNIST database of clothing
- Boston housing price regression dataset (from CMU)
Gateways
Bridges hosts a number of gateways - web-based, domain-specific user interfaces to applications, functionality and resources that allow users to focus on their research rather than programming and submitting jobs. Gateways provide intuitive, easy-to-use interfaces to complex functionality and data-intensive workflows.
Gateways can manage large numbers of jobs and provide collaborative features, security constraints and provenance tracking, so that you can concentrate on your analyses instead of on the mechanics of accomplishing them.
Among the gateways implemented on Bridges are:
Galaxy, an open source, web-based platform for data intensive biomedical research.
Researchers preparing de novo transcriptome assemblies via the popular Galaxy platform for data-intensive analysis have transparent access to Bridges, without the need to obtain their own XSEDE allocation. Bridges is ideal for rapid assembly of massive RNA sequence data.
A high-performance Trinity tool has been installed on the public Galaxy Main instance at usegalaxy.org. All Trinity jobs in workflows run from usegalaxy.org will execute transparently on large memory nodes on Bridges. These tools are free to use for open scientific research.
Run Trinity jobs on Bridges: https://usegalaxy.org.
General information on Galaxy: https://galaxyproject.org.
SEAGrid, the Science and Engineering Applications Grid, provides access for researchers to scientific applications across a wide variety of computing resources. SEAGrid also helps with creating input data, producing visualizations and archiving simulation data.
Read more about SEAGrid: https://seagrid.org/home.
The Causal Web Portal, from the Center for Causal Discovery, offers easy to use software for causal discovery from large and complex biomedical datasets, applying Bayesan and constraint based algorithms. It includes a web application as well as APIs and a command line version.
Read more about the Causal Web Portal: http://www.ccd.pitt.edu/tools.
Access the Causal Web Portal on Bridges: https://ccd2.vm.bridges.psc.edu/ccd/login.
Acknowledgement in Publications
All publications, copyrighted or not, resulting from an allocation of computing time on Bridges should include an acknowledgement. Please acknowledge both the funding source that supported your access to PSC and the specific PSC resources that you used.
Please also acknowledge support provided by XSEDE's ECSS program and/or PSC staff when appropriate.
Proper acknowledgment is critical for our ability to solicit continued funding to support these projects and next generation hardware.
For suggested text and citations, see:
- XSEDE-supported research on Bridges
- Other (non-XSEDE) supported research on Bridges
- Support provided by ECSS
- Support provided by PSC staff
XSEDE supported research on Bridges
We ask that you use the following text:
This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).
Please include these citations:
Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., Hazlewood, V., Lathrop, S., Lifka, D., Peterson, G.D., Roskies, R., Scott, J.R. and Wilkens-Diehr, N. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering. 16(5):62-74. IEEE Computer Society.
Nystrom, N. A., Levine, M. J., Roskies, R. Z., and Scott, J. R. 2015. Bridges: A Uniquely Flexible HPC Resource for New Communities and Data Analytics. In Proceedings of the 2015 Annual Conference on Extreme Science and Engineering Discovery Environment (St. Louis, MO, July 26-30, 2015). XSEDE15. ACM, New York, NY, USA. ACM Digital Library.
Additional Support
Please also acknowledge support provided through XSEDE's Extended Collaborative Support Services (ECSS) and/or by PSC staff.
Other research on Bridges
For research on Bridges supported by programs other than XSEDE, such as PRCI, we ask that you use the following text:
This work used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).
Please include this citation:
Nystrom, N. A., Levine, M. J., Roskies, R. Z., and Scott, J. R. 2015. Bridges: A Uniquely Flexible HPC Resource for New Communities and Data Analytics. In Proceedings of the 2015 Annual Conference on Extreme Science and Engineering Discovery Environment (St. Louis, MO, July 26-30, 2015). XSEDE15. ACM, New York, NY, USA. ACM Digital Library.
Additional support
Please also acknowledge any support provided by PSC staff.
ECSS Support
To acknowledge support provided through XSEDE's Extended Collaborative Support Services (ECSS), please use this text:
We thank [consultant name(s)] for [his/her/their] assistance with [describe tasks such as porting, optimization, visualization, etc.], which was made possible through the XSEDE Extended Collaborative Support Service (ECSS) program.
Please include this citation:
Wilkins-Diehr, N and S Sanielevici, J Alameda, J Cazes, L Crosby, M Pierce, R Roskies. High Performance Computer Applications 6th International Conference, ISUM 2015, Mexico City, Mexico, March 9-13, 2015, Revised Selected Papers Gitler, Isidoro, Klapp, Jaime (Eds.) Springer International Publishing. ISBN 978-3-319-32243-8, 3-13, 2016. 10.1007/978-3-319-32243-8.
PSC Support
If PSC staff contributed substantially to software development, optimization, or other aspects of the research, they should be considered as coauthors.
When PSC staff contributions do not warrant coauthorship, please acknowledge their support with the following text:
We thank consultant name(s) for his/her/their assistance with describe tasks such as porting, optimization, visualization, etc.
Security guidelines and policies
PSC policies regarding privacy, security and the abuse of PSC resources are documented here. Questions about any of these policies should be directed to PSC User Services.
See also policies for:
Security Measures
Security is very important to PSC. These policies are intended to ensure that our machines are not misused and that your data is secure.
What You Can Do
You play a significant role in security! To keep your account and PSC resources secure, please:
- Be aware of and comply with PSC policies on security, use and privacy found in this document
- Choose strong passwords and don't share them between accounts or with others. More information can be found in the PSC password policies.
- Utilize your local security team for advice and assistance
- Take the online XSEDE Cybersecurity Tutorial. Go to Online Training and click on "XSEDE Cybersecurity (CI-Tutor)"
- Keep your computer properly patched and protected
- Report any security concerns to the PSC help desk ASAP by calling the PSC hotline at: 412-268-6350 or email remarks@psc.edu
What We Will Never Do
- PSC will never send you unsolicited emails requesting confidential information.
- We will also never ask you for your password via an unsolicited email or phone call.
Remember that the PSC help desk is always a phone call away to confirm any correspondence at 412-267-6350.
If you have replied to an email appearing to be from PSC and supplied your password or other sensitive information, please contact the help desk immediately.
What You Can Expect
- We will send you email when we need to communicate with you about service outages, new HPC resources, and the like.
- We will send you email when your password is about to expire and ask you to change it by using the web-based PSC password change utility.
Other Security Policies
- PSC password policies
- Users must connect to PSC machines using ssh in order to avoid remote logins with clear text passwords
- We vigilantly monitor our computer systems and network connections for security violations
- We are in close contact with the CERT Coordination Project with regard to possible Internet security violations
Reporting Security Incidents
To report a security incident you should contact our Hotline at 412-268-6350. To report non-emergency security incidents you can send email to remarks@psc.edu.
PSC resource abuse policy
The Pittsburgh Supercomputing Center's computing resources are vital to the scientific community, and we have a responsibility to ensure that these resources are utilized in a responsible manner. The term computing resources in this case refers to all computers owned or operated by PSC and all hardware, data, software, storage systems and communications networks associated with these computers. Improper use of PSC systems is generally referred to as resource abuse.
Users of PSC's systems are subject to applicable Commonwealth of Pennsylvania and federal laws. Abuse of resources will be referred to the PSC User Services manager and/or the appropriate local, state and federal authorities, at PSC's discretion. Furthermore, PSC may terminate or restrict any user's access to its systems, without prior notice, if such action is necessary to maintain computing availability and security for other users of the systems.
Resource abuse includes, but is not limited to:
- using, or attempting to use, the center's computing resources without prior authorization or for unauthorized purposes
- tampering with or obstructing the operation of the computing resources, or attempting to do so
- inspecting, modifying, distributing, or copying privileged data or software without proper authorization, or attempting to do so
- supplying, or attempting to supply, false or misleading information or identification in order to access PSC's computing resources.
If there is any doubt regarding the legitimacy or authorization of any action, contact PSC User Services.
Privacy
Pittsburgh Supercomputing Center is committed to preserving your privacy. This privacy policy explains exactly what information is collected when you visit our site and how it is used.
This policy may be modified as new features are added to the site. Any changes to the policy will be posted on this page.
- Any data automatically collected from our site visitors - domain name, browser types, etc. - are used only in aggregate to help us better meet site visitors' needs.
- There is no identification of individuals from our aggregate data. Therefore, unless you choose otherwise, you are totally anonymous when visiting our site.
- We do not share data with anyone for commercial purposes.
- If you choose to submit personally identifiable information to us electronically via the PSC feedback page, email, etc., we will treat it with the same respect for privacy afforded to mailed submissions. Submission of such information is always optional.
PSC respects individual privacy and takes great effort in supporting web site privacy policy outlined above. Please be aware, however, that we publish URLs of other sites on our web site that may not adhere to the same policy.
Bridges FAQ
Applying for a Bridges account

Reporting a Problem
To report a problem on Bridges, please email bridges@psc.edu. Please report only one problem per email; it will help us to track and solve any issues more quickly and efficiently.
Be sure to include
- the JobID
- the error message you received
- the date and time the job ran
- any other pertinent information
- a screen shot of the error or the output file showing the error, if possible