XStream User Guide
Last update: September 10, 2018
As of October 1 2018, XStream is decommissioned from XSEDE and is no longer available to users.
XStream is a GPU cluster hosted at the Stanford Research Computing Center and funded by the National Science Foundation's (NSF) Major Research Instrumentation (MRI) Program. Twenty percent of its computational resources are reserved to XSEDE awards.
XStream is a compute cluster specifically designed by Cray for GPU computing, or more precisely, heterogeneous parallel computing with CPUs and GPUs. It differs from traditional CPU only based HPC systems as it has almost a Petaflop (PF) of GPU compute power. Each of the 65 nodes has 8 NVIDIA K80 cards or 16 NVIDIA Kepler GPUs, interconnected through PCI-Express PLX-based switches. Each GPU has 12GB of GDDR5 memory. Compute nodes also feature 2 Intel Ivy-Bridge 10-core CPUs, 256 GB of DRAM and 450 GB of local SSD storage. The system features 1.4 petabytes of 22 GB/s Lustre storage (Cray Sonexion 1600).
The system also includes two login node each of them has a 10 GigE connection to Stanford's campus backbone which has 100 Gbps connectivity to various national research and education networks.
XStream was ranked #87 on the June 2015 Top500 list of fastest supercomputers (using LINPACK benchmark). Even with the extreme GPU computing, near Petaflop density, this system was #6 in the June 2015 Green 500 list and moved to #5 in the November 2015 list.
All XStream nodes run RHEL 6.9 and are managed with batch services through SLURM. Global
$WORK storage area is supported by Lustre parallel distributed file system with 6 IO servers. Inter-node communication (MPI/Lustre) is through a FDR Mellanox InfiniBand network.
2 Login Nodes, each with:
- Two 2.6GHz 6-Core Ivy Bridge-EP E5-2630 v2 Xeon 64-bit Processors
- Two NVIDIA Tesla K80 GPU cards (4 Kepler GPUs)
- 64GB DDR3 1600MT/s DRAM
- 56Gbps Infiniband FDR network interface
- 10 Gigabit Ethernet network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux Server 6.7
65 Compute Nodes, each with
- Cray CS-Storm 2626X8N Compute Node
- Two 2.8GHz 10-Core Ivy Bridge-EP E5-2680 v2 Xeon 64-bit Processors
- Eight NVIDIA Tesla K80 24GB GPU cards (16 x GK210 12GB GPU)
- 256GB DDR3 1600MT/s DRAM
- Three Intel SSD (MLC) in RAID-0 (striped volume) totalling 480GB
- 56Gbps Infiniband FDR network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux Server 6.9
The hardware architecture of the compute nodes is that each CPU socket (PCI root) is connected to PLX switches that connect four K80 cards (eight GPUs) together.
2626X8N Compute Node Diagram
You basically have 2 domains, one for each CPU. You can use the "
lstopo" command on a compute node (eg. xs-0001) for full details of the PCI bus.
If you plan on doing GPU peer-to-peer communication, the "
nvidia-smi" command on a compute node will show you the GPUDirect topology matrix for the system:
xs-0001$ nvidia-smi topo -m
PIX gives you the lower latency, SOC the highest.
Please note that you need to have a job actually running on a compute node with all 16 GPUs allocated in order to SSH to it and see the full GPUDirect topology matrix for the system.
GPUs on the compute nodes have the following static settings:
- Auto Boost ON
- Persistent mode
- Compute Mode: Exclusive Process
To switch your job's GPUs to the Default Compute Mode (shared), use the following Slurm constraint:
#SBATCH -C gpu_shared
While this is usually not recommended to prevent wrong process/GPU assignement, this is required for some applications, like LAMMPS or AMBER in P2P mode.
Please note that GPUs on the login nodes are configured in Default Compute Mode, meaning that multiple contexts are allowed per device.
As of July 2017, the version of the Nvidia driver is 375.66. Stanford continually updates the driver as it is made available by Nvidia.
Login and compute nodes are connected to a private Lustre storage system, a Cray Sonexion appliance, with fast I/O performance. This system is capable of providing more than 22 GB/s of sustained Lustre bandwidth over the Infiniband fabric and about 1.4 PB of usable space.
- 492 x 4 TB SAS hard drive
- 48 x Lustre Object Storage Target (OST) - each 32 TB
- 6 x Embedded Lustre Object Storage Server (OSS) Infiniband FDR 56Gbps
A low-performance 3.3TB NFS-mounted /home disk storage is also available.
Each user on XStream has a home directory referenced by
$HOME with a quota limit of 5GB (not purged). It is a small and low performance NFS storage space used to keep scripts, binaries, source files, small log files, etc.
$HOME filesystem is accessible from any node in the system.
$HOME directory is not intended to be used for computation. The Lustre parallel file system
$WORK is much larger and faster, thus much more suited for computation.
Each project has a shared home directory referenced by
$HOME, it is a NFS storage space used to store small files shared by all members of your primary POSIX group (usually your primary project).
$GROUP_HOME, only the owner of the files can delete them.
Important note on home backups: The system doesn't come with any backup system, however user and group home directories are backed up every night by Stanford Research Computing. Contact the XSEDE helpdesk in order to recover any lost files. We also recommend that you periodically back up your files outside of XStream.
Work is a Lustre file system mounted on
/cstor on any node in the system. This parallel file system has multiple purposes:
- perform fast large I/Os
- store large computational data files
- allow multi-node jobs to write coherent files
Each user has a work directory referenced by
$WORK with a quota limit of 1TB which is not purged. Each project has a shared work directory referenced by
$GROUP_WORK on the same file system with a group quota limit of 50TB. This space is shared by all members of the project.
$GROUP_WORK, only the owner of the files can delete them.
User and group quota values are not cumulative, ie. the first limit reached takes precedence.
Important note on work backup: The parallel file system work is not replicated nor backed up.
A local SSD-based scratch space is available on each compute node (NOT on login nodes). It is made of 3 x Intel SSD (MLC) aggregated using Linux dm-raid for a total of 480 GB per node (447 GB usable) and intended for high IOPS local workload.
To access this local scratch space, please use the
$TMPDIR environment variables. This space will be purged when the compute node reboots or when this space becomes full.
The recommended way to access XStream is through XSEDE Single Sign-On Hub:
ssohub$ gsissh xstream
Assuming you have the proper XSEDE CA certificates installed on your local machine, the following commands authenticate using the XSEDE myproxy server, then connect to the gsissh port 2222 on XStream:
localhost$ myproxy-logon -l userid -s myproxy.xsede.org localhost$ gsissh -p 2222 xstream.stanford.xsede.org
When you log in to
xstream.stanford.xsede.org, you will be assigned one of the two login nodes:
xstream-ln[01-02].stanford.edu. These nodes are identical in both architecture and software environment. Users should normally log in through
xstream.stanford.xsede.org, but may specify one node directly if they see poor performance.
Please, do NOT use the login nodes for computationally intensive processes. These nodes are meant for compilation, file editing, simple data analysis, and other tasks that use minimal compute resources. All computationally demanding jobs should be submitted and run through the batch queuing system. You may however use the few GPUs available on the login nodes to perform simple and short tests. Please note that GPUs on the login nodes are configured in Default Compute Mode, meaning that multiple contexts are allowed per device. They are not suitable for performance evaluation.
XStream's default and supported shell is
bash. Users may still request a shell change (eg.
tcsh) by contacting the XSEDE Help Desk.
Modules provide a convenient way to dynamically change the users' environment through modulefiles. This includes easily adding or removing directories to the "
$PATH" environment variable.
Lmod is used as a replacement to the original module command. For more information, please take a look at the Lmod user guide.
On XStream, modules follow a hierarchical module naming scheme, so only packages that can be directly loaded are displayed by "
You can list all available modules using the "
login1$ module spider
Using the full name will give you details on how to load the module by listing any required dependencies:
login1$ module spider FFTW/3.3.4
Also, you can use "
module list" to see currently loaded packages.
XStream supports Globus services such as Globus Connect and Globus "
globus-url-copy" utility to transfer files to XStream or between XSEDE sites. Please note that common command line utilities such as
rsync are also available to transfer files from XStream to a remote host.
Globus Connect (formerly Globus Online) is recommended for transferring data between XSEDE sites. Globus Connect provides fast, secure transport via an easy-to-use web interface using pre-defined and user-created "endpoints". XSEDE users automatically have access to Globus Connect via their XUP username/password. Other users may sign up for a free Globus Connect Personal account.
XSEDE users may also use Globus's
globus-url-copy command-line utility to transfer data between XSEDE sites.
globus-url-copy, like Globus Connect described above, is an implementation of the GridFTP protocol, providing high speed transport between GridFTP servers at XSEDE sites. The GridFTP servers mount the specific file systems of the target machine, thereby providing access to your files or directories.
This command requires the use of an XSEDE certificate to create a proxy for passwordless transfers. To obtain a proxy, use the 'myproxy-logon' command with your XSEDE User Portal (XUP) username and password to obtain a proxy certificate. The proxy is valid for 12 hours for all logins on the local machine. On XStream, the
myproxy-logon command is available on the login nodes.
xstream-ln01$ myproxy-logon -T -l XUP_username
globus-url-copy invocation must include the name of the server and a full path to the file. The general syntax looks like:
globus-url-copy [options] source_url destination_url
where each XSEDE URL will generally be formatted:
The following command copies
"directory1" from Stanford's XStream to TACC's Stampede system, renaming it to "
directory2". Note that when transferring directories, the directory path must end with a slash ( '/'):
login1$ globus-url-copy -r -vb \ gsiftp://xstream.stanford.xsede.org:2811/`pwd`/directory1/ \ gsiftp://gridftp.stampede.tacc.xsede.org:2811/home/0000/johndoe/directory2/
Command-line transfer utilities supporting standard SSH and grid authentication are offered by the Globus GSI-OpenSSH implementation of OpenSSH. The
gsiftp commands are analogous to the OpenSSH
sftp commands. Grid authentication is provided to XSEDE users by first executing the "
myproxy-logon" command (see above).
You must explicitly connect to port 2222 on XStream. The following command copies "
file1" on your local machine to Stampede renaming it to "
localhost$ gsiscp -oTcpRcvBufPoll=yes -oNoneEnabled=yes \ -oNoneSwitch=yes -P2222 file1 xstream.stanford.xsede.org:file2Please consult Globus' GSI-OpenSSH User's Guide for further info.
This section contains software or libraries available on XStream as of July 2017.
The NVIDIA CUDA Deep Neural Network library (cuDDN) is a GPU-accelerated library of primitives for deep neural networks.
login1$ module load CUDA/7.5.18 cuDNN/5.1-CUDA-7.5.18 login1$ module load CUDA/8.0.44 cuDNN/5.1-CUDA-8.0.44
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
login1$ module load intel/2015.5.223 CUDA/7.5.18 GROMACS/5.1-hybrid
LAMMPS is a classical molecular dynamics simulation code designed to run efficiently on parallel computers. Many of its models have versions that provide accelerated performance on GPUs.
login1$ module load foss/2015.05 LAMMPS/17Nov2016-CUDA-8.0.44-K80
OpenMM is a high performance toolkit for molecular simulation. On XStream, it is compiled against the foss toolchain (GCC 4.9.2) and CUDA 7.0.28.
login1$ module load foss/2015.05 OpenMM/6.3.1
PostgreSQL 9.5.2 with PG-Strom
PostgreSQL is an open source relational database management system (DBMS) developed by a worldwide team of volunteers.
PG-Strom is an extension of PostgreSQL designed to off-load several CPU intensive workloads to GPU devices, to utilize its massive parallel execution capability.
login1$ module load foss/2015.05 CUDA/7.5.18 PostgreSQL/9.5.2-Python-2.7.9 login1$ pg_config --version PostgreSQL 9.5.2 login1$ initdb -D $WORK/postgres/data
$WORK/postgres/data/postgresql.conf" and add the following line to load the PG-Strom extension:
shared_preload_libraries = '$libdir/pg_strom'
Start the PostgreSQL server:
login1$ pg_ctl -D $WORK/postgres/data -l logfile start
Connect using your login name and the password postgres and create the pg_strom extension:
login1$ psql -U $USER postgres psql (9.5.2) Type 'help' for help. postgres=# CREATE EXTENSION pg_strom; CREATE EXTENSION
Stop the PostgreSQL server:
login1$ pg_ctl -D $WORK/postgres/data stop
Please always use PostgreSQL within a job, not on the login nodes.
R is a free software environment for statistical computing and graphics.
login1$ ml foss/2015.05 R/3.2.4-libX11-1.6.3
RStudio IDE is a powerful and productive user interface for R.
login1$ module load foss/2015.05 git/2.4.1 RStudio/0.99.893
TensorFlow is an Open Source Software Library for Machine Intelligence originally developed by Google with CUDA support. To load TensorFlow 1.1 with Python 2.7, use:
login1$ ml tensorflow/1.1.0
To load TensorFlow 1.1 with Python 3.6, use:
login1$ ml tensorflow/1.1.0-cp36
Note: Tensorflow is a special module that will load the foss toolchain automatically.
Newer versions of TensorFlow should be run in Singularity containers.
Usage example without MPI support:
login1$ module load foss/2015.05 Theano/0.9.0-Python-2.7.9-noMPI
Usage example with MPI support:
login1$ module load foss/2015.05 Theano/0.9.0-Python-2.7.9
Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It has support for cuDNN 4.0 and 5.0 and is compiled against the foss toolchain (GCC based).
login1$ module load torch/20160414-cbb5161
Note: Torch is a special module that will load the foss toolchain automatically.
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.
VMD on XStream is built against CUDA 7.0.28 and Nvidia OptiX 3.8.0, enabling the TachyonL-OptiX GPU-accelerated ray tracing renderer available in VMD 1.9.2. At least the following features should also be available: ACTC library support, collective variables, Python support, Pthreads, NetCDF, ImageMagick, ffmpeg and NetPBM.
Prerequisite: use ssh X11 forwarding by adding "
-X" to your
ssh command when connecting to XStream.
For non-computationally expensive tasks with VMD, you may launch VMD from a login node:
login1$ module load foss/2015.05 VMD/1.9.2-Python-2.7.9 login1$ vmd
To perform computationally expensive tasks with VMD, please launch VMD using srun with the X11 option as shown here (example with 1 task, 4 CPUs and 4 GPUs):
login1$ module load foss/2015.05 VMD/1.9.2-Python-2.7.9 login1$ srun --x11=first -n1 -c4 --gres gpu:4 vmd
Please refer to the Modules section to learn how to search for software on XStream.
Compiler toolchains are basically a set of compilers together with libraries that provide additional support that is commonly required to build software. In the HPC world, this usually consists of a library for MPI (inter-process communication over a network), BLAS/LAPACK (linear algebra routines) and FFT (Fast Fourier Transforms).
Compiler toolchains on XStream:
- foss/2015.05: the FOSS toolchain is a GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW3 and ScaLAPACK. This version is based on GCC 4.9.2.
- intel/2015.5.223: Intel Cluster Toolkit Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MPI ' Intel MKL. Based on Intel Parallel Studio XE 2015 update 5.
To load the FOSS toolchain:
login1$ module load foss/2015.05
To load the Intel compiler toolchain:
login1$ module load intel/2015.5.223
Note: loading a toolchain will offer you additional software to load through module. Check module avail once your preferred compiler toolchain is loaded.
CUDA is an essential software for XStream. CUDA is not part of the above compiler toolchains and should be loaded aside. The following versions of CUDA are available on XStream: 6.5.14, 7.0.28, 7.5.18 (default), 8.0.44 and 8.0.61.
For example, to load CUDA 8.0.61, please use:
login1$ module load CUDA/8.0.61
We recommend to use "
-arch=sm_37" to select the architecture specification for Nvidia Tesla K80 GPU.
Computing services for XSEDE users are allocated and charged in Service Units (SUs). On XStream, the rule is simple: 1 SU = 1 GPU hour (GK210 architecture).
XStream SUs charged (GPU hours) = # GPUs * wallclock time
The following job submission rules apply:
- A job should at least request one GPU as CPU-only jobs are not allowed on XStream.
1 CPU per GPU maximum allowed (or 12,800MB node memory/GPU) with the following exceptions:
- Half node exception: If the number of requested GPUs per node is 8 and if the
"--gres-flags=enforce-binding" option is specified, up to 10 CPUs are allowed (or 128,000MB of memory).
- Exclusive node exception: If the number of requested GPUs is 16 per node and if
"--gres-flags=enforce-binding" is specified, up to 20 CPUs are allowed (or 256,000MB of memory).
- Half node exception: If the number of requested GPUs per node is 8 and if the
XStream is a GPU cluster, before using it, please ensure that your codes are making heavy use of GPUs and not CPUs.
Service Units are allocated to XSEDE projects. Each project has a corresponding "account" in SLURM of the form p-grant_number_lowercase. For example, the SLURM account for the XSEDE grant CIE160024 will be p-cie160024. The same naming convention is used for POSIX groups. Users in several projects can select the SLURM account to be charged by using the following job parameter (example):
#SBATCH -A p-cie160024
A single default partition, 'normal', is configured and represents all compute nodes. XStream uses SLURM QoS (Quality of Service) to enforce resource usage limits. The table below shows the current job limits per SLURM QoS:
|SLURM QoS||Max CPUs||Max GPUs||Max Jobs||Max Nodes||Job time limits|
| ||320/user 400/group||256/user 320/group||512/user||16/user 20/group|| Default: 2 hours |
Max: 2 days
| || 20/user 80/group |
200 max total
| 16/user 16/group |
160 max total
|4/user|| 4/user |
64 max total
| Default: 2 hours |
Max: 7 days
XStream runs Simple Linux Utility for Resource Management (SLURM) batch environment and doesn't provide any wrapper commands for now. Please refer to the official SLURM Documentation for more details.
SLURM supports a variety of job submission techniques. By accurately requesting the resources you need, you'll be able to get your work done.
A job consists in two parts: resource requests and job steps. Resource requests consist of a number of CPUs, GPUs, computing expected duration, amount of memory, etc. Job steps describe tasks that must be done, software which must be run.
The typical way of creating a job is to write a submission script. A submission script is a shell script, e.g. a Bash script, whose first comments, if they are prefixed with "
SBATCH", are understood by SLURM as parameters describing resource requests and other submissions options. You can get the complete list of parameters from the
sbatch manpage ("
SLURM will ignore all lines after the first blank line, even the ones containing SBATCH. Always put your SBATCH parameters at the top of your batch script.
The script itself is a job step. Other job steps are created with the "
srun" command. For instance, the following script, hypothetically named "submit.sh":
#!/bin/bash # #SBATCH --job-name=test #SBATCH --output=res.txt # #SBATCH --time=10:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=500 #SBATCH --gres gpu:1 #SBATCH --gres-flags=enforce-binding srun hostname srun sleep 60
This script requests one task with one CPU and one GPU for 10 minutes, along with 500 MB of RAM, in the default partition. The "
--gres-flags=enforce-binding" option will ensure the allocated GPU is locally bound to the allocated CPU, to avoid any slow QPI traffic. When started, the job would run a first job step srun hostname, which will launch the command hostname on the node on which the requested CPU was allocated. Then, a second job step will start the "
Once the submission script is written properly, you need to submit it to SLURM through the "
sbatch" command, which, upon success, responds with the jobid attributed to the job.
login1$ sbatch submit.sh Submitted batch job 4011
The job then enters the queue in the
PENDING state. Once resources become available and the job has highest priority, an allocation is created for it and it goes to the
RUNNING state. If the job completes correctly, it goes to the
COMPLETED state, otherwise, it is set to the
Upon completion, the output file contains the result of the commands run in the script file. In the above example, you can see it with "
Note that you can create an interactive job with the "
salloc" command or by issuing an "
srun" command directly.
When requesting GPUs with the option
--gres gpu:N of
salloc), SLURM will set the
$CUDA_VISIBLE_DEVICES environment variable to store the GPU ids that have been allocated to the job. So for instance, with
--gres gpu:2, depending on the current state of the node GPUs,
$CUDA_VISIBLE_DEVICE could be set to '0,1', meaning that you will be able to use GPU 0 and GPU 1. Most applications automatically detect the existence of
$CUDA_VISIBLE_DEVICES and run on the allocated GPUs, but some don't and allow to explicitly set GPU ids, which would need to be done manually.
scancel *jobid* command with the jobid of the job you want canceled. In the case you want to cancel all your jobs, type
scancel -u $USER. You can also cancel all your pending jobs for instance with
scancel -t PD.
squeue" command shows the list of jobs which are currently running (they are in the
RUNNING state, noted as "
R") or waiting for resources (noted as "
login1$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2252 normal job2 mike PD 0:00 1 (Dependency) 2251 normal job1 mike R 1-16:18:47 1 xs-0022
The above output shows that one job is running, whose name is job1 and whose jobid is 2251. The jobid is a unique identifier that is used by many SLURM commands when actions must be taken about one particular job. For instance, to cancel job job1, you would use
scancel 2251. Time is the time the job has been running until now. Node is the number of nodes which are allocated to the job, while the Nodelist column lists the nodes which have been allocated for running jobs. For pending jobs, that column gives the reason why the job is pending.
As with the "
sinfo" command (below), you can choose what you want
squeue to output with the "
login1$ scontrol show job
To get full details of a pending or running job:
login1$ scontrol show job jobid
You can get near-realtime information about your program (memory consumption, etc.) with the sstat command.
You can get the state of your finished jobs with the "
login1$ sacct JobID JobName Partition Account AllocCPUS State ExitCode |----------- ---------- ---------- ---------- ---------- ---------- -------- 4011 test normal srcc 1 COMPLETED 0:0 4011.batch batch srcc 1 COMPLETED 0:0 4011.0 hostname srcc 1 COMPLETED 0:0 4011.1 sleep srcc 1 COMPLETED 0:0
Use the "
sacct" command with its many options to interface to the SLURM accounting database. Here is an example of getting memory information of your recent past jobs:
login1$ sacct --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize JobID JobName NTasks NodeList MaxRSS MaxVMSize AveRSS AveVMSize |----------- ---------- -------- --------- ---------- ---------- ---------- ---------- 4011 test xs-0024 16? 16? 4011.batch batch 1 xs-0024 1496K 150360K 1496K 106072K 4011.0 hostname 1 xs-0024 0 292768K 0 292768K 4011.1 sleep 1 xs-0024 624K 292764K 624K 100912K
SLURM offers a few commands with many options you can use to interact with the system. For instance, the sinfo command gives an overview of the resources offered by the cluster, while the "
squeue" command shows to which jobs those resources are currently allocated.
sinfo lists the partitions that are available. A partition is a set of compute nodes grouped logically. Typical examples include partitions dedicated to batch processing, debugging, post processing, or visualization.
login1$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST normal* up 2-00:00:00 2 drain xs-[0054,0057] normal* up 2-00:00:00 4 mix xs-[0007,0051-53] normal* up 2-00:00:00 48 alloc xs-[0001-0006,0008-0009,0011-0050] normal* up 2-00:00:00 11 idle xs-[0010,0055-0056,0058-0065]
In the above example, we see one partition normal. This is the default partition as it is marked with an asterisk. In this example, 48 nodes of the normal partition are being used, 4 are in the mix state (partially allocated), 11 are idle (available) and 2 are drained which means some maintenance operation is taking place.
sinfo" command can also output the information in a node-oriented fashion, with the "
-N" argument. Along with the "
-l" option, it will display more information about the nodes: number of CPUs, memory, temporary disk (also called local scratch space), features of the nodes (such as processor type for instance) and the reason, if any, for which a node is down.
Node characteristics and Generic Resources (GRES): SLURM associates to each nodes a set of Features and a set of Generic resources. Features are immutable characteristics of the node (e.g. CPU model, CPU frequency) while generic resources are consumable resources, meaning that as users reserve them, they become unavailable for the others (e.g. GPUs).
To list all node characteristics including GRES, you can use the following command:
login1$ scontrol show nodes
However, the output of this command is quite verbose. So you can also use
sinfo to list GRES of each node using specific output parameters, for example:
login1$ sinfo -o "%10P %8c %8m %11G %5D %N" PARTITION CPUS MEMORY GRES NODES NODELIST test 20 258374 gpu:k80:16 3 xs-[0007,0051,0058] normal* 20 258374 gpu:k80:16 62 xs-[0001-0006,0008-0050,0052-0057,0059-0065]
On XStream, all compute nodes are identical, so no Features are set, only GRES are interesting for jobs allocation as GPUs are handled there. GRES appear under the form resource:type:count. On XStream, resource is always gpu and type is k80, and count is the number of logical K80 GPUs per node (16).
Connecting to the compute nodes using
ssh is allowed when at least one job of yours is running. You have access to some system tools there to debug your program, like "
htop" or "
strace". Debuggers from the compiler toolchains are also available. Don't forget to load the proper compiler toolchain first.
XStream is for authorized users only and all users are expected to comply with all Stanford computing, network and research policies. For more info, see http://acomp.stanford.edu/about/policy and http://doresearch.stanford.edu/policies/research-policy-handbook.
Also please note that XStream is not HIPAA compliant and should not be used to process PHI. See https://privacy.stanford.edu/faqs/hipaa-faqs for more information.