Georgia Tech Keeneland User Guide
NOTICE: Keeneland is decommissioned from XSEDE as of December 31, 2014. Keeneland's login nodes will remain online until February 1, 2015.
Keeneland is a hybrid CPU/GPGPU system for use with codes that can take advantage of GPU accelerators. Keeneland is a Georgia Tech machine administered by The National Institute for Computational Sciences (NICS).
The Keeneland Full Scale (KFS) system was approved by the NSF and deployed late October 2012 as a production XSEDE resource.
The KFS system consists of 264 HP SL250G8 compute nodes, each with two 8-core Intel Sandy Bridge (Xeon E5) processors, three NVIDIA M2090 GPU accelerators, and a Mellanox FDR InfiniBand interconnect, for a total of 264 nodes, 528 CPUs and 792 GPUs. Each node has 32 GB memory.
Compute jobs are charged according to the following equivalencies:
1 node-hr = 16 (KFS) CPU-hrs = 3 GPU-hrs = 3 SUs
By default each user has an NFS home directory with a 2 GB quota. The path to this directory is
/nics/[a-e]/home/$USER. The environment variable,
$HOME, is set to each user's home directory. This directory is generally available by logging in to
login.nics.xsede.org even if Keeneland is not available.
Project directories are available by request for storing source code and other files that need to be shared among a group. These project directories may have larger quotas. Large input and output files should be stored on the Lustre filesystem. For more information, see NICS' Project Directories page.
Each user has a Lustre scratch directory in
/lustre/medusa/$USER. Lustre is a highly scalable cluster file system in which storage of a given file is distributed (or striped) across several hardware locations. This allows larger files than could be stored on any one location, also allowing for much faster transfer speeds if accessed in parallel. Users may increase striping width to improve I/O performance for large files. For more information, see NICS' Lustre page.
There is no quota limit placed on Lustre storage. However, files older than 30 days are eligible to be purged. Any attempt to circumvent the purge policy may lead to account deactivation.
Keeneland may be accessed via
In order to
ssh to Keeneland you must use a one time password (OTP) token. Tokens are mailed to users when accounts are enabled and are only accessible if the user has returned a notarized NICS Token Activation form (emailed to new users). Consult NICS' OTP information here.
login1$ ssh email@example.com
The alternative access method is
gsissh. To access Keeneland in this manner, use the GSI authentication to log in to
keenelandgsi.nics.xsede.org. Here, you will need to have an XSEDE password to authenticate with a myproxy certificate. This is automatically done through the XSEDE user portal.
The table below contains the IP addresses for each of the above protocols.
The first time you log in using OTP authentication, you are required to choose a Personal Identification Number (PIN). You'll be prompted to enter your PIN followed by the numbers on your OTP token. The numbers on your token will change every 30 seconds or so and you can view the time left on that passcode by the bar on the left hand side of the viewing window. For further information on this, please see, https://www.nics.tennessee.edu/getting-started/access#OTPAuthentication).
Tokens may occasionally become disabled for a variety of reasons. When this happens or you have forgotten your PIN, email firstname.lastname@example.org with "Keeneland" in the subject line.
Note: The OTP token is for the specified user only. Sharing with anyone will lead to immediate account deactivation.
Keeneland is a Georgia Tech resource supported by NICS. When using Keeneland or NICS resources, you agree to the following user responsibilities:
- You have the responsibility to protect your account from unauthorized use. Never share login information. If you believe your account has been compromised, immediately notify the XSEDE Help Desk at 866-907-2383.
- You have responsibility for the security of your programs and data.
- You may not copy and/or distribute proprietary software or documentation without the permission of the software. Possession or use of illegally copied software is prohibited; all software must be appropriately acquired and used according to the specific licensing.
- Keeneland resources may only be used by authorized users and is limited to the purpose prescribed in the project award. Use of these resources for processing proprietary information, source code or executable code must be disclosed in the award process and is prohibited unless authorized by the project award. Use of Keeneland resources for export controlled information; source code or executable code is prohibited.
- To ensure protection of data and resources, user activity and files may be monitored, intercepted, recorded, copied, audited, inspected, and disclosed to authorities. By using Keeneland or any NICS system, the user consents to such at the discretion of authorized site personnel.
- Activities in violation of any laws may be reported to the proper authorities for investigation and prosecution. Abusive activity may be reported to your home institution for review and action.
- Keeneland uses the NICS file systems. NICS file systems are generally very reliable, however, data may still be lost or corrupted. Users are responsible for backing up critical data.
- Violations of Keeneland or NICS policy can result in loss of access to Keeneland and NICS resources and possible prosecution. If you have questions, you may contact NICS User Support during normal working hours, 9:00 am - 6:00 pm ET, at 865-241-1504, or contact the XSEDE Help Desk 24/7 at 866-907-2383 or email@example.com.
The default environment for each user is: home directory, Lustre scratch space, and unix group name associated to their project number (assigned by XSEDE).
Keeneland's default shell is
bash. Other shells are available:
zsh. Users may change their default shell in the NICS User Portal, https://portal.nics.tennessee.edu/. You'll need your OTP token to log into the NICS portal.
Display the pre-set environment variables using the Unix
env command. In addition, there are pre-set modules loaded when one logs in. This includes the default intel compiler, Moab, Torque, MKL, CUDA, and MPI libraries. Pre-set environment variables include:
Each time you log in to a resource, a number of scripts run to set up your environment. System startup scripts (which are universal for all users) define the modules command and set a number of environment variables. Note that system startup scripts work for login shells – when you log in with
ssh or start a job. If you want them to run when you start a new shell, it should be made a login shell, for instance
bash -l or
newgrp -. For more information, check the Unix man pages.
Additionally, each user can define their own startup scripts, depending on which shell they use. Bash users will use
.bashrc for non-login shells, and
.bash_profile for login shells (generally, users edit
.bashrc and ensure that
.bash_profile sources that file.
Csh shell users (including
tcsh) will use
The modules software package allows you to dynamically modify your user environment by using modulefiles. Modules are useful for building your applications with a specific compiler and set of libraries on Keeneland. For instance, the default modules include the programming environment, PE-intel, which specifies to other modules that you are using the Intel compiler, as well as intel, which adds the actual Intel compiler binaries to your path. It is recommended to switch to the programming environment you want first, and then load the compiler version and other modules. Here is a short list and description of commonly used module commands. Note, if no version number is given after the package name, it will use the default package.
List available modules:
login1$ module list
login1$ module swap packageA packageB
This will swap packageB for packageA. Useful to change PE- modules to switch compilers and versions of other modules.
List available modules:
login1$ module avail package
If no package is given, it will list all available modules. This command is useful to see which versions of particular software are installed. Try: module avail namd
Display module information:
login1$ module show package
This gives you the information concerning the installed software. You will see the setenv commands that will modify your environment if you decide to load that module. This is useful for two major reasons. First, you can make sure what executable you like to run- there might be a small difference of the executable name on Keeneland versus another machine. Here, you can perform an ls command on the outputted bin directory. Or, if you are using Python, for example, use
which python. This will ensure that python is in your path. On Keeneland, python is always in the path, so this simply ensures that you are getting the version you want. Second, some environment variables could be introduced. For instance, the FFTW module will provide an environment variable that points to the library and include directories – include this variable into your makefile versus including the full path.
NICS currently maintains the following options for file transfer:
Before using GridFTP and
globus-url-copy, check out Getting Started with Globus. A valid myproxy certificate and the loaded Globus module is required. Please see https://www.nics.tennessee.edu/computing-resources/data-transfer/gridftp for more instructions.
These standard UNIX transfer utilities,
rsync can be used to transfer files to and from NICS systems. These utilities are usually already installed on Linux/Unix machines, and there are many command and graphical clients available. Due to familiarity and ease, these may be the best choice for transferring scripts and small files, however, these options can be slow in comparison, and may be ill suited for transferring large amounts of data. More information on these utilities can be found on the XSEDE Data Management & Transfers page as well as in the NICS Kraken user guide.
Users can also use the File Manager from the XSEDE Portal for data transfers.
NICS users can use the Globus Online tool to perform large file transfers, for "drag and drop archiving" to move data between its long-time archival storage and compute systems, making it quite easy to move, back up or restore relevant data using a visual interface. To get started visit http://www.globusonline.org.
The standard Unix tools for copying data,
sftp are recommended for small transfers. For larger transfers,
gridftp is often a better choice. On the other hand,
rsync processes can be very resource intensive for the login nodes and file system, please avoid using
rsync on directories with many files (it maybe killed to prevent a node from failing).
Depending on how the source code for the application you want to use is hosted, various version control programs are available to download the source: commonly Subversion, Git, or Mercurial (see the modules for each).
If one has a lot of files they would like to transfer, they should be packed up in a tar file and then transferred. If you have a lot of data, on Lustre, you may want to ensure that the tar file has a larger stripe count, see http://keeneland.gatech.edu/support/lustre Please do not use many simultaneous tar operations as it can make the node and/or file system unresponsive for other users.
The following compilers are available on the Keeneland system:
Each compiler vendor has a "Programming Environment" module, for example,
PE-intel. This module may be checked by library modules to ensure the correct library build. There are also the compiler modules themselves, for example, intel. If you wish to use something other than the defaults, it is necessary to change the Programming Environment module first, then any library modules (MPI) or compiler versions (gcc/4.4.0).
The GNU compilers are installed in system default locations, and thus are always in the user's PATH, though the PE-gnu module is required in order for mpicc to use gcc.
|New compilers may be installed as they are released, check module avail <intel||pgi> for new versions.|
CUDA is installed as a module, check "
module avail cuda" for available versions. As with compilers, new versions of CUDA we will install as they are released. The CUDA wrapper is called "
nvcc". However, there may be a lag because new CUDA versions often require driver updates.
OpenMPI and MVAPICH2 are available on Keeneland, and available via modules. As with the compilers, check: module avail
Select one of these MPI implementations using a command like:
login1$ module swap openmpi openmpi/1.6-intel
The MPI wrappers used to compile one's code are called mpicc, mpiCC, and mpif90 for C, C++, and fortran programs, respectively.
The common libraries available to users on Keeneland include LAPACK & MAGMA, ScaLAPACK, CUBLAS & BLAS, ACML, CUFFT & FFTW, HDF5, and netCDF. If you would like other libraries not installed, please submit a ticket to firstname.lastname@example.org.
There are GDB, DDT, and valgrind debuggers on Keeneland. If one's job does not require many nodes, a good practice is to run an interactive queue session to debug one's software probelms.
Use TAU to performance tune your code on Keeneland. Georgia Tech has detailed instructions on using TAU on Keeneland.
Once logged in to Keeneland, you are placed on a login node. This should be used for basic tasks such as file editing, code compilation, data backup, and job submission. A job is a simulation (program executable command with proper input and output files) that requests resources (number of nodes and a length of time). The login nodes should not be used to run production jobs. Production work should be performed on the compute nodes.
Keeneland uses Torque (an open source PBS derivative) as its batch queue software, with the Moab scheduler, similar to other systems at NICS. Here's an example batch queue script (see the notes afterward for some explanation). This assumes that you have set up the modules in your
.bash_profile as described in the Modules section of this document.
Jobs can be submitted to the queue via the
qsub command. The batch and interactive sessions are available. Batch mode is the typical method to submit production simulations. If one is not certain on how to construct a proper job executable, it is beneficial to use the interactive queue.
The scheduling policy on Keeneland is designed to facilitate jobs that take advantage of high number of GPUs and the FDR interconnect between nodes. On Tuesdays, Keeneland is taken down for preventative maintenance (PM) if necessary, after which, capability (full-machine) jobs are run. If there is demand for it, capability jobs may be run on Tuesdays even if there is no maintenance. It should be noted that if there is a PM or capability period, the queue will be drained on Monday evening, resulting in a situation where only jobs with short walltimes will be run. Regular production jobs enter either the serial or parallel queues. Since these queues are differentiated by job size, the scheduler will automatically determine the queue for a user submitted job. The table below has an outline of the queue properties:
|NAME||TIME FRAME||NODES AVAILABLE TO THIS QUEUE||MAX JOB TIME||MAX JOB SIZE (NODES)|
|Capability||Tuesdays (following PM)||Exclusive access to compute nodes, 133 node minimum||48 Hours||All available nodes|
|Serial/Parallel||Always except during PM/Capability||Available nodes not part of a current reservation.||48 Hours||132|
|Preventative Maintenance (PM)||Tuesdays beginning at 8 AM||N/A||N/A||N/A|
Jobs are prioritized by (in descending order of effect):
- Penalty for projects that have used their whole allocation
- Number of nodes requested
- Length of time job has been waiting in queue
- Per-project fairshare (currently a penalty for projects that have used more than 10% of the available cycles in the last week)
- Only a user's five highest-priority queued jobs (and 10 per project) are considered for scheduling at any given time.
- FIRSTFIT backfill is enabled; the way this works is that first the scheduler starts the highest priority job(s) until it finds one that cannot start immediately, sets a reservation for that highest priority job, and then runs any remaining jobs that would not cause the start time of the highest priority job to slip further into the future
- Maximum number of jobs per user or project
- Changing either the fairshare targets or the relative effect of fairshare on priority
- Changing the threshold for when a job is considered capability or the (human) policy for how/when capability jobs are run
For interactive jobs, PBS options are passed through
qsub on the command line.
login1$ qsub -I -A XXXYYY -l walltime=01:10:00,nodes=4:ppn=16:gpus=3:shared
-I: Start an interactive session
-A: Charge to the "XXXYYY" project
Putting it together:
will request 4 compute nodes, using 16 processors and 3 gpu accelerators under shared mode on each node for one hour.
After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the standard input and standard output of this terminal will be linked directly to the head node of our allocated resource. Issuing the exit command (or Control-d) will end the interactive job. From here commands may be executed directly instead of through a batch script.
KIDS uses Torque (an open source PBS derivative) as its batch queue software, with the Moab scheduler, similar to other systems at NICS. Here's an example batch queue script (see the notes afterward for some explanation). This assumes that you have set up the modules in your .bash_profile as described in the Modules section of this document.
#!/bin/sh #PBS -N my-job #PBS -j oe #PBS -A UT-TENN0037 ### Unused PBS options ### ## If left commented, must be specified when the job is submitted: ## 'qsub -l walltime=hh:mm:ss,nodes=12:ppn=4:gpus=3:shared' ## ##PBS -l walltime=00:30:00 ##PBS -l nodes=12:ppn=4:gpus=3:shared ### End of PBS options ### date cd $PBS_O_WORKDIR echo "nodefile=" cat $PBS_NODEFILE echo "=end nodefile" # run the program which mpirun mpirun /bin/hostname date # eof
With the PBS options in the example batch script above, the output of job with id will go into a single file named "
my-job.o#####" after the run completes. The "
-N" option specifies the name (my-job), and the "
-j" option combines stderr and stdout, otherwise there would be a "
my-job.e#####" file as well.
- The scheduler is set up to give exclusive access to nodes, so there should be no need to add a flag (like "-l naccesspolicy=singletask") to ensure each job gets its node to itself.
- A "
-S" option to PBS is required if you want to use a shell other than bash. Adding something like #!/bin/tcsh in the first line is not enough to choose a different shell. If you write batch scripts for another shell than bash, you must be sure that the module setup has been done as described in Modules.
- If you have your environment set up correctly, and are using the OpenMPI from
/sw/keeneland/openmpi/1.5.1-intel(check the output of the "
which mpirun" command from running this script, you should not need to pass either "
-np 2" or "
-hostfile $PBS_NODEFILE" to the
mpiruncommand. If your
mpiruncommands don't work, it may be that your environment is trying to use the wrong
mpirunthat was not built with Torque integration
- The scheduler is set up to give exclusive access to nodes, so there should be no need to add a flag (like "-l naccesspolicy=singletask") to ensure each job gets a node to itself.
-Sparameter to PBS is required if you want to use a shell other than bash. Adding something like #!/bin/ksh in the first line is not enough to choose a different shell.
- If you write batch scripts for a shell other than bash, you must be sure that the module setup has been done as described in Modules.
- If you are sharing your script with anyone else you must be sure that everyone who uses your script has done this setup. Since this is a burden and error prone, you might want to do the module setup explicitly in the batch script if you are using a non-bash shell for your batch scripts.
- The account number is required. The account number is the same number as the project(s) to which your NICS account is tied to.
- OpenMPI is integrated with TORQUE such that it will default to use all the resources in your PBS request (e.g. a
nodes=2:ppn=3directive, will run 6 MPI ranks). One does not have to pass either the "
-np" or "
-hostfile $PBS_NODEFILE" options to the mpirun command.
Jobs are submitted using the
login1$ qsub myscript.pbs
To check the status of one's queued jobs, the
qstat command is available.
login1$ qstat -u username
To see all running jobs on Keeneland, you can pass the -r flag (qstat -r). An important column in these command's output is the job state column, marked by "S". The job state (a.k.a. status) can be H (Held), Q (Queued), R (Running), W (Waiting), and C (recently Completed).
To delete a job from the queue.
login1$ qdel jobid
To hold a job to prevent it from being run. For instance, if you submitted, and realize that your input file is corrupted, you can hold the job until you get a chance to change it.
login1$ qhold jobid
To release a held job so that it can run.
login1$ qrls jobid
To change PBS request for a queued or held job. Options take the same format as qsub and overwrite previous options.
login1$ qalter jobid
This command gives a different view of jobs in the queue. The utility will show jobs in the following states:
- Active : These jobs are currently running.
- Eligible : These jobs are currently queued awaiting resources. A user is allowed five jobs in the eligible state.
- Blocked: These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state would be jobs on hold, or the owning user currently has five jobs in the eligible state.
To view details of a job in the queue:
This can be used For example, if job 736 is currently in a blocked state, the following can be used to view the reason:
login1$ checkjob 736
The return may contain a line similar to the following:
BlockMsg: job 736 violates idle HARD MAXJOB limit of 2 for user (Req: 1 InUse: 2)
This line indicates the job is in the blocked state because the owning user has reached the limit of five jobs currently in the eligible state.
To get further information:
login1$ showstart 100315
The return may contain a line similar to the following:
job 100315 requires 16384 procs for 00:40:00 Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12
Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12.
The start time may change dramatically as new jobs with higher priority are submitted. It is a very rough estimate based on the current job mix.
To see currently free resources:
This can help you create a job that can be backfilled immediately. As such, it is primarily useful for short jobs.
Benchmarking and profiling one's calculation is important on any resource. TAU is a full-featured profiler, mpiP is a light weight profiling library, low level interfaces to hardware counters are also available via CUPTI and PAPI. CUPTI is installed with CUDA. TAU, mpiP, and PAPI are available via modules.
All NICS policies concerning user responsibilities, directory spaces, grid services, accounting and allocation status, job scheduling, and file system purges can be found at: http://www.nics.tennessee.edu/policies.
Last update: January 5, 2015