Last update: June 21, 2019
System Overview
SDSC Data Oasis is an on-line, high-performance, Lustre-based storage resource with a 5 PB capacity that is available to all users of SDSC Comet. It was designed to meet the needs of data-intensive research by providing easy-to-use, high capacity, short- to medium-term storage with useable bandwidth on the order of 100 GB/s and latencies that are far lower than near-line and tape-based storage systems. However, it is not an archival system and stored data is single-copy and not backed up.
Data Oasis is divided into several file systems, including local scratch spaces for Comet and a shared, persistent 2.5 PB Project space that is available to users with an allocation. All projects on Comet receive a default allocation of 500 GB.
System Access
Allocations
The default allocation is 500 GB of Project Storage to be shared among all users of a project. Projects that require more than 500 GB of Project Storage must request additional space by sending an email to help@xsede.org. This e-mail from the project PI should provide a 500 words-or-less justification that includes:
- Amount of storage being requested
- How the storage will be used
- How this augments other non-SDSC storage resources available to the project
The projects storage requests will be reviewed by SDSC staff, and a decision will be made within 5 business days.
Methods of Access
The Data Oasis Project Storage space is mounted on Comet and can be accessed as a filesystem on all login and compute nodes. Each user's personal space can be found in
/oasis/projects/nsf/allocationname/username
where allocationname
is the project's six-character allocation name (found by running the show_accounts
command) and username
is the user's local login name.
Since Data Oasis is mounted as a standard filesystem, UNIX file transfer utilities such as scp
, sftp
, and rsync
can be used for transfers of modest size or scale. To enable the efficient transfer of larger amounts of data, Data Oasis is also mounted on the SDSC data mover servers:
oasis-dm.sdsc.edu
(for users with accounts on Comet)
These data movers can be used in conjunction with Globus Online and GridFTP, which are discussed in the Data Transfer Methods section below and the XSEDE Data Transfers & Management page.
Checking your Quota
You can review your group's project storage utilization with the following command:
$ lfs quota -g/oasis/projects/nsf
Data Transfer on Data Oasis
Data Transfer Methods
While using the standard UNIX file transfer tools (scp
, sftp
, rsync
) is acceptable for simple and small file transfers (< 1 GB) to and from Data Oasis, they cannot realize the maximum performance of the Data Oasis storage resource because of their limited internal buffers and inability to stripe transfers across multiple data mover servers. The preferred method for transferring big data (both large file sizes and large numbers of files) is using GridFTP (a part of the Globus Toolkit). Keep in mind that attempting to transfer large numbers of small files will result in poor performance. Whenever possible, create archives of directories with large file counts before initiating the data transfers.
The XSEDE Data Transfers & Management page provides a detailed explanation of how to use GridFTP and its associated GUI- and terminal-based tools in XSEDE. To facilitate GridFTP with SDSC Data Oasis, the following data mover has Data Oasis mounted under /oasis/projects/nsf
:
- Comet:
gsiftp://oasis-dm.sdsc.edu:2811/
(XUP File Manager/globus-url-copy) or xsede#comet (Globus)
The data movers are load-balanced in a round-robin fashion, but advanced users may wish to access the individual data movers explicitly via oasis-dm1, oasis-dm2, oasis-dm3 and oasis-dm4.
Examples
globus-url-copy provides the greatest flexibility for optimizing transfers between XSEDE resources. To transfer a file from another XSEDE resource (e.g., TACC Stampede) to SDSC Comet,
$ module load globus $ myproxy-login -l xsedeusername
This will load the commands to use GridFTP and generate the GSI credential needed to access xsedeusername's accounts across XSEDE Resources. Then,
$ globus-url-copy -vb -stripe -tcp-bs 8m -p 4 \ gsiftp://data1.stampede.tacc.utexas.edu:2811///home1/02255/username/somefile.bin \ gsiftp://oasis-dm.sdsc.xsede.org:2811///oasis/projects/nsf/allocation/username/somefile.bin
where
- "
-vb
" enables verbosity (report transfer rate, among other things) - "
-stripe
" enables striped transfers - "
-tcp-bs 8m
" specifies a 8 megabyte TCP buffer. The optimal value for this will vary; Globus provides a way to estimate the optimal tcp-bs value in its documentation - "
-p 4
" indicates that four parallel data connections should be used
By comparison, the equivalent transfer using scp
would be:
$ scp login1.stampede.tacc.utexas.edu:/home1/02255/username/somefile.bin \ /oasis/projects/nsf/allocation/username/
In the case of a 341 MB file transfer test case, GridFTP achieved an average 171 MB/s while scp
achieved only 34.1 MB/s. When transferring terabytes of data, GridFTP is clearly preferable.
Caveats to Users
This resource is based on a Lustre filesystem which has some limitations. A comprehensive list of Lustre best-practices is beyond the scope of this guide, but it is important to minimize unnecessary access of file metadata. For example,
- avoid performing many small file operations: opens/closes, random reads/writes
- avoid putting too many (e.g., more than several hundred) files in one directory
- avoid using "
ls -l
" unnecessarily, and consider using "ls --color=no -U
" when navigating Data Oasis - limit unnecessary use of wildcards on the command line
- avoid using the "
find
" and "du
" commands. Use "lfs find
" and "lfs du
" instead
The "lfs
" command can be enabled by running "module load lustre
" on Comet.
Troubleshooting / Common Errors
- Problem: Any attempts to access files on Data Oasis just hang OR access is extremely sluggish/unresponsive
Solution: This can occur on both login nodes and compute nodes and typically results from Data Oasis being overloaded. These conditions typically "un-hang" within a few minutes; if they persist for longer, contact help@xsede.org with the system (or specific compute nodes) on which this is occurring. - Problem:
/oasis/projects/nsf
exists but is empty
Solution: This problem is infrequent and should be reported to the XSEDE helpdesk with the system (or specific compute nodes) on which this is occurring.
Policies
SDSC Data Oasis Projects Storage is provided on a per-project basis and is available for the duration of the associated compute allocation period. Data will be retained for three months beyond the end of the project, by which time the data must be migrated elsewhere.
Data Oasis Projects Storage is not subject to automatic purges, but be aware that the data stored there is single-copy and not backed up! Users are responsible for ensuring that critical data are duplicated elsewhere. Data accidentally deleted from Data Oasis cannot be recovered.