Data Transfer Methods
    XUP File Manager
    Globus Online
    Globus Online Command Line Interface
    globus-url-copy & uberftp
    scp & sftp
Data Integrity
What Impacts Data Transfer Performance?

Transferring data includes moving files from local machines to XSEDE, as well as transfers between XSEDE resources. This section gives a high level overview on the recommended XSEDE data transfer methods.

Data Transfer Methods

Depending on your data transfer requirements there are a variety of methods for transferring files across XSEDE. You may choose between the XUP File Manager, Globus Online, the Globus Online Command Line Interface, globus-url-copy and uberftp and scp and sftp. The pros and cons of each method are summarized in Table 1.1 below.

Table 1.1 Recommended transfer methods per usage mode with pros and cons
Usage Mode Transfer Method Pros Cons
Graphical User Interface XUP File Manager easy to use, single sign on via user portal, desktop download available geared for basic/beginner usage
Globus Online easy to use web interface, can use XUP login (SSO), desktop download available geared for basic/beginner usage
Command Line Interface Globus Online Command Line Interface (CLI) managed, reliable and auto tuned transfer;
advanced syntax for scripting;
can use XSEDE single sign on
need to setup SSH key in Globus Online profile;
advanced knowledge required for authentication and scripting capabilities
globus-url-copy & uberftp high performance transfer with tuning options;
command line interface
advanced knowledge required for authentication and performance tuning, and increased reliability
scp & sftp easy command line interface must user local (resource-specific) username and password

XUP File Manager

XSEDE File Manager

Visit the XSEDE File Manager

Use the built-in XSEDE User Portal File Manager to transfer both small and large files. When logged into the portal, go to Resources -> File Manager and wait for the Java Applet to load. You can easily transfer files from your desktop to XSEDE resources and between XSEDE resources via simple drag and drop. You'll see a list of all machines including your local machine and XSEDE $Share: 2GB of space for you to share files with any collaborators.

XUP File Transfer Scenarios

  • Moving data between locations on XSEDE: Easily connect to XSEDE resources via your web browser and drag and drop files to transfer between resources.
  • Moving data to XSEDE from a local machine: Any local data you may need for your computation or transferring results back to your local machine can be easily done with XUP File Transfer. Just connect to the file manager and drag and drop files from your desktop to the XSEDE resources without any installation requirements.
  • Large File Transfers with Striping When staging large file transfers, some additional parameters need to be set. Before clicking on the destination resource, right click on the resource you're going to transfer data from and select Edit.
  1. Click the checkbox next to "Stripe Transfers" and click OK
  2. Repeat for the other panel using the destination resource
  3. Repeat this every time you change Resources
  4. Drag and drop the file from source to destination to transfer.

XUP Desktop Client

For those who prefer the convenience of running on your desktop but still using your XUP single sign on credentials You can download the XUP file Manager executable jar.

Globus Online

Globus Online is a fast, reliable service for high performance file transfer. This hosted service provides a solution to data movement challenges by providing a robust, secure, and highly monitored environment for file transfers that has powerful yet easy-to-use interfaces. Researchers with no IT background can easily move large quantities of files, or move files of large size, using the Web GUI and developers who want to automate workflows can use the command line interface.

Create a free account at www.globusonline.org and you're ready to start moving files. Contact support@globusonline.org for any help needed.

Globus online features include:

  • High performance: Move terabytes of data in thousands of files
  • Automatic fault recovery - Across multiple security domains
  • Designed for researchers - Easy "fire and forget" file transfers
  • No client software installation - New features automatically available - Consolidated support and troubleshooting - Works with existing GridFTP servers
  • Ability to move files to any machine (even your laptop) with ease

Globus Online Transfer Scenarios

  • Moving data between locations on XSEDE: The Globus Online File Transfer service can be used to move data between all XSEDE sites, which are accessible as transfer endpoints in the service. Globus Online simplifies and automates file transfer without the need to install or interact with GridFTP.
  • Moving data to XSEDE from a local machine: This might include observational data from sensors, surveys, etc., that will be analyzed on XSEDE computing resources. Globus Connect is a feature of Globus Online that makes it possible to create a transfer endpoint on any machine (including campus servers and home laptops) with just a few clicks and without the typical difficulties of a GridFTP install.

For more information view the XSEDE Globus Online User Guide.

Globus Online Command Line Interface

Globus Online provides a command line interface that may be accessed using any standard ssh terminal client. Prior to running shell commands, you must upload your public SSH key to your Globus Online account. For more information please see the pertinent section in the Globus Online File Transfer User Guide.

globus-url-copy and uberftp

globus-url-copy and uberftp are command-line implementations of the GridFTP protocol that underlies all XSEDE transfer mechanisms. Use these commands to transfer large files.

Here's a sample transfer from PSC's Blacklight to TACC's Stampede optimized for large files:

login1$ globus-url-copy -stripe -tcp-bs 8388608 \
    gsiftp://gridftp.psc.xsede.org:2811/scratcha/joeuser/file \
    gsiftp://gridftp.stampede.tacc.xsede.org:2811/scratch/joeuser

The following table lists the gridFTP endpoint for each XSEDE system.

GridFTP Endpoints

Table 1.2 GridFTP Endpoints
Resource GridFTP Endpoint
Blacklight (PSC) gsiftp://gridftp.psc.xsede.org:2811/
Kraken (NICS) gsiftp://gridftp.kraken.nics.xsede.org:2811/
Lonestar (TACC) gsiftp://gridftp.lonestar.tacc.xsede.org:2811/
Nautilus (NICS) gsiftp://gridftp.nautilus.nics.xsede.org:2811/
Ranch (TACC) gsiftp://gridftp.ranch.tacc.xsede.org:2811/
Trestles (SDSC) gsiftp://trestles-dm.sdsc.xsede.org:2811/
glade.ncar.xsede.org gsiftp://gridftp.ucar.edu:2811/
gordon.sdsc.teragrid.org gsiftp://oasis-dm.sdsc.xsede.org:2811/
grid1.osg.xsede.org gsiftp://submit-1.osg.xsede.org:2811/
keeneland.gatech.xsede.org gsiftp://gridftp.keeneland.gatech.xsede.org:2811/
mason.iu.xsede.org gsiftp://gridftp.mason.iu.xsede.org:2811/
stampede.tacc.xsede.org gsiftp://gridftp.stampede.tacc.xsede.org:2811/
Resource GridFTP Endpoint Server Type
Blacklight (PSC) gsiftp://gridftp.psc.xsede.org:2811/ Striped
Blacklight (PSC) gsiftp://gridftp.psc.xsede.org:2811/ Non-striped
Kraken (NICS) gsiftp://gridftp.kraken.nics.xsede.org:2811/ Striped
Kraken (NICS) gsiftp://gridftp.kraken.nics.xsede.org:2811/ Non-striped
Lonestar (TACC) gsiftp://gridftp.lonestar.tacc.xsede.org:2811/ Non-striped
Nautilus (NICS) gsiftp://gridftp.nautilus.nics.xsede.org:2811/ Non-striped
Ranch (TACC) gsiftp://gridftp.ranch.tacc.xsede.org:2811/ Non-striped
Trestles (SDSC) gsiftp://trestles-dm.sdsc.xsede.org:2811/ Striped
Trestles (SDSC) gsiftp://trestles-dm.sdsc.xsede.org:2811/ Non-striped
glade.ncar.xsede.org gsiftp://gridftp.ucar.edu:2811/ Striped
glade.ncar.xsede.org gsiftp://gridftp.ucar.edu:2811/ Non-striped
gordon.sdsc.teragrid.org gsiftp://oasis-dm.sdsc.xsede.org:2811/ Striped
gordon.sdsc.teragrid.org gsiftp://oasis-dm.sdsc.xsede.org:2811/ Non-striped
grid1.osg.xsede.org gsiftp://submit-1.osg.xsede.org:2811/ Non-striped
keeneland.gatech.xsede.org gsiftp://gridftp.keeneland.gatech.xsede.org:2811/ Striped
keeneland.gatech.xsede.org gsiftp://gridftp.keeneland.gatech.xsede.org:2811/ Non-striped
mason.iu.xsede.org gsiftp://gridftp.mason.iu.xsede.org:2811/ Non-striped
stampede.tacc.xsede.org gsiftp://gridftp.stampede.tacc.xsede.org:2811/ Non-striped

For advanced users speedpage.psc.edu provides information on transfer speeds you can expect using globus-url-copy with the optimized parameters above.

scp & sftp

You may also use one of these command-line tools to transfer small (< 2 GB) files between XSEDE resources and/or your local machine. From Linux or Mac, you can run these commands directly from the terminal. From Windows, use your ssh client. Both scp and sftp are easy to use and secure, but provide poor performance for large files.

Data Integrity and Validation

Data protection mechanisms are incorporated into much of the infrastructure and software used in XSEDE, and in most cases users are not required to take any special steps to ensure the integrity of their data. However, there are situations in which a user may wish to check that a transferred file has been copied correctly to the new system, or check that a file has not been changed since it was originally created. In these situations, checksums may be used to generate a cryptographic hash of one or more files. Cryptographic hashes have the property that they always produce the same value when operating on the same input date, so they can be saved and then compared against a recomputed hash to verify that a file is exactly the same as it was when the original checksum was generated. Even a single bit change in a multi-terabyte file will produce a different checksum value, so a successful checksum comparison provides a strong guarantee that data has not been altered in any way.

We recommend that users utilize the "sha256sum" command to create and check cryptographic hashes. This command should be available on most UNIX systems, as well as most XSEDE resources. To generate a checksum for a given file, run sha256sum with the name of the file (or files) you wish to check; the command will report the checksum of each file on a separate line, followed by the filename:

login1$ sha256sum filename1 filename2
9db55391e52a4a84944c6c9817ab8d0445547e8934d88d26032cc4747e196039  filename1
a6483e57971627e4e2403c6d3e38b205c70db2221f0b9fe46781e0af76192ef5  filename2

To save the generated checksums for comparison, redirect the output to a file:

login1$ sha256sum filename1 filename2 > checksums.out

You can then use the contents of this file to verify that the files are exactly the same on any given system, or on the same system at a later date, using the -c flag to sha256sum:

login1$ sha256sum -c checksums.out
filename1: OK
filename2: OK

Wildcards can also be used with the sha256sum command. For example, a user could generate checksums for all the files in a directory using the command:

login1$ sha256sum * > checksums.out

After transferring these files to another XSEDE resource, the user could verify that the data was transferred completely and correctly using the saved output file. If any of the files has been corrupted or incompletely transferred, the check will produce output like the following:

login1$ sha256sum -c checksums.out
filename1: FAILED
filename2: OK
sha256sum: WARNING: 1 of 2 computed checksums did NOT match

In this situation, the user should retransfer the files in question or restore from a backup copy of the data.

In order to verify data integrity at a later date, you must have a record of the original checksum values to compare to the present value. Therefore, generate and save checksums when data is first created or before it is transferred into XSEDE, even if you do not immediately intend to perform verification against those checksums.

Data Integrity and Data Transfer mechanisms

Some data transfer mechanisms, including GridFTP, provide options to generate and compare checksums as part of the transfer operation. When using Globus Online to manage GridFTP transfers, include the "--verify-checksum" option in command-line invocations, or select the "Verify Checksum" option in the web interface. Secure copy (scp) provides some protection for data integrity during the transfer due to the encryption of data in transit, but it does not perform end-to-end validation of data integrity, therefore users should perform additional verification if data integrity is important.

What Impacts Data Transfer Performance?

  • Disk speed
  • Connectivity of disk to node
  • Node characteristics & load
  • Connectivity of node to WAN
  • For all networks:
    • Bandwidth
    • Latency
    • Buffer Size
    • Protocol
    • Load
    • Encryption

Many of these factors are intrinsic to the system and not in the control of the user. However, by using the recommendations above, choices made by the user can significantly influence the speed of data transfers, in some cases resulting in 50 times the transfer speed that results from default settings and non-recommended commands. Because of the factors listed just above, even with the best optimization, don't expect 40 Gb/sec; performance is usually limited by end node connectivity, not WAN bandwidth.

Last updated: July 24, 2013