Ranch User Guide
Last update: December 10, 2020

Notices

12/01/20 While the deadline for data retrieval from the old Ranch archive has passed, the data on the old system is not deleted. Users no longer have access to the data from old Ranch, however TACC staff will continue to migrate data for users who have requested it on a best effort basis.

if you would like to be placed on the list for continued migrations then submit a support ticket. These migrations will be handled one at a time by an admin in the order the request was received. TACC staff will contact you once your data has been migrated. Due to the number and scale of additional requests we do not currently have a timeline for when these transfers will occur but we hope to have all requests addressed over the course of 2021.

Introduction

TACC's High Performance Computing (HPC) systems are used primarily for scientific computing and although their disk systems are large, they are not large enough to keep up with the long-term data generated on these systems. The Ranch system fills this need for high capacity long-term storage, by providing a massive, high-performance file system for archival purposes.

Ranch (ranch.tacc.utexas.edu), is a Quantum StorNext-based system, with a DDN provided front-end disk system (30PB raw), and a 5000 slot Quantum Scalar i6000 tape library.

Ranch is an allocated resource, meaning that Ranch is available only to users with an allocation on one of TACC's computational resources resources such as Frontera, Stampede2, or Maverick2. XSEDE PIs will be prompted automatically for the companion storage allocation amount as part of the XRAS submission request, UT and UT system PIs should also make a request and justify the amount of storage requested when applying for a Ranch allocation. The default allocation on Ranch for XSEDE, UT, and UT affiliate users, is 2TB. To request additional Ranch storage for your allocation, please submit a TACC user portal ticket.

Intended Use

Ranch is fundamentally based upon long-term tape storage and as such is designed for archiving data that is in a state wherein it will likely not change, and will likely not need to be accessed very often. Obviously, Ranch is to be used only for work-related data. Ranch is not meant for active data, nor is it intended to be a replication solution for your "/scratch" directory. Ranch is also not suitable for system backups, due to the large number of small files this inevitably generates and the nature of a tape-based archive. Also, at this time Ranch provides only a single instance of any data within it. Erroneously edit that data, or delete that data, and it is unrecoverable from within Ranch.

Ranch is an archival system. Ranch user data is not backed up or replicated. This means that Ranch contains only a single, active, instance of user data. While lost data due to tape damage or other system failure is rare, please keep this possibility in mind when formulating your data management plans. If you have irreplaceable data and would like a different level of service, please let us know via the ticketing system, and we can help you with a solution.

System Configuration

Ranch's primary storage system is a DDN SFA14K DCR (Declustered RAID) based system which is managed by Quantum's Stornext filesystem. The raw capacity is around 30PB, with about 17PB usable space for user data. Metadata is stored on a Quantum SSD based appliance. The backend tape library, which is where files migrate after they have been untouched on disk for a period of time (this will be tuned, but it is currently a few weeks), is a Quantum Scalar i6000, with LTO-8 tapes, each with an uncompressed capacity of 12.5 TB. Compressed capacity of an LTO-8 tape is around 30PB, but that assumes highly compressible data.

Formerly, the Ranch system was based on Oracle's HSM system, with two SL8500 libraries, each with 20,000 tape slots. This system will remain as a backend system while we transition data from the old libraries to the new one.

System Access

Direct login via Secure Shell's ssh command to Ranch is allowed so you can create directories and manage files. The Ranch archive system cannot be mounted on a remote system.

stampede2$ ssh taccusername@ranch.tacc.utexas.edu

Ranch Environment Variables

The preferred way of accessing Ranch, especially from scripts, is by using the TACC-defined environment variables $ARCHIVER and $ARCHIVE. These variables, defined on all TACC resources, define the hostname of the current TACC archival system, $ARCHIVER, and each account's personal archival space, $ARCHIVE. These environment variables help ensure that scripts will continue to work, even if the underlying system configuration changes in the future.

If you are trying to access data that is on the old part of Ranch, and you haven't yet transitioned that data to the new Quantum Stornext based portion of Ranch, you can add the old_HSM directory into the paths defined in your scripts, and still be able to read from Ranch that way. Since the filesystem is mounted as read only, you won't be able to send data into the old_HSM directory structure.

Accessing Files from Within Running Programs

Ranch access is not allowed from within running jobs on other TACC resources. Data must be first transferred from Ranch to your compute resource in order to be available to running jobs.

Organizing Your Data

After over a decade of operation and servicing more than 49,000 user accounts, what has been revealed after running Ranch for so long, with so many users, is that limiting total file count, as well as enforcing explicit data retention periods, will be the keys to continued sustainable Ranch operation over the long-term.

When organizing your data keep in mind that reducing file count is at least as important as reducing file space. Ranch performs best with large files and performance will suffer severely if Ranch is kept busy archiving lots of small files compared to large files. For this reason, users must bundle up their small-file-filled directories into single large files. The best way to bundle files is using the UNIX "tar" command into single large files called "tarballs". We include several examples below, also consult the tar man page for detailed information on this command.

From experience of past performance (predominantly the total retrieval time for a given set until completion), we recommend average file size of 300GB - 1TB. Smaller files slow down the retrieval rates drastically when multiple files were recalled from tapes. e.g. retrieval time of 100TB data collection in 100GB average size will be order of magnitude faster than those in average 1GB or less size. The new environment is designed to meet the demand of ~100TB data sets to be available in a few days or less instead of weeks, which is possible only when the average size is big enough.

Ranch Quotas

File Count Quota: Users are limited to 50,000 files in their $HOME directories.
File Space Quota: Users are limited to 2 Terabytes (TB) of disk space in their $HOME directories.

You can display your current Ranch file space usage by executing either of the following UNIX commands:

ranch$ ls -lh
or
ranch$ du -sh

Keep in mind the above commands only display file space used, not a total file count. It is the user's responsibility to keep the file count below the 50,000 quota by using the UNIX "tar" command or some other methodology to bundle files. Both the file space and file count quotas apply to all data copied from the Oracle archive and all new incoming data.

Monitor your Disk Usage and File Counts

Users can check their current and historical Ranch usage by looking at the contents of the "HSM_usage" file in their Ranch home directory. Note that this file reflects DISK usage versus disk quota, for both total file size as well as total file count.

ranch$ tail ~/HSM_usage

This file is updated nightly as a convenience to the user. The data fields within this file show the files and storage in use both on-line and in the Ranch tape archives, as well as the quotas for each currently in effect. Each entry also shows the date and time of its update. Do not delete or edit this file.

Ranch Project Spaces

Ranch introduces new Project Spaces, a special directory structure designed to support both shared and oversized data directories for users or projects whose storage needs exceed the standard 2TB quota. Submit a support ticket to request a customized project space on Ranch.

Transferring Data

To maximize the efficiency of data archiving and retrieval for users, data should be transferred using large files. Small files don't do well on tapes, so they should be combined with others in a "tar" file wherever possible. The term "tar" is derived from (t)ape (ar)chive. Files that are very large (5 TB+), can also be a problem, since their contents can be split across multiple tapes, thereby increasing the chances that there will be problems retrieving the data. Use the UNIX split utility on very large files (1 TB+), and tar up small files into chunks between 10 GB and 300 GB in size. This will allow the archiver to work optimally.

Retrieving Files from Ranch

Since Ranch is an archive system, any files which have not been accessed recently will be stored on tape. To access files stored offline, they must be ‘staged' from tape, which is done automatically with tools like rsync and scp. We ask that you use the Unix tar command or another utility to bundle large numbers of small files together, before transferring to Ranch, for more efficient storage and retrieval on Ranch.

Ranch performs best on large files (10GB to 250GB). If you need a single file from a large tarball, it can easily be extracted without extracting the whole tarball. Due to the nature of the tapes that Ranch uses, it is quicker to read a single large file than it is to read multiple small files.

Large numbers of small files are hard for our tape drives to read back from tape, since the drives need to start and stop for every file. So instead of reading steadily at 252MB/sec, a drive reading many tiny files at a crawl may take a week to stage them back to disk, which occupies the drive, and prevents other users from accessing their data.

Limit your scp processes to no more than four at a time.

Data Transfer Methods

TACC supports two transfer mechanisms: scp (recommended) and rsync (avoid if possible).

scp

The simplest way to transfer files to and from Ranch is to use the Secure Shell "scp" command:

stampede2$ scp myfile ${ARCHIVER}:${ARCHIVE}/myfilepath

where "myfile" is the name of the file to copy and "myfilepath" is the path to the archive on Ranch. For large numbers of files, we strongly recommend you employ the Unix "tar" command to create an archive of one or more directories before transferring the data to Ranch, or as part of the transfer process.

To use ssh to create a "tar" archive file from a directory, you can use the following alternatives to copy files to Ranch

stampede2$ tar cvf - dirname | ssh ${ARCHIVER} "cat > ${ARCHIVE}/mytarfile.tar"

where "dirname" is the path to the directory you want to archive, and "mytarfile.tar" is the name of the archive to be created on Ranch.

Note that when transferring to Ranch, the destination directory/ies must already exist. If not, scp will respond with:

No such file or directory

The following command-line examples demonstrate how to transfer files to and from Ranch using scp.

  • copy "mynewfile" from Stampede2 to Ranch:

    stampede2$ scp mynewfile ${ARCHIVER}:${ARCHIVE}/mynewfilename
  • copy "myoldfile" from Ranch to my computer

    stampede2$ scp ${ARCHIVER}:${ARCHIVE}/myoldfile .

rsync

The UNIX rsync command is another way to keep archives up-to-date. Rather than transferring entire files, rsync transfers only the actual changed parts of a file. This method has the advantage over the scp command in that it can recover if there is an error in the transfer. Enter "rsync -h" for detailed help on this command.

A huge downside to rsync however, is that it will stage data before it can start the sync, so this can lead to a lot of unnecessary staging calls, and really waste resources. In general, it is a bad idea to rsync a whole directory, and it is horrible for archiving data with a tape based archive system like ours.

On the new Quantum StorNext filesystem, data will stay on the front end disk for significantly longer than it did with the previous system, due to a much larger front end disk system, which means that data that has recently been sent to Ranch can safely be rsync'ed. If the data has been on the system for a significant time (around a month, but we will tune that variable over time), it may have migrated to tape, and will still cause the same problems as it did on the old Oracle HSM system.

Large Data Transfers

If you are moving a very large amount of data to Ranch and you encounter a quota limit error, then you are bumping into the limit of data you can have on Ranch's cache system. There are limits on cache inode usage, and disk block usage, but these limits should only affect a few very heavy users and do not affect a user's total allocation on the Ranch archival system. If you encounter a quota error, please submit a ticket to the TACC user portal, and we will work with you to make sure your data is transferred as efficiently as possible. The limits are merely to prevent the system getting unexpectedly overloaded, and thus maintaining good service for all users.

Use the "du -h" command to see how much data you have on the disk.

Archive a large directory with tar, and move it to Ranch, while splitting it into smaller parts. e.g.:

stampede2$ tar -cvf - /directory/ | ssh ranch.tacc.utexas.edu \
    'cd /your_ranch_path/ && split -b 1024m - files.tar.'

Alternatively, you can split large output files, or tar files, on the Stampede2 side, then move them to Ranch.

Large files, more than a few TB in size, should be split into chunks, preferably between 10GB and 500GB in size. Use the split command on Stampede2 to accomplish this:

stampede2$ split -b 300G myverybigfile.tar my_file_part_

The above example will create several 300GB files, with the filenames: my_file_part_aa, my_file_part_ab, my_file_part_ac, etc.

The split parts of a file can be joined together again with the "cat" command.

stampede2$ cat my_file_part_?? > myverybigfile.tar

See "man split" for more options.

Large collections of small files must be bundled into a tar archive, called "tarballs" before being sent to Ranch, or even better, create the tar file while on route to Ranch (that way there is no temporary tar file on the source filesystem).

The following example will create an archive of the my_small_files_directory in the current working directory:

stampede2$ tar -cvf my_data.tar my_small_files_directory/

Citizenship on Ranch

  • Limit rsync and scp processes to no more than two processes.
  • Follow the procedures for archiving data
  • Store only data that was processed, or generated, on TACC's systems
  • Delete all unneeded data under your account
  • No workstation or other system backups

Help Desk

TACC Consulting operates from 8am to 5pm CST, Monday through Friday, except for holidays. You can submit a help desk ticket at any time via the TACC User Portal with "Ranch" in the Resource field. Help the consulting staff help you by following these best practices when submitting tickets.

  • Do your homework before submitting a help desk ticket. What does the user guide and other documentation say? Search the internet for key phrases in your error logs; that's probably what the consultants answering your ticket are going to do. What have you changed since the last time your job succeeded?

  • Describe your issue as precisely and completely as you can: what you did, what happened, verbatim error messages, other meaningful output. When appropriate, include the information a consultant would need to find your artifacts and understand your workflow: e.g. the directory containing your build and/or job script; the modules you were using; relevant job numbers; and recent changes in your workflow that could affect or explain the behavior you're observing.

  • Subscribe to Ranch User News. This is the best way to keep abreast of maintenance schedules, system outages, and other general interest items.

  • Have realistic expectations. Consultants can address system issues and answer questions about Ranch. But they can't teach parallel programming in a ticket, and may know nothing about the package you downloaded. They may offer general advice that will help you build, debug, optimize, or modify your code, but you shouldn't expect them to do these things for you.

  • Be patient. It may take a business day for a consultant to get back to you, especially if your issue is complex. It might take an exchange or two before you and the consultant are on the same page. If the admins disable your account, it's not punitive. When the file system is in danger of crashing, or a login node hangs, they don't have time to notify you before taking action.