GenesisII GFFS

Genesis II GFFS, the Global Federated File System, allows XSEDE users to securely and transparently share data and other resources using the familiar Unix-like file system paradigm. The assets that are stored in GFFS can be accessed anywhere with a GFFS client, using either a command-line interface (CLI) or a graphical user interface (GUI). Through GFFS, researchers can access data at an NSF center from home or campus, access data on a campus machine from an NSF center, and directly share data with a collaborator at another institution.

User applications running on campus and research group resources can directly access (create, read, update, delete) files and other resources at NSF-funded service provider (SP) sites and collaborator sites as if they were in their local operating system file system. Similarly, applications running at the service providers can directly access files on campus or research group resources as if they were located at the center. Existing applications, whether they are statically linked binaries, dynamically linked binaries, or scripts (shell, PERL, Python), can access resources anywhere in the GFFS without modification (subject to access control).

Q: That's nice but can't I just move files using ssh/sftp/scp or Globus Online?

A: Yes but those are simply file transfer tools. With GFFS, you copied the file into (or from) GFFS. The file is present in GFFS and persists (and is automatically backed up and can be replicated if desired). You can think of GFFS as more like an archival service where you can store things to be retrieved later or shared with colleagues in XSEDE. Unlike a traditional archive system that operates with proprietary software installed in a single site, you're able to access GFFS from anywhere you install the client (MacOS, Windows, Linux).

Q: Well, I like Google Drive / Dropbox / Apple's iCloud

A: But can your XSEDE collaborators access your files securely and with their XSEDE authentication credentials, or are you just sharing links and trusting the vendors to get their security right ? Recall the recent hacks in the news. Also, there are file quotas associated with those services. With GFFS, if you need more storage we can accommodate that. XSEDE Service Providers will also be linking some of their local site storage into the GFFS so that you will have (for example) another path to the sites' parallel scratch file system via GFFS.

Please email help@xsede.org to request access to GFFS and specify if you want replication enabled on your home directory.

GFFS Software

The GFFS consists of two pieces of software, GFFS Client and GFFS Container.

  • The GFFS Client allows XSEDE researchers to authenticate, access and share remote resources, transfer files, start and manage jobs, and create and maintain group permissions via three methods:

    1. Client-UI, the Graphical User Interface (GUI) component of the GFFS client
    2. the Grid Shell, a Command-Line Interface (CLI) component
    3. via FUSE file system driver in Linux that provides access to remote resources as if they were directly attached to the Linux machine.

    The GUI clients for most desktop operating systems can be downloaded here.

  • The GFFS Container executes on servers called Grid Interface Units (GIUs). GFFS containers provide remote clients access to local resources accessible from the GIU. The resources typically include file system resources such as a directory tree rooted in a user's local home directory, storage resources for file storage, and compute resources such as locally managed queuing systems, or the CPU's on the GIU itself. The GFFS containers are configured to work on all XSEDE Service Providers (SPs), and allows users who have installed the GFFS client to access, manipulate, and move data to/from their clients as well as execute and manage jobs.

Using GFFS

GFFS at XSEDE is currently available on the Mason cluster and can be loaded using the standard Modules environment manager commands:

login1$ module load genesis-ii

Use your XSEDE User Portal (XUP) username/password to authenticate with the GFFS GIU servers and the XSEDE grid. Here's a command line example on the Mason cluster:

login1$ grid
grid:\$> xsedeLogin
************  
* Username *  
************  
Hint: You may enter "Cancel" to cancel this selection.  
Please enter username: arnoldg
Password for arnoldg:  
Replacing client tool identity with credentials for "CN=Galen Arnold, O=National Center for Supercomputing Applications, C=US".  
grid:\$> whoami
Client Tool Identity:  
   (CONNECTION) "Galen Arnold" 
Additional Credentials: 
   (USER) "arnoldg" -> (CONNECTION) "Galen Arnold"  
   (GROUP) "gffs-users" -> (CONNECTION) "Galen Arnold"  
grid:\$>

After installing the client, you can start the GUI in one of two ways:

  • From a command line, start the grid client, and the start client-ui

    grid:\$> client-ui
  • Double click the client-ui application in the folder where Genesis GFFS was installed.

Examples

1. Saving and retrieving files to and from the GFFS

This example demonstrates storing a file "gen2documentation" to the GFFS via the GUI, and then retrieving it via the CLI client on IU's Mason cluster.


Figure 1. copy file from local "`Downloads/`" folder to GFFS space

After dragging the downloaded document from the local "Downloads" folder into the GFFS home directory under "/home/xsede.org/<i>XUPusername</i>/" a transfer progress monitor appears in the GFFS GUI. When complete, the file is now in the GFFS XSEDE grid where it is accessible to any GFFS clients the user may be running on other systems. The data can also be replicated once it's in GFFS, so a server could be offline and the user will still be able to retrieve the file.

Now, over on Mason if the user later decides to retrieve the document to a local folder on the system, it's a simple cp command from the GFFS CLI:

login1$ grid
grid:\$> cd gen2documentation  
grid:\$> ls
gen2documentation:  
GenesisII.rtf  
download-url  
gen2-convert-ex  
grid:\$> cp grid:./GenesisII.rtf local:./GenesisII.rtf
grid:\$> quit
login1$ ls -l
total 54720  
-rw-r--r-- 1 tg-arno teragrid 27998675 Dec 19 12:07 GenesisII.rtf  
login1$

2. Process a folder of files

Here we demonstrate how to convert a folder of color JPG files to black and white negatives in PNG format. The steps are as follows:

  1. Copy folder into the GFFS

    Copying files into the GFFS grid is a simple drag-n-drop affair. In the example below a folder was dragged from the desktop into the GFFS Browser targeting the user's home directory there (browse to "/home/xsede.org/" ). To copy out to your local machine, right click items in the browser and select "Copy To Local File System From GFFS".


    Figure 2. Copying a folder into the GFFS GUI

    Once the data is stored in GFFS, the user may access it from anywhere the client is installed. It's similar to cloud storage in that respect. XSEDE GFFS servers are redundant, support full auto-failover via replication and are backed up periodically, so you can get to your replicated data even if one of them is down for maintenance.

  2. Determine resources available to run jobs

    Users can also run jobs on XSEDE Resources that have been defined in GFFS. Navigate to the available systems in the browser ("/resources/xsede.org") and look for queues to go with systems. Any systems with a defined basic execution service (BES) queue are available for running jobs with the GFFS client provided the user has an account on the target system.


    Figure 3. Browsing the resources under xsede.org

  3. Create a job

    Select the jobs menu or right click the resource to "Create job" and the Grid Job Tool pane appears.


    Figure 4. Grid Job Tool initial view, Basic Job Info. tab

    For this task we've chosen to time each execution so the Executable field contains "/usr/bin/time". The "/usr/bin/convert" executable and it's arguments are specified in the Arguments area. The final arguments to convert are inputfile and outputfile. The "-" (Arguments 4 & 5) indicate that the program will read stdin and write stdout.


    Figure 5. Grid Job Tool, Data tab

    Fill in the data tab so that all of the .jpg files from the input directory are processed and returned to our output directory (both of which are in the GFFS namespace). Each of the names in the top portion under Stream Redirection need to match up with a defined Input Stage or Output Stage in the Data Staging area below. Specifying grid for the Transfer Protocol means the files will be in GFFS.

    Note the use of "${photonum}" as an argument, indicating a parameter sweep job. The "Grid Job Variables" defines the behaviour of our variables. For this example "${photonum}" is employed to numerate the input, output, and error streams matching the original camera numbering. Using a grid variable in this way to submit multiple jobs that stage from/to different files based on the same JSDL file is called a "parameter sweep". The "errtime${photonum}.txt" will contain the output from the time command in order to log the processing time in a file in case the information is needed later. The Standard Input and Standard Output streams will be processed by "/usr/bin/convert" because of the trailing "-" arguments we previously placed in the Basic Job Info tab.

  4. Save the job for later re-use.

    After the data tab is filled out you may save the job project so that you can run the same job later or re-load and modify it for further work. When you're satisfied with the job details and no warnings are shown, you "submit job" under the Job Tool's file menu. To see the status of a queue (and your jobs in it), select the queue and choose "queue manager" from the file menu or right click the queue and choose that option.


    Figure 6. Queue Manager view.

    Jobs for a parameter sweep appear as independent jobs in the batch system of the target machine. Once completed, you can find the output in the GFFS namespace.


    Figure 7. GFFS GUI browsing user's home directory under "`/home/xsede.org/xupusername`"

    Double clicking a file will open it locally (you'll be prompted for an application if the GenesisII client does not find anything appropriate).


    Figure 8. Before and after processing

References

Last update: February 27, 2015