XSEDE ALLOCATION REQUESTS Open Submission, Guidelines, Resource and Policy Changes

Posted by Ken Hackworth on 10/06/2020 00:59 UTC

XSEDE ALLOCATION REQUESTS Open Submission available until October 15, 2020 for awards starting January 1, 2021

XSEDE is now accepting Research Allocation Requests for the allocation period, January 1, 2021 to December 31, 2021. The submission period is from September 15, 2020 thru October 15, 2020. The XRAC panel will convene December 7, 2020 with notifications being sent by December 15, 2020. Please review the new XSEDE systems and important policy changes (see below) before you submit your allocation request through the XSEDE User Portal
**More information on submitting a XSEDE Research request can be found at:( https://portal.xsede.org/allocations/research). With links to details of the required documents, examples of well-written Research Requests, review criteria and guidelines, along with many other useful information on submitting a successful Research request.

New Resources Available: PSC’s Bridges-2 and SDSC’s Expanse

PSC’s Bridges-2 platform will address the needs of rapidly evolving research by combining high-performance computing (HPC), high-performance artificial intelligence (HPAI), and high-performance data analytics (HPDA) with a user environment that prioritizes researcher productivity and ease of use.

  • Hardware highlights of Bridges-2 include HPC nodes with 128 cores and 256 to 512GB of RAM, scalable AI with 8 NVIDIA Tesla V100-32GB SXM2 GPUs per accelerated node and dual-rail HDR-200 InfiniBand between GPU nodes, a high-bandwidth, tiered data management system to support data-driven discovery and community data, and dedicated database and web servers to support persistent databases and domain-specific portals (science gateways).
  • User environment highlights include interactive access to all node types for development and data analytics; Anaconda support and optimized containers for TensorFlow, PyTorch, and other popular frameworks; and support for high-productivity languages such as Jupyter notebooks, Python, R, and MATLAB including browser-based (OnDemand) use of Jupyter, Python, and RStudio. A large collection of applications, libraries, and tools will make it often unnecessary for users to install software, and when users would like to install other applications, they can do so independently or with PSC assistance. Novices and experts alike can access compute resources ranging from 1 to 64,512 cores, up to 192 V100-32GB GPUs, and up to 4TB of shared memory.
  • Bridges-2 will support community datasets and associated tools, or Big Data as a Service (BDaaS), recognizing that democratizing access to data opens the door to unbiased participation in research. Similarly, Bridges-2 is available to support courses at the graduate, undergraduate, and even high school levels. It is also well-suited to interfacing to other data-intensive projects, instruments, and infrastructure.
  • Bridges-2 will contain three types of nodes: Regular Memory (RM), Extreme Memory (EM), and GPU (Graphics Processing Unit; GPU). These are described in turn below.
  • Bridges-2 Regular Memory (RM) nodes will provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. Each of Bridges-2’s 504 RM nodes will each consist of two AMD 7742 “Rome” CPUs (64 cores, 2.25-3.4 GHz, 3.48 Tf/s peak), 256-512 GB of RAM, 3.84 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. 488 Bridges-2 RM nodes have 256 GB RAM, and 16 have 512 GB RAM for more memory-intensive applications. Bridges-2 will be HPE Apollo 2000 Gen11 servers.
  • Bridges-2 Extreme Memory (EM) nodes will provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Each of Bridges-2’s 4 EM nodes will consist of four Intel Xeon Platinum 8260M CPUs, 4 TB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. Bridges-2 will be HPE ProLiant DL385 Gen10+ servers.
  • Bridges-2 GPU (GPU) nodes will be optimized for scalable artificial intelligence (AI). Each of Bridges-2’s 24 GPU nodes will contain 8 NVIDIA Tesla V100-32GB SXM2 GPUs, providing 40,960 CUDA cores and 5,120 tensor cores. In addition, each GPU node will contain two Intel Xeon Gold 6248 CPUs, 512 GB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and two HDR-200 adaptors. Their 400 Gbps connection will enhance scalability of deep learning training across up to 192 GPUs. The GPU nodes can also be used for other applications that make effective use of the V100 GPUs’ tensor cores. Bridges-2 GPU nodes will be HPE Apollo 6500 Gen10 servers.
  • The Bridges-2 Ocean data management system will provide a unified, high-performance filesystem for active project data, archive, and resilience. Ocean will consist of two tiers – disk and tape – transparently managed by HPE DMF (Data Management Framework) as a single, highly usable namespace, and a third all-flash tier will accelerate AI and genomics. Ocean’s disk subsystem, for active project data, is a high-performance, internally resilient Lustre parallel filesystem with 15 PB of usable capacity, configured to deliver up to 129 GB/s and 142 GB/s of read and write bandwidth, respectively. Its flash tier will provide 9M IOps and an additional 100 GB/s. The disk and flash tiers will be implemented as HPE ClusterStor E1000 systems. Ocean’s tape subsystem, for archive and additional resilience, is a high-performance tape library with 7.2 PB of uncompressed capacity (estimated 8.6 PB compressed, with compression done transparently in hardware with no performance overhead), configured to deliver 50TB/hour. The tape subsystem will an HPE StoreEver MSL6480 tape library, using LTO-8 Type M cartridges. (The tape library is modular and can be expanded, if necessary, for specific projects.)
  • Bridges-2 , including both its compute nodes and its Ocean data management system, is internally interconnected by HDR-200 InfiniBand in a fat tree Clos topology. Bridges-2 RM and EM nodes each have one HDR-200 link (200 Gbps), and Bridges-2 GPU nodes each have two HDR-200 links (400 Gbps) to support acceleration of deep learning training across multiple GPU nodes.
  • Bridges-2 will be federated with Neocortex, an innovative system also at PSC that will provide revolutionary deep learning capability that accelerates training orders of magnitude. This will complement the GPU-enabled scalable AI available on Bridges-2 and provide transformative AI capability for data analysis and to augment simulation and modeling.

More information about the Bridges-2 resource can be found at: (https://www.psc.edu/bridges-2)

SDSC is pleased to announce it’s newest supercomputer Expanse. Expanse will be a Dell integrated cluster, composed of compute nodes with AMD Rome processors, GPU nodes with NVIDIA V100 GPUs (with NVLINK), interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. Expanse supercomputer will provide three new resources for allocation. Limits noted below are subject to change, so consult the Expanse website for the most up-to-date information. (https://expanse.sdsc.edu)

  • (1) Expanse Compute: The compute portion of Expanse features AMD Rome processors, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. There are 728 compute nodes, each with two 64-core AMD EPYC 7742 (Rome) processors for a total of 93,184 cores in the full system. Each compute node features 1TB of NVMe storage, 256GB of DRAM per node, and PCIe Gen4 interfaces. Full bisection bandwidth will be available at the rack level (56 nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and are configured for a 3:1 over-subscription between racks. In addition, Expanse has four 2 TB large memory nodes. There are two allocation request limits for the Expanse Compute resource:, 1) there is a maximum request(SU) limit of 15M SUs except for Science Gateway requests, which may request larger amounts (up to 30M SUs); and 2) a limit on the maximum size of a job set at 4,096 cores, with higher core counts possible by special request.
  • (2) Expanse GPU: The GPU component of Expanse has 52 GPU nodes each containing four NVIDIA V100s (32 GB SMX2), connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. Each GPU node has 1.6TB of NVMe storage and 256GB of DRAM per node, and HDR100 connectivity.
  • (3) Expanse Projects Storage: Lustre-based allocated storage will be available as part of an allocation request. The filesystem will be available on both the Expanse Compute and GPU resources. Storage resources, as with compute resources, must be requested and justified, both in the XRAS application and the proposal’s main document.
  • Expanse will feature two new innovations: 1) scheduler-based integration with public cloud resources; and 2) composable systems, which supports workflows that combine Expanse with external resources such as edge devices, data sources, and high-performance networks.
  • Since the Expanse AMD Rome CPUs are currently not available for benchmarking, PIs are requested to use Comet (or any comparable system) performance/scaling information in their benchmarking and scaling section. For the Expanse GPU nodes, PIs can use performance info on V100 GPUs (if available) or use 1.3X speed up over Comet P100 GPU (or comparable GPU) performance as a conservative estimate. The time requested must be in V100 GPU hours.

*PIs requesting allocations should consult the Expanse website (https://expanse.sdsc.edu) for additional details and the most current information.

Estimated Available Service Units/GB for upcoming meeting:
Indiana University/TACC (Jetstream) 16,000,000
Open Science Grid (OSG) 2,000,000
PSC Bridges-2 Regular Memory (Bridges-2 RM) 127,000,000
PSC Bridges-2 Extreme Memory (Bridges-2 EM) 757,000
PSC Bridges-2 GPU (Bridges-2 GPU) 189,000
PSC Bridges-2 AI (Bridges GPU-AI) 363,000
PSC Bridges-2 Storage (Bridges-2 Ocean) TBD
SDSC Dell Cluster with Intel Haswell Processors (Comet) 20,000,000
SDSC Dell Cluster with Intel Haswell Processors (Comet GPU) 100,000
SDSC Medium-term disk storage (Data Oasis) 300,000
SDSC Dell Cluster with AMD Rome HDR IB (Expanse) 175,000,000
SDSC Dell Cluster with NVIDIA V100 GPUs NVLINK and HDR IB (Expanse GPU) 300,000
SDSC Expanse Projects Storage TBD
TACC Dell/Intel Knight’s Landing System (Stampede2) 10,000,000 node hours
TACC Long-term tape Archival Storage (Ranch) 2,000,000

  • Publications that have resulted from the use of XSEDE resources should be entered into your XSEDE portal profile which you will be able to attach to your Research submission. Please cite XSEDE in all publications that utilized XSEDE resources. See https://www.xsede.org/for-users/acknowledgement
  • After the Panel Discussion of the XRAC meeting, the total Recommended Allocation is determined and compared to the total Available Allocation across all resources. Transfer of allocations may be made for projects that are more suitable for execution on other resources; transfers may also be made for projects that can take advantage of other resources, hence balancing the load. When the total Recommended considerably exceeds Available Allocations a reconciliation process adjusts all Recommended Allocations to remove over-subscription. This adjustment process reduces large allocations more than small ones and gives preference to NSF-funded projects or project portions. Under the direction of NSF, additional adjustments may be made to achieve a balanced portfolio of awards to diverse communities, geographic areas, and scientific domains.

If you would like to discuss your plans for submitting a research request please send email to the XSEDE Help Desk at help@xsede.org. Your questions will be forwarded to the appropriate XSEDE Staff for their assistance.

Ken Hackworth
XSEDE Resource Allocations Manager