Course Calendar

XSEDE Training: Joint HPCToolkit and PerfExpert Tutorials

Host Site:

Texas Advanced Computing Center

Host site URL:

Joint HPCToolkit and PerfExpert Tutorials

May 17, 2012
9 a.m. to 4 p.m. (CT)
J.J. Pickle Research Campus
ROC 1.603
10100 Burnet Rd.
Austin, TX 78758

These tutorials will be webcast.

Webcast participants should expect limited remote support. To participate in the hands-on exercises, all participants (in-person and webcast) MUST have an ACTIVE ACCOUNT and ALLOCATION on either Ranger or Lonestar. Participants are strongly encouraged to use the tools on their OWN CODE. For this to work, the code MUST COMPILE AND RUN successfully on Ranger or Lonestar. All participants (in-person and webcast) will also have access to a simple demonstration code to try the tools with. Participants intending to use the demonstration code will still require an active account and allocation on either Ranger or Lonestar.

1st tutorial: (9 a.m. – noon CT)

Gaining Insight into Parallel Program Performance using HPCToolkit

John Mellor-Crummey, Rice University

HPCToolkit is an integrated suite of multi-platform tools that supports measurement, analysis, attribution, and presentation of application performance for scalable parallel programs. HPCToolkit employs asynchronous sampling, which can pinpoint performance bottlenecks without any preconceived hypothesis about their nature or location. Somewhat surprisingly, asynchronous sampling can provide deep insight into a wide range of performance losses in parallel programs. This talk will show how HPCToolkit can be used to pinpoint and quantify scalability bottlenecks within and across nodes in parallel systems. It will demonstrate how to use HPCToolkit’s code-centric views to attribute codes to application source code and HPCToolkit’s time-centric user interface to understand how a parallel program execution unfolds over time.

For more information on HPCToolkit, please visit

2nd tutorial: (1 p.m. – 4 p.m. CT)

PerfExpert: REALLY SIMPLE Performance Optimization for Multicore Chips

Martin Burtscher, Texas State University
Ashay Rane and James Browne – The University of Texas at Austin

The environment for almost all parallel computations includes multicore chips. But performance optimization for multicore chips is a notoriously complex and knowledge intensive task. Performance optimization has four stages: measurement, diagnosis of bottlenecks, determination of optimizations, and rewriting of the source code. Each stage must be successfully implemented to enable the next stage. The PerfExpert tool automates most of the first three stages of performance optimization for multicore chip execution.

This workshop will introduce and apply PerfExpert. PerfExpert requires no expertise in performance analysis or measurement. The analysis process works directly on the production program without annotations or modifications. PerfExpert also recommends optimizations for the user’s program based on the analysis output. The goal is that, at the end of the workshop, each participant will leave with a version of her/his favorite application that is optimized for execution on Ranger or Lonestar and the ability to apply PerfExpert independently to other programs.

For more information on PerfExpert, please visit:



05/17/2012 09:00 - 05/17/2012 16:00 CDT (SESSION HAS ENDED)
View Session Details →
Registration CLOSED
Registration open date
04/30/2012 15:52 CDT
Registration close date
05/09/2012 16:00 CDT
Class size restriction
22 registrants

(0 spots left)


0 registrants

Contact Information
Bob Garza
Contact phone
Contact email
Texas Advanced Computing Center
J.J. Pickle Research Campus
10100 Burnet Rd., ROC 1.603
Austin, TX 78758
Posted: 04/30/2012 19:02 UTC