VSCSE Data Intensive Summer School

06/30/2014 10:00 - 07/02/2014 16:00 CDT
in person (University of Illinois at Urbana-Champaign)
Registration
Registration open date
04/07/2014 09:00 CDT
Registration close date
06/27/2014 09:00 CDT
Class size restriction
45 registrants

(0 spots left)

Waitlist

3 registrants

Contact Information
Contact
Scott Lathrop
Contact phone
217-714-2517
Contact email
lathrop@illinois.edu
Location
Name
University of Illinois at Urbana-Champaign
Address
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign, 1205 W. Clark St., Room 1030
Urbana, IL 61801
Phone
217-714-2517
URL
http://www.vscse.org

Data Intensive Summer School
June 30 – July 2, 2014
11:00 a.m. - 5:00 p.m. EST

The Data Intensive Summer School focuses on the skills needed to manage, process and gain insight from large amounts of data. It is targeted at researchers from the physical, biological, economic and social sciences that are beginning to drown in data. We will cover the nuts and bolts of data intensive computing, common tools and software, predictive analytics algorithms, data management and visualization. Given the short duration of the summer school, the emphasis will be on providing a solid foundation that the attendees can use as a starting point for advanced topics of particular relevance to their work.

Prerequisites

  • Experience working in a Linux environment
  • Familiarity with relational data base model
  • Examples and assignments will most likely use R, MATLAB and Weka. We do not require experience in these languages or tools, but you should already have an understanding of basic programming concepts (loops, conditionals, functions, arrays, variables, scoping, etc.)

Organizer

  • Robert Sinkovits, San Diego Supercomputer Center

Topics (tentative)

  • Nuts and bolts of data intensive computing
    • Computer hardware, storage devices and file systems
    • Cloud storage
    • Data compression
    • Networking and data movement
  • Data Management
    • Digital libraries and archives
    • Data management plans
    • Access control, integrity and provenance
  • Introduction to R programming
  • Introduction to Weka
  • Predictive analytics
    • Standard algorithms: k-mean clustering, decision trees, SVM
    • Over-fitting and trusting results
  • Dealing with missing data
  • ETL (Extract, transfer and load)
    • The ETL life cycle
    • ETL tools – from scripts to commercial solutions
  • Non-relational atabases
    • Brief refresher on relational mode
    • Survey of non-relational models and technologies
  • Visualization
    • Presentation of data for maximum insight
    • R and ggplot package

Virtual Summer School courses are delivered simultaneously at multiple locations across the country using high-definition videoconferencing technology. Please check below for available site locations.

This session has ended.
Posted: 02/25/2014 20:01 UTC