Source

icis-largedata2011-viz / index.html

Full commit
<!DOCTYPE html>

<!--
  Google HTML5 slide template

  Authors: Luke Mahé (code)
           Marcin Wichary (code and design)
           
           Dominic Mazzoni (browser compatibility)
           Charles Chen (ChromeVox support)

  URL: http://code.google.com/p/html5slides/
-->

<html>
  <head>
    <title>Presentation</title>

    <meta charset='utf-8'>
    <!--<script src='http://html5slides.googlecode.com/svn/trunk/slides.js'></script>-->
    <script src='slides.js'></script>
  </head>
  
  <style>
    /* Your individual styles here, or just use inline styles if that’s
       what you want. */
    
    
  </style>

  <body style='display: none'>

    <section class='slides layout-regular template-default'>
      
      <!-- Your slides (<article>s) go here. Delete or comment out the
           slides below. -->
        
        
      
      <article>
        <h1>
          Large Data: Analysis & Viz
        </h1>
        <h2>
          (a starting point for discussion)
        </h2>
        <p>
          Matthew Turk
          <br>
          August 8, 2011
        </p>
      </article>
      <article>
        <p>Who am I?</p>
        <p>NSF OCI Postdoctoral Fellow, working on first stars and galaxies.
        Developer of the <span
          style='font-family:"Inconsolata",monospace;'>yt</span> viz &amp; analysis tool (<a class="url"
          href="http://yt.enzotools.org/"</a>yt.enzotools.org</a>) and enzo (<a
        class="url"
        href="http://enzo.googlecode.com"</a>enzo.googlecode.com</a>)
        simulation code developer.</p>
      </article>
      <article>
        <qnq>
          What are the essentials for large data analysis <i>today</i>?
        <qnq>
      </article>
      <article>
        <h2>Meaningful sub-selection of data</h2>
        <div class="build">
          <p>Selecting regions for subsequent analysis based on their
          characteristics or their location in context of the broader
          simulation.</p>
          <p>This can be halos, geometric regions, overdensity isocontours, and
          so on.</p>
        </div>
      </article> 
      <article>
        <h2>Efficient parallelization of data analysis</h2>
        <div class="build">
          <p>Keeping all nodes busy during a memory-intensive calculation.</p>
          <p>Can be accomplished through more complex analysis tasks, directed
          acyclic graph (DAG) dependency analysis, over-handling of individual
          data regions.</p>
        </div>
      </article>
      <article>
        <h2>Time-domain correlation</h2>
        <div class="build">
          <p>Examining present behavior in the light of previous: origins,
          complications, growth of phenomena.</p>
          <p>Particularly challenging for Eulerian hydrodynamics
          calculations.</p>
        </div>
      </article>
      <article>
        <h2>In Situ analysis</h2>
        <div class="build">
          <p>Analyzing far more frequently, but disintermediating the disk;
          often tied to a specific code or specific problem.</p>
          <p>Sometimes requires directing the movie without reading the script.</p>
        </div>
      </article>
      <article>
        <qnq>
          What are the weakness in the current state of large data analysis?
        <qnq>
      </article>
      <article>
        <h2>Moving data around is <i>terrible</i>.</h2>
        <div class="build">
          <p>Transfer speeds are slow, transfers are unreliable.</p>
          <p>Storage capabilities are often inadequate.</p>
        </div>
      </article>
      <article>
        <h2>There is no common language for visualization.</h2>
        <div class="build">
          <p><i>De facto</i> standards for describing analysis products have
          arisen, but subtleties are often lost.</p>
          <p>The process from data to plot is often considered unimportant;
          common implementations are few and far between.</p>
        </div>
      </article>
      <article>
        <h2>Remote data access is slow and cumbersome.</h2>
        <div class="build">
          <p>There is no federated system of data repositories.</p>
          <p>Ideal: data-format neutral RPC to unprocessed data, with parallel
          backend.</p>
        </div>
      </article>
      <article>
        <h2>There is no data format for simulation data.</h2>
        <div class="build">
          <p>Handling particles (collisionless or dissipative), fluids and
          arbitrary geometries is challenging.</p>
          <p>Even a leaky format would be an enormous improvement.</p>
        </div>
      </article>
      <article>
        <h2>
        Large data analysis, while informed by details and specifics, should
        focus on underlying <i>physical quantities</i>.
        </h2>
      </article>
      <article>
        <qnq>A brief word about <b>Open Source</b>.</qnq>
      </article>
      <article>
        <h2>Common grounds should be identified and expanded.</h2>
        <div class="build">
          <p>Analysis and visualization platforms.</p>
          <p>All the bits that aren't <i>secret sauce</i>.</p>
          <p>Many eyes, many hands, many cooks.</p>
          <p>This requires social <i>and</i> technical infrastructure, as well
          as a sustained commitment to developing and expanding development
          communities.</p>
        </div>
      </article>
      <article>
        <h2>Thank you.</h2>
      </article>
    </section>
  </body>
</html>