eScience Seminar with Valerie Daggett (UW); Thursday, October 24th, 4:00 PM, CSE-305

Please join the eScience Institute Thursday, October 24, 4:00 pm in
CSE-305. Refreshments will be provided.

*Valerie Daggett (UW):*

Valerie Daggett obtained her BA from Reed College in Portland Oregon in
1983. A couple of years later she went to the University of California, San
Francisco for her PhD (awarded in 1990) and was then a postdoctoral fellow
at Stanford University. She joined the faculty at the University of
Washington in 1993 and has been there ever since. She is a professor in the
Department of Bioengineering, College of Engineering and School of
Medicine. She also holds adjunct positions in the Biochemistry Department
and the Biomedical Health and Informatics Program. She has published over
220 papers and maintains active research programs in protein dynamics and
folding, simulation, protein misfolding diseases, and the general area of
bioinformatics. She is on a Senior Editor of Protein Engineering, Design
and Selection as well as a board member for numerous other scientific

*DIVE: Data Intensive Visual Engine for molecular simulation data*

Data-driven research is a rapidly emerging commonality throughout
scientific disciplines. Recently, with the proliferation of inexpensive
commodity computing clusters, synthetic data sources such as modeling and
simulation are capable of producing a continuous stream of terascale data.
Confronted with this data deluge, domain scientists are in need of
data-intensive analytic environments. Dynameomics is a terascale
simulation-driven research effort designed to enhance our understanding of
protein folding and dynamics through molecular dynamics simulation and
modeling. The project routinely involves exploratory analysis of 100+
terabyte datasets using an array of heterogeneous structural
biology-specific tools. In order to accelerate the pace of discovery for
the Dynameomics project, we have developed DIVE, a framework that allows
for rapid prototyping and dissemination of domain independent (e.g.,
clustering) and domain specific analyses in an implicitly iterative
workflow environment.

The information in the data warehouse is classified into three categories:
raw data, derived data, and state data. Raw data are generated from
simulations and models, derived data are produced through tools operating
on the raw data, and state data constitute the record of the exploratory
workflow, which has the added benefit of capturing the provenance of
derived data.

DIVE empowers researchers by simplifying and expediting the overhead
associated with shared tool use and heterogeneous datasets. Furthermore,
the workflow provides a simple, interactive, and iterative data-oriented
investigation paradigm that tightens the hypothesis generation loop. The
result is an expressive, flexible laboratory informatics framework that
allows researchers to focus on analysis and discovery instead of tool

Upcoming Seminars:

* November 6, 4 PM (233 Sieg Hall)

Clark Gaylord (Virginia Tech)

Data Science Meets Infrastructure: Strategic Highway Research
Program (SHRP 2)