Computer Sciences Colloquium - Interactive Data Science
Tim Kraska, MIT
Unleashing the full potential of Big Data requires a paradigm shift in the algorithms and tools used to analyze data towards more interactive systems with highly collaborative and visual interfaces. Ideally, a data scientist and a domain expert should be able to make discoveries together by directly manipulating, analyzing and visualizing data on the spot, instead of having week-long forth-and-back interactions between them. Current systems, such as traditional databases or more recent analytical frameworks like Hadoop or Spark, are ill-suited for this purpose. They were not designed to be interactive nor to support the special requirements of visual data exploration. Similarly, most machine learning algorithms are not able to provide initial answers at "human speed" (i.e., sub-seconds), nor are existing methods sufficient to convey the impact of the various risk factors, such as multi hypothesis problem.
In this talk, I will present our vision of a new approach for conducting interactive exploratory analytics and explain why integrating the aforementioned features requires a complete rethinking of the full analytics stack, from the interface to the ``guts’’. I will present recent results towards this vision including our novel interface, analytical engine and automatic error detection, and outline what challenges are still ahead of us.
Tim Kraska is an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory. Currently, his research focuses on building systems for machine learning, and using machine learning for systems. Before joining MIT, Tim was an Assistant Professor at Brown, spent time at Google Research, and was a PostDoc in the AMPLab at UC Berkeley after he got his PhD from ETH Zurich. Tim is a 2017 Alfred P. Sloan Research Fellow in computer science, received the VMware Systems Research Award, an NSF CAREER Award, an Air Force Young Investigator award, two Very Large Data Bases (VLDB) conference best-demo awards, and a best-paper award from the IEEE International Conference on Data Engineering (ICDE).