Computer Sciences Colloquium - The Everlasting Database: Statistical Validity at a Fair Price

Nathan Srebro (Toyota Technological Institute at Chicago)

25 March 2018, 11:00 
Schreiber Building, Room 006 
Computer Sciences Colloquium

Abstract:

 

The problem of handling adaptivity in data analysis, intentional or not,  permeates a variety of fields, including  test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for running a validation service that can answer any arbitrarily long sequence of (potentially adaptive) queries, charging a price for each query and using the proceeds to collect additional samples.  Without relying on any declared notion of "users", accounts or adaptivity structure, our pricing mechanism nevertheless ensures analysts making only non-adaptive queries will only pay a very low cost, comparable to the minimal possible cost needed to answer these queries without worrying about adaptivity, while adaptive users bear the cost of answering adaptive queries. 

 

Joint work with Blake Woodworth, Vitaly Feldman and Saharon Rosset.

Tel Aviv University, P.O. Box 39040, Tel Aviv 6997801, Israel
UI/UX Basch_Interactive