Computer Sciences Colloquium - The Everlasting Database: Statistical Validity at a Fair Price
Nathan Srebro (Toyota Technological Institute at Chicago)
Abstract:
The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for running a validation service that can answer any arbitrarily long sequence of (potentially adaptive) queries, charging a price for each query and using the proceeds to collect additional samples. Without relying on any declared notion of "users", accounts or adaptivity structure, our pricing mechanism nevertheless ensures analysts making only non-adaptive queries will only pay a very low cost, comparable to the minimal possible cost needed to answer these queries without worrying about adaptivity, while adaptive users bear the cost of answering adaptive queries.
Joint work with Blake Woodworth, Vitaly Feldman and Saharon Rosset.