讲座简介: | Subsampling methods are effective techniques to reduce computational burden and maintain statistical inference efficiency for big data. In this talk, we will review different subsampling techniques for different models from linear model, to generalized linear model, and to estimation equations. If the data volume is so large that nonuniform subsampling probabilities cannot be calculated all at once, subsampling with replacement is infeasible to implement. This problem is solved by using a new subsampling without replacement, called Poisson subsampling. To deal with the situation that the full data are stored in different blocks or at multiple locations, a distributed subsampling framework is developed, in which statistics are computed simultaneously on smaller partitions of the full data. Finally, the proposed strategies are illustrated and evaluated through numerical experiments on both simulated and real data sets. |