Extant model selection methods either assume homogeneous data observations which follow one common model or search in restricted space heterogeneous models for exogenous given subsets of observations. For panel data in economics or finance may require heterogeneous model selection for each (potentially unknown) clusters the observations naturally form. We invent a novel approach to solving the joint problem of observation clustering and model selection. Our Clustered Bayesian Model (CBM) combines tree-based supervised clustering algorithms and Bayesian modeling with the spike-and-slab prior distributions. First, cross-sectional observations are clustered recursively into leaves by a tree that grows according to the marginal likelihood jointly for all selected leaf models. Second, observations in each leaf fit a model separately with uncommon variables using data in all periods. Third, the Bayesian model allows time-varying coefficients driven by observation subject characteristics under modest computational costs. We apply CBM to the (imbalanced) panel of individual stock returns for estimating and selecting observable factor models. CBM splits cross-sectional stock returns by firm characteristics and selects potentially distinct factor models for each leaf clusters. Empirically, we find most asset clusters can be explained by the list of published factors, but some have significant alphas. CBM provides a graphical tree-leaf path with firm characteristics to analyze these mispriced stocks. Finally, we provide Bayesian inference on factor usefulness and the fundamental and macroeconomic sources of mispricing clusters. |