This talk focuses on how to use statistical machine learning techniques and big data to solve problems in finance and economics. It begins with an overview on the genesis of machine learning and AI and how statistical and computational methods have evolved with growing dimensionality and sample sizes and become the foundation of modern machine learning and AI. It introduces simple yet power techniques to deal with heavy tailness and dependence that stylize financial data. We showcase the applications in high frequency trading and sentiment learning from Chinese financial textual data.
We present the predictability in ultra high-frequency finance, with focus on returns and durations. Based on 101 stocks in the S\&P 100 index over 505 days, we quantified and documented the predictability and confirmed that it exists universally. We unveil important predictors and showed how the predictability depends on the market environments and stock characteristics and the timeliness of data.
For Chinese text analysis, we introduce FarmPredict to let machines learn financial returns directly. Based on approximately 2 million pieces of news, we show that positive sentiments scored by our FarmPredict approach generate on average 83 bps daily excess returns, while negative news has an adverse impact of 26 bps on the days of news announcements. This asymmetric effect aligns well with the short-sale constraints in the Chinese equity market. This lends further support that our FarmPredict can learn the sentiments embedded in financial news.