Speaker: 
Prof. Yaqi Duan, Stern School of Business at New York University, United States

Inviter: 
Prof. Ya-xiang Yuan

Title: 
Optimal Policy Evaluation Using Kernel-based Temporal Difference Methods

15:30-16:30 August 29 (Tuesday), Z311

Abstract: 
We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD) estimate. We analyze the error of this estimate in the L2(mu)-norm, where mu denotes the stationary distribution of the underlying Markov chain. We use empirical process theory techniques to derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator, as well as the instance-dependent variance of the Bellman residual error. In addition, we prove minimax lower bounds over sub-classes of MRPs, which shows that our rate is optimal in terms of the sample size n and the effective horizon H=1/(1-gamma). Whereas existing worst-case theory predicts cubic scaling (H^3) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error. Notably, it is only parametric and near-parametric problems that can ever achieve the worst-case cubic scaling.