Large-scale Open Dataset, Pipeline, and Benchmark for Off-Policy Evaluation
|
Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita
|
Video
|
On the Convergence Rate of Density Ratio Learning Based Off-Policy Policy Gradient Methods
|
Jiawei Huang*, Nan Jiang
|
Video
|
The Importance of Pessimism in Fixed-Dataset Policy Optimization
|
Jacob Buckman, Carles Gelada, Marc G. Bellemare
|
Video
|
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
|
Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu
|
|
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
|
Shangtong Zhang, Bo Liu, Shimon Whiteson
|
Video
|
Batch Reinforcement Learning Through Continuation Method
|
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen
|
Video
|
M$^3$Rec: An Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation
|
Yanan Wang, Yong Ge, Li Li, Rui Chen, Tong Xu
|
Video
|
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
|
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
|
Video
|
On Sampling Error in Batch Action-Value Prediction Algorithms
|
Brahma S. Pavse, Josiah P. Hanna, Ishan Durugkar, Peter Stone
|
Video
|
Offline Meta-Reinforcement Learning with Advantage Weighting
|
Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, Chelsea Finn
|
Video
|
Offline Learning from Demonstrations and Unlabeled Experience
|
Konrad Zolna, Alexander Novikov, Ksenia Konyushova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed
|
Video
|
Parameter-based Value Functions
|
Francesco Faccio, Louis Kirsch, Juergen Schmidhuber
|
Video
|
Reset-Free Lifelong Learning with Skill-Space Planning
|
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
|
Video
|
MARS-Gym: Offline Reinforcement Learning for Recommender Systems in Marketplaces
|
Marlesson Santana*, Luckeciano Melo*, Fernando Camargo*, Bruno Brandão,
Anderson Soares, Renan Oliveira and Sandor Caetano.
|
Video
|
Q-Value Weighted Regression:Reinforcement Learning with Limited Data
|
Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kańska
|
Video
|
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
|
Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum
|
Video
|
Model-Based Visual Planning with Self-Supervised Functional Distances
|
Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
|
Video
|
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
|
Jinlin Lai, Lixin Zou, Jiaxing Song
|
Video
|
Uncertainty Weighted Offline Reinforcement Learning
|
Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
|
Video
|
Offline Policy Optimization with Variance Regularization
|
Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup
|
|
Bridging the Imitation Gap by Adaptive Insubordination
|
Luca Weihs*, Unnat Jain*, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alexander Schwing
|
Video
|
Variance-Reduced Off-Policy Memory-Efficient Policy Search
|
Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu
|
Video
|
Semi-supervised reward learning for offline reinforcement learning
|
Ksenia Konyushova, Konrad Zolna, Yusuf Aytar, Alexander Novikov, Scott Reed, Serkan Cabi, Nando de Freitas
|
Video
|
Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
|
Chaochao Lu, Biwei Huang, Ke Wang, José Miguel Hernández-Lobato, Kun Zhang, Bernhard Schölkopf
|
Video
|
Risk-Averse Offline Reinforcement Learning
|
Núria Armengol-Urpí, Sebastian Curi, Andreas Krause
|
Video
|
POPO: Pessimistic Offline Policy Optimization
|
Qiang He, Xinwen Hou, Yu Liu
|
Video
|
Offline Policy Evaluation with New Arms
|
Ben London, Thorsten Joachims
|
|
Batch Reinforcement Learning in the Real World: A Survey
|
Yuwei Fu, Wu Di, Benoit Boulet
|
Video
|
Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?
|
Balazs Kegl, Gabriel Hurtado, Albert Thomas
|
Video
|
Offline Hyperparameter Selection for Offline Reinforcement Learning
|
Tom Le Paine*, Cosmin Paduraru*, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas
|
Video
|
Double Explore-then-Commit: Asymptotic Optimality and Beyond
|
Tianyuan Jin, Pan Xu, Xiaokui Xiao, Quanquan Gu
|
Video
|
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
|
Ming Yin, Yu Bai, and Yu-Xiang Wang
|
Video
|
Gradient Analysis and Approximations for Off-policy Optimization
|
Ramki Gummadi, Dale Schuurmans
|
Video
|
Offline Reinforcement Learning Hands-On
|
Jakub Kmec, Louis Monier, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, Karim Beguir
|
Video
|
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
|
Annie S Chen*, HyunJi Nam*, Suraj Nair*, Chelsea Finn
|
Video
|
Recurrent Open-loop Control in Offline Reinforcement Learning
|
Alex Lewandowski, Vincent Zhang, Dale Schuurmans
|
Video
|
Abstraction-Guided Policy Recovery from Expert Demonstrations
|
Canmanie T. Ponnambalam, Frans. A. Oliehoek, Matthijs T. J. Spaan
|
Video
|
Shaping Control Variates for Off-Policy Evaluation
|
Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez
|
Video
|
Counterfactual Policy Evaluation and the Conditional Monte Carlo Method
|
Michel Ma, Pierre-Luc Bacon
|
Video
|
Semi-Supervised Learning for Doubly Robust Offline Policy Evaluation
|
Aaron Sonabend, Nilanjana Laha, Rajarshi Mukherjee, Tianxi Cai
|
Video
|
You Only Evaluate Once -- a Simple Baseline Algorithm for Offline RL
|
Wonjoon Goo, Scott Niekum
|
Video
|
Offline Reinforcement Learning From Images with Latent Space Models
|
Rafael Rafailov*, Tianhe Yu*, Aravind Rajeswaran, Chelsea Finn
|
Video
|
Fine-Tuning Offline Reinforcement Learning with Model-Based Policy Optimization
|
Adam Villaflor, John Dolan, Jeff Schneider
|
Video
|
Towards Exploiting Geometry and Time for Fast Off-Distribution Adaptation in Multi-Task Robot Learning
|
K.R. Zentner, Ryan Julian, Ujjwal Puri, Yulun Zhang, Gaurav Sukhatme
|
Video
|
Conservative Objective Models: A Simple Approach to Effective Model-Based Optimization
|
Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
|
Video
|