声明:本文总结难免有不足和错误之处,还望大家一起完善总结
NeurIPS 2021 接收数据分析
今年提交的论文总共有9122篇,其中有2344篇被接收,比例为26%。相比于2020年,虽然投稿减少了332篇,但是被接收的反而增加了444篇。
强化学习领域论文列表:
[1]. PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning
Tao Yu (USTC) · Cuiling Lan (Microsoft) · Wenjun Zeng (Microsoft) · Mingxiao Feng (University of Science and Technology of China) · Zhizheng Zhang (University of Science and Technology of China) · Zhibo Chen (University of Science and Technology of China)
[2]. FACMAC: Factored Multi-Agent Centralised Policy Gradients
Bei Peng (University of Oxford) · Tabish Rashid (University of Oxford) · Christian Schroeder de Witt (University of Oxford) · Pierre-Alexandre Kamienny (Facebook AI Research) · Philip Torr (University of Oxford) · Wendelin Boehmer (University of Oxford) · Shimon Whiteson (University of Oxford)
[3]. Fast Algorithms for L_\infty-constrained S-rectangular Robust MDPs
Bahram Behzadian (University of New Hampshire) · Marek Petrik (University of New Hampshire) · Chin Pang Ho (City University of Hong Kong)
[4]. Outcome-Driven Reinforcement Learning via Variational Inference
Tim G. J. Rudner (University of Oxford) · Vitchyr Pong (UC Berkeley) · Rowan McAllister (UC Berkeley) · Yarin Gal (University of Oxford) · Sergey Levine (University of Washington)
[5]. On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Tim G. J. Rudner (University of Oxford) · Cong Lu (University of Oxford) · Michael A Osborne (U Oxford) · Yarin Gal (University of Oxford) · Yee Teh (DeepMind)
[6]. Model-Based Episodic Memory Induces Dynamic Hybrid Controls
Hung Le (Deakin University) · Thommen Karimpanal George (Deakin University) · Majid Abdolshah (Deakin University) · Truyen Tran (Deakin University) · Svetha Venkatesh (Deakin University)
[7]. Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings
Hengrui Cai (North Carolina State University) · (None) · Rui Song (North Carolina State University) · Wenbin Lu (North Carolina State University)
[8]. Stabilizing Dynamical Systems via Policy Gradient Methods
Juan Perdomo (University of California, Berkeley) · Jack Umenberger (Uppsala University) · Max Simchowitz (MIT)
[9]. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
Yiqin Yang (Tsinghua University) · Xiaoteng Ma (Department of Automation, Tsinghua University) · Li Chenghao (Tsinghua University) · Zewu Zheng (Johns Hopkins University) · Qiyuan Zhang (None) · Gao Huang (Tsinghua) · Jun Yang (Tsinghua University, Tsinghua University) · Qianchuan Zhao (Tsinghua University, Tsinghua University)
[10]. Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
Zaiwei Chen (Georgia Institute of Technology) · Siva Theja Maguluri (Georgia Institute of Technology) · Sanjay Shakkottai (University of Texas at Austin) · Karthikeyan Shanmugam (IBM Research, NY)
[11]. Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning
Xiong-Hui Chen (Nanjing University) · Shengyi Jiang (The University of Hong Kong) · Feng Xu (Nanjing University) · Zongzhang Zhang (Nanjing University) · Yang Yu (Nanjing University)
[12]. Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
Weitong ZHANG (University of California, Los Angeles) · Dongruo Zhou (UCLA) · Quanquan Gu (UCLA)
[13]. Offline Model-based Adaptable Policy Learning
Xiong-Hui Chen (Nanjing University) · Yang Yu (Nanjing University) · Qingyang Li (Didi AI Labs) · Fan-Ming Luo (Nanjing University) · Zhiwei Qin (Didi Research America) · Wenjie Shang (Nanjing University) · Jieping Ye (University of Michigan)
[14]. RMIX: Learning Risk-Sensitive Policies forCooperative Reinforcement Learning Agents
Wei Qiu (Nanyang Technological University) · Xinrun Wang (NTU) · Runsheng Yu (Xiaomi Intelligent Technology Co., Ltd) · Rundong Wang (Nanyang Technological University) · Xu He (Nanyang Technological University) · Bo An (Nanyang Technological University) · Svetlana Obraztsova (Nanyang Technological University) · Zinovi Rabinovich (Nanyang Technological University)
[15]. Regularized Softmax Deep Multi-Agent Q-Learning
Ling Pan (Tsinghua University) · Tabish Rashid (University of Oxford) · Bei Peng (University of Oxford) · Longbo Huang (IIIS, Tsinghua Univeristy) · Shimon Whiteson (University of Oxford)
[16]. Celebrating Diversity in Shared Multi-Agent Reinforcement Learning
Li Chenghao (Tsinghua University) · Tonghan Wang (Tsinghua University) · Chengjie Wu (Tsinghua University) · Qianchuan Zhao (Tsinghua University, Tsinghua University) · Jun Yang (Tsinghua University, Tsinghua University) · Chongjie Zhang (Tsinghua University)
[17]. Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
Aviv Rosenberg (Tel Aviv University) · Yishay Mansour (Tel Aviv University)
[18]. Accelerating Quadratic Optimization with Reinforcement Learning
Jeffrey Ichnowski (University of California Berkeley) · Paras Jain (University of California Berkeley) · Bartolomeo Stellato (Massachusetts Institute of Technology) · Goran Banjac (Swiss Federal Institute of Technology) · Michael Luo (University of California Berkeley) · Francesco Borrelli (University of California Berkeley) · Joseph Gonzalez (UC Berkeley) · Ion Stoica (University of California-Berkeley) · Ken Goldberg (UC Berkeley)
[19]. Iterative Amortized Policy Optimization
Joseph Marino (DeepMind) · Alexandre Piche (Mila) · Alessandro Davide Ialongo (University of Cambridge) · Yisong Yue (Caltech)
[20]. Faster Non-asymptotic Convergence for Double Q-learning
Lin Zhao (National University of Singapore) · Huaqing Xiong (Ohio State University) · Yingbin Liang (The Ohio State University)
[21]. MICo: Improved representations via sampling-based state similarity for Markov decision processes
Pablo Samuel Castro (Google) · Tyler Kastner (McGill University) · Prakash Panangaden (McGill University, Montreal) · Mark Rowland (DeepMind)
[22]. The Difficulty of Passive Learning in Deep Reinforcement Learning
Georg Ostrovski (DeepMind) · Pablo Samuel Castro (Google) · Will Dabney (DeepMind)
[23]. Towards Deeper Deep Reinforcement Learning with Spectral Normalization
Nils Bjorck (Cornell University) · Carla Gomes (Cornell University) · Kilian Weinberger (Cornell University)
[24]. Automatic Data Augmentation for Generalization in Reinforcement Learning
Roberta Raileanu (NYU) · Maxwell Goldstein (New York University) · Denis Yarats (New York University) · Ilya Kostrikov (New York University) · Rob Fergus (DeepMind / NYU)
[25]. Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs
Thomas Spooner (J.P. Morgan AI Research) · Nelson Vadori (J.P. Morgan AI Research) · Sumitra Ganesh (JPMorgan - AI Research)
[26]. Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
Alberto Maria Metelli (Politecnico di Milano) · Alessio Russo (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
[27]. Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning
Jinxin Liu (Westlake University) · Hao Shen (University of California Berkeley) · Donglin Wang (Westlake University) · Yachen Kang (Westlake University) · Qiangxing Tian (Zhejiang University)
[28]. Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
Kaiqing Zhang (University of Illinois at Urbana-Champaign (UIUC)) · Xiangyuan Zhang (University of Illinois at Urbana-Champaign) · Bin Hu (University of Illinois at Urbana-Champaign) · Tamer Basar (University of Illinois at Urbana-Champaign)
[29]. TAAC: Temporally Abstract Actor-Critic for Continuous Control
Haonan Yu (Horizon Robotics) · Wei Xu (Horizon Robotics) · Haichao Zhang (Horizon Robotics)
[30]. On the Equivalence between Neural Network and Support Vector Machine
Yilan Chen (University of California, San Diego) · Wei Huang (University of Technology Sydney) · Lam Nguyen (IBM Research, Thomas J. Watson Research Center) · Tsui-Wei Weng (MIT)
[31]. The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
Yujin Tang (Google) · David Ha (Google Brain)
[32]. Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
Gen Li (Tsinghua University) · Yuxin Chen (Princeton University) · Yuejie Chi (Carnegie Mellon University) · Yuantao Gu (Tsinghua University) · Yuting Wei (Carnegie Mellon University)
[33]. Challenges and Opportunities in High Dimensional Variational Inference
Akash Kumar Dhaka (Aalto University) · Alejandro Catalina (Aalto University) · Manushi Welandawe (Boston University) · Michael Andersen (Technical University of Denmark) · Jonathan Huggins (Boston University) · Aki Vehtari (Aalto University)
[34]. On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Junyu Zhang (Princeton University) · (None) · zheng Yu (Princeton University) · Csaba Szepesvari (DeepMind / University of Alberta) · Mengdi Wang (Princeton University)
[35]. Towards mental time travel: a hierarchical memory for reinforcement learning agents
Andrew Lampinen (DeepMind) · Stephanie Chan (DeepMind) · Andrea Banino (DeepMind) · Felix Hill (Deepmind)
[36]. Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
Kefan Dong (Tsinghua University) · Jiaqi Yang (Tsinghua University) · Tengyu Ma (Stanford University)
[37]. Bandits with Knapsacks beyond the Worst Case
Karthik Abinav Sankararaman (University of Maryland) · Aleksandrs Slivkins (Microsoft Research)
[38]. Coordinated Proximal Policy Optimization
Zifan Wu (SUN YAT-SEN UNIVERSITY) · Chao Yu (Sun Yat-sen University) · Deheng Ye (Tencent) · Junge Zhang (CASIA) · haiyin piao (Northwestern Polytechnical University) · Hankz Hankui Zhuo (Sun Yat-sen University)
[39]. On Effective Scheduling of Model-based Reinforcement Learning
Hang Lai (Shanghai Jiao Tong University) · Jian Shen (Shanghai Jiao Tong University) · Weinan Zhang (Shanghai Jiao Tong University) · Yimin Huang (Huawei Technologies Co., Ltd.) · Xing Zhang (Huawei Technologies Ltd.) · Ruiming Tang (Huawei) · Yong Yu (Shanghai Jiao Tong Unviersity) · Zhenguo Li (Noah's Ark Lab, Huawei Tech Investment Co Ltd)
[40]. Risk-Averse Bayes-Adaptive Reinforcement Learning
Marc Rigter (University of Oxford) · Bruno Lacerda (University of Oxford) · Nick Hawes (University of Oxford)
[41]. Continual World: A Robotic Benchmark For Continual Reinforcement Learning
Maciej WoÅ‚czyk (Jagiellonian University) · MichaÅ‚ ZajÄ…c (Jagiellonian University) · Razvan Pascanu (Google DeepMind) · Lukasz Kucinski (Polish Academy of Sciences) · Piotr MiÅ‚oÅ› (University of Warsaw ul. Krakowskie PrzedmieÅ›cie 26/28 00-927 Warsaw Poland NIP: 525-001-12-66.)
[42]. Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
Xu-Hui Liu (Nanjing University) · Zhenghai Xue (Nanjing University) · Jingcheng Pang (Nanjing University) · Shengyi Jiang (The University of Hong Kong) · Feng Xu (Nanjing University) · Yang Yu (Nanjing University)
[43]. Adaptive Online Packing-guided Search for POMDPs
Chenyang Wu (Nanjing University) · Guoyu Yang (Nanjing University) · Zongzhang Zhang (Nanjing University) · Yang Yu (Nanjing University) · Dong Li (Huawei Noah’s Ark Lab) · Wulong Liu (Huawei Noah's Ark Lab) · Jianye Hao (Tianjin University)
[44]. Variance-Aware Off-Policy Evaluation with Linear Function Approximation
Yifei Min (Yale University) · Tianhao Wang (Yale University) · Dongruo Zhou (UCLA) · Quanquan Gu (UCLA)
[45]. Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
Lingxiao Wang (Northwestern University) · Zhuoran Yang (Princeton) · Zhaoran Wang (Princeton University)
[46]. Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang (Tsinghua University) · Jiaqi Yang (Tsinghua University) · Xiangyang Ji (Tsinghua University) · Simon Du (University of Washington)
[47]. Program Synthesis Guided Reinforcement Learning for Partially Observed Environments
Yichen Yang (MIT) · Jeevana Priya Inala (MIT) · Osbert Bastani (University of Pennsylvania) · Yewen Pu (Autodesk) · Armando Solar-Lezama (MIT) · Martin Rinard (MIT)
[48]. Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
Kanghyun Choi (Yonsei University) · Deokki Hong (Yonsei University) · Noseong Park (George Mason University) · Youngsok Kim (Yonsei University) · Jinho Lee (Yonsei University)
[49]. Universal Off-Policy Evaluation
Yash Chandak (University of Massachusetts Amherst) · Scott Niekum (UT Austin) · Bruno da Silva (Federal University of Rio Grande do Sul) · Erik Learned-Miller (UMass Amherst) · Emma Brunskill (Stanford University) · Philip S. Thomas (CMU)
[50]. Model-Based Reinforcement Learning via Imagination with Derived Memory
(None) · Yuzheng Zhuang (Huawei Technologies Co. Ltd.) · Bin Wang (Huawei Noah's Ark Lab) · Guangxiang Zhu (Tsinghua university) · Wulong Liu (Huawei Noah's Ark Lab) · Jianyu Chen (Tsinghua University) · Ping Luo (The Chinese University of Hong Kong) · Shengbo Li (Tsinghua University, Tsinghua University) · Chongjie Zhang (Tsinghua University) · Jianye Hao (Tianjin University)
[51]. A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
Mingde Zhao (McGill University) · Zhen Liu (University of Montreal, MILA) · Sitao Luan (McGill University, Mila) · Shuyuan Zhang (Mcgill University / Mila) · Doina Precup (DeepMind) · Yoshua Bengio (University of Montreal)
[52]. BCORLE(\lambda): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market
Yang Zhang (Xi'an Jiaotong University) · Bo Tang (Institute of Software Chinese Academy of Sciences) · Qingyu Yang (Xi'an Jiaotong University) · Dou An (Xi'an Jiaotong University) · Hongyin Tang (Chinese Academy of Sciences) · Chenyang Xi (Beijing Institute of Technology) · Xueying LI (University of Science and Technology of China) · Feiyu Xiong (Drexel University)
[53]. Compositional Reinforcement Learning from Logical Specifications
Kishor Jothimurugan (University of Pennsylvania) · Suguman Bansal (University of Pennsylvania) · Osbert Bastani (University of Pennsylvania) · Rajeev Alur (University of Pennsylvania)
[54]. Machine versus Human Attention in Deep Reinforcement Learning Tasks
Sihang Guo (University of Texas at Austin) · Ruohan Zhang (Stanford University) · Bo Liu (Stanford University) · Yifeng Zhu (The University of Texas at Austin) · Dana Ballard (University of Texas, Austin) · Peter Stone (The University of Texas at Austin, Sony AI)
[55]. Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations
Yuping Luo (Princeton University) · Tengyu Ma (Stanford University)
[56]. Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks
Jianhong Wang (Imperial College London) · Wangkun Xu (Imperial College London) · Yunjie Gu (University of Bath) · Wenbin Song (Shanghaitech University) · Tim C Green (Imperial College London)
[57]. RoMA: Robust Model Adaptation for Offline Model-based Optimization
Sihyun Yu (Korea Advanced Institute of Science and Technology) · Sungsoo Ahn (MBZUAI) · Le Song (Georgia Institute of Technology) · Jinwoo Shin (KAIST)
[58]. Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies
Ron Dorfman (Technion - Isreal Institute of Technology) · Idan Shenfeld (Technion) · Aviv Tamar (UC Berkeley)
[59]. Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
Gal Dalal (NVIDIA) · Assaf Hallak (The Technion) · Steven Dalton (Nvidia) · iuri frosio (nvidia) · Shie Mannor (Technion) · Gal Chechik (NVIDIA, BIU)
[60]. Continuous Doubly Constrained Batch Reinforcement Learning
Rasool Fakoor (University of Texas At Arlington) · Jonas Mueller (Amazon Web Services) · Kavosh Asadi (Brown University) · Pratik Chaudhari (University of Pennsylvania) · Alexander J Smola (NICTA)
[61]. A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems
Yi Ma (Tianjin University) · (None) · Jianye Hao (Tianjin University) · Jiawen Lu (Tsinghua University, Tsinghua University) · Mingxuan Yuan (Huawei Noah's Ark Lab) · Jie Tang (Tsinghua University) · Zhaopeng Meng (School of Computer Software, Tianjin University)
[62]. Online and Offline Reinforcement Learning by Planning with a Learned Model
Julian Schrittwieser (DeepMind) · Thomas Hubert (Google) · Amol Mandhane (DeepMind) · Mohammadamin Barekatain (DeepMind) · Ioannis Antonoglou (DeepMind) · David Silver (DeepMind)
[63]. Navigating to the Best Policy in Markov Decision Processes
Aymen Al Marjani (ENS Lyon) · Aurélien Garivier (ENS Lyon) · Alexandre Proutiere (KTH)
[64]. Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning
Kibeom Kim (Seoul National University) · Min Whoo Lee (Seoul National University) · Yoonsung Kim (Seoul National University Biointelligence Lab) · JeHwan Ryu (Seoul National University) · Minsu Lee (Seoul National University) · Byoung-Tak Zhang (Seoul National University & Surromind Robotics)
[65]. Collaborative Uncertainty in Multi-Agent Trajectory Forecasting
Bohan Tang (University of Oxford) · Yiqi Zhong (University of Southern California) · Ulrich Neumann (USC) · Gang Wang (Beijing Institute of Technology) · Siheng Chen (MERL) · Ya Zhang (Cooperative Medianet Innovation Center, Shang hai Jiao Tong University)
[66]. Robust Deep Reinforcement Learning through Adversarial Loss
Tuomas Oikarinen (MIT) · Wang Zhang (MIT) · Alexandre Megretski (Massachusetts Institute of Technology) · Luca Daniel (MIT) · Tsui-Wei Weng (MIT)
[67]. Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees
Gregory Dexter (Purdue University) · Kevin Bello (Purdue University) · Jean Honorio (Purdue University)
[68]. Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
Mohamad Amin Sharifi Kolarijani (Delft University of Technology) · Gyula F. Max (Delft University of Technology) · Peyman Mohajerin Esfahani (TU Delft)
[69]. Safe Reinforcement Learning with Natural Language Constraints
Tsung-Yen Yang (Princeton University) · Michael Y Hu (Princeton University) · Yinlam Chow (Google Research) · Peter J Ramadge (Princeton) · Karthik Narasimhan (Princeton University)
[70]. A Minimalist Approach to Offline Reinforcement Learning
Scott Fujimoto (McGill University) · Shixiang (Shane) Gu (Google Brain, University of Cambridge)
[71]. Pretraining Representations for Data-Efficient Reinforcement Learning
Max Schwarzer (Mila, Université de Montréal) · Nitarshan Rajkumar (Mila, Université de Montréal) · Michael Noukhovitch (Mila (Université de Montréal)) · Ankesh Anand (Mila, University of Montreal) · Laurent Charlin (MILA / U.Montreal) · R Devon Hjelm (Microsoft Research) · Philip Bachman (Microsoft Research) · Aaron Courville (U. Montreal)
[72]. Conservative Offline Distributional Reinforcement Learning
Yecheng Ma (University of Pennsylvania) · Dinesh Jayaraman (UC Berkeley) · Osbert Bastani (University of Pennsylvania)
[73]. Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Siyuan Zhang (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)
[74]. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
Tianpei Yang (Tianjin University, University of Alberta) · Weixun Wang (Tianjin University) · Hongyao Tang (Tianjin University) · Jianye Hao (Tianjin University) · Zhaopeng Meng (School of Computer Software, Tianjin University) · Hangyu Mao (Peking University) · Dong Li (Huawei Noah’s Ark Lab) · Wulong Liu (Huawei Noah's Ark Lab) · Yingfeng Chen (None) · Yujing Hu (NetEase Fuxi AI Lab) · Changjie Fan (NetEase Fuxi AI Lab) · Chengwei Zhang (Dalian maritime university)
[75]. Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
Stefan O'Toole (The University of Melbourne) · Nir Lipovetzky (The University of Melbourne) · Miquel Ramirez (The University of Melbourne) · Adrian Pearce (The University of Melbourne)
[76]. Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning
Guanlin Liu (University of California, Davis) · Lifeng LAI (University of California, Davis)
[77]. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
Dibya Ghosh (UC Berkeley) · Jad Rahme (Princeton University) · Aviral Kumar (UC Berkeley) · Amy Zhang (FAIR, McGill) · Ryan Adams (Princeton University) · Sergey Levine (University of Washington)
[78]. CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum
Shuang Ao (University of Technology Sydney) · Tianyi Zhou (University of Washington, Seattle) · Guodong Long (University of Technology Sydney (UTS)) · Qinghua Lu (Data61, CSIRO) · Liming Zhu (CSIRO) · Jing Jiang (University of Technology Sydney)
[79]. Tactical Optimism and Pessimism for Deep Reinforcement Learning
Ted Moskovitz (Gatsby Unit, UCL) · Jack Parker-Holder (University of Oxford) · Aldo Pacchiano (Microsoft Research) · Michael Arbel (UCL) · Michael Jordan (UC Berkeley)
[80]. Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
Lulu Zheng (Tsinghua University, Tsinghua University) · Jiarui Chen (Nanjing University) · Jianhao Wang (Tsinghua University) · Jiamin He (University of Alberta) · Yujing Hu (NetEase Fuxi AI Lab) · Yingfeng Chen (NetEase Fuxi AI Lab) · Changjie Fan (NetEase Fuxi AI Lab) · Yang Gao (Nanjing University) · Chongjie Zhang (Tsinghua University)
[81]. Explicable Reward Design for Reinforcement Learning Agents
Rati Devidze (MPI-SWS) · Goran Radanovic (Harvard) · Parameswaran Kamalaruban (EPFL) · Adish Singla (MPI-SWS)
[82]. RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem
Eric Liang (University of California Berkeley) · Zhanghao Wu (University of California Berkeley) · Michael Luo (University of California Berkeley) · Joseph Gonzalez (UC Berkeley) · Ion Stoica (University of California-Berkeley)
[83]. A Max-Min Entropy Framework for Reinforcement Learning
Seungyul Han (KAIST) · Youngchul Sung (Korea Advanced Institute of Science and Technology)
[84]. Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates
Romain Laroche (Microsoft Research) · Remi Tachet des Combes (MSR Montreal)
[85]. Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
Yingjie Fei (Cornell University) · Zhuoran Yang (Princeton) · Yudong Chen (Cornell University) · Zhaoran Wang (Princeton University)
[86]. Generalized Proximal Policy Optimization with Sample Reuse
James Queeney (Boston University) · Ioannis Paschalidis (Boston University) · Christos G Cassandras (Boston University)
[87]. Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
harsh satija (McGill University) · Philip S. Thomas (CMU) · Joelle Pineau (McGill University) · Romain Laroche (Microsoft Research)
[88]. On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Niladri Chatterji (UC Berkeley) · Aldo Pacchiano (Microsoft Research) · Peter Bartlett (UC Berkeley) · Michael Jordan (UC Berkeley)
[89]. Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization
Clement Gehring (Massachusetts Institute of Technology) · Kenji Kawaguchi (MIT) · Leslie Kaelbling (MIT)
[90]. Model-Based Domain Generalization
Alexander Robey (University of Pennsylvania) · George J. Pappas (University of Pennsylvania) · Hamed Hassani (ETH Zurich)
[91]. Fair Algorithms for Multi-Agent Multi-Armed Bandits
Safwan Hossain (University of Toronto) · Evi Micha (University of Toronto) · Nisarg Shah (University of Toronto)
[92]. Offline RL Without Off-Policy Evaluation
David Brandfonbrener (New York University) · William Whitney (NYU) · Rajesh Ranganath (New York University) · Joan Bruna (NYU)
[93]. Offline Reinforcement Learning with Reverse Model-based Imagination
Jianhao Wang (Tsinghua University) · Wenzhe Li (Tsinghua University) · Haozhe Jiang (IIIS, Tsinghua University) · Guangxiang Zhu (Tsinghua university) · Siyuan Li (Tsinghua University) · Chongjie Zhang (Tsinghua University)
[94]. Online Robust Reinforcement Learning with Model Uncertainty
Yue Wang (State University of New York, Buffalo) · Shaofeng Zou (University at Buffalo, the State University of New York)
[95]. Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning
Junsu Kim (KAIST) · Younggyo Seo (KAIST) · Jinwoo Shin (KAIST)
[96]. PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
Anish Agarwal (MIT) · Abdullah Alomar (Massachusetts Institute of Technology) · Varkey Alumootil (Massachusetts Institute of Technology) · Devavrat Shah (Massachusetts Institute of Technology) · Dennis Shen (MIT) · Zhi Xu (MIT) · Cindy Yang (Massachusetts Institute of Technology)
[97]. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Paria Rashidinejad (University of California, Berkeley) · Banghua Zhu (University of California Berkeley) · Cong Ma (University of California Berkeley) · Jiantao Jiao (University of California, Berkeley) · Stuart Russell (UC Berkeley)
[98]. Nearly Horizon-Free Offline Reinforcement Learning
Tongzheng Ren (UT Austin) · Jialian Li (Tsinghua University) · Bo Dai (Google Brain) · Simon Du (University of Washington) · Sujay Sanghavi (UT-Austin)
[99]. Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning
Guiliang Liu (University of Waterloo) · Xiangyu Sun (Simon Fraser University) · Oliver Schulte (Simon Fraser University) · Pascal Poupart (University of Waterloo & Vector Institute)
[100]. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
Lili Chen (UC Berkeley) · Kimin Lee (UC Berkeley) · Aravind Srinivas (UC Berkeley) · Pieter Abbeel (UC Berkeley & Covariant)
[101]. Off-Policy Risk Assessment in Contextual Bandits
Audrey Huang (Carnegie Mellon University) · Leqi Liu (Carnegie Mellon University) · Zachary Lipton (Carnegie Mellon University) · Kamyar Azizzadenesheli (Purdue University)
[102]. Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation
Lin Guan (Arizona State University) · Mudit Verma (Arizona State University) · Sihang Guo (University of Texas at Austin) · Ruohan Zhang (Stanford University) · Subbarao Kambhampati (Arizona State University)
[103]. Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee
Xiaofeng Fan (National University of Singapore) · Yining Ma (National University of Singapore) · Zhongxiang Dai (National University of Singapore) · Wei Jing (Alibaba Group) · Cheston Tan (Institute for Infocomm Research, Singapore) · Bryan Kian Hsiang Low (National University of Singapore)
[104]. Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
Pushi Zhang (Tsinghua University) · Xiaoyu Chen (Tsinghua University, Tsinghua University) · Li Zhao (Microsoft Research) · Wei Xiong (Hong Kong University of Science and Technology) · Tao Qin (Microsoft Research) · Tie-Yan Liu (Microsoft Research)
[105]. Risk-Aware Transfer in Reinforcement Learning using Successor Features
Michael Gimelfarb (University of Toronto) · Andre Barreto (DeepMind) · Scott Sanner (NICTA--ANU) · Chi-Guhn Lee (University of Toronto)
[106]. Twice regularized MDPs and the equivalence between robustness and regularization
Esther Derman (None) · Matthieu Geist (Université de Lorraine) · Shie Mannor (Technion)
[107]. Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Gaon An (Seoul National University) · Seungyong Moon (Seoul National University) · Jang-Hyun Kim (Seoul National University) · Hyun Oh Song (Seoul National University)
[108]. TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
Minchao Wu (Australian National University) · Michael Norrish (CSIRO) · Christian Walder (DATA61) · Amir Dezfouli (Data61, CSIRO)
[109]. Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Hiroki Furuta (The University of Tokyo) · Tadashi Kozuno (University of Alberta) · Tatsuya Matsushima (The University of Tokyo) · Yutaka Matsuo (University of Tokyo) · Shixiang (Shane) Gu (Google Brain, University of Cambridge)
[110]. On Blame Attribution for Accountable Multi-Agent Sequential Decision Making
Stelios Triantafyllou (Max Planck Institute for Software Systems) · Adish Singla (MPI-SWS) · Goran Radanovic (Harvard)
[111]. Multi-Agent Reinforcement Learning in Stochastic Networked Systems
Yiheng Lin (California Institute of Technology) · Guannan Qu (California Institute of Technology) · Longbo Huang (IIIS, Tsinghua Univeristy) · Adam Wierman (Caltech)
[112]. Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization
Jianhao Wang (Tsinghua University) · Zhizhou Ren (Tsinghua University) · Beining Han (Tsinghua University) · Jianing Ye (Tsinghua University) · Chongjie Zhang (Tsinghua University)
[113]. On the Estimation Bias in Double Q-Learning
Zhizhou Ren (Tsinghua University) · Guangxiang Zhu (Tsinghua university) · Hao Hu (Tsinghua University, Tsinghua University) · Beining Han (Tsinghua University) · Jianglun Chen (Tsinghua University, Tsinghua University) · Chongjie Zhang (Tsinghua University)
[114]. Settling the Variance of Multi-Agent Policy Gradients
Jakub Kuba (Huawei Technologies Ltd.) · Muning Wen (Shanghai Jiao Tong University) · Linghui Meng (Institute of automation, Chinese Academy of Sciences) · shangding gu (Technical University of Munich) · Haifeng Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences) · David Mguni (PROWLER.io) · Jun Wang (University College London) · Yaodong Yang (University College London)
[115]. Reward is enough for convex MDPs
Tom Zahavy (Deepmind) · Brendan O'Donoghue (DeepMind) · Guillaume Desjardins (DeepMind) · Satinder Singh (DeepMind)
[116]. Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods
Seohong Park (Seoul National University) · Jaekyeom Kim (Seoul National University) · Gunhee Kim (Seoul National University / RippleAI)
[117]. There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
Nathan Grinsztajn (Inria) · Johan Ferret (Google Brain / Inria Scool) · Olivier Pietquin (Googlel Brain) · philippe preux (Inria) · Matthieu Geist (Université de Lorraine)
[118]. Mirror Langevin Monte Carlo: the Case Under Isoperimetry
Qijia Jiang (Stanford University)
[119]. Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation
Yuchao Qin (Cambridge University) · Fergus Imrie (University of California, Los Angeles) · Alihan Hüyük (University of Cambridge) · Daniel Jarrett (University of Cambridge) · alexander gimson (Cambridge University Hospitals) · Mihaela van der Schaar (University of Cambridge)
[120]. PettingZoo: Gym for Multi-Agent Reinforcement Learning
Justin K Terry (University of Maryland College Park (SSO)) · Benjamin Black (University of Maryland) · Nathaniel Grammel (University of Maryland, College Park) · Mario Jayakumar (None) · Ananth Hari (None) · Ryan Sullivan (University of Maryland) · Luis S Santos (University of Maryland, College Park) · Clemens Dieffendahl (Technical University Berlin) · Caroline Horsch (University of Maryland) · Rodrigo Perez-Vicente (University of Maryland, College Park) · Niall Williams (None) · Yashas Lokesh (None) · Praveen Ravi (None)
[121]. Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints
Tianhao Wang (Yale University) · Dongruo Zhou (UCLA) · Quanquan Gu (UCLA)
[122]. RL for Latent MDPs: Regret Guarantees and a Lower Bound
Jeongyeol Kwon (University of Texas, Austin) · Yonathan Efroni (Microsoft Research, New York) · Constantine Caramanis (UT Austin) · Shie Mannor (Technion)
[123]. Mastering Atari Games with Limited Data
Weirui Ye (Tsinghua University) · Shaohuai Liu (Tsinghua University) · Thanard Kurutach (University of California Berkeley) · Pieter Abbeel (UC Berkeley & Covariant) · Yang Gao (Tsinghua University)
[124]. Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning
Jingfeng Wu (Johns Hopkins University) · Vladimir braverman (Johns Hopkins University) · Lin Yang (UCLA)
[125]. The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin (University of Southern California) · Longbo Huang (IIIS, Tsinghua Univeristy) · Haipeng Luo (University of Southern California)
[126]. Variational Bayesian Reinforcement Learning with Regret Bounds
Brendan O'Donoghue (DeepMind)
[127]. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning
Christoph Dann (Google Research) · Teodor Vanislavov Marinov (Johns Hopkins University) · Mehryar Mohri (Google Research & Courant Institute of Mathematical Sciences) · Julian Zimmert (University of Copenhagen)
[128]. How Well do Feature Visualizations Support Causal Understanding of CNN Activations?
Roland S. Zimmermann (University of Tübingen, International Max Planck Research School for Intelligent Systems) · Judy Borowski (University of Tuebingen) · Robert Geirhos (University of Tübingen) · Matthias Bethge (University of Tübingen) · Thomas Wallis (TU Darmstadt) · Wieland Brendel (AG Bethge, University of Tübingen)
[129]. Support vector machines and linear regression coincide with very high-dimensional features
Navid Ardeshir (Columbia University) · Clayton Sanford (Columbia University) · Daniel Hsu (Columbia University)
[130]. Structural Credit Assignment in Neural Networks using Reinforcement Learning
Dhawal Gupta (University of Alberta) · Gabor Mihucz (University of Alberta) · Matthew Schlegel (University of Alberta) · James Kostas (University of Massachusetts, Amherst) · Philip S. Thomas (CMU) · Martha White ()
[131]. COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu (Stanford University) · Aviral Kumar (UC Berkeley) · Rafael Rafailov (Stanford University) · Aravind Rajeswaran (University of Washington) · Sergey Levine (University of Washington) · Chelsea Finn (Stanford University)
[132]. Decentralized Q-learning in Zero-sum Markov Games
Muhammed Sayin (Massachusetts Institute of Technology) · Kaiqing Zhang (University of Illinois at Urbana-Champaign (UIUC)) · David Leslie (Lancaster University) · Tamer Basar (University of Illinois at Urbana-Champaign) · Asuman Ozdaglar (Massachusetts Institute of Technology)
[133]. Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette (Stanford University) · Martin J Wainwright (UC Berkeley) · Emma Brunskill (Stanford University)
[134]. Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation
jiafan he (University of California, Los Angeles) · Dongruo Zhou (UCLA) · Quanquan Gu (UCLA)
[135]. Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Hang Wang (Arizona State University) · Sen Lin (Arizona State University) · Junshan Zhang (Arizona State University)
[136]. Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
Nicklas Hansen (UC San Diego) · Hao Su (Stanford) · Xiaolong Wang (UC San Diego)
[137]. Learning to Ground Multi-Agent Communication with Autoencoders
Toru Lin (Massachusetts Institute of Technology) · Jacob MY Huh (UC Berkeley) · Christopher Stauffer (Facebook) · Ser Nam Lim (Facebook AI) · Phillip Isola (Massachusetts Institute of Technology)
[138]. Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning
Gen Li (Tsinghua University) · Laixi Shi (Carnegie Mellon University) · Yuxin Chen (Princeton University) · Yuantao Gu (Tsinghua University) · Yuejie Chi (Carnegie Mellon University)
[139]. Scalable Online Planning via Reinforcement Learning Fine-Tuning
Arnaud Fickinger (UC Berkeley) · Hengyuan Hu (Carnegie Mellon University) · Brandon Amos (Carnegie Mellon University) · Stuart Russell (UC Berkeley) · Noam Brown (Facebook AI Research)
[140]. Local Differential Privacy for Regret Minimization in Reinforcement Learning
Evrard Garcelon (Facebook AI Research) · Vianney Perchet (ENS Paris-Saclay & Criteo Research) · Ciara Pike-Burke (Imperial College London) · Matteo Pirotta (Facebook AI Research)
[141]. MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents
Ming Hay Chung (University of Massachusetts Amherst)
[142]. Reinforcement Learning in Newcomblike Environments
James Bell (The Alan Turing Institute) · Linda Linsefors (Laboratory of Subatomic Physics & Cosmology) · Caspar Oesterheld (Department of Computer Science, Duke University) · Joar Skalse (University of Oxford)
[143]. A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Jean Tarbouriech (Facebook AI Research & Inria) · Matteo Pirotta (Facebook AI Research) · Michal Valko (DeepMind Paris / Inria / ENS Paris-Saclay) · Alessandro Lazaric (INRIA)
[144]. Learning in Non-Cooperative Configurable Markov Decision Processes
Giorgia Ramponi (ETH Zurich) · Alberto Maria Metelli (Politecnico di Milano) · Alessandro Concetti (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
[145]. Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
Minshuo Chen (Georgia Tech) · Yan Li (Georgia Institute of Technology) · Ethan Wang (Georgia Institute of Technology) · Zhuoran Yang (Princeton) · Zhaoran Wang (Princeton University) · Tuo Zhao (Georgia Tech)
[146]. Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
Luca Viano (EPFL) · Yu-Ting Huang (EPFL) · Parameswaran Kamalaruban (EPFL) · Adrian Weller (University of Cambridge ) · Volkan Cevher (EPFL)
[147]. Heuristic-Guided Reinforcement Learning
Ching-An Cheng (Georgia Tech) · Andrey Kolobov (Microsoft Research) · Adith Swaminathan (Microsoft Research)
[148]. Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal (IIT Bombay) · Max Schwarzer (Mila, Université de Montréal) · Pablo Samuel Castro (Google) · Aaron Courville (U. Montreal) · Marc Bellemare (Google Brain)
[149]. Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss
Michael Iuzzolino (University of Colorado Boulder) · Michael Mozer (Google Research / University of Colorado) · Samy Bengio (Apple)
[150]. Safe Policy Optimization with Local Generalized Linear Function Approximations
Akifumi Wachi (IBM Research AI) · Yunyue Wei (Tsinghua University, Tsinghua University) · Yanan Sui (California Institute of Technology)
[151]. Autonomous Reinforcement Learning via Subgoal Curricula
Archit Sharma (Google) · Abhishek Gupta (University of California, Berkeley) · Sergey Levine (University of Washington) · Karol Hausman (Google Brain) · Chelsea Finn (Stanford University)
[152]. Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
Terrell Mundhenk (Lawrence Livermore National Labs) · Mikel Landajuela (Lawrence Livermore National Labs) · Ruben Glatt (Lawrence Livermore National Laboratory) · Claudio P Santiago (Lawrence Livermore National Laboratory) · Daniel faissol (Lawrence Livermore National Labs) · Brenden K Petersen (Lawrence Livermore National Laboratory)
[153]. MobILE: Model-Based Imitation Learning From Observation Alone
Rahul Kidambi (Amazon Search & AI) · Jonathan Chang (Cornell University) · Wen Sun (Cornell University)
[154]. Support Recovery of Sparse Signals from a Mixture of Linear Measurements
Soumyabrata Pal (University of Massachusetts Amherst) · Arya Mazumdar (University of Massachusetts Amherst) · Venkata Gandikota (Syracuse University)
[155]. Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen (UC Berkeley) · Kevin Lu (UC Berkeley) · Aravind Rajeswaran (University of Washington) · Kimin Lee (UC Berkeley) · Aditya Grover (University of California, Los Angeles) · Misha Laskin (UC Berkeley) · Pieter Abbeel (UC Berkeley & Covariant) · Aravind Srinivas (UC Berkeley) · Igor Mordatch (University of Washington)
[156]. Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization
(None) · Wenzhi Gao (Shanghai University of Finance and Economics)
[157]. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
Ming Yin (UC Santa Barbara) · Yu Bai (Salesforce Research) · Yu-Xiang Wang (UC Santa Barbara)
[158]. On sensitivity of meta-learning to support data
Mayank Agarwal (IBM Research AI, MIT-IBM Watson AI Lab) · Mikhail Yurochkin (IBM Research, MIT-IBM Watson AI Lab) · Yuekai Sun (University of Michigan)
[159]. Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu (Stanford University) · Aviral Kumar (UC Berkeley) · Yevgen Chebotar (University of Southern California) · Karol Hausman (Google Brain) · Sergey Levine (University of Washington) · Chelsea Finn (Stanford University)
[160]. On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning
Alireza Fallah (MIT) · Kristian Georgiev (MIT) · Aryan Mokhtari (UT Austin) · Asuman Ozdaglar (Massachusetts Institute of Technology)
[161]. Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
Murtaza Dalal (Carnegie Mellon University) · Deepak Pathak (Carnegie Mellon University) · Russ Salakhutdinov (Carnegie Mellon University)
[162]. Reinforcement Learning with Latent Flow
Wenling Shang (University of Amsterdam) · Xiaofei Wang (UC Berkeley) · Aravind Srinivas (UC Berkeley) · Aravind Rajeswaran (University of Washington) · Yang Gao (Tsinghua University) · Pieter Abbeel (UC Berkeley & Covariant) · Misha Laskin (UC Berkeley)
[163]. Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
Matteo Papini (Politecnico di Milano) · Andrea Tirinzoni (Politecnico di Milano) · Aldo Pacchiano (Microsoft Research) · Marcello Restelli (Politecnico di Milano) · Alessandro Lazaric (INRIA) · Matteo Pirotta (Facebook AI Research)
[164]. Online Knapsack with Frequency Predictions
Sungjin Im (UC Merced) · Ravi Kumar (Google) · Mahshid Montazer Qaem (University of California at Merced) · Manish Purohit (Google)
[165]. When Is Generalizable Reinforcement Learning Tractable?
Dhruv Malik (Carnegie Mellon University) · Yuanzhi Li (CMU) · Pradeep Ravikumar (Carnegie Mellon University)
[166]. Monte Carlo Tree Search With Iteratively Refining State Abstractions
Samuel Sokota (University of Alberta) · Caleb Y Ho (Facebook) · Zaheen Ahmad (University of Alberta) · J. Zico Kolter (Carnegie Mellon University / Bosch Center for A)
[167]. Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning
Ali Taghibakhshi (University of Illinois at Urbana Champaign) · Scott MacLachlan (Memorial University of Newfoundland) · Luke Olson (University of Illinois, Urbana Champaign) · Matthew West (University of Illinois, Urbana Champaign)
[168]. Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin (UC Santa Barbara) · Yu-Xiang Wang (UC Santa Barbara)
[169]. Reinforcement Learning based Disease Progression Model for Alzheimer’s Disease
Krishnakant Saboo (University of Illinois Urbana Champaign) · Anirudh Choudhary (University of Illinois, Urbana-Champaign) · Gregory Worrell (Mayo Clinic, Rochester) · Ravishankar Iyer ()
[170]. Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo (University of Southern California) · Chen-Yu Wei (University of Southern California) · Chung-Wei Lee (University of Southern California)
[171]. Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning
Xin Zhang (Facebook) · Zhuqing Liu (Ohio State University) · Jia Liu (The Ohio State University) · Zhengyuan Zhu (Iowa State University) · Songtao Lu (IBM)
[172]. Bellman-consistent Pessimism for Offline Reinforcement Learning
Tengyang Xie (University of Illinois at Urbana-Champaign) · Ching-An Cheng (Georgia Tech) · Nan Jiang (University of Illinois at Urbana-Champaign) · Paul Mineiro (Microsoft) · Alekh Agarwal (Microsoft Research)
[173]. Reinforcement Learning Enhanced Explainer for Graph Neural Networks
Caihua Shan (Microsoft) · Yifei Shen (HKUST) · Yao Zhang (Fudan University) · Xiang Li (East China Normal University) · Dongsheng Li (IBM Research - China)
[174]. Learning Markov State Abstractions for Deep Reinforcement Learning
Cameron Allen (Brown University) · Neev Parikh (Brown University) · Omer Gottesman (Harvard University) · George Konidaris (Duke University)
[175]. Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning
Kai Wang (Harvard University) · Sanket Shah (Harvard University) · Haipeng Chen (Dartmouth College) · Andrew Perrault (Harvard University) · Finale Doshi-Velez (Harvard) · Milind Tambe (Harvard University/Google Research India)
[176]. Provably efficient multi-task reinforcement learning with model transfer
Chicheng Zhang (University of Arizona) · Zhi Wang (University of California San Diego)
[177]. An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap
Yuanhao Wang (Tsinghua University) · Ruosong Wang (Carnegie Mellon University) · Sham Kakade (University of Washington & Microsoft Research)
[178]. Contrastive Reinforcement Learning of Symbolic Reasoning Domains
Gabriel Poesia (Stanford University) · WenXin Dong (Stanford University) · Noah Goodman (Stanford University)
[179]. Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
Runzhe Wu (Shanghai Jiao Tong University) · Yufeng Zhang (Northwestern University) · Zhuoran Yang (Princeton) · Zhaoran Wang (Princeton University)
[180]. Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization
Zhenghao Peng (The Chinese University of Hong Kong) · quanyi li (cuhk) · Ka Ming Hui (The Chinese University of Hong Kong) · Chunxiao Liu (Sensetime Research) · Bolei Zhou (Massachusetts Institute of Technology)
[181]. Adversarial Intrinsic Motivation for Reinforcement Learning
Ishan Durugkar (University of Texas at Austin) · Mauricio B Tec (University of Texas at Austin) · Scott Niekum (UT Austin) · Peter Stone (The University of Texas at Austin, Sony AI)
[182]. Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
Tengyang Xie (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign) · Huan Wang (Salesforce Research) · Caiming Xiong (State Univerisity of New York at Buffalo) · Yu Bai (Salesforce Research)
[183]. Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
Charles Packer (University of California Berkeley) · Pieter Abbeel (UC Berkeley & Covariant) · Joseph Gonzalez (UC Berkeley)
[184]. Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
Ke Sun (University of Alberta) · Yafei Wang (University of Alberta) · Yi Liu (University of Alaberta) · (None) · Bo Pan (University of Alberta) · Shangling Jui (Huawei) · Bei Jiang (University of Alberta) · Linglong Kong (University of Alberta)
[185]. Parametrized Quantum Policies for Reinforcement Learning
Sofiene Jerbi (University of Innsbruck) · Casper Gyurik (Leiden University) · Simon Marshall (Leiden University) · Hans Briegel (Universität Innsbruck) · Vedran Dunjko (Leiden University)
[186]. Teachable Reinforcement Learning via Advice Distillation
Olivia Watkins (UC Berkeley) · Abhishek Gupta (University of California, Berkeley) · Trevor Darrell (Electrical Engineering & Computer Science Department) · Pieter Abbeel (UC Berkeley & Covariant) · Jacob Andreas (UC Berkeley)
[187]. Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
jiafan he (University of California, Los Angeles) · Dongruo Zhou (UCLA) · Quanquan Gu (UCLA)
[188]. Offline Reinforcement Learning as One Big Sequence Modeling Problem
Michael Janner (UC Berkeley) · Qiyang Li (University of California, Berkeley) · Sergey Levine (University of Washington)
[189]. Agent Modelling under Partial Observability for Deep Reinforcement Learning
Georgios Papoudakis (University of Edinburgh) · Filippos Christianos (University of Edinburgh) · Stefano Albrecht (University of Edinburgh)
[190]. Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning
Hyunsoo Chung (POSTECH) · (None) · Boris Knyazev (University of Guelph / Vector Institute) · Jinhwi Lee (POSTECH) · Graham Taylor (University of Guelph / Vector Institute) · Jaesik Park (POSTECH) · Minsu Cho (POSTECH)
[191]. Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Ming Yin (UC Santa Barbara) · Yu-Xiang Wang (UC Santa Barbara)
[192]. Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems
Jiayu Chen (Tsinghua University, Tsinghua University) · Yuanxin Zhang (Tsinghua University, Tsinghua University) · Yuanfan Xu (Tsinghua University, Tsinghua University) · Huimin Ma (Tsinghua University) · Huazhong Yang (None) · Jiaming Song (Stanford University) · Yu Wang (Tsinghua University) · Yi Wu (OpenAI)
[193]. VAST: Value Function Factorization with Variable Agent Sub-Teams
Thomy Phan (LMU Munich) · Fabian Ritz (LMU Munich) · Lenz Belzner (Technische Hochschule Ingolstadt) · Philipp Altmann (LMU Munich) · Thomas Gabor (Institut für Informatik) · Claudia Linnhoff-Popien (LMU Munich)
[194]. GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction
Longyuan Li (Shanghai Jiao Tong University) · Jian Yao (Fudan University) · Li Wenliang (University College London) · Tong He (Amazon Web Services) · Tianjun Xiao (Amazon) · Junchi Yan (Shanghai Jiao Tong University) · David P Wipf (AWS) · Zheng Zhang (Shanghai New York Univeristy)
[195]. Dynamic population-based meta-learning for multi-agent communication with natural language
Abhinav Gupta (Mila) · Marc Lanctot (DeepMind) · Angeliki Lazaridou (DeepMind)
[196]. Information Directed Reward Learning for Reinforcement Learning
David Lindner (ETH Zurich) · Matteo Turchetta (ETH Zurich) · Sebastian Tschiatschek (Microsoft Research) · Kamil Ciosek (Microsoft Research Cambridge) · Andreas Krause (ETH Zurich)
[197]. Functional Regularization for Reinforcement Learning via Learned Fourier Features
Alexander Li (Carnegie Mellon University) · Deepak Pathak (Carnegie Mellon University)
[198]. Reinforcement Learning in Reward-Mixing MDPs
Jeongyeol Kwon (University of Texas, Austin) · Yonathan Efroni (Microsoft Research, New York) · Constantine Caramanis (UT Austin) · Shie Mannor (Technion)
[199]. Identifiability in inverse reinforcement learning
Haoyang Cao (The Alan Turing Institute) · Samuel Cohen (University of Oxford / Alan Turing Institute) · Lukasz Szpruch (University of Edinburgh)
[200]. Play to Grade: Testing Coding Games as Classifying Markov Decision Process
Allen Nie (Stanford University) · Emma Brunskill (Stanford University) · Chris Piech (Stanford)
[201]. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
Christoph Dann (Google Research) · Mehryar Mohri (Google Research & Courant Institute of Mathematical Sciences) · Tong Zhang (The Hong Kong University of Science and Technology) · Julian Zimmert (University of Copenhagen)
[202]. Exploiting Opponents Under Utility Constraints in Sequential Games
Martino Bernasconi-de-Luca (Politecnico di Milano) · Federico Cacciamani (Politecnico di Milano) · Simone Fioravanti (Gran Sasso Science Institute (GSSI)) · Nicola Gatti (Politecnico di Milano) · Alberto Marchesi (Politecnico di Milano) · Francesco Trovò (Politecnico di Milano)
[203]. Reinforcement learning for optimization of variational quantum circuit architectures
Mateusz Ostaszewski (Institute of Theoretical and Applied Informatics, Polish Academy of Sciences) · Lea M. Trenkwalder (University of Innsbruck) · Wojciech Masarczyk (Warsaw University of Technology) · Eleanor Scerri (Leiden University) · Vedran Dunjko (Leiden University)
[204]. Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Ayush Sekhari (Cornell University) · Christoph Dann (Google Research) · Mehryar Mohri (Google Research & Courant Institute of Mathematical Sciences) · Yishay Mansour (Tel Aviv University) · Karthik Sridharan (Cornell)
[205]. Learning Domain Invariant Representations in Goal-conditioned Block MDPs
Beining Han (Tsinghua University) · Chongyi Zheng (CMU, Carnegie Mellon University) · Harris Chan (University of Toronto, Vector Institute) · Keiran Paster (University of Toronto) · Michael Zhang (University of Toronto / Vector Institute) · Jimmy Ba (University of Toronto / Vector Institute)
[206]. Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality
Stefanos Leonardos (Singapore University of Technology and Design) · Georgios Piliouras (Singapore University of Technology and Design) · Kelly Spendlove (University of Oxford)
[207]. Entropy-based adaptive Hamiltonian Monte Carlo
Marcel Hirt (University College London) · Michalis Titsias (DeepMind) · Petros Dellaportas (University College London and Athens University of Economics)
[208]. Causal Influence Detection for Improving Efficiency in Reinforcement Learning
Maximilian Seitzer (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Bernhard Schölkopf (MPI for Biological Cybernetics) · Georg Martius (IST Austria)
[209]. Learning Distilled Collaboration Graph for Multi-Agent Perception
Yiming Li (New York University) · Shunli Ren (Shanghai Jiao Tong University) · Pengxiang Wu (Rutgers University) · Siheng Chen (MERL) · Chen Feng (Mitsubishi Electric Research Laboratories (MERL)) · Wenjun Zhang (None)
[210]. Near Optimal Policy Optimization via REPS
Aldo Pacchiano (Microsoft Research) · Jonathan Lee (Google) · Peter Bartlett (UC Berkeley) · Ofir Nachum (Google Brain)
[211]. Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
Yunhao Tang (Columbia University) · Tadashi Kozuno (University of Alberta) · Mark Rowland (DeepMind) · Remi Munos (DeepMind) · Michal Valko (DeepMind Paris / Inria / ENS Paris-Saclay)
[212]. Hierarchical Reinforcement Learning with Timed Subgoals
Nico Gürtler (Max Planck Institute for Intelligent Systems, Max-Planck Institute) · Dieter Büchler (Max-Planck Institute for Intelligent Systems) · Georg Martius (IST Austria)
[213]. A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
Gugan Chandrashekhar Thoppe (Indian Institute of Science) · Bhumesh Kumar (University of Wisconsin, Madison)
[214]. Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
Yufeng Zhang (Northwestern University) · Siyu Chen (Tsinghua University) · Zhuoran Yang (Princeton) · Michael Jordan (UC Berkeley) · Zhaoran Wang (Princeton University)
[215]. Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs
Tao Liu (Texas A&M University) · Ruida Zhou (Texas A&M) · Dileep Kalathil (Texas A&M University) · Panganamala Kumar (Texas A&M) · Chao Tian (Texas A&M)
[216]. SOPE: Spectrum of Off-Policy Estimators
Christina Yuan (University of Texas, Austin) · Yash Chandak (University of Massachusetts Amherst) · Stephen Giguere (UMass Amherst) · Philip S. Thomas (CMU) · Scott Niekum (UT Austin)
[217]. Online learning in MDPs with linear function approximation and bandit feedback.
Gergely Neu (Universitat Pompeu Fabra) · Julia Olkhovskaya (Universitat Pompeu Fabra)
[218]. EDGE: Explaining Deep Reinforcement Learning Policies
Wenbo Guo (Pennsylvania State University) · Xian Wu (Pennsylvania State University) · Usmann Khan (Georgia Institute of Technology) · Xinyu Xing (Penn State University)
[219]. Understanding the Effect of Stochasticity in Policy Optimization
Jincheng Mei (University of Alberta / Google Brain) · Bo Dai (Google Brain) · Chenjun Xiao (University of Alberta) · Csaba Szepesvari (DeepMind / University of Alberta) · Dale Schuurmans (Google Brain & University of Alberta)
[220]. Safe Reinforcement Learning by Imagining the Near Future
Garrett Thomas (UC Berkeley) · Yuping Luo (Princeton University) · Tengyu Ma (Stanford University)
[221]. Environment Generation for Zero-Shot Compositional Reinforcement Learning
Izzeddin Gur (Google) · Natasha Jaques (Google Brain, UC Berkeley) · Yingjie Miao (Google) · Jongwook Choi (University of Michigan) · Manoj Tiwari (None) · Honglak Lee (U. Michigan) · Aleksandra Faust (Google Brain)
[222]. Weighted model estimation for offline model-based reinforcement learning
Toru Hishinuma (Proxima Technology Inc.) · Kei Senda (Kyoto University)
[223]. Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes
HyunJi Nam (Stanford University) · Scott Fleming (Stanford University) · Emma Brunskill (Stanford University)
[224]. Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
Bingyan Wang (Princeton University) · Yuling Yan (Princeton University) · Jianqing Fan (Princeton University)
[225]. KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support
Pierre Glaser (University College London) · Michael Arbel (UCL) · Arthur Gretton (Gatsby Unit, UCL)
[226]. Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning
Christopher Hoang (University of Michigan) · Sungryull Sohn (University of Michigan) · Jongwook Choi (University of Michigan) · Wilka Carvalho (University of Michigan) · Honglak Lee (U. Michigan)
[227]. Control Variates for Slate Off-Policy Evaluation
Nikos Vlassis (Netflix) · Ashok Chandrashekar (Warner Media) · Fernando Amat (Netflix) · Nathan Kallus (Cornell University)
欢迎大家评论区(或者单独开贴)讨论、交流!