Program

The workshop recording is now available on SlidesLive.

Schedule

The workshop is preceded by the conference’s first invited talk.

9:45–9:50	Opening remarks
9:50–10:20	Invited talk #1: Pieter Abbeel, “Model-based reinforcement learning via meta-model-free reinforcement learning”
10:20–10:30	Contributed talk #1: Kate Rakelly & Aurick Zhou, “Efficient off-policy meta-RL via probabilistic context variables”
10:30–11:00	Poster session #1 (during the morning coffee break)

11:00–11:30	Invited talk #2: Matt Botvinick, “Meta-reinforcement learning: Quo vadis?”
11:30–12:00	Invited talk #3: Katja Hofmann, “Directions and challenges in multi-task reinforcement learning”
12:00–12:30	Invited talk #4: Tejas Kulkarni, “Self-supervised object-centric representations for reinforcement learning”
12:30–1:00	Invited talk #5: Tim Lillicrap, “Learning models for representations and planning”

The workshop will recommence after lunch and the conference’s second invited talk.

3:20–3:50	Invited talk #6: Karthik Narasimhan, “Task-agnostic priors for reinforcement learning”
3:50–4:00	Contributed talk #2: Ben Eysenbach, Lisa Lee & Jacob Tyo, “Priors for exploration and robustness”
4:00–4:30	Poster session #2 (during the afternoon coffee break)

4:30–5:00	Invited talk #7: Doina Precup, “Inductive biases for temporal abstraction”
5:00–5:30	Invited talk #8: Jane Wang, “Learning and development of structured, causal priors”
5:30–6:30	Panel discussion: Matt Botvinick, Tejas Kulkarni, Sergey Levine, Tim Lillicrap, Karthik Narasimhan, Doina Precup, Jane Wang

The workshop is followed by the conference’s opening reception and the newcomers’ reception.

Program

Invited Speakers

Pieter Abbeel (UC Berkeley), “Model-based RL via meta-model-free RL”

Pieter Abbeel

Model-free reinforcement learning (RL) has seen great asymptotic successes, but sample complexity tends to be high. Model-based RL carries the promise of better sample efficiency, and indeed has shown more data-efficient learning, but tends to fall well short of model-free RL in terms of asymptotic performance. In this presentation, I will describe a new approach to model-based RL that brings in ideas from domain randomization and meta-model-free RL, resulting in the best of both worlds: fast learning and great asymptotic performance. Our method is evaluated on several MuJoCo environments (PR2 Reacher, Swimmer, Hopper, Ant, Swimmer, Walker) and is able to learn lego-block placement on a real robot in 10 minutes.

Matt Botvinick (DeepMind), “Meta-reinforcement learning: Quo vadis?”

Matt Botvinick

Katja Hofmann (Microsoft Research), “Directions and challenges in multi-task RL”

Katja Hofmann Multi-task reinforcement learning (RL) aims to develop approaches that learn to perform well across a range of related tasks, instead of specializing to a single task. This has high potential for real-world applications, where sharing data across tasks can dramatically improve data efficiency to make RL approaches economically viable. In this talk I present two novel approaches that leverage learned task embeddings to exploit multi-task structure for sample-efficient learning. I conclude with key open challenges and a novel benchmark and NeurIPS competition, designed to drive further research in this important research area.

Tejas Kulkarni (DeepMind), “Self-supervised object-centric representations for RL”

Tejas Kulkarni

Deep and reinforcement learning systems (RL) implicitly learn knowledge about objects, relations, agents and other abstractions given pre-specified task spaces. On the other hand, infants explicitly abstract away experiences into these representations. These building blocks later become the key towards solving other sensorimotor problems with better combinatorial generalization and sample efficiency. However, it has been hard to explicitly learn the concept of objects from pixels that are useful for control across a wide variety of commonly used RL environments. In this talk, I will present techniques to learn controllable and spatio-temporally consistent object-level representations using self-supervision. These representations enable RL agents to have longer temporally-extended exploration, better generalization, and lower sample complexity.

Tim Lillicrap (DeepMind), “Learning models for representations and planning”

Tim Lillicrap

Karthik Narasimhan (Princeton), “Task-agnostic priors for reinforcement learning”

Karthik Narasimhan

Despite the success of deep reinforcement learning (RL) in various problems, current techniques suffer from poor generalization and sample complexity. In this talk, I will discuss methods to alleviate these challenges using two key sources of inductive biases employed by humans while learning new tasks: Basic comprehension of classical mechanics and an understanding of natural language. I will talk about how representations for such priors can be acquired in a task-agnostic fashion to make them useful in multiple different environments. Our empirical results show that incorporating these “universal” priors into environment models for RL agents can enable more efficient learning of generalizable policies.

Doina Precup (McGill / MILA / DeepMind), “Inductive biases for temporal abstraction”

Doina Precup

In this talk, I will discuss some existing work related to option construction in reinforcement learning, which uses different objectives for this process. The rationale for this diversity is often that options can be used for many different purposes (exploration, transfer learning, quick planning). I will discuss the challenge of empirically evaluating option-learning algorithms in the context of life-long learning, in order to assess if the prior inductive bias proposed is actually helpful.

Jane Wang (DeepMind), “Learning and development of structured, causal priors”

Jane Wang In an interactive world, in which agents have an ongoing ability to modify the environment, having structured, causal priors allows for better planning, reasoning, and obtaining goals. For humans, many of these priors manifest spontaneously throughout the normal trajectory of development, but many are also learned through experience. Strikingly, often these behaviors are not apparently normative or Bayes optimal. In this talk, I’ll make the argument that such deviations make sense if considered from a meta-learning perspective. Further, I’ll show evidence that causal reasoning and causal priors can be learned through reinforcement learning and training on a structured distribution of tasks. That is, within a meta-reinforcement learning setting, an RL agent can learn to perform causal reasoning on all 3 levels of Judea Pearl’s ladder of causation.

Invited Panelists

In addition to invited speakers Matt Botvinick, Tejas Kulkarni, Tim Lillicrap, Karthik Narasimhan, Doina Precup and Jane Wang, we are happy to have the following invited panellists:

Sergey Levine

Sergey Levine

Panel question submission is here.

Contributed Talks

Kate Rakelly & Aurick Zhou (UC Berkeley), “Efficient off-policy meta-reinforcement learning via probabilistic context variables”

Kate Rakelly Aurick Zhou Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (RL) algorithms enable agents to learn new skills from small amounts of experience, several major challenges preclude their practicality. Current methods rely heavily on on-policy experience, limiting their sample efficiency. They also lack mechanisms to reason about task uncertainty when adapting to new tasks, limiting their effectiveness in sparse reward problems. In this paper, we address these challenges by developing an off-policy meta-RL algorithm that disentangles task inference and control. In our approach, we perform online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. This probabilistic interpretation enables posterior sampling for structured and efficient exploration. We demonstrate how to integrate these task variables with off-policy RL algorithms to achieve both meta-training and adaptation efficiency. Our method outperforms prior algorithms in sample efficiency by 20-100X as well as in asymptotic performance on several meta-RL benchmarks.

Ben Eysenbach, Lisa Lee & Jacob Tyo (CMU), “Priors for exploration and robustness”

Ben Eysenbach Lisa Lee Jacob Tyo

Contributed Posters

Poster Session #1 (Morning)

(39) Efficient off-policy meta-reinforcement learning via probabilistic context variables Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine

(1) Feudal multi-agent hierarchies for cooperative reinforcement learning Sanjeevan Ahilan, Peter Dayan

(6) Few-shot imitation learning with disjunctions of conjunctions of programs Tom Silver, Kelsey Allen, Leslie Kaelbling, Joshua Tenenbaum

(7) Graph-DQN: Fast generalization to novel objects using prior relational knowledge Varun Kumar, Hanlin Tang, Arjun K Bansal [appendix]

(15) Learning powerful policies by using consistent dynamics model Shagun Sodhani, Anirudh Goyal, Tristan Deleu, Yoshua Bengio, Sergey Levine, Jian Tang

(11) Learning to generalize from sparse and underspecified rewards Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi [appendix]

(40) Meta-reinforcement learning with autonomous task inference Sungryull Sohn, Hyunjae Woo, Honglak Lee

(44) Meta-learning surrogate models for sequential decision making Jonathan Schwarz, Alexandre Galashov, Yee Whye Teh, Marta Garnelo, David Saxton, S. M. Ali Eslami, Pushmeet Kohli, Hyunjik Kim

(35) Mimicry constraint policy optimization Xiaojian Ma, Mingxuan Jing, Fuchun Sun, Huaping Liu [appendix]

(29) Perception-prediction-reaction agents for deep reinforcement learning Adam Stooke, Max Jaderberg, Valentin Dalibard, Siddhant Jayakumar, Wojciech M. Czarnecki

(19) Rapid trial-and-error learning in physical problem solving Kelsey Allen, Kevin Smith, Joshua Tenenbaum

(21) Recurrent learning reinforcement learning Pierre Thodoroff, Nishanth V. Anand, Lucas Caccia, Doina Precup, Joelle Pineau

(27) Search on the replay buffer: Bridging motion planning and reinforcement learning Ben Eysenbach, Sergey Levine, Ruslan Salakhutdinov

(32) Skill discovery with well-defined objectives Yuu Jinnai, David Abel, Jee Won Park, David Hershkowitz, Michael L. Littman, George Konidaris [appendix]

(23) Structured mechanical models for efficient reinforcement learning Kunal R Menda, Jayesh K Gupta, Zachary Manchester, Mykel Kochenderfer

(30) Variational task embeddings for fast adaptation in deep reinforcement learning Luisa M. Zintgraf, Kyriacos Shiarli, Maximilian Igl, Anuj Mahajan, Katja Hofmann, Shimon Whiteson

Poster Session #2 (Afternoon)

(16) Bayesian policy selction using active inference Ozan Catal

(10) Control what you can: Intrinsically motivated reinforcement learner with task planning structure Sebastian Blaes, Marin Vlastelica, Jia-Jie Zhu, Georg Martius

(18) Decoupling feature extraction from policy learning: Assessing benefits of state representation learning in goal based robotics Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Dı́az Rodrı́guez, David Filliat

(34) Exploiting hierarchy for learning and transfer in KL-regularized RL Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas Heess

(12) Language as an abstraction for hierarchical deep reinforcement learning YiDing Jiang, Chelsea Finn, Shixiang Gu, Kevin Murphy

(22) Learning effect-dependent embeddings for temporal abstraction William Whitney, Abhinav Gupta

(43) Perception-aware point-based value iteration for partially observable markov decision processes Mahsa Ghasemi, Ufuk Topcu

(38) Planning with latent simulated trajectories Alexandre Piché, Valentin Thomas, Cyril Ibrahim, Julien Cornebise, Chris Pal

(41) Proprioceptive spatial representations for generalized locomotion Joshua Zhanson, Emilio Parisotto, Ruslan Salakhutdinov

(9) Provably efficient RL with rich observations via latent state decoding Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudik, John Langford

(26) Reinforcement learning with unknown reward functions Ben Eysenbach, Jacob Tyo, Shixiang Gu, Ruslan Salakhutdinov, Zachary Lipton, Sergey Levine

(25) State marginal matching with mixtures of policies Ben Eysenbach, Sergey Levine, Lisa Lee, Emilio Parisotto, Ruslan Salakhutdinov

(17) Symmetry-based disentangled representation learning requires interaction with environments Hugo Caselles-Dupré, David Filliat, Michael Garcia Ortiz

(14) Task-agnostic dynamics priors for deep reinforcement learning Yilun Du, Karthik Narasimhan

(37) Unsupervised subgoal discovery method for learning hierarchical representations Jacob Rafati, David C. Noelle

(20) Value preserving state-action abstractions David Abel, Nate Umbanhowar, Khimya Khetarpal, Dilip Arumugam, Doina Precup, Michael L. Littman [appendix]