***NEW*** With permission of speakers, talks and tutorials will be recorded and made available online after the conference. 

Sunday, June 7 (Lister Centre)

11:00-5:00 Registration desk open

12:00-13:00 light lunch

Track 1 – Chair: Susan Murphy (Location: Maple Leaf room)

  • 13:00-16:00 Michael Littman – Basics of computational reinforcement learning
  • 16:00-16:30 tea
  • 16:30-18:00 David Silver – Deep reinforcement learning

Track 2 – Chair: Clay Holroyd (Location: Wild Rose room)

  • 13:00-16:00 Nathaniel Daw – Natural RLDM: Optimal and suboptimal control in brain and behavior
  • 16:00-16:30 tea
  • 16:30-18:00 Ifat Levy – A neuroeconomics approach to pathological behavior

Monday, June 8 (CCIS 1-440)

8:00-8:30 Breakfast

Session 1 (8:30-10:30) – Chair: Yael Niv 

  • 8:30-9:10 Alison Gopnik – Childhood Is Evolution’s Way of Performing Simulated Annealing: A life history perspective on explore-exploit tensions
  • 9:10-9:30 Kevin Miller*, Matthew Botvinick, Carlos Brody – The Role of Orbitofrontal Cortex in Cognitive Planning in the Rat (M7)
  • 9:30-9:50 James MacGlashan*, Michael Littman, Stefanie Tellex – Escaping Groundhog Day (M55)
  • 9:50-10:30 Emma Brunskill – Quickly Learning to Make Good Decisions

10:30-11:00 coffee

Session 2 (11:00-13:00) – Chair: Susan Murphy 

  • 11:00-11:40 Ben Van Roy – Generalization and Exploration via Value Function Randomization
  • 11:40-12:00 Daniel Mankowitz*, Timothy Mann, Shie Mannor – Bootstrapping Skills (M4)
  • 12:00-12:20 Meropi Topalidou*, Daisuke Kase, Thomas Boraud, Nicolas Rougier – The formation of habits: a computational model mixing reinforcement and Hebbian learning (M23)
  • 12:20-13:00 Alexandre Pouget – What limits performance in decision making?

13:00-14:30 lunch

Session 3 (14:30-16:30) – Chair: Doina Precup

  • 14:30-15:10 Michael Woodford – Efficient Coding and Choice Behavior
  • 15:10-15:30 Kimberly Stachenfeld*, Matthew Botvinick, Samuel Gershman – Reinforcement learning objectives constrain the cognitive map (M43)
  • 15:30-15:50 Xue Bin Peng*, Michiel van de Panne – Learning Dynamic Locomotion Skills for Terrains with Obstacles (M15)
  • 15:50-16:30 David Parkes Mechanism design as a toolbox for alignment of reward

Spotlights Session I (16:30-16:40) – Chair: Peter Dayan 

  • Eiji Uchibe*, Kenji Doya – Inverse Reinforcement Learning with Density Ratio Estimation (M14)
  • Harm Van Seijen*, Richard Sutton – A Deeper Look at Planning as Learning from Replay (M27)
  • Alborz Geramifard*, Christoph Dann, Robert Klein, William Dabney, Jonathan How – RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research (M29)
  • Mark Ho*, Michael Littman, Fiery Cushman, Joseph Austerweil – Teaching Behavior with Punishments and Rewards (M25)
  • Sebastian Musslick*, Amitai Shenhav, Matthew Botvinick, Jonathan Cohen – A computational model of control allocation based on the Expected Value of Control (M59)

16:40-19:40 Posters I, wine & tea (posters available till midnight)

19:40 Banquet

Tuesday, June 9

8:00-8:30 Breakfast

Session 4 (8:30-10:30) – Chair: Ifat Levy 

  • 8:30-9:10 Claire Tomlin Reachability and Learning for Hybrid Systems
  • 9:10-9:30 Craig Sherstan, Joseph Modayil, Patrick Pilarski* – Direct Predictive Collaborative Control of a Prosthetic Arm (T24)
  • 9:30-9:50 Falk Lieder*, Thomas Griffiths, Ming Hsu – Utility-weighted sampling in decisions from experience (T28)
  • 9:50-10:10 Marieke Jepma*, Tor Wager – Self-reinforcing expectancy effects on pain: Behavioral and brain mechanisms (T33)
  • 10:10-10:30 Emma Brunskill, Lihong Li* – The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning (T3)

10:30-11:00 coffee

Session 5 (11:00-13:00) – Chair: Satinder Singh 

  • 11:00-11:20 Jalal Arabneydi*, Aditya Mahajan – Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing (T47)
  • 11:20-11:40 Eunjeong Lee*, Olga Dal Monte, Bruno Averbeck – Dopamine type 2 receptors control inverse temperature beta for transition from perceptual inference to reinforcement learning (T42)
    11:40-12:00 Dan Lizotte*, Eric Laber – Multi-Objective Markov Decision Processes for Decision Support (T22)
  • 12:00-12:20 Yuki Sakai*, Saori Tanaka, Yoshinari Abe, Seiji Nishida, Takashi Nakamae, Kei Yamada, Kenji Doya, Kenji Fukui, Jin Narumoto – Reinforcement learning based on impulsively biased time scale and its neural substrate in OCD (T23)
  • 12:20-13:00 Ilana Witten – Dissecting reward circuits

13:00-14:30 lunch

Session 6 (14:30-16:30) – Chair: David Silver

  • 14:30-15:10 Sridhar Mahadevan – Proximal Reinforcement Learning:  Learning to Act in Primal Dual Spaces
  • 15:10-15:30 Aaron Gruber*, Ali Mashhoori, Rajat Thapa – Choice reflexes in the rodent habit system (T40)
  • 15:30-15:50 Tim Brys*, Anna Harutyunyan, Matthew Taylore, Ann Nowé – Ensembles of Shapings (T8)
  • 15:50-16:30 Andrea Thomaz – Robots Learning from Human Teachers

Spotlights Session II (16:30-16:40) – Chair: Peter Dayan 

  • Robert Wilson*, Jonathan Cohen – Humans trade off information seeking and randomness in explore-exploit decisions (T48)
  • Ashique Rupam Mahmood*, Richard Sutton – Off-policy learning with linear function approximation based on weighted importance sampling (T18)
  • Akina Umemoto*, Clay Holroyd – Task-specific Effects of Reward on Task Switching (T14)
  • Halit Suay*, Sonia Chernova, Tim Brys, Matthew Taylor – Reward Shaping by Demonstration (T4)
  • Stefania Sarno*, Victor de Lafuente, Ranulfo Romo, Néstor Parga – Reinforcement learning modeling of decision-making tasks with temporal uncertainty (T49)

16:40-19:40 Posters II, wine & tea (posters available till midnight)

Wednesday, June 10

8:00-8:30 Breakfast

Session 7 (8:30-10:30) – Chair: Emma Brunskill 

  • 8:30-9:10 Peter Stone – Practical RL: Representation, Interaction, Synthesis, and Mortality (PRISM)
  • 9:10-9:50 Eric Laber – Online, semi-parametric estimation of optimal treatment allocations for the control of emerging epidemics
  • 9:50-10:30 Geoff Schoenbaum – The good, the bad and the ugly – neural coding of  upshifted, downshifted, and blocked outcomes in the rat (lateral) orbitofrontal cortex.

10:30-11:00 coffee

Session 8 (11:00-13:00) – Chair: Nathaniel Daw 

  • 11:00-11:40 Marcia Spetch – Gambling pigeons: Primary rewards are not all that matter
  • 11:40-12:20 Tim Behrens – Encoding, and updating world models in prefrontal cortex and hippocampus
  • 12:20-13:00 Charles Isbell – Reinforcement Learning as Software Engineering