***NEW*** With permission of speakers, talks and tutorials will be recorded and made available online after the conference.
Sunday, June 7 (Lister Centre)
11:00-5:00 Registration desk open
12:00-13:00 light lunch
Track 1 – Chair: Susan Murphy (Location: Maple Leaf room)
- 13:00-16:00 Michael Littman – Basics of computational reinforcement learning
- 16:00-16:30 tea
- 16:30-18:00 David Silver – Deep reinforcement learning
Track 2 – Chair: Clay Holroyd (Location: Wild Rose room)
- 13:00-16:00 Nathaniel Daw – Natural RLDM: Optimal and suboptimal control in brain and behavior
- 16:00-16:30 tea
- 16:30-18:00 Ifat Levy – A neuroeconomics approach to pathological behavior
Monday, June 8 (CCIS 1-440)
8:00-8:30 Breakfast
Session 1 (8:30-10:30) – Chair: Yael Niv
- 8:30-9:10 Alison Gopnik – Childhood Is Evolution’s Way of Performing Simulated Annealing: A life history perspective on explore-exploit tensions
- 9:10-9:30 Kevin Miller*, Matthew Botvinick, Carlos Brody – The Role of Orbitofrontal Cortex in Cognitive Planning in the Rat (M7)
- 9:30-9:50 James MacGlashan*, Michael Littman, Stefanie Tellex – Escaping Groundhog Day (M55)
- 9:50-10:30 Emma Brunskill – Quickly Learning to Make Good Decisions
10:30-11:00 coffee
Session 2 (11:00-13:00) – Chair: Susan Murphy
- 11:00-11:40 Ben Van Roy – Generalization and Exploration via Value Function Randomization
- 11:40-12:00 Daniel Mankowitz*, Timothy Mann, Shie Mannor – Bootstrapping Skills (M4)
- 12:00-12:20 Meropi Topalidou*, Daisuke Kase, Thomas Boraud, Nicolas Rougier – The formation of habits: a computational model mixing reinforcement and Hebbian learning (M23)
- 12:20-13:00 Alexandre Pouget – What limits performance in decision making?
13:00-14:30 lunch
Session 3 (14:30-16:30) – Chair: Doina Precup
- 14:30-15:10 Michael Woodford – Efficient Coding and Choice Behavior
- 15:10-15:30 Kimberly Stachenfeld*, Matthew Botvinick, Samuel Gershman – Reinforcement learning objectives constrain the cognitive map (M43)
- 15:30-15:50 Xue Bin Peng*, Michiel van de Panne – Learning Dynamic Locomotion Skills for Terrains with Obstacles (M15)
- 15:50-16:30 David Parkes Mechanism design as a toolbox for alignment of reward
Spotlights Session I (16:30-16:40) – Chair: Peter Dayan
- Eiji Uchibe*, Kenji Doya – Inverse Reinforcement Learning with Density Ratio Estimation (M14)
- Harm Van Seijen*, Richard Sutton – A Deeper Look at Planning as Learning from Replay (M27)
- Alborz Geramifard*, Christoph Dann, Robert Klein, William Dabney, Jonathan How – RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research (M29)
- Mark Ho*, Michael Littman, Fiery Cushman, Joseph Austerweil – Teaching Behavior with Punishments and Rewards (M25)
- Sebastian Musslick*, Amitai Shenhav, Matthew Botvinick, Jonathan Cohen – A computational model of control allocation based on the Expected Value of Control (M59)
16:40-19:40 Posters I, wine & tea (posters available till midnight)
19:40 Banquet
Tuesday, June 9
8:00-8:30 Breakfast
Session 4 (8:30-10:30) – Chair: Ifat Levy
- 8:30-9:10 Claire Tomlin Reachability and Learning for Hybrid Systems
- 9:10-9:30 Craig Sherstan, Joseph Modayil, Patrick Pilarski* – Direct Predictive Collaborative Control of a Prosthetic Arm (T24)
- 9:30-9:50 Falk Lieder*, Thomas Griffiths, Ming Hsu – Utility-weighted sampling in decisions from experience (T28)
- 9:50-10:10 Marieke Jepma*, Tor Wager – Self-reinforcing expectancy effects on pain: Behavioral and brain mechanisms (T33)
- 10:10-10:30 Emma Brunskill, Lihong Li* – The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning (T3)
10:30-11:00 coffee
Session 5 (11:00-13:00) – Chair: Satinder Singh
- 11:00-11:20 Jalal Arabneydi*, Aditya Mahajan – Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing (T47)
- 11:20-11:40 Eunjeong Lee*, Olga Dal Monte, Bruno Averbeck – Dopamine type 2 receptors control inverse temperature beta for transition from perceptual inference to reinforcement learning (T42)
11:40-12:00 Dan Lizotte*, Eric Laber – Multi-Objective Markov Decision Processes for Decision Support (T22) - 12:00-12:20 Yuki Sakai*, Saori Tanaka, Yoshinari Abe, Seiji Nishida, Takashi Nakamae, Kei Yamada, Kenji Doya, Kenji Fukui, Jin Narumoto – Reinforcement learning based on impulsively biased time scale and its neural substrate in OCD (T23)
- 12:20-13:00 Ilana Witten – Dissecting reward circuits
13:00-14:30 lunch
Session 6 (14:30-16:30) – Chair: David Silver
- 14:30-15:10 Sridhar Mahadevan – Proximal Reinforcement Learning: Learning to Act in Primal Dual Spaces
- 15:10-15:30 Aaron Gruber*, Ali Mashhoori, Rajat Thapa – Choice reflexes in the rodent habit system (T40)
- 15:30-15:50 Tim Brys*, Anna Harutyunyan, Matthew Taylore, Ann Nowé – Ensembles of Shapings (T8)
- 15:50-16:30 Andrea Thomaz – Robots Learning from Human Teachers
Spotlights Session II (16:30-16:40) – Chair: Peter Dayan
- Robert Wilson*, Jonathan Cohen – Humans trade off information seeking and randomness in explore-exploit decisions (T48)
- Ashique Rupam Mahmood*, Richard Sutton – Off-policy learning with linear function approximation based on weighted importance sampling (T18)
- Akina Umemoto*, Clay Holroyd – Task-specific Effects of Reward on Task Switching (T14)
- Halit Suay*, Sonia Chernova, Tim Brys, Matthew Taylor – Reward Shaping by Demonstration (T4)
- Stefania Sarno*, Victor de Lafuente, Ranulfo Romo, Néstor Parga – Reinforcement learning modeling of decision-making tasks with temporal uncertainty (T49)
16:40-19:40 Posters II, wine & tea (posters available till midnight)
Wednesday, June 10
8:00-8:30 Breakfast
Session 7 (8:30-10:30) – Chair: Emma Brunskill
- 8:30-9:10 Peter Stone – Practical RL: Representation, Interaction, Synthesis, and Mortality (PRISM)
- 9:10-9:50 Eric Laber – Online, semi-parametric estimation of optimal treatment allocations for the control of emerging epidemics
- 9:50-10:30 Geoff Schoenbaum – The good, the bad and the ugly – neural coding of upshifted, downshifted, and blocked outcomes in the rat (lateral) orbitofrontal cortex.
10:30-11:00 coffee
Session 8 (11:00-13:00) – Chair: Nathaniel Daw
- 11:00-11:40 Marcia Spetch – Gambling pigeons: Primary rewards are not all that matter
- 11:40-12:20 Tim Behrens – Encoding, and updating world models in prefrontal cortex and hippocampus
- 12:20-13:00 Charles Isbell – Reinforcement Learning as Software Engineering