Talk Abstracts

Cate Hartley + Michael Littman: What you need when you need it is all you need

There are many proposals for the key ingredient from which intelligence springs. In this talk, we explore what is known about learning and development, specifically around the role of the physical body, the significance of active experimentation, and the importance of teaching. The talk will take the form of a conversation between a pair of researchers, one each from the artificial and natural intelligence camps.

Andreas Krause: Uncertainty-guided Exploration in Model-based Deep Reinforcement Learning

How can we enable agents to efficiently and safely learn online, from interaction with the real world? I will first discuss safe Bayesian optimization, where we quantify uncertainty in the unknown reward function and constraints, and, under some regularity conditions, can guarantee both safety and convergence to a natural notion of reachable optimum. I will then generalize these ideas to Bayesian model-based deep reinforcement learning, where we use the epistemic uncertainty in the dynamics model to guide exploration while ensuring safety. Lastly I will discuss how we can meta-learn flexible data-driven priors from related tasks and simulations, and discuss several applications.

Malcolm A. MacIver: The Geological Basis of Intelligence

The ground beneath our feet may seem distant from the capacity to plan, but this talk proposes that this apparent distance is illusory. When fish transitioned to land 350 million years ago, we have shown that eye sizes tripled, and visually accessible space grew by six orders of magnitude. Computational models suggest that in water, rapid threats demand decisions with minimal sensorimotor latencies, limiting the benefits of advanced planning. On land, vast sightlines mixed with the short sightlines of hiding places enable extended planning, spurring imagination. We review our computational work indicating minimal selective benefit to planning in water, but high benefit on land. We show preliminary results of a new rodent behavior paradigm designed to test these hypotheses via adjustable spatial complexity in predator-prey like contests against an agile autonomous robot.

Claire Vernade: Partially Observable Reinforcement Learning with Memory Traces

Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this talk, we introduce memory traces, a compact representation of the history of observations in the form of exponential moving averages. We show that they have good theoretical properties compared to windows, and that they can be easily combined with general RL algorithms to efficiently learn in partially observable environments. 

Romy Froemer: Aligning inputs with goals: attention as control in value-based decision-making

People are more likely to choose options they look at more. Past work has viewed this common observation as evidence that gaze increases value and thereby increases the probability that an option will be chosen. Recent work takes a different perspective and suggests that instead, people look more at options they consider more strongly choosing. Here, I will show a set of studies showing that consistent with the latter account 1) when attention is experimentally manipulated, the impact of attention on choice depends on options’ values, 2) compared to experimentally manipulated attention, free viewing is characterized by attention prioritization and as a consequence greater choice efficiency, and 3) how attention is allocated and relates to choice depends on one’s choice goal. Taken together these studies show that attention serves as a means to efficiently sampling information in line with one’s goals, and that established, dominant models of decision-making are missing important cognitive, computational mechanisms. Identifying and understanding these mechanisms will provide new levers for understanding and improving aberrant decision-making.

Sam Devlin: Towards Human-AI Collaboration: Lessons Learnt From Imitating Human Gameplay

We find ourselves in an exciting time for AI products but remain a long way from realizing the full potential of autonomous AI agents. As we continue to see rapid growth in the number of human-AI interactions, it is important we research and develop methods that enable effective collaboration. Over the past 10+ years I have studied human gameplay as an environment where people regularly form teams and demonstrate fast, online adaptation to achieve common goals. This talk will focus on learning by imitation and reinforcement to enable AI agents to recreate the human skill of ad-hoc teamwork.

Amanda Prorok: Synthesizing Diverse Policies for Multi-Robot Coordination

How can we effectively orchestrate large teams of robots and translate high-level goals into the nuanced local policies that guide individual robot behavior? Machine learning has revolutionized the way in which we address these challenges, enabling the automatic synthesis of agent policies directly from task objectives. In this presentation, I will first describe how we use data-driven approaches to learn the interaction strategies that foster coordination and cooperation within robot teams. I will then discuss methods for learning heterogeneous policies, where robots adopt different roles, and explain how this approach overcomes limitations inherent in traditional homogeneous models that force all robots to behave identically. Underpinning this work is a measure of ‘System Neural Diversity,’ a tool that allows us to quantify the degree of behavioral heterogeneity within multi-agent systems. I will demonstrate how this metric enables precise control over diversity in multi-robot tasks, leading to significant improvements in performance and efficiency, and unlocking the potential for novel and often surprising collective behaviors.

Tim Rocktaeschel: Open-Endedness and World Models

The pursuit of Artificial Superintelligence (ASI) requires a shift from narrow objective optimization towards embracing Open-Endedness—a research paradigm, pioneered in AI by Stanley, Lehman and Clune, that is focused on systems that generate endless sequences of novel but learnable artifacts. In this talk, I will present our work on large-scale foundation world models that can generate a wide variety of diverse environments, and in turn be used to train more general and robust agents. 

Angela Radulescu: Attention and affect in human RLDM: insights from computational psychiatry

Attention has been established as a critical mechanism by which humans decide, learn and generalize in complex environments. However, many questions remain about the nature of attentional biases: how do they emerge? How flexible are they across different environments? In this talk, I will present work examining the role of affect in shaping attentional biases. Studying attention and learning in the context of computational psychiatry, I will show that affect is a key signal for attention allocation in human RLDM.

Sanne de Wit: Investigating Habit Making and Breaking in Real-World Settings

Lab research into habits with experimental paradigms has shed light on the processes underlying goal-directed and habitual control in animals and humans. But to what extent do these experimental models capture the role of habits in everyday behaviours? To bridge this gap between the lab and the real world, I will discuss the challenge of capturing habit formation in real-world settings, and its effects on behavioural efficiency, persistence, and rigidity. I will present real-world habit research that includes objective measurement of the target behaviour (e.g., medication adherence), habit tracking with self-report habit measures, modeling of acquisition functions, and measuring real-life action slips towards no-longer-desirable outcomes (e.g., in the context of social media use). I will argue that experimental lab research and real-world investigations compliment each other, and can be combined to gain a deeper understanding of the role of habits in our behaviour.

Xianyuan Zhan: Towards Real-World Deployable Data-Driven Reinforcement Learning

In recent years, reinforcement learning (RL) has achieved notable success in the digital world tasks. However, applying RL to solve physical world tasks still faces significant challenges, including but not limited to a lack of reliable simulation environments, limited available data, and stringent safety and operational requirements. The emerging data-driven decision-making methods—represented by offline RL—have demonstrated unique potential, though they also encounter a number of practical difficulties. In this talk, we will delve into real-world issues encountered during the deployment of offline RL, such as generalization challenges, compliance with safety constraints, and imperfect reward signals, etc. We will also discuss how we use offline RL to solve the real-world industrial control problems, as well as the lessons learned through our development process.

Valentin Wyart: Alternatives to exploration? Moving up and down the ladder of causation in humans

Making adaptive decisions under uncertainty reflects a difficult yet ubiquitous challenge for human – and machine – intelligence. Balancing decisions between those aimed at seeking information about the uncertain state of the environment (exploration) and those aimed at maximizing reward (exploitation) is key for artificial agents to behave adaptively. The large variability of human decisions under uncertainty has therefore been theorized and understood in terms of explicit policies aimed at solving this ‘explore-exploit trade-off’. In this talk, I will argue that the way we think about exploration, and the tasks that we use to study exploration in the lab, have biased our understanding of human decision-making under uncertainty. Across several studies, ranging from sensory- to reward-guided decisions, I will show that the variability of human decisions is not mainly driven by exploration policies, but rather by the scheme and precision with which humans learn the state of uncertain environments. By simulating the same effects in artificial agents trained and tested in the same conditions, I will defend the idea that the adaptability of human decisions under uncertainty does not arise from the way humans choose as previously thought, but from the way humans learn about their environment.

Doina Precup: On making artificial RL agents closer to natural ones

Reinforcement learning has produced tremendous successes in practical applications, from complex control problems like fusion to training large language models, and has helped us improve our understanding of the brain. However, computational reinforcement learning agents tend to be much less data and compute efficient than their natural counterparts, and they have trouble adapting quickly to new experiences. I believe this is a grand challenge that our field should tackle, and that the synergy of ideas from the study of reinforcement learning in computer science and neuroscience will be key. In this talk, I will outline ingredients that we already have and others that we need to develop to achieve this grand challenge.

Nicolas Tritsch: Defining timescales of neuromodulation by dopamine

Dopamine is critical for motor control and reinforcement learning, but the precise molecular mechanisms and timescales of neuromodulation are less clear. In this presentation, I will describe our attempts at better understanding how dopamine contributes to behavior through the study of release patterns and optogenetic manipulations. I will first describe published work highlighting the dynamics of dopamine release in the striatum of mice and their relationship to concurrent fluctuations in another important modulator implicated in learning and action: acetylcholine. I will then share unpublished work taking a critical look at the widely-held view that phasic fluctuations in extracellular dopamine control the vigor of ongoing movements. Our findings help restrain the kinds of mechanisms and timescales that dopamine likely acts on to modify behavior.

Wei Ji Ma: Human planning and memory in combinatorial games

Planning is an integral part of the human experience, ranging from the mundane, such as meal preparation, to the profound, such as planning for the survival of our species. Artificial intelligence has made considerably more progress in mastering complex planning tasks than cognitive science has in understanding how humans perform such tasks. I will argue that combinatorial games can be useful in advancing this understanding. I will describe a heuristic search algorithm that accounts for human choices, response times, and eye movements in Four-in-a-Row, a variant of Tic-Tac-Toe. Then, in the context of Four-in-Row and Chess, I will discuss how memories of previous experiences and cultural knowledge reduce the need to plan and yield better choices. Finally, I will describe frameworks that can incorporate memory into models of planning.