Talk Abstracts

Malcolm A. MacIver: The Geological Basis of Intelligence

The ground beneath our feet may seem distant from the capacity to plan, but this talk proposes that this apparent distance is illusory. When fish transitioned to land 350 million years ago, we have shown that eye sizes tripled, and visually accessible space grew by six orders of magnitude. Computational models suggest that in water, rapid threats demand decisions with minimal sensorimotor latencies, limiting the benefits of advanced planning. On land, vast sightlines mixed with the short sightlines of hiding places enable extended planning, spurring imagination. We review our computational work indicating minimal selective benefit to planning in water, but high benefit on land. We show preliminary results of a new rodent behavior paradigm designed to test these hypotheses via adjustable spatial complexity in predator-prey like contests against an agile autonomous robot.


Sanne de Wit: Investigating Habit Making and Breaking in Real-World Settings

Lab research into habits with experimental paradigms has shed light on the processes underlying goal-directed and habitual control in animals and humans. But to what extent do these experimental models capture the role of habits in everyday behaviours? To bridge this gap between the lab and the real world, I will discuss the challenge of capturing habit formation in real-world settings, and its effects on behavioural efficiency, persistence, and rigidity. I will present real-world habit research that includes objective measurement of the target behaviour (e.g., medication adherence), habit tracking with self-report habit measures, modeling of acquisition functions, and measuring real-life action slips towards no-longer-desirable outcomes (e.g., in the context of social media use). I will argue that experimental lab research and real-world investigations compliment each other, and can be combined to gain a deeper understanding of the role of habits in our behaviour.


Valentin Wyart: Alternatives to exploration? Moving up and down the ladder of causation in humans

Making adaptive decisions under uncertainty reflects a difficult yet ubiquitous challenge for human – and machine – intelligence. Balancing decisions between those aimed at seeking information about the uncertain state of the environment (exploration) and those aimed at maximizing reward (exploitation) is key for artificial agents to behave adaptively. The large variability of human decisions under uncertainty has therefore been theorized and understood in terms of explicit policies aimed at solving this ‘explore-exploit trade-off’. In this talk, I will argue that the way we think about exploration, and the tasks that we use to study exploration in the lab, have biased our understanding of human decision-making under uncertainty. Across several studies, ranging from sensory- to reward-guided decisions, I will show that the variability of human decisions is not mainly driven by exploration policies, but rather by the scheme and precision with which humans learn the state of uncertain environments. By simulating the same effects in artificial agents trained and tested in the same conditions, I will defend the idea that the adaptability of human decisions under uncertainty does not arise from the way humans choose as previously thought, but from the way humans learn about their environment.


Romy Froemer: Aligning inputs with goals: attention as control in value-based decision-making

People are more likely to choose options they look at more. Past work has viewed this common observation as evidence that gaze increases value and thereby increases the probability that an option will be chosen. Recent work takes a different perspective and suggests that instead, people look more at options they consider more strongly choosing. Here, I will show a set of studies showing that consistent with the latter account 1) when attention is experimentally manipulated, the impact of attention on choice depends on options’ values, 2) compared to experimentally manipulated attention, free viewing is characterized by attention prioritization and as a consequence greater choice efficiency, and 3) how attention is allocated and relates to choice depends on one’s choice goal. Taken together these studies show that attention serves as a means to efficiently sampling information in line with one’s goals, and that established, dominant models of decision-making are missing important cognitive, computational mechanisms. Identifying and understanding these mechanisms will provide new levers for understanding and improving aberrant decision-making.


Nicolas Tritsch: Defining timescales of neuromodulation by dopamine

Dopamine is critical for motor control and reinforcement learning, but the precise molecular mechanisms and timescales of neuromodulation are less clear. In this presentation, I will describe our attempts at better understanding how dopamine contributes to behavior through the study of release patterns and optogenetic manipulations. I will first describe published work highlighting the dynamics of dopamine release in the striatum of mice and their relationship to concurrent fluctuations in another important modulator implicated in learning and action: acetylcholine. I will then share unpublished work taking a critical look at the widely-held view that phasic fluctuations in extracellular dopamine control the vigor of ongoing movements. Our findings help restrain the kinds of mechanisms and timescales that dopamine likely acts on to modify behavior.


Angela Radulescu: Attention and affect in human RLDM: insights from computational psychiatry

Attention has been established as a critical mechanism by which humans decide, learn and generalize in complex environments. However, many questions remain about the nature of attentional biases: how do they emerge? How flexible are they across different environments? In this talk, I will present work examining the role of affect in shaping attentional biases. Studying attention and learning in the context of computational psychiatry, I will show that affect is a key signal for attention allocation in human RLDM.


Wei Ji Ma: Human planning and memory in combinatorial games

Abstract to follow.


Cate Hartley + Michael Littman

Title and abstract to follow


Doina Precup

Title and abstract to follow. 


Amanda Prorok: Synthesizing Diverse Policies for Multi-Robot Coordination

How can we effectively orchestrate large teams of robots and translate high-level goals into the nuanced local policies that guide individual robot behavior? Machine learning has revolutionized the way in which we address these challenges, enabling the automatic synthesis of agent policies directly from task objectives. In this presentation, I will first describe how we use data-driven approaches to learn the interaction strategies that foster coordination and cooperation within robot teams. I will then discuss methods for learning heterogeneous policies, where robots adopt different roles, and explain how this approach overcomes limitations inherent in traditional homogeneous models that force all robots to behave identically. Underpinning this work is a measure of ‘System Neural Diversity,’ a tool that allows us to quantify the degree of behavioral heterogeneity within multi-agent systems. I will demonstrate how this metric enables precise control over diversity in multi-robot tasks, leading to significant improvements in performance and efficiency, and unlocking the potential for novel and often surprising collective behaviors.


Andreas Krause: Uncertainty-guided Exploration in Model-based Deep Reinforcement Learning

Abstract to follow


Karl Tuyls: Multi-agent AI/RL

Abstract to follow 


Tim Rocktaeschel: Open-Endedness and World Models

The pursuit of Artificial Superintelligence (ASI) requires a shift from narrow objective optimization towards embracing Open-Endedness—a research paradigm, pioneered in AI by Stanley, Lehman and Clune, that is focused on systems that generate endless sequences of novel but learnable artifacts. In this talk, I will present our work on large-scale foundation world models that can generate a wide variety of diverse environments, and in turn be used to train more general and robust agents. 


Weinan Zhang: Large Language Models Based Multi-Agent Intelligence – The Progress So Far

In the era of large language models (LLMs), most operational processes can be reformulated and reproduced using LLM agents. The LLM agents can perceive, control, and get feedback from the environment so as to accomplish the given tasks in an autonomous manner. Besides the environment-interaction property, the LLM agents can call various external tools to ease the task completion process. The tools can be regarded as a predefined operational process with private or real-time knowledge that does not exist in the parameters of LLMs. As a natural trend of development, the tools for calling are becoming autonomous agents, thus the full intelligent system turns out to be an LLM-based Multi-Agent System (LaMAS). Compared to the previous single-LLM-agent system, LaMAS has the advantages of i) dynamic task decomposition and organic specialization, ii) higher flexibility for system changing, iii) proprietary data preserving for each participating entity, and iv) feasibility of monetization for each entity. This paper discusses the technical and business landscapes of LaMAS. To support the ecosystem of LaMAS, we briefly describe a preliminary version of such LaMAS protocol considering technical requirements, algorithms, data privacy, and business incentives. As such, LaMAS would be a practical solution to achieve artificial collective intelligence in the near future.


Claire Vernade: Partially Observable Reinforcement Learning with Memory Traces

Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this talk, we introduce memory traces, a compact representation of the history of observations in the form of exponential moving averages. We show that they have good theoretical properties compared to windows, and that they can be easily combined with general RL algorithms to efficiently learn in partially observable environments.