Building Taskable Reinforcement Learning Agents (Professor Sheila McIlraith, PhD)

Get Your Free 14 Day Trial Of HighLevel Here!!

Synthetic Intelligence Forum is excited to convene a presentation about Taskable Reinforcement Learning Agents with eminent computer scientist, Professor Sheila McIlraith, PhD, from the University of Toronto.

Title: Reward Machines: structuring reward function specifications and reducing sample complexity in reinforcement learning

Abstract: A standard assumption in reinforcement learning (RL) is that the agent does not have access to a faithful model of the world. As such, to learn optimal behaviour, an RL agent must interact with the environment and learn from its experience. While it seems reasonable to assume that the transition probabilities relating to the agent’s actions are unknown,there is less reason to hide the reward function from the agent.

Artificial agents cannot inherently perceive reward from the environment; someone must program those reward functions (even if the agent is interacting with the real world). Two challenges that face RL are reward specification and sample complexity. Specification of a reward function — a mapping from state to numeric value — can be challenging, particularly when reward-worthy behaviour is complex and temporally extended. Further, when reward is sparse, it can require millions of exploratory episodes for an RL agent to converge to a reasonable quality policy.

In this talk, Professor McIlraith presents the notion of a Reward Machine, an automata-based structure that provides a normal form representation for reward functions. Reward Machines can be used natively to specify complex, possibly non-Markovian reward-worthy behavior. Alternatively, because of their automata-based structure, a variety of compelling human-friendly formal languages can be used as reward specification languages and straightforwardly translated into Reward Machines, including variants of Linear Temporal Logic (LTL), and a variety of regular languages.

Furthermore, Reward Machines expose reward function structure in a normal form. The Q-Learning for Reward Machines (QRM) algorithm exploits Reward Machine structure in its learning, while preserving optimality guarantees. Experiments show that QRM significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise can’t reasonably be solved and critically reducing the sample complexity.

Biography: Sheila McIlraith is a Professor in the Department of Computer Science, University of Toronto, Canada CIFAR AI Chair and Faculty Member at the Vector Institute for Artificial Intelligence, and Research Lead at the Schwartz Reisman Institute for Technology and Society.

Her work focuses on AI sequential decision making broadly construed, through the lens of human-compatible AI.

McIlraith is a fellow of the ACM, a fellow of the Association for the Advancement of Artificial Intelligence (AAAI) and a past President of KR Inc., the international scientific foundation concerned with fostering research and communication on knowledge representation and reasoning.

Her research has also made practical contributions to the development of next-generation NASA space systems and to emerging Web standards.

Profiles of the host and presenter:
• Vik Pant, PhD – t
• Sheila McIlraith, PhD – 3

Web resources pertaining to Professor Sheila McIlraith, PhD:
• Knowledge Representation area in the University of Toronto – /
• Cognitive Robotics area in the University of Toronto – /
• Website – /

Join Synthetic Intelligence Forum online:
• Website – i
• LinkedIn (Page) – /
• LinkedIn (Group) – /
• YouTube – m

Special Thanks to our Partner:
• ET Business Services

You May Also Like