Learning Hierarchical Compositional Task Definitions through Online Situated Interactive Language Instruction
Add to Google Calendar
Artificial agents, from robots to personal assistants, have become competent workers in many settings and embodiments, but for the most part, they are limited to performing the capabilities and tasks with which they were initially programmed. Learning in these settings has predominately focused on learning to improve the agent’s performance on a task, and not on learning the actual definition of a task. The primary method for imbuing an agent with the task definition has been through programming by humans, who have detailed knowledge of the task, domain, and agent architecture. In contrast, humans quickly learn new tasks from scratch, often from instruction by another human. If we desire AI agents to be flexible and dynamically extendable, they will need to emulate these learning capabilities, and not be stuck with the limitation that task definitions must be acquired through programming.
This dissertation explores the problem of how an Interactive Task Learning agent can learn the complete definition or formulation of novel tasks rapidly through online natural language instruction from a human instructor. Recent advances in natural language processing, memory systems, computer vision, spatial reasoning, robotics, and cognitive architectures makes the time ripe to study how knowledge can be automatically acquired, represented, transferred, and operationalized. We present a learning approach embodied in an ITL agent that interactively learns the meaning of task concepts, the goals, actions, failure conditions, and task-specific terms, for 60 games and puzzles. In our approach, the agent learns hierarchical symbolic representations of task knowledge that enable it to transfer and compose knowledge, analyze and debug multiple interpretations, and communicate with the teacher to resolve ambiguity. Our results show that the agent can correctly generalize, disambiguate, and transfer concepts across variations of language descriptions and world representations, even with distractors present.