Meta-Learning Through Learnt Self-Addition
Thesis or dissertation
University of Exeter
This thesis presents a meta-learning architecture designed to form an agent able to operate in a classic reinforcement learning environment. Drawing on several existing meta-learning techniques, this agent learns its environment by subdividing it between multiple predictive components, each with their own machine learning capabilities. These components are competitive and co-operative, competing for `worth', which is used periodically to remove under-performing components and direct the creation of new ones. Components compete to make as many accurate predictions as possible, balancing number of predictions made against the accuracy they can achieve. Co-operation comes from trading information, either prediction values or memory data, to other components, in return for a portion of any worth the other component receives. Components may vary in internal architecture. They can store different information, can use different machine learning algorithms, can share different information or can subdivide the environment in different ways. They are all measured by the same worth metric, so the agent's component set will consist of a highly heterogeneous pool, determined by which components work best in which roles. This creates a complex set of multiple machine learners, arranged into several trees of information-suppliers and information-user, rather than depending on a single machine learner to handle the entire problem. Meta-learning comes from the agent's ability to learn how to create these structures. It learns to predict which component types will work best in which circumstances, as well as learning which parameters to provide these with when it creates them. This thesis evaluates the basic properties of such an agent under different conditions. Tests cover its ability to correctly evaluate the worth of its components, its ability to learn to usefully select which types of new components to generate, and its ability to learn from one task to improve its performance on the next. Experiments on a variety of learning problems confirm the architecture is able to exhibit the required component creation, deletion and balancing with no internal restructuring between tasks. Several possible future expansions are also explored, using the component-worth metric as a metric for the value of information. Testing suggests this allows the agent to be able to learn an information-seeking drive, which directs it to move to a source of information in order to solve a later task, rather then depending exclusively on the information it receives passively.
MbyRes in Computer Science