We present an initial automated test to evaluate the LLMs’
capacity to perform inductive reasoning tasks. We use the GPT-3.5/4
models to create a system which generates Python code as hypotheses
for inductive reasoning to transform sequences of the One Dimensional
Abstract Reasoning Corpus (1D-ARC) challenge. We experiment with
3 ...
We present an initial automated test to evaluate the LLMs’
capacity to perform inductive reasoning tasks. We use the GPT-3.5/4
models to create a system which generates Python code as hypotheses
for inductive reasoning to transform sequences of the One Dimensional
Abstract Reasoning Corpus (1D-ARC) challenge. We experiment with
3 prompting techniques, namely standard prompting, Chain of Thought
(CoT) and direct feedback. We provide results and an analysis of cost
to success rate and benefit-cost ratio. Our best result is an overall 25%
success rate with our CoT prompting on GPT-4, significantly surpass-
ing the standard prompting approach. We discuss potential avenues to
improve our experiments and test other strategies.