Table Detection in Invoice Documents by Graph Neural Networks
Riba, P; Dutta, A; Goldmann, L; et al.Fornés, A; Ramos, O; Lladós, J
Date: 3 February 2020
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Publisher DOI
Abstract
Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical
or quantitative relationships among pieces of information.
In digital mail room applications, where a large amount of
administrative documents must be processed with reasonable
accuracy, the detection and interpretation of ...
Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical
or quantitative relationships among pieces of information.
In digital mail room applications, where a large amount of
administrative documents must be processed with reasonable
accuracy, the detection and interpretation of tables is crucial.
Table recognition has gained interest in document image
analysis, in particular in unconstrained formats (absence of
rule lines, unknown information of rows and columns). In
this work, we propose a graph-based approach for detecting
tables in document images. Instead of using the raw content
(recognized text), we make use of the location, context and
content type, thus it is purely a structure perception approach,
not dependent on the language and the quality of the text
reading. Our framework makes use of Graph Neural Networks
(GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model
has been experimentally validated in two invoice datasets and
achieved encouraging results. Additionally, due to the scarcity
of benchmark datasets for this task, we have contributed to
the community a novel dataset derived from the RVL-CDIP
invoice data. It will be publicly released to facilitate future
research.
Computer Science
Faculty of Environment, Science and Economy
Item views 0
Full item downloads 0