SEEK: Salford Environment for Expertise and Knowledge

Published Conference Proceedings - Paper
September 2011

Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments

Clausner, C & Pletschacher, S & Antonacopoulos, A 2011, Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments, in: 'Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011)', 1st edition, IEEE Computer Society Conference Publishing Services (CPS), Los Alamitos, California, USA, pp.48-52. Conference details: 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011.

Abstract

Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in  the field of Document  Image  Analysis and OCR. For ground truth production of large corpora, however, there is still a gap in terms of productivity. Ground truth is not only crucial for training and evaluation at the development stage of tools but also for quality assurance in the scope of production workflows for digital libraries.
This paper describes Aletheia, an advanced system for accurate and yet cost-effective ground truthing of large amounts of documents. It aids the user with a number of automated and semi-automated tools which were partly  developed and improved based on feedback  from major libraries across Europe and from their digitisation service providers which are using the tool in a production environment. Novel features are, among others, the support of  top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of  lower-level  elements to more complex structures. Special features have been developed to support working with the complexities of historical documents. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient  production of highly accurate ground truth.

Publication Details

Conference Proceedings
Clausner, C & Antonacopoulos, A & Pletschacher, S eds. 2011, Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), 1st edition, IEEE Computer Society Conference Publishing Services (CPS), Los Alamitos, California, USA, pp.48-52.

Conference Details
11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011