SEEK: Salford Environment for Expertise and Knowledge

Published Conference Proceedings - Paper
September 2011

Restoration of Arbitrarily Warped Historical Document Images Using Flow Lines

Rahnemoonfar, M & Antonacopoulos, A 2011, Restoration of Arbitrarily Warped Historical Document Images Using Flow Lines, in: 'Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011)', IEEE-CS, Los Alamitos, CA, USA, pp.905-909. Conference details: 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011.

Abstract

Historical documents frequently suffer from arbitrary geometric distortions (warping and folds) due to storage conditions, use and to, some extent, the printing process of the time. In addition, page curl can be prominent due to the scanning technique used. Such distortions adversely affect OCR and print-on-demand quality. Previous approaches to geometric restoration either focus only on the correction of page curl or require supplementary information obtained by additional scanning hardware –– not practical for existing scans. This paper presents a new approach to detect and restore arbitrary warping and folds, in addition to page curl. Warped text lines and the smooth deformation between them are precisely modelled as primary and secondary flow lines that are then restored to their original linear shape. Preliminary, but representative, experimental results, in comparison to a leading page curl removal method and an industry-standard commercial system, demonstrate the effectiveness of the proposed method.

Notes

The first paper to describe a method to identify and correct arbitrary warping in scanned historical documents. Unlike previous approaches, this widely applicable method is not limited by the need to use a model for specific geometrical distortions e.g. page curl. This is vital for large-scale digitization. Another important point for large-scale applications with mixed document conditions is that the proposed method does not introduce distortions in clean images, unlike most existing methods. Finally, a practical measure is introduced to reliably evaluate the performance of this and other systems (previously only indirect measures e.g. OCR performance, have been used).

Authors

SEEK Members

External Authors

Maryam Rahnemoonfar

Publication Details

Conference Proceedings
Antonacopoulos, A & Rahnemoonfar, eds. 2011, Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), IEEE-CS, Los Alamitos, CA, USA, pp.905-909.

Conference Details
11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011