SEEK: Salford Environment for Expertise and Knowledge

Published Conference Proceedings - Paper
July 2009

A Realistic Dataset for Performance Evaluation of Document Layout Analysis

Antonacopoulos, A & Bridson, D & Papadopoulos, C & Pletschacher, S 2009, A Realistic Dataset for Performance Evaluation of Document Layout Analysis, in: 'Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009)', IEEE Computer Society, Los Alamitos, USA, pp.296-300. Conference details: ICDAR2009, Barcelona, Spain, July 2009.

Abstract

There is a significant need for a realistic dataset on which to evaluate layout analysis methods and examine their performance in detail. This paper presents a new dataset (and the methodology used to create it) based on a wide range of contemporary documents. Strong emphasis is placed on comprehensive and detailed representation of both complex and simple layouts, and on colour originals. In-depth information is recorded both at the page and region level. Ground truth is efficiently created using a new semi-automated tool and stored in a new comprehensive XML representation, the PAGE format. The dataset can be browsed and searched via a web-based front end to the underlying database and suitable subsets (relevant to specific evaluation goals) can be selected and downloaded.

Authors

SEEK Members

External Authors

Stefan Pletschacher

David Bridson

Christos Papadopoulos

Publication Details

Conference Proceedings
Antonacopoulos, A & Bridson, & Papadopoulos, & Pletschacher, eds. 2009, Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), IEEE Computer Society, Los Alamitos, USA, pp.296-300.

Conference Details
ICDAR2009, Barcelona, Spain, July 2009