Hierarchy Identification for Automatically Generating Table-of-Contents (bibtex)
by Nicolai Erbs, Iryna Gurevych and Torsten Zesch
Abstract:
A table-of-contents (TOC) provides a quick reference to a document's content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.
Reference:
Hierarchy Identification for Automatically Generating Table-of-Contents Nicolai Erbs, Iryna Gurevych and Torsten Zesch, In Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013) (Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, eds.), INCOMA Ltd., 2013.
Bibtex Entry:
@inproceedings{TUD-CS-2013-0198b,
abstract = {A table-of-contents (TOC) provides a quick reference to a document's content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.},
address = {Shoumen, Bulgaria},
author = {Erbs, Nicolai and Gurevych, Iryna and Zesch, Torsten},
booktitle = {Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013)},
editor = {Angelova, Galia and Bontcheva, Kalina and Mitkov, Ruslan},
issn = {1313-8502},
pages = {252--260},
publisher = {INCOMA Ltd.},
title = {{Hierarchy Identification for Automatically Generating Table-of-Contents}},
url = {http://aclweb.org/anthology/R/R13/R13-1033.pdf},
year = {2013}
}