skip navigation
ERPAePRINTS  logo: erpanet skip navigation

"The Naming of Cats": Automated Genre Classification.

Kim, Dr Yunhyong and Ross, Prof Seamus (2006) "The Naming of Cats": Automated Genre Classification..

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This paper builds on the work presented at the ECDL 2006 ([29]) in automated genre classifcation as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services. We divide features of the documents into five types: features for visual layout, linguistically modeled syntactic features, stylo-metric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources. Results concerning the first two types have been described elsewhere([29]). The current paper discusses results from testing classifiers based on image and stylometric features and shows that genres for which image features fail to cluster are the genres for which stylo-metric features cluster very well.

Item Type:Preprint
Subjects:M Resource Discovery
E Data Description, Documentation and Standards > EA Metadata
Document Language:English
ID Code:123
Deposited By:Kim, Dr Yunhyong
Deposited On:27 March 2007