skip navigation
ERPAePRINTS  logo: erpanet skip navigation

Detecting Family Resemblance: Automated Genre Classification.

Kim, Dr Yunhyong and Ross, Prof Seamus (2006) Detecting Family Resemblance: Automated Genre Classification..

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

Item Type:Preprint
Keywords:automated genre classification, metadata, scientific information, information management, information extraction
Subjects:M Resource Discovery
E Data Description, Documentation and Standards > EA Metadata
Document Language:English
ID Code:116
Deposited By:INVALID USER
Deposited On:25 April 2007