skip navigation
ERPAePRINTS  logo: erpanet skip navigation

Genre Classification in Automated Ingest and Appraisal Metadata

Kim, Dr Yunhyong and Ross, Prof Seamus (2006) Genre Classification in Automated Ingest and Appraisal Metadata. In Gonzalo, Julio, Eds. Proceedings EUROPEAN CONFERENCE ON RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES (ECDL) LNCS 4172, pages pp. 63-74, Alicante, Spain.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Metadata creation is a crucial aspect of the ingest of digital materials into digital libraries. Metadata needed to document and manage digital materials are extensive and manual creation of them expensive. The Digital Curation Centre (DCC) has undertaken research to automate this process for some classes of digital material. We have segmented the problem and this paper discusses results in genre classification as a first step toward automating metadata extraction from documents. Here we propose a classification method built on looking at the documents from five directions; as an object exhibiting a specific visual format, as a linear layout of strings with characteristic grammar, as an object with stylo-metric signatures, as an object with intended meaning and purpose, and as an object linked to previously classified objects and other external sources. The results of some experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-facetted approach.

Item Type:Conference Paper
Keywords:genre, metadata, classification, extraction,information retrieval
Subjects:C Strategies and Procedures > CG Harvesting
P Curation Issues
E Data Description, Documentation and Standards > EG Representation Information
O Costs
E Data Description, Documentation and Standards > EA Metadata
Document Language:English
ID Code:110
Deposited By:Ross, Professor Seamus
Deposited On:18 October 2006