Non-hierarchic document clustering

Cluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that...

Full description

Saved in:
Bibliographic Details
Main Authors: Gareth Jones, Alexander M. Robertson, Chawchat Santimetvirul, Peter Willett
Format: Article
Language:English
Published: University of Borås 1995-01-01
Series:Information Research: An International Electronic Journal
Subjects:
Online Access:http://informationr.net/ir/1-1/paper1.html
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832569089836449792
author Gareth Jones
Alexander M. Robertson
Chawchat Santimetvirul
Peter Willett
author_facet Gareth Jones
Alexander M. Robertson
Chawchat Santimetvirul
Peter Willett
author_sort Gareth Jones
collection DOAJ
description Cluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that documents with large numbers of index terms in common are grouped together. In this paper, we consider the use of a genetic algorithm, henceforth a GA, for document clustering. GAs are a class of non-deterministic algorithms that derive from Darwinian theories of evolution. They provide good, though not necessarily optimal solutions to combinatorial optimisation problems, where the number of possible solutions is far too great for all of the possibilities to be explored in a reasonable time by a deterministic algorithm. One such problem is that of non-hierarchic clustering, where the clustering method seeks to partition a set of objects into a set of non-overlapping groups so as to maximise some external criterion of goodness of clustering, typically the extent to which the within-cluster inter-object similarities are maximised and the between-cluster similarities minimised.
format Article
id doaj-art-f54fafb7f69e4b8cb582110d3a310caf
institution Kabale University
issn 1368-1613
language English
publishDate 1995-01-01
publisher University of Borås
record_format Article
series Information Research: An International Electronic Journal
spelling doaj-art-f54fafb7f69e4b8cb582110d3a310caf2025-02-02T23:22:16ZengUniversity of BoråsInformation Research: An International Electronic Journal1368-16131995-01-01111Non-hierarchic document clusteringGareth JonesAlexander M. RobertsonChawchat SantimetvirulPeter WillettCluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that documents with large numbers of index terms in common are grouped together. In this paper, we consider the use of a genetic algorithm, henceforth a GA, for document clustering. GAs are a class of non-deterministic algorithms that derive from Darwinian theories of evolution. They provide good, though not necessarily optimal solutions to combinatorial optimisation problems, where the number of possible solutions is far too great for all of the possibilities to be explored in a reasonable time by a deterministic algorithm. One such problem is that of non-hierarchic clustering, where the clustering method seeks to partition a set of objects into a set of non-overlapping groups so as to maximise some external criterion of goodness of clustering, typically the extent to which the within-cluster inter-object similarities are maximised and the between-cluster similarities minimised.http://informationr.net/ir/1-1/paper1.htmlinformation retrievaldocument clusteringclusteringalgorithmscluster analysisautomatic classificationgenetic algorithms
spellingShingle Gareth Jones
Alexander M. Robertson
Chawchat Santimetvirul
Peter Willett
Non-hierarchic document clustering
Information Research: An International Electronic Journal
information retrieval
document clustering
clustering
algorithms
cluster analysis
automatic classification
genetic algorithms
title Non-hierarchic document clustering
title_full Non-hierarchic document clustering
title_fullStr Non-hierarchic document clustering
title_full_unstemmed Non-hierarchic document clustering
title_short Non-hierarchic document clustering
title_sort non hierarchic document clustering
topic information retrieval
document clustering
clustering
algorithms
cluster analysis
automatic classification
genetic algorithms
url http://informationr.net/ir/1-1/paper1.html
work_keys_str_mv AT garethjones nonhierarchicdocumentclustering
AT alexandermrobertson nonhierarchicdocumentclustering
AT chawchatsantimetvirul nonhierarchicdocumentclustering
AT peterwillett nonhierarchicdocumentclustering