Non-hierarchic document clustering

Cluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gareth Jones, Alexander M. Robertson, Chawchat Santimetvirul, Peter Willett
Format:	Article
Language:	English
Published:	University of Borås 1995-01-01
Series:	Information Research: An International Electronic Journal
Subjects:	information retrieval document clustering clustering algorithms cluster analysis automatic classification genetic algorithms
Online Access:	http://informationr.net/ir/1-1/paper1.html
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832569089836449792
author	Gareth Jones Alexander M. Robertson Chawchat Santimetvirul Peter Willett
author_facet	Gareth Jones Alexander M. Robertson Chawchat Santimetvirul Peter Willett
author_sort	Gareth Jones
collection	DOAJ
description	Cluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that documents with large numbers of index terms in common are grouped together. In this paper, we consider the use of a genetic algorithm, henceforth a GA, for document clustering. GAs are a class of non-deterministic algorithms that derive from Darwinian theories of evolution. They provide good, though not necessarily optimal solutions to combinatorial optimisation problems, where the number of possible solutions is far too great for all of the possibilities to be explored in a reasonable time by a deterministic algorithm. One such problem is that of non-hierarchic clustering, where the clustering method seeks to partition a set of objects into a set of non-overlapping groups so as to maximise some external criterion of goodness of clustering, typically the extent to which the within-cluster inter-object similarities are maximised and the between-cluster similarities minimised.
format	Article
id	doaj-art-f54fafb7f69e4b8cb582110d3a310caf
institution	Kabale University
issn	1368-1613
language	English
publishDate	1995-01-01
publisher	University of Borås
record_format	Article
series	Information Research: An International Electronic Journal
spelling	doaj-art-f54fafb7f69e4b8cb582110d3a310caf2025-02-02T23:22:16ZengUniversity of BoråsInformation Research: An International Electronic Journal1368-16131995-01-01111Non-hierarchic document clusteringGareth JonesAlexander M. RobertsonChawchat SantimetvirulPeter WillettCluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that documents with large numbers of index terms in common are grouped together. In this paper, we consider the use of a genetic algorithm, henceforth a GA, for document clustering. GAs are a class of non-deterministic algorithms that derive from Darwinian theories of evolution. They provide good, though not necessarily optimal solutions to combinatorial optimisation problems, where the number of possible solutions is far too great for all of the possibilities to be explored in a reasonable time by a deterministic algorithm. One such problem is that of non-hierarchic clustering, where the clustering method seeks to partition a set of objects into a set of non-overlapping groups so as to maximise some external criterion of goodness of clustering, typically the extent to which the within-cluster inter-object similarities are maximised and the between-cluster similarities minimised.http://informationr.net/ir/1-1/paper1.htmlinformation retrievaldocument clusteringclusteringalgorithmscluster analysisautomatic classificationgenetic algorithms
spellingShingle	Gareth Jones Alexander M. Robertson Chawchat Santimetvirul Peter Willett Non-hierarchic document clustering Information Research: An International Electronic Journal information retrieval document clustering clustering algorithms cluster analysis automatic classification genetic algorithms
title	Non-hierarchic document clustering
title_full	Non-hierarchic document clustering
title_fullStr	Non-hierarchic document clustering
title_full_unstemmed	Non-hierarchic document clustering
title_short	Non-hierarchic document clustering
title_sort	non hierarchic document clustering
topic	information retrieval document clustering clustering algorithms cluster analysis automatic classification genetic algorithms
url	http://informationr.net/ir/1-1/paper1.html
work_keys_str_mv	AT garethjones nonhierarchicdocumentclustering AT alexandermrobertson nonhierarchicdocumentclustering AT chawchatsantimetvirul nonhierarchicdocumentclustering AT peterwillett nonhierarchicdocumentclustering

Non-hierarchic document clustering

Similar Items