Non-hierarchic document clustering
Cluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Borås
1995-01-01
|
Series: | Information Research: An International Electronic Journal |
Subjects: | |
Online Access: | http://informationr.net/ir/1-1/paper1.html |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cluster analysis, or automatic classification, is a multivariate statistical technique that seeks to identify groups, or clusters, of similar objects in a multi-dimensional space. There have been many attempts over the years to use such procedures for the organisation of document databases, so that documents with large numbers of index terms in common are grouped together. In this paper, we consider the use of a genetic algorithm, henceforth a GA, for document clustering. GAs are a class of non-deterministic algorithms that derive from Darwinian theories of evolution. They provide good, though not necessarily optimal solutions to combinatorial optimisation problems, where the number of possible solutions is far too great for all of the possibilities to be explored in a reasonable time by a deterministic algorithm. One such problem is that of non-hierarchic clustering, where the clustering method seeks to partition a set of objects into a set of non-overlapping groups so as to maximise some external criterion of goodness of clustering, typically the extent to which the within-cluster inter-object similarities are maximised and the between-cluster similarities minimised. |
---|---|
ISSN: | 1368-1613 |