AN EFFICIENT TEXT CLUSTERING
APPROACH USING AFFINITY
PROPAGATION WITH WEIGHT
MODIFICATION

Mr. Sumit Panjwani 
Recently the text mining has emerged as one of the most important fields of data mining because of most of the searching in
the web is done on the basis of provided text, also the increasing use of social web network uses the text as major component
and extracting the effective information directly or indirectly requires an efficient grouping algorithm which should be capable
of providing efficient clustering. The most widely used techniques use vector space model to find equivalent vector of the text
for clustering. The vector space model represents the text on the form of ntuples numeric array (vector) where each dimension
represents the unique word and the value is the weight of that word on the basis of term frequencyinverse document frequency
(tfidf), the problem of the technique is that the unique words count in any document may be very large which will create the
similarly long vectors whose processing will require large memory with processing power secondly analysis may be required a
bias categorical grouping which not addressed in the above technique. Hence in this paper an efficient clustering approach is
presented which uses one dimension for the group of the words representing the similar area of interest with that we have also
considered the uneven weighting of each dimension depending upon the categorical bias during clustering. After creating the
vector the clustering is performed using seedsaffinity clustering technique. Finally to study the performance of the presented
algorithm, it is applied to the benchmark data set Reuters21578 and compared it for Fmeasure, Entropy and Execution time
with kmeans algorithm and the original AP (affinity propagation) algorithm the results shows that the presented algorithm
outperforms the others by acceptable margin. 
