•  
  •  
 

Keywords

Clustering, Data mining, Vectorization, Big data

Document Type

Article

Abstract

The exponential growth of data stored in databases and data warehouses has intensified the demand for advanced analytical tools capable of efficiently extracting actionable insights. Data mining has emerged as a critical methodology for transforming vast datasets into knowledge. However, traditional techniques often face scalability challenges, particularly due to prolonged computational times, which hinder their effectiveness in large-scale applications. This research aims to improve the K-prototypes clustering algorithm to optimize computational efficiency and enhance clustering precision, enabling more effective knowledge discovery. Key modifications involve implementing vectorization techniques to accelerate processing, redesigning the initialization strategy for cluster centers, and refining the convergence conditions. The upgraded algorithm was evaluated using the widely referenced Adult Dataset through three experimental frameworks: a performance assessment of runtime and accuracy, an investigation into the ideal weighting balance for categorical variables, and an evaluation of convergence behavior. Empirical results indicated a drastic reduction in processing time from 20 minutes to 4 seconds and a 4% increase in clustering accuracy. Additionally, the study found that the optimized algorithm achieves convergence within merely four iterations. These enhancements collectively demonstrate substantial improvements in both operational efficiency and analytical effectiveness for mixed-data clustering tasks.

Share

COinS