Abstract: In order to overcome the low efficiency of traditional data mining algorithms without considering weighted association rules, this paper proposes a distributed data flow mining algorithm based on matrix weighted association rules. According to the way of separating metadata and data flow, garbage data processing in data flow is realized. By using sliding window and data summary structure to optimize PCA algorithm, the main component decision matrix is formed in the window, and the dimension of data in sliding window is reduced by using the decision matrix. The matrix weighted association rules are used to mine the distributed data. After dimensionality reduction, the transactions in the database are clustered according to the time distribution. The weighted analysis is carried out for each aggregation to obtain the weighted frequent item set with time and output the mining results. The experimental results show that the proposed algorithm has high efficiency and the highest accuracy of 98.9%.
Keywords: Matrix weighted association rules, data flow, mining, data dimension reduction