基于群智能算法的聚类挖掘方法研究
发布时间:2018-09-12 15:09
【摘要】:互联网时代来临,为了避免陷入“数据丰富,信息匮乏”的窘迫境地,数据挖掘担负着从海量数据中提取有价值的潜在信息并实现数据价值的重要使命。数据挖掘成为了众多学者在信息时代研究的热点之一。聚类是数据挖掘中的一个重要研究领域,它作为一种数据挖掘工具在诸多领域都有重要的应用。群智能算法是一种新兴的启发式优化算法,根据生物在生态系统中以存活、觅食、求偶等行为模拟而来。它具有自学习、分布性、自组织、并行性等特点,能很好地处理传统计算方法难以解决的一些复杂问题,特别是数据分析。群智能算法在处理一些复杂优化问题方面具备较大的发展潜力。本文详细论述了数据挖掘的基础知识和几种常见的群智能算法,分析了聚类算法存在的问题。论文对萤火虫算法的理论进行了研究和算法改进,并利用改进的算法来解决聚类问题。主要工作如下:(1)针对传统模糊C均值聚类算法初始聚类中心随机选取、容易陷入局部最优、效率低等问题,本文引入了混沌相关理论,提出了一种混沌初始化方法。然后利用Logistic映射修改萤火虫位置更新公式,得到较好的聚类效果。实验结果表明:该算法准确率较高,迭代次数较少。(2)针对传统模糊C均值聚类算法全局搜索能力较差、对初始聚类中心选择较敏感、聚类效果差等缺点,在上一个算法的基础上提出了一种新的小生境萤火虫模糊聚类算法。该算法首先采用了随机性和遍历性更好的立方映射初始化种群,然后引入随机惯性权重以修改萤火虫位置更新公式,以平衡探索和开发的性能。通过实验结果可知:该算法提高了聚类质量并具有较强鲁棒性。(3)针对k-means聚类算法聚类效果差、对初始聚类中心选择过分依赖、全局搜索能力较差等缺点,提出了一种引入莱维飞行机制的萤火虫划分聚类算法。该算法利用基于密度和最大最小距离法来初始化种群,并在萤火虫个体位置更新公式中引入莱维飞行机制,以避免陷入局部最优,同时使收敛速度更快,且具有良好的全局搜索能力,最后利用平衡方差评价函数优化目标函数。实验结果表明,该算法不仅避免了陷入局部最优,提高了k-means算法聚类结果质量,同时削弱了其对初始值的依赖程度。
[Abstract]:With the advent of the Internet era, in order to avoid falling into the dilemma of "rich data and lack of information", data mining is shouldering the important mission of extracting valuable potential information from massive data and realizing the value of data. Data mining has become one of the hotspots of many scholars in the information age. Clustering is an important research field in data mining. As a data mining tool, it has important applications in many fields. Swarm intelligence algorithm is a new heuristic optimization algorithm, which is simulated by the behavior of survival, foraging, courtship and so on. It has the characteristics of self-learning, distribution, self-organization and parallelism. It can deal with some complicated problems, especially data analysis, which are difficult to solve by traditional computing methods. Swarm intelligence algorithm has great development potential in dealing with some complex optimization problems. In this paper, the basic knowledge of data mining and several common swarm intelligence algorithms are discussed in detail, and the problems of clustering algorithm are analyzed. In this paper, the theory of firefly algorithm is studied and improved, and the improved algorithm is used to solve the clustering problem. The main work is as follows: (1) aiming at the problems of random selection of initial clustering center, easy to fall into local optimum and low efficiency in traditional fuzzy C-means clustering algorithm, chaos correlation theory is introduced and a chaos initialization method is proposed in this paper. Then the Logistic mapping is used to modify the update formula of the firefly position and the clustering effect is obtained. The experimental results show that the algorithm has higher accuracy and fewer iterations. (2) the traditional fuzzy C-means clustering algorithm has poor global search ability, sensitive to the selection of initial clustering centers, and poor clustering effect. Based on the previous algorithm, a new fuzzy clustering algorithm for niche fireflies is proposed. In this algorithm, the population is initialized by cubic mapping with better randomness and ergodicity, and then random inertial weight is introduced to modify the update formula of firefly position to balance the performance of exploration and development. The experimental results show that the algorithm improves the clustering quality and has strong robustness. (3) aiming at the shortcomings of k-means clustering algorithm, such as poor clustering effect, over-dependence on the initial clustering center, poor global search ability, etc. A firefly clustering algorithm based on Levi flight mechanism is proposed. The algorithm initializes the population based on density and maximum and minimum distance, and introduces Levy flight mechanism into the updating formula of individual position of fireflies, so as to avoid falling into local optimum, and at the same time make the convergence speed faster. And it has good global search ability. Finally, the objective function is optimized by the balanced variance evaluation function. Experimental results show that the algorithm not only avoids falling into local optimum, but also improves the quality of clustering results of k-means algorithm and weakens its dependence on initial values.
【学位授予单位】:长沙理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP18;TP311.13
本文编号:2239416
[Abstract]:With the advent of the Internet era, in order to avoid falling into the dilemma of "rich data and lack of information", data mining is shouldering the important mission of extracting valuable potential information from massive data and realizing the value of data. Data mining has become one of the hotspots of many scholars in the information age. Clustering is an important research field in data mining. As a data mining tool, it has important applications in many fields. Swarm intelligence algorithm is a new heuristic optimization algorithm, which is simulated by the behavior of survival, foraging, courtship and so on. It has the characteristics of self-learning, distribution, self-organization and parallelism. It can deal with some complicated problems, especially data analysis, which are difficult to solve by traditional computing methods. Swarm intelligence algorithm has great development potential in dealing with some complex optimization problems. In this paper, the basic knowledge of data mining and several common swarm intelligence algorithms are discussed in detail, and the problems of clustering algorithm are analyzed. In this paper, the theory of firefly algorithm is studied and improved, and the improved algorithm is used to solve the clustering problem. The main work is as follows: (1) aiming at the problems of random selection of initial clustering center, easy to fall into local optimum and low efficiency in traditional fuzzy C-means clustering algorithm, chaos correlation theory is introduced and a chaos initialization method is proposed in this paper. Then the Logistic mapping is used to modify the update formula of the firefly position and the clustering effect is obtained. The experimental results show that the algorithm has higher accuracy and fewer iterations. (2) the traditional fuzzy C-means clustering algorithm has poor global search ability, sensitive to the selection of initial clustering centers, and poor clustering effect. Based on the previous algorithm, a new fuzzy clustering algorithm for niche fireflies is proposed. In this algorithm, the population is initialized by cubic mapping with better randomness and ergodicity, and then random inertial weight is introduced to modify the update formula of firefly position to balance the performance of exploration and development. The experimental results show that the algorithm improves the clustering quality and has strong robustness. (3) aiming at the shortcomings of k-means clustering algorithm, such as poor clustering effect, over-dependence on the initial clustering center, poor global search ability, etc. A firefly clustering algorithm based on Levi flight mechanism is proposed. The algorithm initializes the population based on density and maximum and minimum distance, and introduces Levy flight mechanism into the updating formula of individual position of fireflies, so as to avoid falling into local optimum, and at the same time make the convergence speed faster. And it has good global search ability. Finally, the objective function is optimized by the balanced variance evaluation function. Experimental results show that the algorithm not only avoids falling into local optimum, but also improves the quality of clustering results of k-means algorithm and weakens its dependence on initial values.
【学位授予单位】:长沙理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP18;TP311.13
【参考文献】
相关期刊论文 前10条
1 赵杰;雷秀娟;吴振强;;基于最优类中心扰动的萤火虫聚类算法[J];计算机工程与科学;2015年02期
2 卿铭;孙晓梅;;一种新的聚类有效性函数:模糊划分的模糊熵[J];智能系统学报;2015年01期
3 王吉权;王福林;;萤火虫算法的改进分析及应用[J];计算机应用;2014年09期
4 张桂珠;胥枫;赵芳;吴德龙;;一种具有领导机制的混合蛙跳优化算法[J];计算机应用研究;2014年07期
5 袁锋;陈守强;刘弘;钟安帅;;一种改进的文化萤火虫算法[J];计算机仿真;2014年06期
6 王冲;雷秀娟;;新的小生境萤火虫划分聚类算法[J];计算机工程;2014年05期
7 符强;童楠;钟才明;赵一鸣;;基于改进型进化机制的萤火虫优化算法[J];计算机科学;2014年03期
8 胥小波;郑康锋;李丹;武斌;杨义先;;新的混沌粒子群优化算法[J];通信学报;2012年01期
9 蒲蓬勃;王鸽;刘太安;;基于粒子群优化的模糊C-均值聚类改进算法[J];计算机工程与设计;2008年16期
10 贾东立;张家树;;基于混沌变异的小生境粒子群算法[J];控制与决策;2007年01期
相关博士学位论文 前1条
1 匡芳君;群智能混合优化算法及其应用研究[D];南京理工大学;2014年
相关硕士学位论文 前1条
1 李莲;基于蜂群和粗糙集的聚类算法研究[D];长沙理工大学;2014年
,本文编号:2239416
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2239416.html