Araştırma Makalesi
BibTex RIS Kaynak Göster

fcvalid: An R Package for Internal Validation of Probabilistic and Possibilistic Clustering

Yıl 2020, Cilt: 3 Sayı: 1, 11 - 27, 30.04.2020
https://doi.org/10.35377/saucis.03.01.664560

Öz

In exploratory data analysis and machine learning, partitioning clustering is a frequently used unsupervised learning technique for finding the meaningful patterns in numeric datasets. Clustering aims to identify and classify the objects or the cases in datasets in practice. The clustering quality or the performance of a clustering algorithm is generally evaluated by using the internal validity indices. In this study, an R package named 'fcvalid' is introduced for validation of fuzzy and possibilistic clustering results. The package implements a broad collection of the internal indices which have been proposed to validate the results of fuzzy clustering algorithms. Additionally, the options to compute the generalized and extended versions of the fuzzy internal indices for validation of the possibilistic clustering are also included in the package.

Destekleyen Kurum

The Unit of Scientific Research Projects of Çukurova University

Proje Numarası

FBA-2019-10285

Teşekkür

Supplementary materials including the manual and codes of the package 'fcvalid' can be downloaded from GitHub at https://github.com/zcebeci/fcvalid.

Kaynakça

  • [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
  • [2] R. Krishnapuram, J. Keller, “A possibilistic approach to clustering”, IEEE Transactions on Fuzzy Systems, vol. 1, pp. 98-110, 1993.
  • [3] R. Krishnapuram, J. Keller, “The possibilistic c-means algorithm: Insights and recommendations”, IEEE Transactions on Fuzzy Systems, vol. 4, pp. 385-393, 1996.
  • [4] N.R. Pal, K. Pal, J.C. Bezdek, “A mixed c-means clustering model”, Proc. of the 6th IEEE Int. Conf. on Fuzzy Systems, vol. 1, pp. 11-21, 1997.
  • [5] N.R. Pal, K. Pal, J.M. Keller, J.C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm”, IEEE Transactions on Fuzzy Systems, vol. 13, no. 4, pp. 517-530, 2005.
  • [6] K.L. Wu, M.S. Yang, “A cluster validity index for fuzzy clustering”, Pattern Recognition Letters, vol. 26, no. 9, pp. 1275-1291, 2005.
  • [7] X. Wu, B. Wu, J. Sun, H. Fu, “Unsupervised possibilistic fuzzy clustering”, J of Information & Computational Science, vol. 7, no. 5, pp. 1075-1080, 2010.
  • [8] M. R. Rezaee, B. P. Lelieveldt, J. H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [9] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “On clustering validation techniques”, J of Intelligent Information Systems, vol. 17, no. 2–3, pp. 107-145, 2001.
  • [10] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part I”, ACM Sigmod Record, vol. 31, no. 2, pp. 40-45, 2002.
  • [11] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part II”, ACM Sigmod Record, vol. 31, no. 3, pp. 19-27, 2002.
  • [12] A.K. Jain, & R.C. Dubes. Algorithms for Clustering Data. Englewood Cliffs: Prentice Hall, 1988.
  • [13] J.C. Bezdek, “Cluster validity with fuzzy sets”, J Cybernetics, vol. 3, no. 3, pp. 58-72, 1974.
  • [14] I. Gath, A.B. Geva, “Unsupervised optimal fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-780, 1989.
  • [15] Y. Fukuyama, M. Sugeno, “A new method of choosing the number of cluster for the fuzzy c-means method”, Proc. of the 5th Fuzzy Systems Symp., pp. 247-250, 1989.
  • [16] X.L. Xie, G. Beni, “A validity measure for fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991.
  • [17] R. Krishnapuram, C-P. Freg, “Fitting an unknown number of lines and planes to image data through compatible cluster merging”, Pattern Recognition, vol. 25, pp. 385-400, 1992.
  • [18] S.H. Kwon, “Cluster validity index for fuzzy clustering”, Electronics Letters, vol. 34, no. 22, pp. 2176-2177, 1998.
  • [19] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh, “Validity-guided (re)clustering with applications to image segmentation”, IEEE Transactions on Fuzzy Systems, vol. 4, no.2, pp. 112-123, 1996.
  • [20] R.N. Dave, “Validating fuzzy partitions obtained through c-shells clustering”, Pattern Recognition Letters, vol. 17, pp. 613-623, 1996.
  • [21] M.R. Rezaee, B.P. Lelieveldt, J.H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [22] N. Zahid, M. Limouri, A. Essaid, “A new cluster-validity for fuzzy clustering”, Pattern Recognition, vol. 32, no. 7, pp. 1089-1097, 1999.
  • [23] M.Y. Chen, D.A. Linkens, “Rule-base self-generation and simplification for data-driven fuzzy models”, Fuzzy Sets and Systems, vol. 142, no. 2, pp. 243-265, 2004.
  • [24] M.K. Pakhira, S. Bandyopadhyay, U. Maulik, “Validity index for crisp and fuzzy clusters”, Pattern Recognition, vol. 37, no. 3, pp. 487-501, 2004.
  • [25] Y. Tang, F. Sun, Z. Sun, “Improved validation index for fuzzy clustering”, Proc. - The American Control Conference,IEEE, pp. 1120-1125, 2005.
  • [26] R.J.G.B. Campello, E.R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis”, Fuzzy Sets and Systems, vol. 157, no. 21, pp. 2858-2875, 2006.
  • [27] V. Schwaemmle, O.N. Jensen, “A simple and fast method to determine the parameters for fuzzy c-means cluster validation”, 2010. [Online]. Available: http://arxiv.org/abs/1004.1307v1. [Accessed: 24-Dec-2019].
  • [28] A. Chakrabarty, An Investigation of Clustering Algorithms and Soft Computing Approaches for Pattern Recognition. PhD. Thesis, Assam Univ., India, 116 p., 2010. [Online]. Available: http://shodhganga.inflibnet.ac.in/bitstream/10603/93443/16/16_chapter%208.pdf [Accessed: 24-Dec-2019].
  • [29] Z. Cebeci, A.T. Kavlak, F. Yildiz, “Validation of fuzzy and possibilistic clustering results,” Proc. - International Artificial Intelligence and Data Processing Symposium IDAP 2017, IEEE, pp. 1-7, 2017. doi: 10.1109/IDAP.2017.8090183
  • [30] H. Wickham, J. Hester, W. Chang, “devtools: Tools to Make Developing R Packages Easier”, R package version 2.2.1, 2019. [Online]. Available: https://CRAN.R-project.org/package=devtools, [Accessed: 24-Dec-2019].
  • [31] R. Core Team, “R: A language and environment for statistical computing”, R Foundation for Statistical Computing Vienna Austria, 2017.
  • [32] Y. Xie, J.J. Allaire, G. Grolemund, G., R Markdown: The Definitive Guide. Chapman and Hall/CRC, 2018. [Online]. Available: https://bookdown.org/yihui/rmarkdown, [Accessed: 24-Dec-2019].
  • [33] E. Anderson, “The Irises of the Gaspé Peninsula”, Bull. Amer. Iris Soc., vol. 59, pp. 2-5, 1935.
  • [34] Z. Cebeci, F. Yildiz, A.T. Kavlak, C. Cebeci, H. Onder, “ppclust: Probabilistic and Possibilistic Cluster Analysis”, R package version 0.1.3, 2019. [Online]. Available: https://CRAN.R-project.org/package=ppclust, [Accessed: 24-Dec-2019].

fcvalid: Olasılıklı ve Olabilirlikli Bölümleyici Kümelemede Bulanık Geçerlilik İndeksleri için Bir R Paketi

Yıl 2020, Cilt: 3 Sayı: 1, 11 - 27, 30.04.2020
https://doi.org/10.35377/saucis.03.01.664560

Öz

Bölümleyici kümeleme, keşifsel veri analizi ve makine öğrenmesinde sayısal veri kümelerindeki anlamlı örüntüleri bulmak için yaygın olarak kullanılan denetimsiz öğrenme tekniklerinden biridir. Kümeleme, pratikte veri kümesindeki nesneleri veya olguları tanımayı ve sınıflandırmayı amaçlar. Bir kümeleme analizinin kalitesi veya bir kümeleme algoritmasının performansı genellikle iç geçerlilik endeksleri kullanılarak değerlendirilir. Bu çalışmada, bulanık ve olabilirlikli kümeleme sonuçlarının doğrulanması için 'fcvalid' adında bir R paketinin işlevleri tanıtılmaktadır. Paket, bulanık kümeleme algoritmalarının sonuçlarını doğrulamak için önerilen çok sayıda iç endeksin uygulamasını içermektedir. Ayrıca, olabilirlikli kümelemenin doğrulanması için bulanık iç endekslerin genelleştirilmiş ve genişletilmiş sürümlerini hesaplama seçenekleri de pakete dâhil edilmiştir.

Proje Numarası

FBA-2019-10285

Kaynakça

  • [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
  • [2] R. Krishnapuram, J. Keller, “A possibilistic approach to clustering”, IEEE Transactions on Fuzzy Systems, vol. 1, pp. 98-110, 1993.
  • [3] R. Krishnapuram, J. Keller, “The possibilistic c-means algorithm: Insights and recommendations”, IEEE Transactions on Fuzzy Systems, vol. 4, pp. 385-393, 1996.
  • [4] N.R. Pal, K. Pal, J.C. Bezdek, “A mixed c-means clustering model”, Proc. of the 6th IEEE Int. Conf. on Fuzzy Systems, vol. 1, pp. 11-21, 1997.
  • [5] N.R. Pal, K. Pal, J.M. Keller, J.C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm”, IEEE Transactions on Fuzzy Systems, vol. 13, no. 4, pp. 517-530, 2005.
  • [6] K.L. Wu, M.S. Yang, “A cluster validity index for fuzzy clustering”, Pattern Recognition Letters, vol. 26, no. 9, pp. 1275-1291, 2005.
  • [7] X. Wu, B. Wu, J. Sun, H. Fu, “Unsupervised possibilistic fuzzy clustering”, J of Information & Computational Science, vol. 7, no. 5, pp. 1075-1080, 2010.
  • [8] M. R. Rezaee, B. P. Lelieveldt, J. H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [9] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “On clustering validation techniques”, J of Intelligent Information Systems, vol. 17, no. 2–3, pp. 107-145, 2001.
  • [10] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part I”, ACM Sigmod Record, vol. 31, no. 2, pp. 40-45, 2002.
  • [11] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part II”, ACM Sigmod Record, vol. 31, no. 3, pp. 19-27, 2002.
  • [12] A.K. Jain, & R.C. Dubes. Algorithms for Clustering Data. Englewood Cliffs: Prentice Hall, 1988.
  • [13] J.C. Bezdek, “Cluster validity with fuzzy sets”, J Cybernetics, vol. 3, no. 3, pp. 58-72, 1974.
  • [14] I. Gath, A.B. Geva, “Unsupervised optimal fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-780, 1989.
  • [15] Y. Fukuyama, M. Sugeno, “A new method of choosing the number of cluster for the fuzzy c-means method”, Proc. of the 5th Fuzzy Systems Symp., pp. 247-250, 1989.
  • [16] X.L. Xie, G. Beni, “A validity measure for fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991.
  • [17] R. Krishnapuram, C-P. Freg, “Fitting an unknown number of lines and planes to image data through compatible cluster merging”, Pattern Recognition, vol. 25, pp. 385-400, 1992.
  • [18] S.H. Kwon, “Cluster validity index for fuzzy clustering”, Electronics Letters, vol. 34, no. 22, pp. 2176-2177, 1998.
  • [19] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh, “Validity-guided (re)clustering with applications to image segmentation”, IEEE Transactions on Fuzzy Systems, vol. 4, no.2, pp. 112-123, 1996.
  • [20] R.N. Dave, “Validating fuzzy partitions obtained through c-shells clustering”, Pattern Recognition Letters, vol. 17, pp. 613-623, 1996.
  • [21] M.R. Rezaee, B.P. Lelieveldt, J.H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [22] N. Zahid, M. Limouri, A. Essaid, “A new cluster-validity for fuzzy clustering”, Pattern Recognition, vol. 32, no. 7, pp. 1089-1097, 1999.
  • [23] M.Y. Chen, D.A. Linkens, “Rule-base self-generation and simplification for data-driven fuzzy models”, Fuzzy Sets and Systems, vol. 142, no. 2, pp. 243-265, 2004.
  • [24] M.K. Pakhira, S. Bandyopadhyay, U. Maulik, “Validity index for crisp and fuzzy clusters”, Pattern Recognition, vol. 37, no. 3, pp. 487-501, 2004.
  • [25] Y. Tang, F. Sun, Z. Sun, “Improved validation index for fuzzy clustering”, Proc. - The American Control Conference,IEEE, pp. 1120-1125, 2005.
  • [26] R.J.G.B. Campello, E.R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis”, Fuzzy Sets and Systems, vol. 157, no. 21, pp. 2858-2875, 2006.
  • [27] V. Schwaemmle, O.N. Jensen, “A simple and fast method to determine the parameters for fuzzy c-means cluster validation”, 2010. [Online]. Available: http://arxiv.org/abs/1004.1307v1. [Accessed: 24-Dec-2019].
  • [28] A. Chakrabarty, An Investigation of Clustering Algorithms and Soft Computing Approaches for Pattern Recognition. PhD. Thesis, Assam Univ., India, 116 p., 2010. [Online]. Available: http://shodhganga.inflibnet.ac.in/bitstream/10603/93443/16/16_chapter%208.pdf [Accessed: 24-Dec-2019].
  • [29] Z. Cebeci, A.T. Kavlak, F. Yildiz, “Validation of fuzzy and possibilistic clustering results,” Proc. - International Artificial Intelligence and Data Processing Symposium IDAP 2017, IEEE, pp. 1-7, 2017. doi: 10.1109/IDAP.2017.8090183
  • [30] H. Wickham, J. Hester, W. Chang, “devtools: Tools to Make Developing R Packages Easier”, R package version 2.2.1, 2019. [Online]. Available: https://CRAN.R-project.org/package=devtools, [Accessed: 24-Dec-2019].
  • [31] R. Core Team, “R: A language and environment for statistical computing”, R Foundation for Statistical Computing Vienna Austria, 2017.
  • [32] Y. Xie, J.J. Allaire, G. Grolemund, G., R Markdown: The Definitive Guide. Chapman and Hall/CRC, 2018. [Online]. Available: https://bookdown.org/yihui/rmarkdown, [Accessed: 24-Dec-2019].
  • [33] E. Anderson, “The Irises of the Gaspé Peninsula”, Bull. Amer. Iris Soc., vol. 59, pp. 2-5, 1935.
  • [34] Z. Cebeci, F. Yildiz, A.T. Kavlak, C. Cebeci, H. Onder, “ppclust: Probabilistic and Possibilistic Cluster Analysis”, R package version 0.1.3, 2019. [Online]. Available: https://CRAN.R-project.org/package=ppclust, [Accessed: 24-Dec-2019].
Toplam 34 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yazılım Mühendisliği (Diğer)
Bölüm Makaleler
Yazarlar

Zeynel Cebeci 0000-0002-7641-7094

Proje Numarası FBA-2019-10285
Yayımlanma Tarihi 30 Nisan 2020
Gönderilme Tarihi 24 Aralık 2019
Kabul Tarihi 14 Nisan 2020
Yayımlandığı Sayı Yıl 2020Cilt: 3 Sayı: 1

Kaynak Göster

IEEE Z. Cebeci, “fcvalid: An R Package for Internal Validation of Probabilistic and Possibilistic Clustering”, SAUCIS, c. 3, sy. 1, ss. 11–27, 2020, doi: 10.35377/saucis.03.01.664560.

    Sakarya University Journal of Computer and Information Sciences in Applied Sciences and Engineering: An interdisciplinary journal of information science