Research Article
BibTex RIS Cite

fcvalid: An R Package for Internal Validation of Probabilistic and Possibilistic Clustering

Year 2020, Volume: 3 Issue: 1, 11 - 27, 30.04.2020
https://doi.org/10.35377/saucis.03.01.664560

Abstract

In exploratory data analysis and machine learning, partitioning clustering is a frequently used unsupervised learning technique for finding the meaningful patterns in numeric datasets. Clustering aims to identify and classify the objects or the cases in datasets in practice. The clustering quality or the performance of a clustering algorithm is generally evaluated by using the internal validity indices. In this study, an R package named 'fcvalid' is introduced for validation of fuzzy and possibilistic clustering results. The package implements a broad collection of the internal indices which have been proposed to validate the results of fuzzy clustering algorithms. Additionally, the options to compute the generalized and extended versions of the fuzzy internal indices for validation of the possibilistic clustering are also included in the package.

Supporting Institution

The Unit of Scientific Research Projects of Çukurova University

Project Number

FBA-2019-10285

Thanks

Supplementary materials including the manual and codes of the package 'fcvalid' can be downloaded from GitHub at https://github.com/zcebeci/fcvalid.

References

  • [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
  • [2] R. Krishnapuram, J. Keller, “A possibilistic approach to clustering”, IEEE Transactions on Fuzzy Systems, vol. 1, pp. 98-110, 1993.
  • [3] R. Krishnapuram, J. Keller, “The possibilistic c-means algorithm: Insights and recommendations”, IEEE Transactions on Fuzzy Systems, vol. 4, pp. 385-393, 1996.
  • [4] N.R. Pal, K. Pal, J.C. Bezdek, “A mixed c-means clustering model”, Proc. of the 6th IEEE Int. Conf. on Fuzzy Systems, vol. 1, pp. 11-21, 1997.
  • [5] N.R. Pal, K. Pal, J.M. Keller, J.C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm”, IEEE Transactions on Fuzzy Systems, vol. 13, no. 4, pp. 517-530, 2005.
  • [6] K.L. Wu, M.S. Yang, “A cluster validity index for fuzzy clustering”, Pattern Recognition Letters, vol. 26, no. 9, pp. 1275-1291, 2005.
  • [7] X. Wu, B. Wu, J. Sun, H. Fu, “Unsupervised possibilistic fuzzy clustering”, J of Information & Computational Science, vol. 7, no. 5, pp. 1075-1080, 2010.
  • [8] M. R. Rezaee, B. P. Lelieveldt, J. H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [9] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “On clustering validation techniques”, J of Intelligent Information Systems, vol. 17, no. 2–3, pp. 107-145, 2001.
  • [10] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part I”, ACM Sigmod Record, vol. 31, no. 2, pp. 40-45, 2002.
  • [11] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part II”, ACM Sigmod Record, vol. 31, no. 3, pp. 19-27, 2002.
  • [12] A.K. Jain, & R.C. Dubes. Algorithms for Clustering Data. Englewood Cliffs: Prentice Hall, 1988.
  • [13] J.C. Bezdek, “Cluster validity with fuzzy sets”, J Cybernetics, vol. 3, no. 3, pp. 58-72, 1974.
  • [14] I. Gath, A.B. Geva, “Unsupervised optimal fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-780, 1989.
  • [15] Y. Fukuyama, M. Sugeno, “A new method of choosing the number of cluster for the fuzzy c-means method”, Proc. of the 5th Fuzzy Systems Symp., pp. 247-250, 1989.
  • [16] X.L. Xie, G. Beni, “A validity measure for fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991.
  • [17] R. Krishnapuram, C-P. Freg, “Fitting an unknown number of lines and planes to image data through compatible cluster merging”, Pattern Recognition, vol. 25, pp. 385-400, 1992.
  • [18] S.H. Kwon, “Cluster validity index for fuzzy clustering”, Electronics Letters, vol. 34, no. 22, pp. 2176-2177, 1998.
  • [19] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh, “Validity-guided (re)clustering with applications to image segmentation”, IEEE Transactions on Fuzzy Systems, vol. 4, no.2, pp. 112-123, 1996.
  • [20] R.N. Dave, “Validating fuzzy partitions obtained through c-shells clustering”, Pattern Recognition Letters, vol. 17, pp. 613-623, 1996.
  • [21] M.R. Rezaee, B.P. Lelieveldt, J.H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [22] N. Zahid, M. Limouri, A. Essaid, “A new cluster-validity for fuzzy clustering”, Pattern Recognition, vol. 32, no. 7, pp. 1089-1097, 1999.
  • [23] M.Y. Chen, D.A. Linkens, “Rule-base self-generation and simplification for data-driven fuzzy models”, Fuzzy Sets and Systems, vol. 142, no. 2, pp. 243-265, 2004.
  • [24] M.K. Pakhira, S. Bandyopadhyay, U. Maulik, “Validity index for crisp and fuzzy clusters”, Pattern Recognition, vol. 37, no. 3, pp. 487-501, 2004.
  • [25] Y. Tang, F. Sun, Z. Sun, “Improved validation index for fuzzy clustering”, Proc. - The American Control Conference,IEEE, pp. 1120-1125, 2005.
  • [26] R.J.G.B. Campello, E.R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis”, Fuzzy Sets and Systems, vol. 157, no. 21, pp. 2858-2875, 2006.
  • [27] V. Schwaemmle, O.N. Jensen, “A simple and fast method to determine the parameters for fuzzy c-means cluster validation”, 2010. [Online]. Available: http://arxiv.org/abs/1004.1307v1. [Accessed: 24-Dec-2019].
  • [28] A. Chakrabarty, An Investigation of Clustering Algorithms and Soft Computing Approaches for Pattern Recognition. PhD. Thesis, Assam Univ., India, 116 p., 2010. [Online]. Available: http://shodhganga.inflibnet.ac.in/bitstream/10603/93443/16/16_chapter%208.pdf [Accessed: 24-Dec-2019].
  • [29] Z. Cebeci, A.T. Kavlak, F. Yildiz, “Validation of fuzzy and possibilistic clustering results,” Proc. - International Artificial Intelligence and Data Processing Symposium IDAP 2017, IEEE, pp. 1-7, 2017. doi: 10.1109/IDAP.2017.8090183
  • [30] H. Wickham, J. Hester, W. Chang, “devtools: Tools to Make Developing R Packages Easier”, R package version 2.2.1, 2019. [Online]. Available: https://CRAN.R-project.org/package=devtools, [Accessed: 24-Dec-2019].
  • [31] R. Core Team, “R: A language and environment for statistical computing”, R Foundation for Statistical Computing Vienna Austria, 2017.
  • [32] Y. Xie, J.J. Allaire, G. Grolemund, G., R Markdown: The Definitive Guide. Chapman and Hall/CRC, 2018. [Online]. Available: https://bookdown.org/yihui/rmarkdown, [Accessed: 24-Dec-2019].
  • [33] E. Anderson, “The Irises of the Gaspé Peninsula”, Bull. Amer. Iris Soc., vol. 59, pp. 2-5, 1935.
  • [34] Z. Cebeci, F. Yildiz, A.T. Kavlak, C. Cebeci, H. Onder, “ppclust: Probabilistic and Possibilistic Cluster Analysis”, R package version 0.1.3, 2019. [Online]. Available: https://CRAN.R-project.org/package=ppclust, [Accessed: 24-Dec-2019].

fcvalid: Olasılıklı ve Olabilirlikli Bölümleyici Kümelemede Bulanık Geçerlilik İndeksleri için Bir R Paketi

Year 2020, Volume: 3 Issue: 1, 11 - 27, 30.04.2020
https://doi.org/10.35377/saucis.03.01.664560

Abstract

Bölümleyici kümeleme, keşifsel veri analizi ve makine öğrenmesinde sayısal veri kümelerindeki anlamlı örüntüleri bulmak için yaygın olarak kullanılan denetimsiz öğrenme tekniklerinden biridir. Kümeleme, pratikte veri kümesindeki nesneleri veya olguları tanımayı ve sınıflandırmayı amaçlar. Bir kümeleme analizinin kalitesi veya bir kümeleme algoritmasının performansı genellikle iç geçerlilik endeksleri kullanılarak değerlendirilir. Bu çalışmada, bulanık ve olabilirlikli kümeleme sonuçlarının doğrulanması için 'fcvalid' adında bir R paketinin işlevleri tanıtılmaktadır. Paket, bulanık kümeleme algoritmalarının sonuçlarını doğrulamak için önerilen çok sayıda iç endeksin uygulamasını içermektedir. Ayrıca, olabilirlikli kümelemenin doğrulanması için bulanık iç endekslerin genelleştirilmiş ve genişletilmiş sürümlerini hesaplama seçenekleri de pakete dâhil edilmiştir.

Project Number

FBA-2019-10285

References

  • [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
  • [2] R. Krishnapuram, J. Keller, “A possibilistic approach to clustering”, IEEE Transactions on Fuzzy Systems, vol. 1, pp. 98-110, 1993.
  • [3] R. Krishnapuram, J. Keller, “The possibilistic c-means algorithm: Insights and recommendations”, IEEE Transactions on Fuzzy Systems, vol. 4, pp. 385-393, 1996.
  • [4] N.R. Pal, K. Pal, J.C. Bezdek, “A mixed c-means clustering model”, Proc. of the 6th IEEE Int. Conf. on Fuzzy Systems, vol. 1, pp. 11-21, 1997.
  • [5] N.R. Pal, K. Pal, J.M. Keller, J.C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm”, IEEE Transactions on Fuzzy Systems, vol. 13, no. 4, pp. 517-530, 2005.
  • [6] K.L. Wu, M.S. Yang, “A cluster validity index for fuzzy clustering”, Pattern Recognition Letters, vol. 26, no. 9, pp. 1275-1291, 2005.
  • [7] X. Wu, B. Wu, J. Sun, H. Fu, “Unsupervised possibilistic fuzzy clustering”, J of Information & Computational Science, vol. 7, no. 5, pp. 1075-1080, 2010.
  • [8] M. R. Rezaee, B. P. Lelieveldt, J. H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [9] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “On clustering validation techniques”, J of Intelligent Information Systems, vol. 17, no. 2–3, pp. 107-145, 2001.
  • [10] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part I”, ACM Sigmod Record, vol. 31, no. 2, pp. 40-45, 2002.
  • [11] M. Halkidi, Y. Batistakis, M. Vazirgiannis, “Cluster validity methods: Part II”, ACM Sigmod Record, vol. 31, no. 3, pp. 19-27, 2002.
  • [12] A.K. Jain, & R.C. Dubes. Algorithms for Clustering Data. Englewood Cliffs: Prentice Hall, 1988.
  • [13] J.C. Bezdek, “Cluster validity with fuzzy sets”, J Cybernetics, vol. 3, no. 3, pp. 58-72, 1974.
  • [14] I. Gath, A.B. Geva, “Unsupervised optimal fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-780, 1989.
  • [15] Y. Fukuyama, M. Sugeno, “A new method of choosing the number of cluster for the fuzzy c-means method”, Proc. of the 5th Fuzzy Systems Symp., pp. 247-250, 1989.
  • [16] X.L. Xie, G. Beni, “A validity measure for fuzzy clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991.
  • [17] R. Krishnapuram, C-P. Freg, “Fitting an unknown number of lines and planes to image data through compatible cluster merging”, Pattern Recognition, vol. 25, pp. 385-400, 1992.
  • [18] S.H. Kwon, “Cluster validity index for fuzzy clustering”, Electronics Letters, vol. 34, no. 22, pp. 2176-2177, 1998.
  • [19] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh, “Validity-guided (re)clustering with applications to image segmentation”, IEEE Transactions on Fuzzy Systems, vol. 4, no.2, pp. 112-123, 1996.
  • [20] R.N. Dave, “Validating fuzzy partitions obtained through c-shells clustering”, Pattern Recognition Letters, vol. 17, pp. 613-623, 1996.
  • [21] M.R. Rezaee, B.P. Lelieveldt, J.H. Reiber, “A new cluster validity index for the fuzzy c-mean”, Pattern Recognition Letters, vol. 19, no. 3, pp. 237-246, 1998.
  • [22] N. Zahid, M. Limouri, A. Essaid, “A new cluster-validity for fuzzy clustering”, Pattern Recognition, vol. 32, no. 7, pp. 1089-1097, 1999.
  • [23] M.Y. Chen, D.A. Linkens, “Rule-base self-generation and simplification for data-driven fuzzy models”, Fuzzy Sets and Systems, vol. 142, no. 2, pp. 243-265, 2004.
  • [24] M.K. Pakhira, S. Bandyopadhyay, U. Maulik, “Validity index for crisp and fuzzy clusters”, Pattern Recognition, vol. 37, no. 3, pp. 487-501, 2004.
  • [25] Y. Tang, F. Sun, Z. Sun, “Improved validation index for fuzzy clustering”, Proc. - The American Control Conference,IEEE, pp. 1120-1125, 2005.
  • [26] R.J.G.B. Campello, E.R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis”, Fuzzy Sets and Systems, vol. 157, no. 21, pp. 2858-2875, 2006.
  • [27] V. Schwaemmle, O.N. Jensen, “A simple and fast method to determine the parameters for fuzzy c-means cluster validation”, 2010. [Online]. Available: http://arxiv.org/abs/1004.1307v1. [Accessed: 24-Dec-2019].
  • [28] A. Chakrabarty, An Investigation of Clustering Algorithms and Soft Computing Approaches for Pattern Recognition. PhD. Thesis, Assam Univ., India, 116 p., 2010. [Online]. Available: http://shodhganga.inflibnet.ac.in/bitstream/10603/93443/16/16_chapter%208.pdf [Accessed: 24-Dec-2019].
  • [29] Z. Cebeci, A.T. Kavlak, F. Yildiz, “Validation of fuzzy and possibilistic clustering results,” Proc. - International Artificial Intelligence and Data Processing Symposium IDAP 2017, IEEE, pp. 1-7, 2017. doi: 10.1109/IDAP.2017.8090183
  • [30] H. Wickham, J. Hester, W. Chang, “devtools: Tools to Make Developing R Packages Easier”, R package version 2.2.1, 2019. [Online]. Available: https://CRAN.R-project.org/package=devtools, [Accessed: 24-Dec-2019].
  • [31] R. Core Team, “R: A language and environment for statistical computing”, R Foundation for Statistical Computing Vienna Austria, 2017.
  • [32] Y. Xie, J.J. Allaire, G. Grolemund, G., R Markdown: The Definitive Guide. Chapman and Hall/CRC, 2018. [Online]. Available: https://bookdown.org/yihui/rmarkdown, [Accessed: 24-Dec-2019].
  • [33] E. Anderson, “The Irises of the Gaspé Peninsula”, Bull. Amer. Iris Soc., vol. 59, pp. 2-5, 1935.
  • [34] Z. Cebeci, F. Yildiz, A.T. Kavlak, C. Cebeci, H. Onder, “ppclust: Probabilistic and Possibilistic Cluster Analysis”, R package version 0.1.3, 2019. [Online]. Available: https://CRAN.R-project.org/package=ppclust, [Accessed: 24-Dec-2019].
There are 34 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Articles
Authors

Zeynel Cebeci 0000-0002-7641-7094

Project Number FBA-2019-10285
Publication Date April 30, 2020
Submission Date December 24, 2019
Acceptance Date April 14, 2020
Published in Issue Year 2020Volume: 3 Issue: 1

Cite

IEEE Z. Cebeci, “fcvalid: An R Package for Internal Validation of Probabilistic and Possibilistic Clustering”, SAUCIS, vol. 3, no. 1, pp. 11–27, 2020, doi: 10.35377/saucis.03.01.664560.

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License