Mfuzz webpage

Mfuzz webpage

Clustering is an important tool in gene expression data analysis - both on transcript as well as on protein level. This unsupervised classification technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied produce hard partitions of the data, i.e. each gene or protein is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for gene expression time-course data, where gene/protein clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise.

To overcome the limitations of hard clustering, we have implemented soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes or proteins, respectively. This can be used for the more targeted search for regulatory elements (see Publication). Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes/proteins can be avoided. This prevents the exclusion of biologically relevant genes/proteins from the data analysis.

Further information

Questions & Answers


Soft clustering was implemented here using the fuzzy c-means algorithm. A software package termed Mfuzz for soft clustering has been developed based on the open-source statistical language R. It uses the cmeans function of the e1071 package. The current version can be downloaded from the Bioconductor repository (see below). A graphical interface for the Mfuzz-package is included but may not have all functionality compared to Mfuzz using command line.

Current version


Questions and suggestions regarding Mfuzz can be addressed to Matthias Futschik.