BPCA Missing Value Estimator for MATLAB

[JBPCAfill] Missing Value Estimator for Java

Shigeyuki Oba

BPCAfill is a missing value estimation software suitable for gene expression profile data. (DNA microarray)

BPCAfill estimates missing values in the gene expression matrix, and fill them by the estimated values. The estimation almost always exhibits the best accuracy of all existing methods (as long as I know in 2003). Especially, when your data set has large number of genes and/or large number of samples, BPCAfill will efficiently use all the provided informations and output the best result, even when there are large number of missing entries.

In addition, it is very easy to use, because it do not need any type of hand tuning parameters.

See also the paper,
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., and Ishii, S. A Bayesian Missing value estimation method, Bioinformatics, (2003), to appear.

13 May. 2003 : first release of the Java version

MATLAB version is here

Contents
Download
Install
Usage
- Simplest usage
- Using via command line

Download

Jar file JBPCAfill.jar (170kb)

Install

Put the JAR file in arbitrary directory.

Usage

Simplest usage

Prepare a data file.

Tab delimited text format
- for example, Microsoft EXCEL ".txt" format
Missing values must be denoted by 999.0.

JBPCAfill estimates the missing entries using BPCA model and returns complete matrix filled with the estimated values.

Open the JBPCAfill application frame from command line.

> java -classpath JBPCAfill.jar JBPCAfill

(1) Set the input filename.
(2) Set the output filename. ( When the input filename (ex. test999.dat) is set, default filename (ex. test999_filled.dat) is set )
(3) Click the EXECUTE button, and the calculation starts. Please wait for it to converge. It will take some minutes or hours till finish. When it finished, a new output file is written. Missing entries will be filled by the estimated values.
(4) (OPTIONAL) If you want to handle, extraordinarily large data set, you may set some parameters, and you can obtain coarse estimation in short calculation time. But, the default is the most recommended.
(5) Something appears in this window, but, please don't care.

Using via command line

You can execute the algorithm via command line. When the input filename is sample999.dat, and the output filename is sampleFilled.dat, you will obtain the result by:

> java -classpath JBPCAfill.jar JBPCAfill sample999.dat sampleFilled.dat