Python: module cmonkey.datamatrix

cmonkey.datamatrix

index
/home/weiju/Projects/ISB/cmonkey-python/cmonkey/datamatrix.py

datatypes.py - data types for cMonkey This file is part of cMonkey Python. Please see README and LICENSE for more information and licensing details.

Modules

gzip
logging
numpy
operator
os
random
scipy
cmonkey.util

Classes



DataMatrix
DataMatrixFactory

class DataMatrix

    A two-dimensional data matrix class, with optional row and column names The matrix is initialized with a fixed dimension size and can not be resized after initialization. Names and values of a matrix instance can be modified. The values themselves are implemented as a two-dimensional numpy array and returned values are all based on numpy arrays.

Methods defined here:

__init__(self, nrows, ncols, row_names=None, col_names=None, values=None, init_value=None)
create a DataMatrix instance

__repr__(self)
returns a string representation of this matrix

__str__(self)
returns a string representation of this matrix

apply_log(self)
applies np.log to all values

column_indexes_for(self, column_names)
returns the column indexes with the matching names

column_values(self, column)
returns the values in the specified column

fix_extreme_values(self, min_value=-20.0)
replaces values < -20 with the smallest value that is >= -20 replaces all NA/Inf values with the maximum value in the matrix

max(self)
return the maximum value in this matrix

mean(self)
returns the mean value

median(self)
returns the mean value

min(self)
return the minimum value in this matrix

multiply_column_by(self, column, factor)
Mulitplies the specified column by a certain factor

quantile(self, probability)
returns the result of the quantile function over all contained values

replace_nan_with(self, value)
replaces NaN with the specified value

residual(self, max_row_variance=None)
computes the residual for this matrix, if max_row_variance is given, result is normalized by the row variance

row_indexes_for(self, row_names)
returns the row indexes with the matching names

row_values(self, row)
returns the values in the specified row

row_variance(self)

sorted_by_row_name(self)
returns a version of this table, sorted by row name

submatrix_by_name(self, row_names=None, column_names=None)
extract a submatrix with the specified rows and columns Selecting by name is more common than selecting by index in cMonkey, because submatrices are often selected based on memberships. Note: Currently, no duplicate row names or column names are supported. Furthermore, the submatrices potentially share the original matrix's values and so, writing to the submatrix will change the original matrix, too. Recommended to use submatrices read-only

submatrix_by_rows(self, row_indexes)
extract a submatrix with the specified rows. row_indexes needs to be sorted

subtract_with_quantile(self, quantile)
subtracts this matrix's values with the specified quantile of its values

write_tsv_file(self, path, compressed=True)
writes this matrix to tab-separated file

class DataMatrixFactory

    Reader class for creating a DataMatrix from a delimited file, applying all supplied filters. Currently, the assumption is that all input ratio files have a header row and the first column denotes the gene names. (To be moved to filter class comments): There are a couple of constraints and special things to consider: - Row names are unique, throw out rows with names that are already   in the matrix, first come, first in. This is to throw out the second   probe of a gene in certain cases

Methods defined here:

__init__(self, filters)
create a reader instance with the specified filters

create_from(self, delimited_file, case_sensitive=True)
creates and returns an initialized, filtered DataMatrix instance

Functions

center_scale_filter(matrix)
center the values of each row around their median and scale by their standard deviation

nochange_filter(matrix)
returns a new filtered DataMatrix containing only the columns and rows that have large enough measurements

Data

__all__ = ['DataMatrix', 'DataMatrixFactory', 'nochange_filter', 'center_scale_filter']

Data
		__all__ = ['DataMatrix', 'DataMatrixFactory', 'nochange_filter', 'center_scale_filter']