| |
- DataMatrix
- DataMatrixFactory
class DataMatrix |
|
A two-dimensional data matrix class, with optional row and column names
The matrix is initialized with a fixed dimension size and can not
be resized after initialization.
Names and values of a matrix instance can be modified.
The values themselves are implemented as a two-dimensional numpy array
and returned values are all based on numpy arrays. |
|
Methods defined here:
- __init__(self, nrows, ncols, row_names=None, col_names=None, values=None, init_value=None)
- create a DataMatrix instance
- __repr__(self)
- returns a string representation of this matrix
- __str__(self)
- returns a string representation of this matrix
- apply_log(self)
- applies np.log to all values
- column_indexes_for(self, column_names)
- returns the column indexes with the matching names
- column_values(self, column)
- returns the values in the specified column
- fix_extreme_values(self, min_value=-20.0)
- replaces values < -20 with the smallest value that is >= -20
replaces all NA/Inf values with the maximum value in the matrix
- max(self)
- return the maximum value in this matrix
- mean(self)
- returns the mean value
- median(self)
- returns the mean value
- min(self)
- return the minimum value in this matrix
- multiply_column_by(self, column, factor)
- Mulitplies the specified column by a certain factor
- quantile(self, probability)
- returns the result of the quantile function over all contained
values
- replace_nan_with(self, value)
- replaces NaN with the specified value
- residual(self, max_row_variance=None)
- computes the residual for this matrix, if max_row_variance is given,
result is normalized by the row variance
- row_indexes_for(self, row_names)
- returns the row indexes with the matching names
- row_values(self, row)
- returns the values in the specified row
- row_variance(self)
- sorted_by_row_name(self)
- returns a version of this table, sorted by row name
- submatrix_by_name(self, row_names=None, column_names=None)
- extract a submatrix with the specified rows and columns
Selecting by name is more common than selecting by index
in cMonkey, because submatrices are often selected based
on memberships.
Note: Currently, no duplicate row names or column names are
supported. Furthermore, the submatrices potentially share
the original matrix's values and so, writing to the submatrix
will change the original matrix, too. Recommended to use
submatrices read-only
- submatrix_by_rows(self, row_indexes)
- extract a submatrix with the specified rows.
row_indexes needs to be sorted
- subtract_with_quantile(self, quantile)
- subtracts this matrix's values with the specified quantile of its values
- write_tsv_file(self, path, compressed=True)
- writes this matrix to tab-separated file
|
class DataMatrixFactory |
|
Reader class for creating a DataMatrix from a delimited file,
applying all supplied filters. Currently, the assumption is
that all input ratio files have a header row and the first column
denotes the gene names.
(To be moved to filter class comments):
There are a couple of constraints and special things to consider:
- Row names are unique, throw out rows with names that are already
in the matrix, first come, first in. This is to throw out the second
probe of a gene in certain cases |
|
Methods defined here:
- __init__(self, filters)
- create a reader instance with the specified filters
- create_from(self, delimited_file, case_sensitive=True)
- creates and returns an initialized, filtered DataMatrix instance
| |