In several real life applications matrices have often very large dimensions and contain a large number of zero entries. Matrices with these characteristics are called sparse. The performance both in terms of memory space and efficiency of linear algebra operations with these matrices can be improved if zero entries are avoided.
An example of sparse matrix are diagonal matrices. They can be stored by storing only the diagonal as an array. The function that transforms an array into a diagonal matrix is diag
.
More in general sparse matrices can be stored by means of scipy.sparse
in three different forms:
import scipy as sp
import scipy.sparse as sps
A=sp.array([[1.,0.,2.,0.],
[0.,0.,0.,0.],
[3.,0.,0.,0.],
[1.,0.,0.,4.]])
AS=sps.csr_matrix(A)
print AS.data
print AS.indptr
print AS.indices
print AS.nnz
data
contains the elements of the matrix different from zeroindptr
contains integers such that indptr[i]
is the index of the element in data
which is the first nonzero element of row $i$. If the entire row is zero then indptr[i]==indptr[i+1]
. If the original matrix has $m$ rows then len(indptr)==m+1
.indices
contains the column index information in such a way that indices[indptr[i]:indptr[i+1]]
is an integer array with the column indexes of the nonzero elements in row $i$. len(indices==len(data)==nnz
.
For example: for the fourth row $i=3$, indptr[3]
is 3 and indptr[4]
is 5; hence the portion of indices[3:5]
containing the indices of the columns for the elements in data
from 3 to 5 are 0 and 3. Hence 1. goes in column 0 and 4. goes in column 3. nnz
is the number of elements in data
CSC works similarly but with indptr
and indices
which are now column related.
A=sp.array([[1.,0.,2.,0.],
[0.,0.,0.,0.],
[3.,0.,0.,0.],
[1.,0.,0.,4.]])
AS=sps.csc_matrix(A)
print AS.data
print AS.indptr
print AS.indices
print AS.nnz
A=sp.array([[1.,0.,2.,0.],
[0.,0.,0.,0.],
[3.,0.,0.,0.],
[1.,0.,0.,4.]])
AS=sps.lil_matrix(A)
print "data:", AS.data
print "rows:", AS.rows
print "nnz:", AS.nnz
data
is a collection of lists such that data[k]
contains the non-zero elements of k
row of the matrix. If a row is empty, the corresponding list is empty.rows
is a collection of lists such that rows[k]
contains the indices of the columns of the corresponding elements in data
Altering and slicing are more efficiently executed on the LIL format:
AS[1,0]=17
BS=AS[1:3,0:2]
print BS.data, BS.rows
eye
, identity
, diag
, rand
have their sparse counterparts. sps.rand
has an additional argument to indicate the density of the generated random matrix. A dense matrix has density 1 while a zero matrix has density 0.
E=sps.eye(10,10,format='lil')
D=sps.spdiags(sp.ones((20,)),0,20,20,format='csr')
R=sps.rand(100,200,density=0.1,format='csc')
Element-wise operations can be executed only in CSR or CSC format.
AS.toarray
AS.tocsr
AS.tocsc
AS.tolil
sps.issparse(AS)
AS.todense()
Avoid using dot
command with sparse matrices since the result is unexpected. Other functions for specific tasks are defined in the module scipy.sparse.linalg
.
%matplotlib inline
import matplotlib.pyplot as plt
plt.imshow(R.todense(), interpolation='nearest')
from pylab import *
matshow(R.todense())
In numpy and scipy it is possible to read the number of bytes occupied by an array with th eattribute nbytes
. Hence:
sp.arange(100).nbytes
R.data.nbytes
R.data.nbytes + R.indptr.nbytes + R.indices.nbytes