Sparse Matrices

In several real life applications matrices have often very large dimensions and contain a large number of zero entries. Matrices with these characteristics are called sparse. The performance both in terms of memory space and efficiency of linear algebra operations with these matrices can be improved if zero entries are avoided.
An example of sparse matrix are diagonal matrices. They can be stored by storing only the diagonal as an array. The function that transforms an array into a diagonal matrix is diag.

Sparse Matrix Format

More in general sparse matrices can be stored by means of scipy.sparse in three different forms:

  • Compressed Sparse Row (CSR): used for matrix/matrix and matrix/vector operations
  • Compressed Sparse Column (CSC): used for matrix/matrix and matrix/vector operations
  • Row-Based Linked List Format (LIL): used for generating and altering sparse matrices

Compressed Sparse Row (CSR)

In [1]:
import scipy as sp
import scipy.sparse as sps
A=sp.array([[1.,0.,2.,0.],
            [0.,0.,0.,0.],
            [3.,0.,0.,0.],
            [1.,0.,0.,4.]])
AS=sps.csr_matrix(A)
print AS.data
print AS.indptr
print AS.indices
print AS.nnz
[ 1.  2.  3.  1.  4.]
[0 2 2 3 5]
[0 2 0 0 3]
5
  • data contains the elements of the matrix different from zero
  • indptr contains integers such that indptr[i] is the index of the element in data which is the first nonzero element of row $i$. If the entire row is zero then indptr[i]==indptr[i+1]. If the original matrix has $m$ rows then len(indptr)==m+1.
  • indices contains the column index information in such a way that indices[indptr[i]:indptr[i+1]] is an integer array with the column indexes of the nonzero elements in row $i$. len(indices==len(data)==nnz. For example: for the fourth row $i=3$, indptr[3] is 3 and indptr[4] is 5; hence the portion of indices[3:5] containing the indices of the columns for the elements in data from 3 to 5 are 0 and 3. Hence 1. goes in column 0 and 4. goes in column 3.
  • nnz is the number of elements in data

Compressed Sparse Column (CSC)

CSC works similarly but with indptr and indices which are now column related.

In [2]:
A=sp.array([[1.,0.,2.,0.],
            [0.,0.,0.,0.],
            [3.,0.,0.,0.],
            [1.,0.,0.,4.]])
AS=sps.csc_matrix(A)
print AS.data
print AS.indptr
print AS.indices
print AS.nnz
[ 1.  3.  1.  2.  4.]
[0 3 3 4 5]
[0 2 3 0 3]
5

Row-Based Linked List Format (LIL)

In [3]:
A=sp.array([[1.,0.,2.,0.],
            [0.,0.,0.,0.],
            [3.,0.,0.,0.],
            [1.,0.,0.,4.]])
AS=sps.lil_matrix(A)
print "data:", AS.data
print "rows:", AS.rows
print "nnz:", AS.nnz
data: [[1.0, 2.0] [] [3.0] [1.0, 4.0]]
rows: [[0, 2] [] [0] [0, 3]]
nnz: 5
  • data is a collection of lists such that data[k] contains the non-zero elements of k row of the matrix. If a row is empty, the corresponding list is empty.
  • rows is a collection of lists such that rows[k] contains the indices of the columns of the corresponding elements in data

Altering and slicing are more efficiently executed on the LIL format:

In [4]:
AS[1,0]=17
BS=AS[1:3,0:2]
print BS.data, BS.rows
[[17.0] [3.0]] [[0] [0]]

Generating sparse matrices

eye, identity, diag, rand have their sparse counterparts. sps.rand has an additional argument to indicate the density of the generated random matrix. A dense matrix has density 1 while a zero matrix has density 0.

In [5]:
E=sps.eye(10,10,format='lil')
D=sps.spdiags(sp.ones((20,)),0,20,20,format='csr')
R=sps.rand(100,200,density=0.1,format='csc')

Conversions

Element-wise operations can be executed only in CSR or CSC format.

In [6]:
AS.toarray
AS.tocsr
AS.tocsc
AS.tolil
sps.issparse(AS)
AS.todense()
Out[6]:
matrix([[  1.,   0.,   2.,   0.],
        [ 17.,   0.,   0.,   0.],
        [  3.,   0.,   0.,   0.],
        [  1.,   0.,   0.,   4.]])

Avoid using dot command with sparse matrices since the result is unexpected. Other functions for specific tasks are defined in the module scipy.sparse.linalg.

Plotting

In [7]:
%matplotlib inline
In [8]:
import matplotlib.pyplot as plt
plt.imshow(R.todense(), interpolation='nearest')
Out[8]:
<matplotlib.image.AxesImage at 0x7f5968986110>
In [9]:
from pylab import *
matshow(R.todense())
Out[9]:
<matplotlib.image.AxesImage at 0x7f5968190b50>

Memory occupation

In numpy and scipy it is possible to read the number of bytes occupied by an array with th eattribute nbytes. Hence:

In [10]:
sp.arange(100).nbytes
Out[10]:
800
In [11]:
R.data.nbytes
Out[11]:
16000
In [12]:
R.data.nbytes + R.indptr.nbytes + R.indices.nbytes
Out[12]:
24804