Part 4: Higher Mathematics

Matrices and linear algebra

Matrices are represented by arrays of rank 2, whereas vectors (in the more general sense of linear algebra, not geometrical vectors) are represented by arrays of rank 1. The module Numeric contains a few operations that are particularly useful for dealing with matrices and vectors:

dot(a, b) returns the dot product of a and b. This can be used with two vectors (scalar product), a matrix and a vector (matrix-vector product), or two matrices (matrix product). The last dimension of a must have the same length as the first dimension of b.
transpose(a) returns the transpose of the matrix a. This function can actually be applied to arrays of any rank and in general reverses the order of the dimensions.
identity(n) returns an nxn unit matrix.
ravel(a) returns a vector containing all elements of the array a. The order of the elements is defined by the first index varying most slowly, i.e. the elements of a 2x2 array are returned in the order (0,0), (0,1), (1,0), (1,1).
reshape(a, shape) returns an array of the specified shape (a tuple) containing the elements of the array a (in the same order as in ravel(a)). This is often useful to construct special matrices. The shape must lead to the same total number of elements. This restriction is lifted in the function resize(a, shape), which allows any shape. If more elements are needed than are contained in a, they are repeated from the start.

The module LinearAlgebra contains various common operations on matrices. Matrices are represented by two-dimensional arrays. Some operations require square matrices, and check their input accordingly. All operations are available for real and for complex matrices. The functions are based on LAPACK-routines, which are well tested and efficient.

The function solve_linear_equations(a, b) solves a system of linear equations with a square non-singular matrix a and a right-hand-side vector b. Several right-hand-side vectors can be treated simultaneously by making b a two-dimensional array (i.e. a sequence of vectors). The function inverse(a) calculates the inverse of the square non-singular matrix a by calling solve_linear_equations(a, b) with a suitable b.

The function eigenvalues(a) returns the eigenvalues of the square matrix a. The function eigenvectors(a) returns both the eigenvalues and the eigenvectors, the latter as a two-dimensional array (i.e. a sequence of vectors).

The function determinant(a) returns the determinant of the square matrix a.

The function singular_value_decomposition(a) returns three arrays V, S, and W^T whose matrix product is the original matrix a. V and W^T are unitary matrices (rank-2 arrays), whereas S is the vector (rank-1 array) of diagonal elements of the singular-value matrix. This function is mainly used to check whether (and in what way) a matrix is ill-conditioned.

The function generalized_inverse(a) returns the generalized inverse (also known as pseudo-inverse or Moore-Penrose-inverse) of the matrix a. It has numerous applications related to linear equations and least-squares problems.

The function linear_least_squares(a, b) returns the least-squares solution of an overdetermined system of linear equations. An optional third argument indicates the cutoff for the range of singular values (defaults to 10^-10). There are four return values: the least-squares solution itself, the sum of the squared residuals (i.e. the quantity minimized by the solution), the rank of the matrix a, and the singular values of a in descending order.

Example:

import Numeric; N = Numeric
import LinearAlgebra; LA = LinearAlgebra

# Create a matrix
a = N.reshape(N.arrayrange(25.), (5, 5)) + N.identity(5)

# Calculate the inverse
inv_a = LA.inverse(a)

# Verify the inverse by printing the largest absolute element
# of a * a^{-1} - identity(5)
print "Inversion error:", \
      N.maximum.reduce(N.fabs(N.ravel(N.dot(a, inv_a)-N.identity(5))))

# Print the eigenvalues of the matrix
print "Eigenvalues: ", LA.eigenvalues(a)

Note the way the abbreviations for the module names are created. Modules are just special data objects, so it is possible to assign them to variables.

Fourier Transforms

The module FFT contains functions that perform various variants of the Fast Fourier Transform in one and two dimensions, using the package FFTPACK. All functions can be applied to arrays of any rank; the dimension(s) being transformed can be specified and default to the last one or two.

The function fft(a) performs a standard complex one-dimensional FFT, and the function inverse_fft(a) performs the inverse transform. Both functions take an optional second argument specifying the length of the data sequence (by default the length of the array); if it is larger than the array, an appropriate number of zeros will be added. An optional third argument specifies the dimension along which the transform will be performed.

The function fft2d(a) performs a two-dimensional FFT. An optional second argument specifies the shape (a tuple of length two) of the subarray to be transformed; by default the whole array is transformed. An optional third argument (also a tuple of length two) specifies the two dimensions for the transformation.

Parameter fitting

The module LeastSquares provides the function leastSquaresFit(model, parameter, data) that performs general non-linear least-squares fit using the Levenberg-Marquardt algorithm.

The argument model specifies the function to be fitted. It will be called with two arguments: the first is a tuple containing the fit parameters, and the second is the first element of a data point (see below). The return value must be a number.

The argument parameter is a tuple of initial values for the fit parameters. The speed of convergence of the method depends on the quality of the initial values, so they should be chosen with care.

The argument data is a sequence of data points to which the model is to be fitted. Each data point is a sequence of length two or three. Its first element specifies the independent variables of the model. It is passed to the model function as its first parameter, but not used in any other way. The second element of each data point tuple is the number that the return value of the model function is supposed to match as well as possible. The third element (which defaults to 1.) is the statistical variance of the data point, i.e. the inverse of its statistical weight in the fitting procedure.

Example:

from LeastSquares import leastSquaresFit
import Numeric

# The mathematical model.
def exponential(parameters, x):
    a = parameters[0]
    b = parameters[1]
    return a*Numeric.exp(-b/x)

# The data to be fitted to.
data = [(100, 4.999e-8),
	(200, 5.307e+2),
	(300, 1.289e+6),
	(400, 6.559e+7)]

fit = leastSquaresFit(exponential, (1e13, 4700), data)

print "Fitted parameters:", fit[0]
print "Fit error:", fit[1]

Functions on a grid

In numerical calculations many functions are defined by a values on a grid of evaluation points. Evaluation of such functions for off-grid points requires interpolation. The module Scientific.Functions.Interpolation defines such functions and provides some common operations on them. The grids must be orthogonal, but do not have to be equidistant. A grid is defined by a set of axes, one per dimension, specifying the coordinates of the grid points.

A function is defined by InterpolatingFunction(grid, value), where grid is a sequence of axes whose length determines the dimensionality of the grid. The argument values must be an array with a rank equal to or higher than the dimensionality of the grid. If it is higher, then the function has multiple values (e.g. a vector field). An optional third argument specifies a value for the function outside the grid. If none is given, evaluation of the function outside the grid results in an error.

An interpolating function can be called like any other function. The number of arguments must be equal to the dimensionality of the grid, and the arguments must be integers or real numbers.

Interpolating functions offer some useful operations in addition to a simple evaluation:

function.derivative(variable) returns another interpolating function that is the (numerical) derivative with respect to the variable specified (default to zero).
function.integral(variable) returns another interpolating function that is the (numerical) indefinite integral with respect to the variable specified (default to zero).
function.definiteIntegral(variable) returns another interpolating function that is the (numerical) definite integral with respect to the variable specified (default to zero). This operation reduces the dimensionality by one; for a one-dimensional function, a simple number is returned.
function.selectInterval(first, last, variable) returns an interpolating function defined on a smaller grid, reduced to the interval from first to last along dimension variable. This is most useful for the calculation of definite integrals.

Example:

from Scientific.Functions.Interpolation import InterpolatingFunction
import Numeric; N = Numeric

# Define an axis with 10 points
axis = N.arrayrange(0, 1.1, 0.1)

# Calculate the values (here a simple square root)
values = N.sqrt(axis)

# Create the interpolating function
f = InterpolatingFunction((axis,), values)

# Print a table of values for comparison
for x in N.arrayrange(0, 1., 0.13):
    print "%5.2f: %10.5f %10.5f" % (x, N.sqrt(x), f(x))

Visualization (VRML)

Simple plots are not always sufficient for visualizing complex data, especially in three dimensions. The module Scientific.Visualization.VRML provides a straightforward way to construct three-dimensional scenes composed of standard objects (cubes, spheres, etc.). These scenes can be written to files in the VRML (Virtual Reality Modelling Language) format, to be viewed with a VRML viewer. VRML viewers are available for most systems. For more information on VRML and VRML viewers, see the VRML Repository at the San Diego Supercomputer Center.

If you want to display VRML scenes directly from your Python programs, instead of writing a file and running the VRML viewer manually, you must set the Unix environment variable VRMLVIEWER to the name of the VRML viewer you have installed.

Note: this course covers only the basic use of the modules VRML. For more details see the source code.

The typical sequence of operations is the following:

Create a VRML scene with Scene(objects), where objects is a list of VRML objects. You can pass an empty list to create an empty scene.
Create VRML objects (see below) and add them to the scene with scene.addObject(object).
Write the scene to a file with scene.writeToFile(filename), or view it directly using scene.view().

VRML objects are defined by geometrical data, e.g. a sphere is defined by a center (a vector) and a radius (a number). In addition, objects can have one or more properties that affect their appearance. The most important property is the material, which defines color, transparency, reflectivity, etc. In this course, only diffuse materials in various predefined colors are used, i.e. non-transparent non-reflecting materials. See a book on VRML for the full possibilities.

Attributes are always optional and specified by keyword parameters. The material, for example, is specified by material=the_material. An object with no material specification has a white diffuse material by default.

A material is created with DiffuseMaterial(color), where color is the color name. The available colors are black, white, grey, red, green, blue, yellow, magenta, and cyan. All of these can be prefixed with 'light ' or 'dark ' (note the space), e.g. 'light blue'.

The available VRML objects are:

Sphere(center, radius) creates a sphere.
Cube(center, edge) creates a cube centered around a given point. The edges are oriented along the x, y, and z axes.
Cylinder(point1, point2, radius) creates a cylinder whose axis runs from point1 to point2 with the radius specified.
Cone(point1, point2, radius) creates a cone whose base is a circle around point1 with radius radius and whose tip is at point2.
Line(point1, point2) creates a line from point1 to point2.
Arrow(point1, point2, radius) creates an arrow from point to point2 with a shaft radius of radius.

The following example program reads a PDB file and constructs a scene consisting of a black sphere for each C-alpha atom and red lines connecting neighbouring C-alpha atoms.

from Scientific.IO.TextFile import TextFile
from Scientific.IO.FortranFormat import FortranFormat, FortranLine
from Scientific.Geometry import Vector
import string
from Scientific.Visualization import VRML

generic_format = FortranFormat('A6')
atom_format = FortranFormat('A6,I5,1X,A4,A1,A3,1X,A1,I4,A1,' +
                            '3X,3F8.3,2F6.2,7X,A4,2A2')

# Read the PDB file and make a list of all C-alpha positions
c_alpha_positions = []
for line in TextFile('protein.pdb'):
    record_type = FortranLine(line, generic_format)[0]
    if record_type == 'ATOM  ' or record_type == 'HETATM':
        data = FortranLine(line, atom_format)
        atom_name = string.strip(data[2])
        position = Vector(data[8:11])
	if atom_name == 'CA':
	    c_alpha_positions.append(position)

# Create the VRML scene and the materials and radii for spheres and lines
scene = VRML.Scene([])
c_alpha_material = VRML.DiffuseMaterial('black')
c_alpha_radius = 0.2
line_material = VRML.DiffuseMaterial('red')

# Create the spheres and put them into the scene
for point in c_alpha_positions:
    scene.addObject(VRML.Sphere(point, c_alpha_radius,
				material=c_alpha_material))

# Create the lines and put them into the scene
for i in range(len(c_alpha_positions)-1):
    point1 = c_alpha_positions[i]
    point2 = c_alpha_positions[i+1]
    scene.addObject(VRML.Line(point1, point2, material=line_material))

# View the scene
scene.view()

Other scientific modules

There are more modules for scientific applications which are not covered in this course. See the manual for documentation.

Exercises

Write a function that applies a mathematical function (exp, sqrt, etc.) to a square matrix by diagonalization. The steps are: 1) find the eigenvalues and eigenvectors of the matrix 2) apply the mathematical function to the eigenvalues 3) undo the diagonalization by multiplying with the eigenvector matrix from both sides.
Test this program by calculating the exponential of a matrix alternatively from the Taylor expansion.
Write a program that reads two PDB files containing two conformations of the same protein and produces a VRML scene containing an arrow from conformation 1 to conformation 2 for each C-alpha atom.
Hint: put the code for extracting the C-alpha positions from a file into a function and call this function twice.

Table of Contents