Part 3: Advanced calculations

Mappings

Mappings store associations between a set of keys and a set of values. They can be regarded as generalizations of sequences, because sequences are like mappings whose keys are a range of integers. Mappings allow more general keys, but define no order of their elements. There are other essential differences between sequences and mappings, so the two should be kept clearly separate.

Mappings are accessed much like sequences: mapping[key] returns the value associated with the key, and mapping[key]=value changes it. A list of all keys is available with mapping.keys(), and a list of all values with mapping.values(). A list of (key, value) tuples can be obtained with mapping.items().

Dictionaries

The most frequently used mappings (and the only kind discussed in this course) are dictionaries. They allow any non-mutable object to be a key, i.e. any object that cannot change during its lifetime. This means that lists, dictionaries, and arrays cannot be used as keys, whereas integers, tuples, and strings (among many others) are allowed. Values can be arbitrary objects.

The following example creates a dictionary containing the atomic masses of some chemical elements. It then adds another entry and finally calculates the mass of a water molecule.

atomic_mass = {'H': 1., 'C': 12, 'S': 32}
atomic_mass['O'] = 16.
print atomic_mass['O'] + 2*atomic_mass['H']

A dictionary entry can be deleted with del dictionary[key].

Arrays

Arrays are another kind of sequence object, having special properties optimized for numerical applications. Arrays and functions dealing with them are defined in the module Numeric, which also contains the math functions introduced earlier. It is a good idea to begin an interactive session for numerical calculations by typing from Numeric import *.

Arrays have two properties that distinguish them from other sequences: they can be multidimensional, and their elements are all of the same type, either integers or real numbers or complex numbers. (There are also arrays of characters, single-precision real numbers, and general Python objects, but they will not be discussed in this course.) The number of dimensions is limited to fourty.

Arrays are created from other sequences by the function Numeric.array. Multidimensional arrays are created from nested sequences. An optional second argument indicates the type of the array (Int, Float, or Complex). If the type is not indicated, it is inferred from the data.

The following example creates a one-dimensional array of integers and a three-dimensional array of real numbers:

from Numeric import *

integer_array = array([2, 3, 5, 7])
print integer_array.shape

real_array = array([ [ [0., 1.],
                       [2., 3.] ],
                     [ [4., 5.],
                       [6., 7.] ],
                     [ [8., 9.],
                       [10., 11.] ] ])
print real_array.shape

The shape of an array, which is printed in the example, is a tuple containing the lengths of the dimensions of the array, (4,) for the first array and (3, 2, 2) for the second one. The number of dimensions, i.e. the length of the shape, is called the rank of the array. The standard number objects (integers, real and complex numbers) can be regarded as arrays of rank zero.

There are a few functions in module Numeric which create special arrays. The function zeros(shape, type) returns an array of the specified shape (a tuple) with all elements set to zero. The second argument specifies the type (Integer, Float or Complex) and is optional; the default is Integer. The function ones(shape, type) works analoguously and sets the elements to one. The function arrayrange(first, last, step) works much like range, but returns an array (of rank one) and accepts real-number arguments in addition to integers.

Arrays support an extended form of the indexing used for sequences. In addition to the forms a[i] (single element) and a[i:j] (subarray from i to j), there is the form a[i:j:k] that extracts a subarray from i to j with an increment of k. For multidimensional arrays, the indices/ranges are separated by commas. If there are fewer indices than dimensions, indices are assigned starting from the left. Indexed arrays can also be used as the target of an assignment, as for lists.

Here are some examples of array indices:

import Numeric

# Create a zero array
a = Numeric.zeros((4, 2, 3), Numeric.Float)

# Set a specific element to one
a[2, 1, 0] = 1.

# Set all elements with first index 0 to 2.5
a[0] = 2.5

# Print all elements with first index 0 and last index 1
print a[0,:,1]

There are two special indices that do not directly select elements. The special index ... (three dots) "skips" dimensions such that indices following it are assigned starting from the right. For example, a[..., 0] selects all elements with 0 as their last index, and works for any number of dimensions. The special index NewAxis (defined in module Numeric) inserts a new dimension of length one. For example, if a has the shape (2, 2), then a[:, Numeric.NewAxis, :] has the shape (2, 1, 2) and the same elements as a.

Arrays can also be used as sequences, e.g. in loops. In such a situation, the first index becomes the sequence index, and the elements of the sequences are arrays with a smaller rank. This is a consequence of the rule to assign indices to dimensions starting from the left.

The standard arithmetic operations (addition, multiplication, etc.) and the mathematical functions from the module Numeric can be applied to arrays, which means that they are applied to each element of the array. For binary operations (addition etc.), the arrays must have matching shapes. However, "matching" does not mean "equal". It is possible, for example, to multiply a whole array by a single number, or to add a row to all rows of a matrix. The precise matching rule is the following: Compare the shapes of the two arrays element by element, starting from the right and continuing until the smaller one is exhausted; if for all dimensions the lengths are equal or one of them is one, then the arrays match. A less rigorous description is that dimensions of length one are repeated along the corresponding dimension of the other array.

Example:

a = array([[1, 2]])               # shape (1, 2)
b = array([[10], [20], [30]])     # shape (3, 1)

When you ask for a+b, a is repeated three times along the first axis, and b is repeated twice along the second axis, giving

a = array([[1, 2], [1, 2], [1, 2]])
b = array([[10, 10], [20, 20], [30, 30]])

Now both arrays have the same shape, and a+b is array([[11, 12], [21, 22], [31, 32]]). Of course the arrays are not physically replicated, so there is no risk of running out of memory.

Binary operations also exist as functions (in module Numeric): add(a, b) is equivalent to a+b, and subtract, multiply, divide, and power can be used instead of the other binary operators. There are some more binary operations that exist only as functions:

maximum(a, b), minimum(a, b) larger/smaller of a and b

equal(a, b), not_equal(a, b) equality test (returns 0/1)

greater(a, b), greater_equal(a, b) comparison

less(a, b), less_equal(a, b) comparison

logical_and(a, b), logical_or(a,b) logical and/or

logical_not(a) logical negation

The binary operations in this list can also be applied to combinations of elements of a single array. For example, add.reduce(a) calculates the sum of all elements of a along the first dimension, and minimum.reduce(a) returns the smallest element in a. An optional second argument indicates the dimension explicitly. It can be positive (0 = first dimension) or negative (-1 = last dimension). A variation is accumulate: add.accumulate(a) returns an array containing the first element of a, the sum of the first two elements, the sum of the first three elements, etc. The last element is then equal to add.reduce(a).

Another way to derive an operation from binary functions is outer. add.outer(a, b) returns an array with the combined dimensions of a and b whose elements are all possible sum combinations of the elements of a and b.

More array operations will be described later.

Functions

You have already used many Python functions, and may have wished to write your own. Functions allow you to write (and test) a certain piece of code once and then use it again and again. This course will only treat functions defined in Python; it is also possible to define functions in C or compatible low-level languages.

The following code example defines a function called distance that calculates the distance between two points in space, and calls this function for two arbitrary vectors:

from Scientific.Geometry import Vector

def distance(r1, r2):
    return (r1-r2).length()

print distance(Vector(1., 0., 0.), Vector(3., -2., 1.))

You can imagine a function call as consisting of three steps:

The arguments in the function call are assigned to the corresponding variables in the function definition.
The code in the function is executed.
The value given to the return statement is put in the place of the original function call.

Functions can take any number of arguments (including zero), and return any number of values (including zero). They can also define any number of variables. These variables are local to the function, i.e. they have no relation to variables of the same name in another function or outside any function. However, function may use (but not change) the value of variables defined outside them. The function argument variables are also local to the function. Inside the function, they can be used like any other variable.

The following function converts cartesian to polar coordinates. It takes two arguments and returns two values. It also defines local variables.

import Numeric

def cartesianToPolar(x, y):
    r = Numeric.sqrt(x**2+y**2)
    phi = Numeric.arccos(x/r)
    if y < 0.:
        phi = 2.*Numeric.pi-phi
    return r, phi

radius, angle = cartesianToPolar(1., 1.)
print radius
print angle

Functions are just a special kind of data objects. Like all other data objects, they can be assigned to variables, stored in lists and dictionaries, or passed as arguments to other function. The following example shows a function that prints a table of values of another function that it receives as an argument:

import Numeric

def printTable(function, first_value, last_value, step):
    for x in Numeric.arrayrange(first_value, last_value, step):
	print x, function(x)


def someFunction(x):
    return Numeric.sin(x**2)

printTable(someFunction, 0., 5., 0.1)

Modules

Until now all programs have been written as single files containing everything necessary, and importing definitions from standard modules. A function to be used in several programs would have to be copied to several files, which is not very practical. Such generally useful definitions should be collected in modules.

Python makes no difference between its standard library and any other modules; the standard library just happens to come with the interpreter. The mechanisms for importing from modules are the same, irrespective of the origin.

A module is simply a file containing definitions (variables, functions, etc.). The name of the file must end in .py; everything before that makes up the module name. The command import Numeric, for example, looks for a file called Numeric.py and executes it, making the resulting definitions available to the importing program.

However, Python does not search the whole file system for modules; that would be inefficient and dangerous. It uses a strategy similar to that used by Unix to locate programs: an environment variable called PYTHONPATH can be set before running Python, and its value must be a sequence of directory names separated by colons. Python looks for modules in those directories in the specified order; if a module is not found there, Python tries its standard library.

Some of the module names that have been used before contain one or several dots. These are modules contained in packages. A package is a module that contains other modules. Packages are used to structure libraries and to prevent name clashes between libraries written by different people. The only package used so far is Scientific, which contains the modules Geometry and IO (plus others). And these modules are themselves packages; IO for example contains the modules TextFile and FortranFormat that have been used before. A full description of the package Scientific can be found in its manual.

Programs with parameters

You may also want to write Python programs for regular use that take parameters from the shell command line, i.e. you might want to run a program as

python my_program.py a b c

All you need to achieve this is a way to access the command line parameters. The module sys contains a variable argv whose value is a list of all parameters, starting with the name of the Python program. So in the example above, sys.argv would be ['my_program.py', 'a', 'b', 'c'].

On Unix systems, you can even make Python programs directly executable. This requires two steps: first, add

#!/usr/local/bin/python

as the first line of your Python program. You must change the path if Python is installed somewhere else on your system. Second, make the file executable with the Unix utility chmod.

The magical line shown above is ignored by Python, since it is a comment (a line starting with #). But it is recognized by the Unix shells as an indicator of the program that must be run to execute the file.

Statistics

The module Scientific.Statistics contains some elementary statistical analysis functions.

The functions average(data), variance(data) and standardDeviation(data) all take a sequence of data values and return the statistical quantity indicated by their name.

The submodule Histogram defines a histogram object. Histogram(data, number_of_bins) takes a sequence of data and calculates a histogram by dividing the range of the data into the specified number of bins and counting how many data values fall within each bin interval. An optional third argument is a tuple specifying the range to be analyzed (lower and upper bound). A histogram is an array of rank two representing a sequence of bins, with each bin being described by its center and its count.

Plotting

There is no full-featured plotting module for Python yet. For the moment, plotting is delegated to external programs. There are two modules that contain a function for simple line plots: Scientific.Mathematica and Gnuplot. They use the external program indicated by their name. Both modules define a function called plot, and both versions behave in a compatible way.

The function plot takes an arbitrary number of arguments, each specifying a dataset. All datasets will be plotted in the same graph. A dataset is a sequence of points. A point is specified by a two-element sequence (x and y coordinates) or by a simple number, which is interpreted as the y coordinate, the x coordinate being the index of the point in the sequence.

The function plot also takes an optional keyword argument of the form file='something.ps'. When this argument is present, the plot will be written to the specified file in PostScript format.

Example:

from Gnuplot import plot
from Scientific.Statistics.Histogram import Histogram
from Numeric import arrayrange, sqrt

plot(Histogram(sqrt(arrayrange(100.)), 10))

Exercises

Write a function that takes a sequence of amino acids and returns its mass. Make the function accept either a character string containing one-letter codes (e.g. 'AKG') or a sequence of three-letter strings (e.g. ['ala', 'lys', 'gly']). Also make sure that it accepts lower- and uppercase letters. Put this function into a module and write a simple program that imports and uses it.
Write a module containing two functions. One function should read a file containing a text representation of a matrix, i.e. a certain number of lines each containing the same number of numbers separated by spaces. The function should return an array containing this data. Note that the function should work for matrices of arbitrary sized. The second function should do the inverse: write an array of rank two to a text file.
Write a program that processes experimental data. Suppose the data consists of 50 files named "data1" to "data50". Each file contains two parameters in the first line; the remaining lines each contain a time value and a measured quantity. Assume that all files have the same length and that the time values agree. Produce another time series defined by the first parameter multiplied by the square root of the measured value, summed over all files. Plot this time series.
Hint: Use the module written in the last exercise.
To test the program, write another program that generates fictitious test data for which you can predict the result easily.

Table of Contents

`maximum(a, b)`, `minimum(a, b)`	larger/smaller of a and b
`equal(a, b)`, `not_equal(a, b)`	equality test (returns 0/1)
`greater(a, b)`, `greater_equal(a, b)`	comparison
`less(a, b)`, `less_equal(a, b)`	comparison
`logical_and(a, b)`, `logical_or(a,b)`	logical and/or
`logical_not(a)`	logical negation