Part 5: Make your own data types

What is a data type?

A data type is a specific collection of data with a specific set of operations. For example, an integer is a collection of bits with arithmetic operations, and a vector is a collection of three real numbers with operations like addition, cross product, or normalization.

Python allows the definition of new data types that can do anything that the built-in types can do. From a user's point of view there is no difference. In fact, many of the data types used in this course are defined in Python: vectors, text files, interpolating functions, etc. It is also possible to define new data types in C; the array data type from the module Numeric is an example.

Python provides a set of standard operations that any data type can use if appropriate. These include arithmetic, indexing, function call, etc. In addition, data types can provide arbitrary operations in the form of methods. Methods are like functions, but they depend on a specific data object. You have already seen many examples: append is a method defined for lists, close is a method defined for files, etc.

There are many reasons for defining new data types. They help to keep programs readable - it is much clearer to write (a+b)/2 for two vectors a and b than to write something like vector_scale(vector_add(a, b), 0.5), as many programming languages require. Type definitions also help to reduce the dependence between different parts of a program. A program using vectors doesn't have to know how vectors store their data (they could use a list, a tuple, an array, or three separate variables). So if the storage method is changed for some reason, other modules will not be affected.

Object-oriented programming

It is possible to write a complete program exclusively by defining data types, ranging from low-level general-purpose data types like vectors to high-level data types describing application-specific objects (e.g. molecules, force fields, wavefunctions, etc.). This technique is known as object-oriented programming. Experience has shown that it is generally better than the traditional procedural style (structuring code according to functions and subroutines), because it results in code that is easier to understand and easier to extend and modify due to a greater independence between the different parts of a complete program. Python supports most of the techniques that are commonly used in object-oriented programming.

The most difficult part of writing large object-oriented software systems is deciding on the data types to be used. As a rule of thumb, data types representing mathematical entities (such as arrays or functions) and data types representing physical entities (molecules, wavefunctions, etc.) are a good choice. But more abstract data types are equally important, for examples data types that represent common data structures (lists, stacks, etc.) or algorithms. Object-oriented design is best learned by experience; whenever you write a non-trivial program (i.e. more than a few lines), consider doing it in an object-oriented way. There is also an extensive literature on the subject.

Classes

A definition of a new data type is called a class. A class defines all the operations of a type in the form of methods (standard operations like arithmetic are mapped to methods with special names). It also defines the initialization of a new object.

The following example shows a part of the definition of the class Vector. Only initialization, addition, and length calculation are shown explicitly, and some operations are simplified (less general). Look at the source code of the module Scientific.Geometry to see the complete definition.

import Numeric

class Vector:

    def __init__(self, x, y, z):
        self.array = Numeric.array([x,y,z])

    def __add__(self, other):
	sum = self.array+other.array
	return Vector(sum[0], sum[1], sum[2])

    def length(self):
	return Numeric.sqrt(Numeric.add.reduce(self.array*self.array))

Methods are defined like functions, and also behave much like functions. However, their first argument has a special meaning: it stands for the object on which the method is called. This argument is by convention called self, but you could use any other name instead. In the method call v.length() (assuming v is a vector), the variable self gets the value of v.

The methods whose names begin and end with a double underscore have a special meaning; they are not normally called explicitly (although they can). The most important special method is __init__, which is called immediately after an object has been created. The expression Vector(1., 0., 1.) creates a new vector object and then calls its method __init__, which stores the three coordinates in an array that is assigned to a local variable of the new object.

The arithmetic operations are also implemented as special methods. The expression a+b is equivalent to a.__add__(b), and the other operations have similar equivalents; see the Python Language Reference for details.

There are more special methods that implement indexing, copying, printing, etc. Only the methods that make sense must be implemented, and only if the default behaviour is not sufficient. For vectors, for example, it makes sense to define a printed representation that shows the values of the coordinates. This is achieved by adding another special method:

    def __repr__(self):
        return 'Vector(%s,%s,%s)' % (`self.array[0]`,
    				     `self.array[1]`,`self.array[2]`)

Attributes

An object in Python can have any number of attributes, which are much like variables, except that they are attached to a specific object, whereas variables are attached to modules or functions. In fact, variables defined in modules are nothing but attributes of module objects. The notation for accessing attributes is always object.attribute. Method names are attributes too, just like function names are variables.

Unlike many other object-oriented languages, Python does not provide access control to attributes. Any code can use and even change any attribute in any object. For example, you can run

import Numeric
Numeric.sqrt = Numeric.exp
to make sqrt behave like exp. Obviously this is not a good idea, but Python does not try to protect you against your own stupidity. Of course there is a certain chance of accidentally changing an attribute, but in practice this is not a problem.

"Everything is an object": the Python universe

Python is a very consistent language. Its world view consists of nothing but objects, names, and name spaces. All data is kept in objects, but modules, functions, classes, and methods are also objects. Objects can be assigned to names, and names reside in name spaces. Every object has an associated name space for its attributes. In addition, functions and methods provide a temporary name space during execution (for local variables).

There are a few rules that decide in which name space a certain name resides. Definitions in modules end up in the attribute name space of the module object. Definitions within functions and methods are made in the temporary execution name space, but code in a function can also use (but not assign to) names in the surrounding module. Definitions in classes go to the attribute name space of the class. Finally, an object that is constructed from a class (a class instance) has of course its own attribute name space, in which all assignments happen, but when an attribute is requested that is not in this name space, it is searched for in the class name space; this is how methods are normally found.

Specializing and extending classes

Often there are several data types that have something in common. One might be a specialization of another one; one could for example introduce normalized vectors as a special kind of vector. Or several data types could share many operations, but differ in certain features. For example, one could define data types representing scalar and vector fields, which would share some properties (e.g. being defined on a grid) but have specific operations like "gradient" for scalar fields and "divergence" for vector fields. Both data types would be implemented as specializations of a data type "field", which would define the common behaviour but not be used directly in programs (this is sometimes called an abstract class).

The technique for treating specialization is called inheritance. A class can inherit methods from another class, substitute those that require modification, and add some of its own. The main advantage is avoiding redundant code, which is an important source of mistakes and of course also a waste of memory.

The following code defines a class representing directions in space, i.e. vectors with length one. based on the vector class defined above:

class Direction(Vector):

    def __init__(self, x, y, z):
        Vector.__init__(self, x, y, z)
        self.array = self.array/self.length()
The only method being redefined is initialization, which now normalizes the vector. Note that the initialization method first calls the initialization method of the class Vector and then applies the normalization.

The class Direction inherits all the operation from Vector, which act as if their code were repeated in the new class. In particular, the sum of two directions will be a vector, not another direction. To obtain a normalized result, the method __add__ would have to be redefined as well. But since addition of directions is not such a useful operation, it might not be worth the effort.

Error handling

When an error occurs, Python prints a stack trace (a list of all active functions at the time the error occurred) and stops. This is often useful, but not always. You might want to deal with errors yourself, e.g. print a warning, ask for user input, or do some different calculation. Python allows any code to catch specific error conditions and deal with them in whatever way necessary.

To identify an error type, Python has several built-in error objects, for example ValueError (indicating that a value is unsuitable for an operation, e.g. negative numbers for a square root) or TypeError (indicating an unsuitable data type, e.g. when asking for the logarithm of a character string). A program can catch a specific error object, a specific collection, or any error.

The general form of error catching is

try:
    x = someFunction(a)
    anotherFunction(x)
except ValueError:
    print "Something's wrong"
else:
    print "No error"
The code after try: is executed first. If a ValueError occurs, then the code after except ValueError: is executed, otherwise the code after else:; this part is optional To catch several error types, use a tuple (e.g. except (ValueError, TypeError). To catch all error, use a blank except:. To deal with several error types in a different way, use several except ...: code parts. For further details and possibilities, see the language reference manual

Of course Python programs can also generate errors, by using the statement raise ErrorObject, or raise ErrorObject, "Explanation" to add an explanation for the user. The ErrorObject can be any of the predefines error objects, but also a string. Many modules define their own error types as strings and let other modules import them:

# Module A

AError = "Error in module A"

def someFunction(x):
    raise AError
# Module B

import A

try:
    A.someFunction(0)
except A.AError:
    print "Something went wrong"

Exercises


Table of Contents