R-perl-python-bioinformatics

How to debug in Python?

March 26, 2024 Off By admin
Shares

 

Testing a program to find errors usually takes much more time than to write the code. This appendix is devoted to tools and good habits for effective debugging. Section 1 describes the Python debugger, a key tool for examining the internal workings of a code, while Section 2 explains how solve problems and write software to simplify the debugging process.

 Using a debugger

A debugger is a program that can help you to find out what is going on in a computer program. You can stop the execution at any prescribed line number, print out variables, continue execution, stop again, execute statements one by one, and repeat such actions until you have tracked down abnormal behavior and found bugs.

Here we shall use the debugger to demonstrate the program flow of the code Simpson.py1 (which can integrate functions of one variable with the famous Simpson’s rule). This development of this code is explained in Section 4.2. You are strongly encouraged to carry out the steps below on your computer to get a glimpse of what a debugger can do.

Step 1. Go to the folder src/funcif2 where the program Simpson.py resides.

Step 2. If you use the Spyder Integrated Development Environment, choose Debug on the Run pull-down menu. If you run your programs in a plain terminal window, start IPython:

 

Terminal

Terminal> ipython

Run the program Simpson.py with the debugger on (-d):

In [1]: run -d Simpson.py

We now enter the debugger and get a prompt

ipdb>

After this prompt we can issue various debugger commands. The most important ones will be described as we go along.

Step 3. Type continue or just c to go to the first line in the file. Now you can see a printout of where we are in the program:

1—> 1 def Simpson(f, a, b, n=500): 2 “””

3 Return the approximation of the integral of f

Each program line is numbered and the arrow points to the next line to be executed. This is called the current line.

Step 4. You can set a break point where you want the program to stop so that you can examine variables and perhaps follow the execution closely. We start by setting a break point in the application function:

ipdb> break application

Breakpoint 2 at /home/…/src/funcif/Simpson.py:30

You can also say break X, where X is a line number in the file.

Step 5. Continue execution until the break point by writing continue or c. Now the program stops at line 31 in the application function:

ipdb> c

> /home/…/src/funcif/Simpson.py(31)application()

2 30 def application():

—> 31 from math import sin, pi

32 print ’Integral of 1.5*sin^3 from 0 to pi:’

2http://tinyurl.com/pwyasaa/funcif

Step 6. Typing step or just s executes one statement at a time:

ipdb> s

31

—> 32

33

> /home/…/src/funcif/Simpson.py(32)application()

from math import sin, pi

print ’Integral of 1.5*sin^3 from 0 to pi:’ for n in 2, 6, 12, 100, 500:

ipdb> s

Integral of 1.5*sin^3 from 0 to pi:

> /home/…/src/funcif/Simpson.py(33)application()

32

—> 33

34

for n in 2, 6, 12, 100, 500:

approx = Simpson(h, 0, pi, n)

print ’Integral of 1.5*sin^3 from 0 to pi:’

Typing another s reaches the call to Simpson, and a new s steps into the function

Simpson:

ipdb> s

–Call–

> /home/…/src/funcif/Simpson.py(1)Simpson() 1—> 1 def Simpson(f, a, b, n=500):

2 “””

3 Return the approximation of the integral of f

Type a few more s to step ahead of the if tests.

Step 7. Examining the contents of variables is easy with the print (or p) command:

ipdb> print f, a, b, n

<function h at 0x898ef44> 0 3.14159265359 2

We can also check the type of the objects:

ipdb> whatis f Function h ipdb> whatis a

<type ’int’> ipdb> whatis b

<type ’float’> ipdb> whatis n

<type ’int’>

Step 8. Set a new break point in the application function so that we can jump directly there without having to go manually through all the statements in the Simpson function. To see line numbers and corresponding statements around some line with number X, type list X. For example,

ipdb> list 32

27 def h(x):

28 return (3./2)*sin(x)**3

2

29

30 from math import sin, pi 31

  1. def application():
  2. print ’Integral of 1.5*sin^3 from 0 to pi:’ 34 for n in 2, 6, 12, 100, 500:

35 approx = Simpson(h, 0, pi, n)

36 print ’n=%3d, approx=%18.15f, error=%9.2E’ % \

37 (n, approx, 2-approx)

We set a line break at line 35:

ipdb> break 35

Breakpoint 3 at /home/…/src/funcif/Simpson.py:35

Typing c continues execution up to the next break point, line 35.

Step 9. The command next or n is like step or s in that the current line is executed, but the execution does not step into functions, instead the function calls are just performed and the program stops at the next line:

ipdb> n

> /home/…/src/funcif/Simpson.py(36)application()

3 35 approx = Simpson(h, 0, pi, n)

—> 36 print ’n=%3d, approx=%18.15f, error=%9.2E’ % \

ipdb> print approx, n 1.9891717005835792 6

37

(n, approx, 2-approx)

Step 10. The command disable X Y Z disables break points with numbers X, Y, and Z, and so on. To remove our three break points and continue execution until the program naturally stops, we write

ipdb> disable 1 2 3 ipdb> c

n=100, approx= 1.999999902476350, error= 9.75E-08 n=500, approx= 1.999999999844138, error= 1.56E-10

In [2]:

At this point, I hope you realize that a debugger is a very handy tool for monitoring the program flow, checking variables, and thereby understanding why errors occur.

 How to debug

Most programmers will claim that writing code consumes a small portion of the time it takes to develop a program: the major portion of the work concerns testing the program and finding errors.

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. Brian W. Kernighan, computer scientist, 1942-.

Newcomers to programming often panic when their program runs for the first time and aborts with a seemingly cryptic error message. How do you approach the art of debugging? This appendix summarizes some important working habits in this respect. Some of the tips are useful for problem solving in general, not only when writing and testing Python programs.

A recipe for program writing and debugging

  1. Understand the problem. Make sure that you really understand the task the program is supposed to solve. We can make a general claim: if you do not understand the problem and the solution method, you will never be able to make a correct program. It may be argued that this claim is not entirely true: sometimes students with limited understanding of the problem are able to grab a similar program and guess at a few modifications, and actually get a program that works. But this technique is based on luck and not on understanding. The famous Norwegian computer scientist Kristen Nygaard (1926-2002) phrased it precisely: Programming is understanding. It may be necessary to read a problem description or exercise many times and study relevant background material before starting on the programming part of the problem solving process.
  2. Work out examples. Start with sketching one or more examples on input and output of the program. Such examples are important for control- ling the understanding of the purpose of the program, and for verifying the implementation.
  3. Decide on a user interface. Find out how you want to get data into the program. You may want to grab data from the command-line, a file, or a dialog with questions and answers.
  4. Make algorithms. Identify the key tasks to be done in the program and sketch rough algorithms for these. Some programmers prefer to do this on a piece of paper, others prefer to start directly in Python and write Python-like code with comments to sketch the program (this is easily developed into real Python code later).
  5. Look up information. Few programmers can write the whole program without consulting manuals, books, and the Internet. You need to know and understand the basic constructs in a language and some fundamental problem solving techniques, but technical details can be looked up.

The more program examples you have studied (in this book, for instance), the easier it is to adapt ideas from an existing example to solve a new problem.

  1. Write the program. Be extremely careful with what you write. In particular, compare all mathematical statements and algorithms with the original mathematical expressions.

In longer programs, do not wait until the program is complete before you start testing it, test parts while you write.

  1. Run the program. If the program aborts with an error message from Python, these messages are fortunately quite precise and helpful. First, locate the line number where the error occurs and read the statement, then carefully read the error message. The most common errors (exceptions) are listed below.

SyntaxError: Illegal Python code.

File “somefile.py”, line 5 x = . 5

^

SyntaxError: invalid syntax

Often the error is precisely indicated, as above, but sometimes you have to search for the error on the previous line.

NameError: A name (variable, function, module) is not defined.

File “somefile.py”, line 20, in <module> table(10)

File “somefile.py”, line 16, in table

value, next, error = L(x, n)

File “somefile.py”, line 8, in L exact_error = log(1+x) – value_of_sum

NameError: global name ’value_of_sum’ is not defined

Look at the last of the lines starting with File to see where in the program the error occurs. The most common reasons for a NameError are

  • a misspelled name,
  • a variable that is not initialized,
  • a function that you have forgotten to define,
  • a module that is not imported.

TypeError: An object of wrong type is used in an operation.

File “somefile.py”, line 17, in table value, next, error = L(x, n)

File “somefile.py”, line 7, in L

first_neglected_term = (1.0/(n+1))*(x/(1.0+x))**(n+1) TypeError: unsupported operand type(s) for +: ’float’ and ’str’

Print out objects and their types (here: print x, type(x), n, type(n)), and you will most likely get a surprise. The reason for a TypeError is often far away from the line where the TypeError occurs.

ValueError: An object has an illegal value.

File “somefile.py”, line 8, in L y = sqrt(x)

ValueError: math domain error

Print out the value of objects that can be involved in the error (here: print x).

IndexError: An index in a list, tuple, string, or array is too large.

File “somefile.py”, line 21 n = sys.argv[i+1]

IndexError: list index out of range

Print out the length of the list, and the index if it involves a variable (here:

print len(sys.argv), i).

  1. Verify the results. Assume now that we have a program that runs without error messages from Python. Before judging the results of the program, set precisely up a test case where you know the exact solution. This is in general quite difficult. In complicated mathematical problems it is an art to construct good test problems and procedures for providing evidence that the program works.

If your program produces wrong answers, start to examine intermediate results. Never forget that your own hand calculations that you use to test the program may be wrong!

  1. Use a debugger. If you end up inserting a lot of print statements in the program for checking intermediate results, you might benefit from using a debugger as explained in Section 1.

Some may think that this list of nine points is very comprehensive. How- ever, the recipe just contains the steps that you should always carry out when developing programs. Never forget that computer programming is a difficult task.

Program writing is substantially more demanding than book writing. Why is it so? I think the main reason is that a larger attention span is needed when working on a large computer program than when doing other intellectual tasks. Donald Knuth [2, p. 18], computer scientist, 1938-.

Application of the recipe

Let us illustrate the points above in a specific programming problem: implemen- tation of the Midpoint rule for numerical integration. The Midpoint rule for

approximating an integral b f (x)dx reads

n

a

Σ 1 b a

I = h f (a + (i − )h), h = . (1)

i=1 2 n

We just follow the individual steps in the recipe to develop the code.

      1. Understand the problem. In this problem we must understand how to program the formula (1). Observe that we do not need to understand how the formula is derived, because we do not apply the derivation in the program. What is important, is to notice that the formula is an approximation of an integral. Comparing the result of the program with the exact value of the integral will in general show a discrepancy. Whether we have an approximation error or a programming error is always difficult to judge. We will meet this difficulty below.
      2. Work out examples. As a test case we choose to integrate

f (x) = sin1(x) . (2)

between 0 and π. From a table of integrals we find that this integral equals

x sin1(x) + 1 x2 π . (3)

h √ − i

0

The formula (1) gives an approximation to this integral, so the program will (most likely) print out a result different from (3). It would therefore be very helpful to construct a calculation where there are no approximation errors. Numerical integration rules usually integrate some polynomial of low order exactly. For the Midpoint rule it is obvious, if you understand the derivation of this rule, that a constant function will be integrated exactly. We therefore also introduce a test problem where we integrate g(x) = 1 from 0 to 10. The answer should be exactly 10.

Input and output: The input to the calculations is the function to integrate, the integration limits a and b, and the n parameter (number of intervals) in

the formula (1). The output from the calculations is the approximation to the integral.

      1. Decide on a user interface. We find it easiest at this beginning stage to program the two functions f (x) and g(x) directly in the program. We also specify the corresponding integration limits a and b in the program, but we read a common n for both integrals from the command line. Note that this is not a flexible user interface, but it suffices as a start for creating a working program. A much better user interface is to read f , a, b, and n from the command line, which will be done later in a more complete solution to the present problem.
      2. Make algorithms. Like most mathematical programming problems, also this one has a generic part and an application part. The generic part is the formula (1), which is applicable to an arbitrary function f (x). The implementation should reflect that we can specify any Python function f(x) and get it integrated. This principle calls for calculating (1) in a Python function where the input to the computation (f , a, b, n) are arguments. The function heading can look as integrate(f, a, b, n), and the value of (1) is returned.

The test part of the program consists of defining the test functions f (x) and g(x) and writing out the calculated approximations to the corresponding integrals.

A first rough sketch of the program can then be

def integrate(f, a, b, n):

# compute integral, store in I return I

def f(x):

def g(x):

# test/application part:

n = sys.argv[1]

I = integrate(g, 0, 10, n)

print “Integral of g equals %g” % I I = integrate(f, 0, pi, n)

# calculate and print out the exact integral of f

The next step is to make a detailed implementation of the integrate function. Inside this function we need to compute the sum (1). In general, sums are computed by a for loop over the summation index, and inside the loop we calculate a term in the sum and add it to an accumulation variable. Here is the algorithm in Python code:

s = 0

for i in range(1, n+1):

s = s + f(a + (i-0.5)*h) I = s*h

      1. Look up information. Our test function f (x) = sin1(x) must be eval- uated in the program. How can we do this? We know that many common mathematical functions are offered by the math module. It is therefore natural to check if this module has an inverse sine function. The best place to look for Python modules is the Python Standard Library3 [1] documentation, which has a search facility. Typing math brings up a link to the math module, there we find math.asin as the function we need. Alternatively, one can use the command line utility pydoc and write pydoc math to look up all the functions in the module.

In this simple problem, we use very basic programming constructs and there is hardly any need for looking at similar examples to get started with the problem solving process. We need to know how to program a sum, though, via a for loop and an accumulation variable for the sum. Examples are found in Sections ?? and 1.8.

3http://docs.python.org/2/library/

      1. Write the program. Here is our first attempt to write the program. You can find the whole code in the file integrate_v1.py4.

def integrate(f, a, b, n): s = 0

for i in range(1, n): s += f(a + i*h)

return s

def f(x):

return asin(x)

def g(x):

return 1

# Test/application part n = sys.argv[1]

I = integrate(g, 0, 10, n)

print “Integral of g equals %g” % I I = integrate(f, 0, pi, n)

I_exact = pi*asin(pi) – sqrt(1 – pi**2) – 1

print “Integral of f equals %g (exact value is %g)’ % \ (I, I_exact)

      1. Run the program. We try a first execution from IPython

In [1]: run integrate_v1.py

Unfortunately, the program aborts with an error:

File “integrate_v1.py”, line 8 return asin(x)

^

IndentationError: expected an indented block

We go to line 8 and look at that line and the surrounding code:

def f(x): return asin(x)

Python expects that the return line is indented, because the function body must always be indented. By the way, we realize that there is a similar error in the g(x) function as well. We correct these errors:

def f(x):

return asin(x)

def g(x):

return 1

Running the program again makes Python respond with

4http://tinyurl.com/pwyasaa/debug/integrate_v1.py

File “integrate_v1.py”, line 24 (I, I_exact)

^

SyntaxError: EOL while scanning single-quoted string

There is nothing wrong with line 24, but line 24 is a part of the statement starting on line 23:

print “Integral of f equals %g (exact value is %g)’ % \ (I, I_exact)

A SyntaxError implies that we have written illegal Python code. Inspecting line 23 reveals that the string to be printed starts with a double quote, but ends with a single quote. We must be consistent and use the same enclosing quotes in a string. Correcting the statement,

print “Integral of f equals %g (exact value is %g)” % \ (I, I_exact)

and rerunning the program yields the output

Traceback (most recent call last):

File “integrate_v1.py”, line 18, in <module> n = sys.argv[1]

NameError: name ’sys’ is not defined

Obviously, we need to import sys before using it. We add import sys and run again:

Traceback (most recent call last):

File “integrate_v1.py”, line 19, in <module> n = sys.argv[1]

IndexError: list index out of range

This is a very common error: we index the list sys.argv out of range because we have not provided enough command-line arguments. Let us use n = 10 in the test and provide that number on the command line:

In [5]: run integrate_v1.py 10

We still have problems:

Traceback (most recent call last):

File “integrate_v1.py”, line 20, in <module> I = integrate(g, 0, 10, n)

File “integrate_v1.py”, line 7, in integrate

for i in range(1, n):

TypeError: range() integer end argument expected, got str.

It is the final File line that counts (the previous ones describe the nested functions calls up to the point where the error occurred). The error message for line 7 is very precise: the end argument to range, n, should be an integer, but it is a string. We need to convert the string sys.argv[1] to int before sending it to the integrate function:

n = int(sys.argv[1])

After a new edit-and-run cycle we have other error messages waiting:

Traceback (most recent call last):

File “integrate_v1.py”, line 20, in <module> I = integrate(g, 0, 10, n)

File “integrate_v1.py”, line 8, in integrate

s += f(a + i*h)

NameError: global name ’h’ is not defined

The h variable is used without being assigned a value. From the formula (1) we see that h = (b a)/n, so we insert this assignment at the top of the integrate function:

def integrate(f, a, b, n): h = (b-a)/n

A new run results in a new error:

Integral of g equals 9

Traceback (most recent call last):

File “integrate_v1.py”, line 23, in <module> I = integrate(f, 0, pi, n)

NameError: name ’pi’ is not defined

Looking carefully at all output, we see that the program managed to call the integrate function with g as input and write out the integral. However, in the call to integrate with f as argument, we get a NameError, saying that pi is undefined. When we wrote the program we took it for granted that pi was π, but we need to import pi from math to get this variable defined, before we call integrate:

from math import pi

I = integrate(f, 0, pi, n)

The output of a new run is now

Integral of g equals 9

Traceback (most recent call last):

File “integrate_v1.py”, line 24, in <module> I = integrate(f, 0, pi, n)

File “integrate_v1.py”, line 9, in integrate

s += f(a + i*h)

File “integrate_v1.py”, line 13, in f return asin(x)

NameError: global name ’asin’ is not defined

A similar error occurred: asin is not defined as a function, and we need to import it from math. We can either do a

from math import pi, asin

or just do the rough

from math import *

to avoid any further errors with undefined names from the math module (we will get one for the sqrt function later, so we simply use the last “import all” kind of statement).

There are still more errors:

Integral of g equals 9

Traceback (most recent call last):

File “integrate_v1.py”, line 24, in <module> I = integrate(f, 0, pi, n)

File “integrate_v1.py”, line 9, in integrate

s += f(a + i*h)

File “integrate_v1.py”, line 13, in f return asin(x)

ValueError: math domain error

Now the error concerns a wrong x value in the f function. Let us print out x:

def f(x):

print x

return asin(x)

The output becomes

Integral of g equals 9 0.314159265359

0.628318530718

0.942477796077

1.25663706144

Traceback (most recent call last):

File “integrate_v1.py”, line 25, in <module> I = integrate(f, 0, pi, n)

File “integrate_v1.py”, line 9, in integrate

s += f(a + i*h)

File “integrate_v1.py”, line 14, in f return asin(x)

ValueError: math domain error

We see that all the asin(x) computations are successful up to and including x = 0.942477796077, but for x = 1.25663706144 we get an error. A math domain error may point to a wrong x value for sin1(x) (recall that the domain of a function specifies the legal x values for that function).

To proceed, we need to think about the mathematics of our problem: Since sin(x) is always between 1 and 1, the inverse sine function cannot take x values outside the interval [ 1, 1]. The problem is that we try to integrate sin1(x) from 0 to π, but only integration limits within [ 1, 1] make sense (unless we allow for complex-valued trigonometric functions). Our test problem is hence wrong from a mathematical point of view. We need to adjust the limits, say 0 to 1 instead of 0 to π. The corresponding program modification reads

I = integrate(f, 0, 1, n)

We run again and get

Integral of g equals 9 0

0

0

0

0

0

0

0

0

Traceback (most recent call last):

File “integrate_v1.py”, line 26, in <module> I_exact = pi*asin(pi) – sqrt(1 – pi**2) – 1

ValueError: math domain error

It is easy to go directly to the ValueError now, but one should always examine the output from top to bottom. If there is strange output before Python reports an error, there may be an error indicated by our print statements. This is not the case in the present example, but it is a good habit to start at the top of the output anyway. We see that all our print x statements inside the f function say that x is zero. This must be wrong – the idea of the integration rule is to pick n different points in the integration interval [0, 1].

Our f(x) function is called from the integrate function. The argument to f, a + i*h, is seemingly always 0. Why? We print out the argument and the values of the variables that make up the argument:

def integrate(f, a, b, n): h = (b-a)/n

s = 0

for i in range(1, n): print a, i, h, a+i*h s += f(a + i*h)

return s

Running the program shows that h is zero and therefore a+i*h is zero.

Why is h zero? We need a new print statement in the computation of h:

def integrate(f, a, b, n): h = (b-a)/n

print b, a, n, h

The output shows that a, b, and n are correct. Now we have encountered a very common error in Python version 2 and C-like programming languages: integer division (see Section ??). The formula (1 0)/10 = 1/10 is zero according to integer division. The reason is that a and b are specified as 0 and 1 in the call to integrate, and 0 and 1 imply int objects. Then b-a becomes an int, and n is an int, causing an int/int division. We must ensure that b-a is float to get the right mathematical division in the computation of h:

def integrate(f, a, b, n): h = float(b-a)/n

Thinking that the problem with wrong x values in the inverse sine function is resolved, we may remove all the print statements in the program, and run again.

The output now reads

Integral of g equals 9

Traceback (most recent call last):

File “integrate_v1.py”, line 25, in <module> I_exact = pi*asin(pi) – sqrt(1 – pi**2) – 1

ValueError: math domain error

That is, we are back to the ValueError we have seen before. The reason is that asin(pi) does not make sense, and the argument to sqrt is negative. The error is simply that we forgot to adjust the upper integration limit in the computation of the exact result. This is another very common error. The correct line is

I_exact = 1*asin(1) – sqrt(1 – 1**2) – 1

We could have avoided the error by introducing variables for the integration limits, and a function for f (x)dx would make the code cleaner:

a = 0; b = 1

def int_f_exact(x):

return x*asin(x) – sqrt(1 – x**2) I_exact = int_f_exact(b) – int_f_exact(a)

Although this is more work than what we initially aimed at, it usually saves time in the debugging phase to do things this proper way.

Eventually, the program seems to work! The output is just the result of our two print statements:

Integral of g equals 9

Integral of f equals 5.0073 (exact value is 0.570796)

      1. Verify the results. Now it is time to check if the numerical results are correct. We start with the simple integral of 1 from 0 to 10: the answer should be 10, not 9. Recall that for this particular choice of integration function, there is no approximation error involved (but there could be a small round-off error). Hence, there must be a programming error.

To proceed, we need to calculate some intermediate mathematical results by hand and compare these with the corresponding statements in the program. We choose a very simple test problem with n = 2 and h = (10 0)/2 = 5. The formula (1) becomes

I = 5 · (1 + 1) = 10 .

Running the program with n = 2 gives

Integral of g equals 1

We insert some print statements inside the integrate function:

def integrate(f, a, b, n): h = float(b-a)/n

s = 0

for i in range(1, n):

print ’i=%d, a+i*h=%g’ % (i, a+i*h) s += f(a + i*h)

return s

Here is the output:

i=1, a+i*h=5

Integral of g equals 1 i=1, a+i*h=0.5

Integral of f equals 0.523599 (exact value is 0.570796)

There was only one pass in the i loop in integrate. According to the formula, there should be n passes, i.e., two in this test case. The limits of i must be wrong. The limits are produced by the call range(1,n). We recall that such a call results in integers going from 1 up to n, but not including n. We need to include n as value of i, so the right call to range is range(1,n+1).

We make this correction and rerun the program. The output is now

i=1, a+i*h=5 i=2, a+i*h=10

Integral of g equals 2 i=1, a+i*h=0.5

i=2, a+i*h=1

Integral of f equals 2.0944 (exact value is 0.570796)

The integral of 1 is still not correct. We need more intermediate results!

In our quick hand calculation we knew that g(x) = 1 so all the f (a +(i 1 )h) evaluations were rapidly replaced by ones. Let us now compute all the x coordinates a + (i − 1 )h that are used in the formula:

2

2

1 1

i = 1 : a + (i − 2 )h = 2.5, i = 2 : a + (i − 2 )h = 7.5 .

Looking at the output from the program, we see that the argument to g has a different value – and fortunately we realize that the formula we have coded is wrong. It should be a+(i-0.5)*h.

We correct this error and run the program:

i=1, a+(i-0.5)*h=2.5

i=2, a+(i-0.5)*h=7.5

Integral of g equals 2

Still the integral is wrong. At this point you may give up programming, but the more skills you pick up in debugging, the more fun it is to hunt for errors! Debugging is like reading an exciting criminal novel: the detective follows different ideas and tracks, but never gives up before the culprit is caught.

Now we read the code more carefully and compare expressions with those in the mathematical formula. We should, of course, have done this already when writing the program, but it is easy to get excited when writing code and hurry for the end. This ongoing story of debugging probably shows that reading the code carefully can save much debugging time. (Actually, being extremely careful with what you write, and comparing all formulas with the mathematics, may be the best way to get more spare time when taking a programming course!)

We clearly add up all the f evaluations correctly, but then this sum must be multiplied by h, and we forgot that in the code. The return statement in integrate must therefore be modified to

return s*h

Eventually, the output is

Integral of g equals 10

Integral of f equals 0.568484 (exact value is 0.570796)

and we have managed to integrate a constant function in our program! Even the second integral looks promising!

To judge the result of integrating the inverse sine function, we need to run several increasing n values and see that the approximation gets better. For n = 2, 10, 100, 1000 we get 0.550371, 0.568484, 0.570714, 0.570794, to be

compared to the exact value 0.570796. (This is not the mathematically exact value, because it involves computations of sin1(x), which is only approximately calculated by the asin function in the math module. However, the approximation error is very small ( 1016).) The decreasing error provides evidence for a correct program, but it is not a strong proof. We should try out more functions. In particular, linear functions are integrated exactly by the Midpoint rule. We can also measure the speed of the decrease of the error and check that the speed is consistent with the properties of the Midpoint rule, but this is a mathematically more advanced topic.

The very important lesson learned from these debugging sessions is that you should start with a simple test problem where all formulas can be computed by hand. If you start out with n = 100 and try to integrate the inverse sine function, you will have a much harder job with tracking down all the errors.

      1. Use a debugger. Another lesson learned from these sessions is that we needed many print statements to see intermediate results. It is an open question if it would be more efficient to run a debugger and stop the code at relevant lines. In an edit-and-run cycle of the type we met here, we frequently need to examine many numerical results, correct something, and look at all the intermediate results again. Plain print statements are often better suited for this massive output than the pure manual operation of a debugger, unless one writes a program to automate the interaction with the debugger.

The correct code for the implementation of the Midpoint rule is found in

integrate_v2.py5. Some readers might be frightened by all the energy it took

5http://tinyurl.com/pwyasaa/debug/integrate_v2.py

to debug this code, but this is just the nature of programming. The experience of developing programs that finally work is very awarding.

People only become computer programmers if they’re obsessive about details, crave power over machines, and can bear to be told day after day exactly how stupid they are. Gregory J. E. Rawlins [3], computer scientist.

Refining the user interface. We briefly mentioned that the chosen user interface, where the user can only specify n, is not particularly user friendly. We should allow f , a, b, and n to be specified on the command line. Since f is a function and the command line can only provide strings to the program, we may use the StringFunction object from scitools.std to convert a string expression for the function to be integrated to an ordinary Python function (see Section 3.3). The other parameters should be easy to retrieve from the command line if Section 2 is understood. As suggested in Section 7, we enclose the input statements in a try-except block, here with a specific exception type IndexError (because an index in sys.argv out of bounds is the only type of error we expect to handle):

try:

f_formula = sys.argv[1] a = eval(sys.argv[2])

b = eval(sys.argv[3]) n = int(sys.argv[4])

except IndexError:

print ’Usage: %s f-formula a b n’ % sys.argv[0] sys.exit(1)

Note that the use of eval allows us to specify a and b as pi or exp(5) or another mathematical expression.

With the input above we can perform the general task of the program:

from scitools.std import StringFunction f = StringFunction(f_formula)

I = integrate(f, a, b, n) print I

Writing a test function. Instead of having these test statements as a main program we follow the good habits of Section 9 and make a module with

  • the integrate function,

a test_integrate function for testing the integrate function’s ability to exactly integrate linear functions,

a main function for reading data from the command line and calling

integrate for the user’s problem at hand.

Any module should also have a test block, as well as doc strings for the module itself and all functions.

The test_integrate function can perform a loop over some specified n

values and check that the Midpoint rule integrates a linear function exactly. As always, we must be prepared for round-off errors, so “exactly” means errors less than (say) 1014. The relevant code becomes

def test_integrate():

“””Check that linear functions are integrated exactly.”””

def g(x):

return p*x + q # general linear function

def int_g_exact(x): # integral of g(x) return 0.5*p*x**2 + q*x

a = -1.2; b = 2.8

p = -2; q = 10

success = True

for n in 1, 10, 100:

# “arbitrary” integration limits

# True if all tests below are passed

success = False assert success

I = integrate(g, a, b, n)

I_exact = int_g_exact(b) – int_g_exact(a) error = abs(I_exact – I)

if error > 1E-14:

We have followed the programming standard that will make this test function automatically work with the nose test framework:

        1. the name of the function starts with test_,
        2. the function has no arguments,
        3. checks of whether a test is passed or not are done with assert.

The assert success statement raises an AssertionError exception if success is false, otherwise nothing happens. The nose testing framework searches for functions whose name start with test_, execute each function, and record if an AssertionError is raised. It is overkill to use nose for small programs, but in larger projects with many functions in many files, nose can run all tests with a short command and write back a notification that all tests passed.

The main function is simply a wrapping of the main program given above. The test block may call or test_integrate function or main, depending on whether the user will test the module or use it:

if name == ’ main ’:

if sys.argv[1] == ’verify’: verify()

else:

# Compute the integral specified on the command line main()

Here is a short demo computing ∫ 2π (cos(x) + sin(x))dx with the aid of the

integrate.py6 file:

0

 

Terminal

integrate.py ’cos(x)+sin(x)’ 0 2*pi 10

-3.48786849801e-16

Getting help from a code analyzer

The tools PyLint7 and Flake88 can analyze your code and point out errors and undesired coding styles. Before point 7 in the lists above, Run the program, it can be wise to run PyLint or Flake8 to be informed about problems with the code.

Consider the first version of the integrate code, integrate_v1.py9. Run- ning Flake8 gives

 

Terminal

Terminal> flake8 integrate_v1.py

integrate_v1.py:7:1: E302 expected 2 blank lines, found 1 integrate_v1.py:8:1: E112 expected an indented block integrate_v1.py:8:7: E901 IndentationError: expected an indented block integrate_v1.py:10:1: E302 expected 2 blank lines, found 1 integrate_v1.py:11:1: E112 expected an indented block

Flake8 checks if the program obeys the official Style Guide for Python Code10 (known as PEP8 ). One of the rules in this guide is to have two blank lines before functions and classes (a habit that is often dropped in this book to reduce the length of code snippets), and our program breaks the rule before the f and g functions. More serious and useful is the expected an indented block at lines 8 and 11. This error is quickly found anyway by running the programming. PyLint does not a complete job before the program is free of syntax errors.

We must therefore apply it to the integrate_v2.py11 code:

 

Terminal

Terminal> pylint integrate_v2.py

C: 20, 0: Exactly one space required after comma I = integrate(f, 0, 1, n)

^ (bad-whitespace)

W: 19, 0: Redefining built-in ’pow’ (redefined-builtin) C: 1, 0: Missing module docstring (missing-docstring) W: 1,14: Redefining name ’f’ from outer scope (line 8) W: 1,23: Redefining name ’n’ from outer scope (line 16)

6http://tinyurl.com/pwyasaa/debug/integrate.py 7http://www.pylint.org/ 8https://flake8.readthedocs.org/en/2.0/ 9http://tinyurl.com/pwyasaa/debug/integrate_v1.py 10http://www.python.org/dev/peps/pep-0008/ 11http://tinyurl.com/pwyasaa/debug/integrate_v2.py

C: 1, 0: Invalid argument name “f” (invalid-name) C: 1, 0: Invalid argument name “a” (invalid-name)

There is much more output, but let us summarize what PyLint does not like about the code:

  1. Extra whitespace (after comma in a call to integrate)
  2. Missing doc string at the beginning of the file
  3. Missing doc strings in the functions
  4. Same name f used as local variable in integrate and global function name in the f(x) function
  5. Too short variable names: a, b, n, etc.
  6. “Star import” of the form from math import *

In short programs where the one-to-one mapping between mathematical notation and the variable names is very important to make the code self-explanatory, this author thinks that only points 1-3 qualify for attention. Nevertheless, for larger non-mathematical programs all the style violations pointed out are serious and lead to code that is easier to read, debug, maintain, and use.

Running Flak8 on integrate_v2.py leads to only three problems: missing two blank lines before functions (not reported by PyLint) and doing from math import *. Flake8 complains in general a lot less than PyLint, but both are very useful during program development to readability of the code and remove errors

Shares