Showing posts with label Tools. Show all posts
Showing posts with label Tools. Show all posts

Wednesday, 20 October 2010

Memcheck : A Memory Error Detection Tool

Linux distros provide a  tool suite called 'Valgrind' that consists of a number of tools that help you make your programs faster and more correct . The most popular tool out of this is called is memcheck that is used to detect memory related errors in C and C++ programs that can lead to serious issues like  segfaults and even more grevious issues like unpredictable behaviours .

Lets take a look into how we use the tool 'memcheck' so that we detect , try and avoid the memory related issues in our program.

First of all we need to create a memory leak so that we have something to detect .

We write a program that consists of a memory leak . Here is one such in a file ' pgm.c'

1 #include <stdlib.h>
2 #include <string.h>
3 void f(char *s)
4 {
5 char *x = malloc(strlen(s));
6 x[7] = 0;
7 }


8 int  main()
9 {
10 char *s="Hello";
11 f(s);
12 }


As you could see , there are two major memory related bugs in this code:

1) Trying to write in the location x[7] which is out of the allocated memory space .This will lead to what is called a 'heap block overrun' .

2) The memory that is allocated to x remains unused . This memory becomes garbage on returning from the function . In short , you have seen a 'memory leak '.

Now lets try and detect the two problems using memcheck :

You need to have valgrind installed in your system .
$ : apt-get install valgrind
if you don't have it already .

Do

$: cc -g pgm.c

We  compile the program using the -g option so that the line number informations are displayed when using memcheck .

Do

$: valgrind  --leak-check = yes a.out

the option  ' --leak-check ' being set equal to yes displays the informations on memory leak issues .

Now lets look out what are the informations that are being displayed when the memcheck tool is used .

First lets have a look into the 'heap block overrun' problem :

==3350== Invalid write of size 1
==3350==    at 0x80483F6: f (pgm.c:6)
==3350==    by 0x804841D: main (pgm.c:11)
==3350==  Address 0x419102f is 2 bytes after a block of size 5 alloc'd
==3350==    at 0x4023D6E: malloc (vg_replace_malloc.c:207)
==3350==    by 0x80483EC: f (pgm.c:5)
==3350==    by 0x804841D: main (pgm.c:11)


The above few lines is the code that was generated by memcheck  .

'3350' is the process id .

The actual error is seen right at the first line .

'Invalid write  of size 1' .

You get a stack trace right after this line , that

the invalid write has occured at the 6th line as a result of the 12th line . You could see what line numbers '6' and '11' do by having a look into our code , its the point of occurence of the error and the function call respectively.

A line showing the the fact that the location you are trying to access ( x[7] ) is 2 bytes after the allocated area can also be seen added with lines that contain information about the main and the function .These lines provide great help to the programmer especially when the case becomes a lot more complicated .

Now the informations that are displayed about the 'memory leak problem' can also be looked into .

==3350== LEAK SUMMARY:
==3350== definitely lost: 5 bytes in 1 blocks.
==3350== possibly lost: 0 bytes in 0 blocks.
==3350== still reachable: 0 bytes in 0 blocks.
==3350== suppressed: 0 bytes in 0 blocks.

You could view the lines that provide information about the memory leak problem .
The first line of the 'LEAK SUMMARY ' is the most significant to us . It shows the amount of memory definitely lost ( 5 Bytes ). Changes need to be made in the program so that the memory leak is prevented .

Memcheck produces these result which helps the programmer so as to view and correct the memory related issues rather convincingly . It needs to be noted that Memcheck is a 'dynamic instrumentation tool ' and not a static tool like 'lint ' . Hence to detect the memory leaks in a program , the control actually needs to get transferred to that segment of the program where the issues occur . In short you need to invoke the function 'f' from your 'main' so that Memcheck could detect those memory related issues that exists within the function 'f' .

Sunday, 3 October 2010

GNU make

The make utility determines which pieces of the program needs to be recompiled and issues instructions to recompile them.GNU make is the most popular make available. Lets get into the working of make with a simple example but please do read THIS before moving into further details.

By reading that, you would have understood that our ultimate aim is to produce a file '_avl.so' so that the C module 'avl.c' could be extended into Python.
Lets check what exactly do we need to do here.

We begin with 2 files namely 'avl.c' and 'avl.i'. We need to create a wrapper file at first . This is done by:

$: swig -python avl.i

Next what we need to do is that we need to create 2 object files for the respective C files- i.e we need to produce 'avl.o' and 'avl_wrap.o' from 'avl.c' and 'avl_wrap.c' respectively.
We do this by:

$: gcc -c avl.c avl_wrap.c -I /usr/include/python2.5/

The last and final step is to create '_avl.so' from the 2 object files.

$: ld -shared avl.o avl_wrap.o -o _avl.so

The dependencies and actions are quite clear now.


Typing in these commands time and again would be a tedious and ineffective job for anyone. We could automate the whole process by writing a file called Makefile and use the 'make' utility.

The Makefile for performing the above steps would be as follows:



A Makefile has the following format.



Its important to note that the 'ACTIONS' line begins with a TAB as it is a part of the syntax of a Makefile. Anything other than TAB would be harmful.

Lets check out how things work out.

$: make

This would cause GNU make to start executing the commands in Makefile.GNU make starts from the very beginning of the Makefile. The first set of 'TARGET-DEPENDENCIES-ACTIONS are checked in.

The TARGET is '_avl.so' : The DEPENDENCIES are 'avl.o and avl_wrap.o' and the ACTION that needs to be performed is

ld -shared avl.o avl_wrap.o -o _avl.so

But at present 'avl.o' and 'avl_wrap.o' are not available. To Generate them the next set of the Makefile is to be checked. The same problem exists there as well . Hence the 3rd set in the Makefile is reached where the DEPENDENCY is 'avl.i' which exists at present. Hence the action

swig -python avl.i

is executed , the target 'avl_wrap.c' is generated .
Now the second set in the Makefile can be compiled followed by the compilation of the first set. The procedure that takes place here is recursive .All the three actions are executed , the TARGETS are generated from the ACTIONS and DEPENDENCIES and finally we obtain '_avl.so'.

You could see another line in the Makefile that has not been mentioned upto now.
make remove
make remove is the TARGET and the ACTION that needs to be performed is
rm avl.o avl_wrap.o avl.py _avl.so avl_wrap.c avl.pyc

$: make remove

will remove all the unnecessary files in the directory when you are planning to start from the beginning.

If you would like to have a test on the intelligence of 'make' , you would be pretty surprised.

Lets deal that case with an example as well.

DO:

$: swig -python avl.i

We have executed the first command manually.
Now do:

$: make

and you will find that make executes only the other two commands that is included in the Makefile.
This again shows the recursive nature of the make process. The Makefile is read from the top. On reaching the second set , GNU make realizes that both the dependencies 'avl.c' and 'avl_wrap.c' are available and executes the ACTION to produce the TARGET 'avl.o' and 'avl_wrap.o'.

Now once again do
$: make

You would get a message as follows:
make: `_avl.so' is up to date.

This again brings to the fort-light the power of make. The final target '_avl.so' has been found to be 'up to date' and hence there is no question of having to execute the commands in the Makefile . GNU make, Richard Stallman and Rolland Mcgrath's creation, has recognized this.

SWIG(Simplified Wrapper and Interface Generator).

Scripting languages like python presents a lot of ease to the programmer while coding , but the fact remains that there are certain tasks that would screw you up when attempted in Python. So the coding is done in a more flexible language like C and the modules are imported in Python. SWIG is a tool that is used for extending your C programs into python and make the functions callable from Python.

Here we discuss how to import a C module into Python using SWIG.
Installing SWIG in your system ,if you haven't already, is the first task.

$:apt-get install swig

would do that for you.

Now its better you create a directory for the entire purpose and setup yourself within it.
Consider you have a C file ,avl.c , as our example (Avl is a height balanced Tree ,avl.c is an implementation of such a tree.) You could download the source code for avl.c here.

Our first step is to create an interface file 'avl.i' for setting up the interface. 'avl.i ' consists of the declarations of the various functions and global variables that are used in the file 'avl.c' and are prefixed with 'extern'.

%module avl

%{

extern struct node *root;

extern struct node *p;

%}


extern void insert(struct node* move,int item);

...
...
...
...

The 'avl.i ' file would like as above:
Now we have two files in our directory ,

avl.c & avl.i

This is all that we have to code for the task of extending a C module in Python. Now its all about SWIG.


Lets move further: Do the following.

$:swig -python avl.i

now have have a look into your directory and you will find two more files there .

avl.py & avl_wrap.c
avl_wrap.c is the wrapper file that has been created for the purpose of extension . Wrapper functions act as a glue layer between languages.

$: gcc -c avl.c avl_wrap.c -I /usr/include/python2.5/

You could now see 2 more files present in your directory.
avl.o & avl_wrap.o

These two are the object files that have been created for avl.c and avl_wrap.c respectively.

$: ld -shared avl.o avl_wrap.o -o _avl.so

This is the last thing you need to do before you could use start using your C module in Python.
Check your directory and you could see a new file , '_avl.so'. This is the shared object file that has been created .
If you create a make file for these , then obviously you wont have to go doing these tasks repeatedly.
Now ,the python module corresponding to the C module avl.c has been created which means that now you could start using it.

$: python

>>>import avl
>>>avl.my_insert(10)
>>>avl.my_insert(20)
>>>avl.traverse(1)

The source code can be downloaded here.

Sunday, 8 August 2010

Debugging with the GDB

GNU debugger(GDB) allows you to view the step by step execution of the program code. This proves of great help in executing complex programs(specially the ones involving recursion) to see the exact flow of control. How to debug using the gdb and a few options used are worth notable:
To debug a program with gdb , steps followed are
'cc -g file.c'
gdb a.out. ....'a.out is your executable...'
You will see messages about the gdb version displayed on the terminal....
Now set the break point by
'break main'
Use the 'run' command to start running the program.
The 3 commonly used options are 's' ,'n' ,'q' with the gdb are…
's' moves to the next line of code-it enters functions as well.
'n' moves to the next line of code as well but does not enter functions.
'q' is used to quit from the process.

A debugging process would be like this...

sunil:~/new# cc -g quicksort.c
sunil:~/new# gdb a.out
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...

(gdb) break main
Breakpoint 1 at 0x80483e5: file quicksort.c, line 7.
(gdb) run
Starting program: /root/new/a.out
Breakpoint 1, main () at quicksort.c:7
7 int left=0,right=4,i;

(gdb) s
8 int array[]={4,8,1,9,3};
.../*
...continue the process...
.../*

(gdb) q
Quits from the debugger