Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Memory allocation functions

	1. Basic Ideas
	2. Programming Examples
	3. De-allocation and memory leakage
	4. Debugging
	5. Meaning of Addresses

Basic Ideas

ANSI-C defines a group of four functions to allocate and de-allocate blocks of memory on request. These are to be found in the standard libraries, the prototypes are all in <stdlib.h>. They are very widely used by applications either directly or indrectly to create buffers for objects of inherently unpredictable sizes.

The functions are.

Function	Operation
void *calloc(size_t n, size_t s)	Allocates space for n objects of size s. The space is initialised to all bits zero.
void free(void *p)	De-allocates the space pointed to by p. This must have a value returned by a previous call to calloc, malloc or realloc.
void *malloc(size_t s)	Allocates a space of size s. The initial value is indeterminate.
void realloc(void p, size_t s)	Changes the size of space indicated by p by the amount s. If the new area is larger the contents of the new portion are indeterminate.

All these functions collectively administer a set of memory blocks sometimes called the arena or heap. Internally there is information specifying how large each block is. Freeing blocks may result in amalgamation of smaller blocks. A request for a larger block may result in an underlying software request for allocation of more real memory to the process by the host operating system.

If a program is using these calls it is important to ensure that code never writes beyond the end of a memory block, this will overwrite other blocks and may corrupt the arena administration data, it will almost certainly result in an obscure program crash. It also important to ensure that all memory blocks obtained using one of the "alloc" calls is released via a "free" call. Failure to do this will result in the program gradually (or rapidly) accumulating an ever larger amount of memory until the system runs out, this is known as a memory leak.

Under Solaris 2, confusingly, there are several sets of the basic functions, which you get depends on which library you link against. They are the bsdmalloc routines, the malloc(3C) functions and the malloc(3X) functions. They are all ANSI compliant, the 3X routines are space efficient, the bsdmalloc routines are fast and the 3C routines are a compromise. Each group includes some extra non-ANSI functionality that may be of interest.

The 3C group is the default.

Programming Examples
Here's an example of a program that creates an array and fills it with the character 'x'. The program also reports how much CPU time the various operations took.
```
#include        <stdlib.h>
#include        <time.h>
main()
{
        int     i,size;
        char    *base;
        clock_t start,end;
        printf("Enter size of array to create ");
        scanf("%d",&size);
        start = clock();
        base = (char *)calloc(size,sizeof (char));
        end = clock();
        if(base != NULL)
        {
                printf("Array of size %d created OK at address %p\n",size,base);
                printf("Allocation took %d milliseconds\n",
                        (int)((end-start)*1E3/CLOCKS_PER_SEC));
        }
        else
        {
                printf("Not enough memory\n");
                exit(1);
        }
        start = clock();
        for(i=0;i<size;i++)
                base[i] = 'x';
        end = clock();
        printf("Filling the array took %d milliseconds\n",
                        (int)((end-start)*1E3/CLOCKS_PER_SEC));
        
}
```
There are several interesting points to note.
- The value returned from the function calloc() is cast to a suitable type. This is normal practice when using this function since it's default return value is a generic pointer and there are no default conversions between pointer types in C.
- The value returned by calloc() is stored in the variable base. The value will, of course, be the start or base address of the allocated area of memory. The relationship between pointers and array notation in C allows the use of array notation when the array is filled.
- Timing information is here determined using the ANSI C function clock(), this does not have particularly fine resolution. You may find higher resolution timers available on your system.
Here are some results from running the program to create arrays of 1, 10, 100 and 1000 Megabytes. You may find it interesting to repeat these tests on different systems.
```
bash$ a.out
Enter size of array to create 1000000
Array of size 1000000 created OK at address 20be0
Allocation took 70 milliseconds
Filling the array took 130 milliseconds
bash$ a.out
Enter size of array to create 10000000
Array of size 10000000 created OK at address 20be0
Allocation took 700 milliseconds
Filling the array took 1390 milliseconds
bash$ a.out
Enter size of array to create 100000000
Array of size 100000000 created OK at address 20be0
Allocation took 7020 milliseconds
Filling the array took 13900 milliseconds
Enter size of array to create 1000000000
Not enough memory
```
The overheads involved in creation of the arrays should be noted as should the time taken to fill the arrays.
It is tempting when using calloc() to create instances of structure one by one as required by the program logic, this may not be wise as the overheads involved are significant. A useful alternative is to write some sort of wrapper function that maintains a "pool" of such structures and calls calloc() only when the "pool" is empty. Here's some typical code.
```
#define	POOLSIZE	100


struct	x *new()
{
	static	int	nfree = 0;	
	static	struct	x *base;
	if(!nfree)
	{
		base = (struct x *)calloc(POOLSIZE,sizeof(struct x));
		if(base == NULL) return NULL;
		nfree = POOLSIZE;
	}
	return base + POOLSIZE - nfre++;
}
```
A routine that creates new instances of memory consuming objects on demand in this fashion is often called a constructor.

A common practice is to use structures to store information about objects, this will typically include descriptive text. The descriptive text needs to be stored somewhere and it is very common practice to calloc() a suitable area of memory. Here's a modified version of the "new" function shown above that does this. This time we're not using a pool of structures but allocating them one by one, this may be worth the overheads if they are going to be subsequently released.
```
struct	x
{
	char	*text;
	int	value;
};


struct	x *new(char *s)
{
	static	int	nfree = 0;
	static	struct	x *base,*xptr;
	char	*sptr;
	sptr = (char *)calloc(strlen(s)+1,sizeof(char));
	if(sptr == NULL) return NULL;
	strcpy(sptr,s);
	xptr = (struct x *)calloc(1,sizeof(struct x));
	if(xptr == NULL)
	{
		free(sptr);
		return NULL;
	}
	xptr -> text = sptr;
	return xptr;
}
```
Again there are several interesting points here.
- The allocated space for the copy of the value is its string length plus one to allow for the string terminating NUL.
- The value is copied to its final storage place using the strcpy() function.
- If there is insufficient memory for the struct x the already allocated memory for the value is released before returning from the constructor.
Releasing memory space, leaks and garbage collection
If instances of struct x are being created by the code shown in the previous section, it is very tempting to dispose of them thus.
```
	struct	x *instance;

	instance = new("hello world");

	free(instance);
```
This is wrong as, although it does release the memory occupied by the instance of a struct x it does not release the memory used to store a copy of the associated text. The address of this memory location is lost and it is no longer accessible, however the memory management routines have no way of knowing this. This is a classic example of a memory leak. The programmer should have written :-
```
	free(instance->text);
	free(instance);
```
It is probably better practice to code a specific destructor function to go with the constructor function here called new(). This does not, of course, deal with the type of memory leak caused by creating an instance of something to deal with an input or some part of processing an input and then not destroying it, perhaps because the processing went through some unusual path through the program. Some C++ implementations, apparently, automatically destroy objects when they pass out of scope, this is of limited use if a pointer to an object is being used.

In some programming languages, such as Java, the memory management facilities include a facility known as garbage collection. This requires that the memory management facilities periodically examine the whole data space of the program looking for pointers into the arena and checking that all objects in the arena are actually targets of pointers. This further requires that the language supports comparatively strong typing so that such pointers can be readily identified. The overheads involved are significant although Java implementations multi-thread garbage collection with normal activity so garbage collection can proceed while waiting for input. The normal memory management functions supplied with C compilers do not provide garbage collection.
Debugging
Debugging a program that uses calloc() and free() extensively can be difficult. There are two types of error that need to be tracked down.
1. Those caused by over-writing the bounds of an allocated memory block. This is almost identical to the well-known C problem of "running off" the end of an array. The resultant damage to the arena will often cause an almost immediate crash.
2. Memory leaks. This is a longer term problem that simple testing with a small amount of data is unlikely to reveal. Testing against large volumes of data is important.
Under Solaris 2, dbx has some facilities to deal with such problems although they require linkage against a special set of allocation routines. They can be used periodically to display information about the arena. Solaris also provides a library function mallinfo that provides access to the arena information, this can be wrapped in a simple print function and included at suitable points in the program, or used in conjunction with assertions about the various values (i.e. that the values at the end of execution of a piece of code are the same as those at the start). Here's an example of the sort of code that might be used.
```
	struct	mallinfo old,new;
		.
		.
	old = mallinfo();
		.
		.
		.
	new = mallinfo();
	assert(old.uordblks == new.uordblks);
		.
```
If the number of uordblks has changed, probably indicating that some memory has been allocated but not released the failed assertion will cause a core dump which can then be examined using the debugger. The usefulness of this approach is slightly restricted by lack of information about the significance of the various items of arena information reported by mallinfo().
Addresses
The addresses returned by calloc() are fairly arbitrary. If two, or more, objects are created by successive calls to calloc() they will not, in general, occupy successive locations as this example shows.
```
#include	<time.h>
main()
{
	struct	tm *t[3];
	int	i,s = sizeof (struct tm);
	for(i=0;i<3;i++)
	{
		t[i] = (struct tm*)calloc(1,s);
	}
	printf("Size of struct tm = %x\n",s);
	for(i=0;i<3;i++)
		printf("Instance %d base address %x\n",
					i,t[i]);
}
```
The program allocates memory for three instances of a struct tm and reports the addresses. Here is the output.
```
bash$ a.out
Size of struct tm = 24
Instance 0 base address 209e8
Instance 1 base address 20a18
Instance 2 base address 20a48
```
This shows that the size of a struct tm is 24_x whereas the difference between the addresses of the successive locations is 30_x, this suggests an overhead of 12₁₀ bytes per allocated piece of memory.

It is also important to realise that the addresses associated with allocated memory objects are absolute within the program's address space. This means that the following practices are likely to cause severe problems.
- Linked lists within arrays
  It is tempting to create and array (or "pool") of objects via a single call to calloc() for reasons of efficiency, in preference to multiple single calls and then build a linked list (or other data structure) within the array. This is perfectly safe until all the objects of the array have been used, there is a temptation to use the function realloc() to expand the array. realloc() will preserve the contents of the first part of the array and attach extra space at the end, however, as the ANSI standard makes quite clear the allocated space may have been moved. Moving the array without changing the pointers that define the linked list destroys the linked list as the poointers no longer point to the correct place.
- Saving data structures
  Having constructed a possibly elaborate and expensive to build linked data structure in calloc'd space, it is tempting to write it to a file for future use by the program. Since there is no control over the absolute positioning of calloc'd space this will not work unless, by chance, the new area is positioned in the same place as the original. Similar constraints would apply if the structure were being created in ordinary "compile time allocated" memory and the program were still being developed. The best way of creating such persistent structures is to use the mmap() function to map a file into memory space at a specific location. This also has the advantage that there is no need to perform file operations to read in the persistent structure.