|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
1. Basic Ideas | |
2. Programming Examples | |
3. De-allocation and memory leakage | |
4. Debugging | |
5. Meaning of Addresses |
ANSI-C defines a group of four functions to allocate and de-allocate
blocks of memory on request. These are to be found in the standard libraries,
the prototypes are all in <stdlib.h>
. They are very widely
used by applications either directly or indrectly to create buffers
for objects of inherently unpredictable sizes.
The functions are.
Function | Operation |
---|---|
void *calloc(size_t n, size_t s) | Allocates space for n objects of size s. The space is initialised to all bits zero. |
void free(void *p) | De-allocates the space pointed to by p. This must have a value returned by a previous call to calloc, malloc or realloc. |
void *malloc(size_t s) | Allocates a space of size s. The initial value is indeterminate. |
void *realloc(void *p, size_t s) | Changes the size of space indicated by p by the amount s. If the new area is larger the contents of the new portion are indeterminate. |
All these functions collectively administer a set of memory blocks sometimes called the arena or heap. Internally there is information specifying how large each block is. Freeing blocks may result in amalgamation of smaller blocks. A request for a larger block may result in an underlying software request for allocation of more real memory to the process by the host operating system.
If a program is using these calls it is important to ensure that code never writes beyond the end of a memory block, this will overwrite other blocks and may corrupt the arena administration data, it will almost certainly result in an obscure program crash. It also important to ensure that all memory blocks obtained using one of the "alloc" calls is released via a "free" call. Failure to do this will result in the program gradually (or rapidly) accumulating an ever larger amount of memory until the system runs out, this is known as a memory leak.
Under Solaris 2, confusingly, there are several sets of the basic functions, which you get depends on which library you link against. They are the bsdmalloc routines, the malloc(3C) functions and the malloc(3X) functions. They are all ANSI compliant, the 3X routines are space efficient, the bsdmalloc routines are fast and the 3C routines are a compromise. Each group includes some extra non-ANSI functionality that may be of interest.
The 3C group is the default.
Here's an example of a program that creates an array and fills it with the character 'x'. The program also reports how much CPU time the various operations took.
#include <stdlib.h> #include <time.h> main() { int i,size; char *base; clock_t start,end; printf("Enter size of array to create "); scanf("%d",&size); start = clock(); base = (char *)calloc(size,sizeof (char)); end = clock(); if(base != NULL) { printf("Array of size %d created OK at address %p\n",size,base); printf("Allocation took %d milliseconds\n", (int)((end-start)*1E3/CLOCKS_PER_SEC)); } else { printf("Not enough memory\n"); exit(1); } start = clock(); for(i=0;i<size;i++) base[i] = 'x'; end = clock(); printf("Filling the array took %d milliseconds\n", (int)((end-start)*1E3/CLOCKS_PER_SEC)); }
There are several interesting points to note.
Here are some results from running the program to create arrays of 1, 10, 100 and 1000 Megabytes. You may find it interesting to repeat these tests on different systems.
bash$ a.out Enter size of array to create 1000000 Array of size 1000000 created OK at address 20be0 Allocation took 70 milliseconds Filling the array took 130 milliseconds bash$ a.out Enter size of array to create 10000000 Array of size 10000000 created OK at address 20be0 Allocation took 700 milliseconds Filling the array took 1390 milliseconds bash$ a.out Enter size of array to create 100000000 Array of size 100000000 created OK at address 20be0 Allocation took 7020 milliseconds Filling the array took 13900 milliseconds Enter size of array to create 1000000000 Not enough memoryThe overheads involved in creation of the arrays should be noted as should the time taken to fill the arrays.
It is tempting when using calloc() to create instances of structure one by one as required by the program logic, this may not be wise as the overheads involved are significant. A useful alternative is to write some sort of wrapper function that maintains a "pool" of such structures and calls calloc() only when the "pool" is empty. Here's some typical code.
#define POOLSIZE 100 struct x *new() { static int nfree = 0; static struct x *base; if(!nfree) { base = (struct x *)calloc(POOLSIZE,sizeof(struct x)); if(base == NULL) return NULL; nfree = POOLSIZE; } return base + POOLSIZE - nfre++; }
A routine that creates new instances of memory consuming objects on demand in this fashion is often called a constructor.
A common practice is to use structures to store information about objects, this will typically include descriptive text. The descriptive text needs to be stored somewhere and it is very common practice to calloc() a suitable area of memory. Here's a modified version of the "new" function shown above that does this. This time we're not using a pool of structures but allocating them one by one, this may be worth the overheads if they are going to be subsequently released.
struct x { char *text; int value; }; struct x *new(char *s) { static int nfree = 0; static struct x *base,*xptr; char *sptr; sptr = (char *)calloc(strlen(s)+1,sizeof(char)); if(sptr == NULL) return NULL; strcpy(sptr,s); xptr = (struct x *)calloc(1,sizeof(struct x)); if(xptr == NULL) { free(sptr); return NULL; } xptr -> text = sptr; return xptr; }
Again there are several interesting points here.
If instances of struct x are being created by the code shown in the previous section, it is very tempting to dispose of them thus.
struct x *instance; instance = new("hello world"); free(instance);
This is wrong as, although it does release the memory occupied by the instance of a struct x it does not release the memory used to store a copy of the associated text. The address of this memory location is lost and it is no longer accessible, however the memory management routines have no way of knowing this. This is a classic example of a memory leak. The programmer should have written :-
free(instance->text); free(instance);
It is probably better practice to code a specific destructor function to go with the constructor function here called new(). This does not, of course, deal with the type of memory leak caused by creating an instance of something to deal with an input or some part of processing an input and then not destroying it, perhaps because the processing went through some unusual path through the program. Some C++ implementations, apparently, automatically destroy objects when they pass out of scope, this is of limited use if a pointer to an object is being used.
In some programming languages, such as Java, the memory management facilities include a facility known as garbage collection. This requires that the memory management facilities periodically examine the whole data space of the program looking for pointers into the arena and checking that all objects in the arena are actually targets of pointers. This further requires that the language supports comparatively strong typing so that such pointers can be readily identified. The overheads involved are significant although Java implementations multi-thread garbage collection with normal activity so garbage collection can proceed while waiting for input. The normal memory management functions supplied with C compilers do not provide garbage collection.
Debugging a program that uses calloc() and free() extensively can be difficult. There are two types of error that need to be tracked down.
Under Solaris 2, dbx has some facilities to deal with such problems although they require linkage against a special set of allocation routines. They can be used periodically to display information about the arena. Solaris also provides a library function mallinfo that provides access to the arena information, this can be wrapped in a simple print function and included at suitable points in the program, or used in conjunction with assertions about the various values (i.e. that the values at the end of execution of a piece of code are the same as those at the start). Here's an example of the sort of code that might be used.
struct mallinfo old,new; . . old = mallinfo(); . . . new = mallinfo(); assert(old.uordblks == new.uordblks); .
If the number of uordblks has changed, probably indicating that some memory has been allocated but not released the failed assertion will cause a core dump which can then be examined using the debugger. The usefulness of this approach is slightly restricted by lack of information about the significance of the various items of arena information reported by mallinfo().
The addresses returned by calloc() are fairly arbitrary. If two, or more, objects are created by successive calls to calloc() they will not, in general, occupy successive locations as this example shows.
#include <time.h> main() { struct tm *t[3]; int i,s = sizeof (struct tm); for(i=0;i<3;i++) { t[i] = (struct tm*)calloc(1,s); } printf("Size of struct tm = %x\n",s); for(i=0;i<3;i++) printf("Instance %d base address %x\n", i,t[i]); }
The program allocates memory for three instances of a struct tm and reports the addresses. Here is the output.
bash$ a.out Size of struct tm = 24 Instance 0 base address 209e8 Instance 1 base address 20a18 Instance 2 base address 20a48
This shows that the size of a struct tm is 24x whereas the difference between the addresses of the successive locations is 30x, this suggests an overhead of 1210 bytes per allocated piece of memory.
It is also important to realise that the addresses associated with allocated memory objects are absolute within the program's address space. This means that the following practices are likely to cause severe problems.
It is tempting to create and array (or "pool") of objects via a single call to calloc() for reasons of efficiency, in preference to multiple single calls and then build a linked list (or other data structure) within the array. This is perfectly safe until all the objects of the array have been used, there is a temptation to use the function realloc() to expand the array. realloc() will preserve the contents of the first part of the array and attach extra space at the end, however, as the ANSI standard makes quite clear the allocated space may have been moved. Moving the array without changing the pointers that define the linked list destroys the linked list as the poointers no longer point to the correct place.
Having constructed a possibly elaborate and expensive to build linked data structure in calloc'd space, it is tempting to write it to a file for future use by the program. Since there is no control over the absolute positioning of calloc'd space this will not work unless, by chance, the new area is positioned in the same place as the original. Similar constraints would apply if the structure were being created in ordinary "compile time allocated" memory and the program were still being developed. The best way of creating such persistent structures is to use the mmap() function to map a file into memory space at a specific location. This also has the advantage that there is no need to perform file operations to read in the persistent structure.