Documenting Code

Code is best documented with a combination of descriptive variable names (self-documenting code) and effective commenting. It is best to write code assuming that there is a competent programmer reading it. It's not necessary to explain what a for loop does, but it's nice to explain the layout of structures and algorithms using terms that another programmer would know.

We'll use a simple example to show how these suggestions can improve readability.

void calcResults ( int n, int* p, void* sum ) {

     int i;
     *sum = 0;
     for { i = 0; i < n; i++ } {
         *sum += p[i];
     }
}

Although it's pretty easy to figure out what it does, it takes a minute to translate, a simple function that sums an array of numbers shouldn't disrupt the flow of a reader's concentration.

Comments

Comments can be used as line item notes or descriptions of large chunks of code.

/* This is a single quick comment. */

/* This is a longer comment that might explain
   a few lines of code or a control loop. Emacs will
   format comments like these automatically. */

/*
 * This is a header type of comment.
 *
 * It can be used to stand out from the rest of the code
 * and draw immediate attention, or skipped if the reader
 * wants to go straight to the code.
 *
 */

It's worth thinking about the proper type of comment properly conveys the appropriate level of description you need in different sections of code.

In our example, this is overkill:

/* 
 * function name: calcResults 
 * 
 * arguments: n - the number of items in p
 *            p - an array of integers
 *            sum - upon return, the sum of the items in p
 *
 */
void calcResults ( int n, int* p, int* sum ) {

     int i;           /* Array index */

     /* Loop from 0 to n. Sum up all the numbers in p. Return the
        result in sum. */
     *sum = 0;
     for { i = 0; i < n; i++ } {

         /* Add the element in p to the sum. */
         *sum += p[i];
     }
}

In this example, the level of documentation is too high. While the readability of this function can be improved with documentation, providing too much just makes the reader have to skip through it, and teaches him that the commenting is not worth his time and may lead him to miss something further on. While long function header style comments are needed in complicated functions, this one can be described very simply. For example, we all know what the += operator does, so that doesn't need a line comment of its own. We should be able to describe the function and its arguments with good naming conventions (decribed later) so we'll leave that out for now. All that's needed is a quick comment describing the for loop.

void calcResults ( int n, int* p, Int* sum ) {

     int i;           /* Array index */

     /* Loop from 0 to n. Sum up all the numbers in p. Return the
        result in sum. */
     *sum = 0;
     for { i = 0; i < n; i++ } {
         *sum += p[i];
     }
}

Function Naming

We all know that functions do stuff, but their names often don't reflect what they fully do and to what stuff they do it to. Try to choose a name that is short but decribes the full extent of the function. If the name would need to be too long to describe the function's effects, consider breaking it up into shorter functions, simply for readability.

Function names should be verbs the describe what they do. However, that description should be complete. It does no good to name a function SumVariables if it not only computes a sum but also updates an internal database as a side effect. So one might try SumVariablesAndUpdateDatabase. But we can now see that this function name is getting unweildy and involves two seperate kinds of effects. So it might be worth considering breaking the function up into two functions. Naming your functions carefully can help you make sure that your design is clear and leads to now suprises for the reader.

Consider this example:

void PrintError ( int index );

While we might know that this prints an error from an internal list of error messages, a reader wouldn't know that, and would have to look through the function to figure out why a numerical index is provided to a function that describes itself as printing an error message.

void PrintMessageInErrorList ( int index );

This is better, but it's going to be a pain to have to read (and write) this overly long function name that will presumably be used often in the code. Sometimes it's best to use abbreviations if the abbreviation is commonly known.

void PrintErrListMsg ( int index );

This is perhaps a good intermediate. This might be too short:

void PrntErrLstMsg ( int index );

Using abbreviations that don't save a worthwhile amount of characters, such as leaving out a single vowel, might not be worth the added time it would take to parse the word, or the number times you will mistype it as a programmer.

In our example function, calcResults is a bad name because it doesn't describe what results it's calculating. CalcSum might be better, but it doesn't describe the fact that we're working on an array. Also, it might need to describe that we're summing integers as opposed to any other value type. We'll go with SumValueArray as it describes the operation of summing with the fact that we're working on an array. The integer part should be apparent by looking at the signature. In c++, we could even use the [] notation to reinforce the array type, and could simply state:

void SumValues ( int n, int p[], int& sum );

Again, assume that your reader is a programmer, and will know certain basic things like how to deduce some meaning from a function signature, such as using [] denotes an array and & denotes a return variable passed by reference.

Variable Naming

We all know how to read code, but a bad variable name is both easy to misconstrue and easy to use. A little forethought and planning will make your code easier to read without resorting to needing comments.

The easiest thing to do is to use really long variable names to be descriptive.

void SumValueArray ( int number_of_elements, int* integer_array, int* sum_output ) {

     int array_index;
     *sum_output = 0;
     for { array_index = 0; array_index < number_of_elements; array_index++ } {
         *sum_output += integer_array[array_index++];
     }
}

This is helpful because it's descriptive, but is tiresome to type, and a programmer will probably recognize common contexts such as arrays and indices. One method is to use common name parts to link together variables that have a common context. In this case, number_of_elements, integer_array, and array_index have a common context, so let's give them a common name element. We can even link it to the name of the function.

void SumValueArray ( int values_size, int* values_array, int* values_sum ) {

     int values_index;
     *values_sum = 0;
     for { values_index = 0; values_index < values_size; values_index++ ) {
         *values_sum += values_array[values_index];
     }
}

Now we've linked the relevant variables together, and have related it to the funtion name. However, one problem here is that we've mixed the context of the output (values_sum) with the input by giving them common names. So we might distinguish them with an additional name element.

void SumValueArray ( int values_size, int* values_array, int* values_sum_result ) {

     int values_index;
     *values_sum = 0;
     for { values_index = 0; values_index < values_size; values_index++ ) {
         *values_sum_result += values_array[values_index];
     }
}

If our reader saw that we commonly use the word result to refer to output variables, he would gain additional insight to a variable's purpose simply by seeing its name. If all our variables were as cleverly named, the reader would begin to trust the variable names and pick up on the meaning of the function much quicker. This function doesn't even really need comments any more.

Variable Prefixes

Taking this one step further, we can design a prefix system that further describes our variables. Note that this is similar to the Hungarian prefix notation, except that Hungarian uses prefixes to denote data types, whereas I believe it's better to use them to denote context.

Let's make a list of prefixes we might use:

Prefix

Meaning

a

Array pointer

c

Array element count

i

Input variable

n

Array index

o

Output variable

Now let's rewrite our variable names:

void SumValueArray ( int icValues, int* iaValues, int* oValuesSum ) {

     int nValue;
     *oValuesSum = 0;
     for { nValue = 0; nValue < icValues; nValue++ ) {
         *oValuesSum += iaValues[nValue];
     }
}

Our variables names are now shorter and equally as descriptive. This function, with a little foreknowledge of our naming conventions, states its purpose without the use of comments and still uses variables that won't wear out your keyboard. This is best of both worlds for programmer and reader.