If you don't know much about cars and you take a look under the hood of an automobile, you will see a number of interconnected modules and parts designed to fit neatly into a restricted space. You may also observe that all the car's wheels are the same and that different car models may have identical or interchangeable parts.
Computer programs are a different kind of artifact, one whose structure is not readily available for inspection by the user. If he could peek "under the hood" of software systems, the user might find all sorts of things ranging from elegant designs to monstrous tangles.
One obvious way to reduce chaos and introduce some order into any complex project is to divide it into reasonably independent pieces. A large, well-structured software system typically comprises a number of modules that can be created and tested independently of each other. Each module implements a set of related data structures and functions. Modern programming languages and software development environments, including C++, provide support for this modular approach.
| Software modules can be compiled separately and then linked into one executable program. |
The benefits of modular software design, besides cleaner structure and more efficient implementation, include easier software maintenance and reusability. In a bird's-eye view of the project, each module exists to carry out a certain task. Once the interfaces between the modules are defined, each module can be implemented and even modified independently of the other modules. This property is called locality. A tested module can be used in other projects that present the same task. This property is called reusability. Since software's main costs are development and maintenance, not physical production, tested reusable code is essentially free.
Modularity is, of course, widely used in the physical world. Once the connections and the size of a dishwasher are standardized, a kitchen designer can proceed with the overall design with an "abstract" dishwasher in mind. The specific model will be installed later and can be replaced easily in case of a malfunction or upgrade.
In this chapter we discuss the more traditional approach to modularity in which each module implements a set of related functions. This will prepare us for later chapters where we will consider a more advanced concept of modularity associated with C++ classes.
As an example, let us consider a software application that processes banking transactions. In all likelihood this application will have to deal with dates. Even before knowing all the details, we can predict that this application will need functions that validate and compare dates and convert them into different formats. These functions can fit neatly into a separate module, which would deal only with a structure defined for representing dates.
Let us start by defining the DATE structure:
We can then think of functions useful for the tasks at hand. For example:
Other functions may be added later, as the application takes shape. Some reasonable functions may be included for completeness even if they are not immediately useful for this application, because the extra effort is minimal and they may be useful for testing or for a future application. The same or another member of the development team can implement the actual code.
In C++ the source code for each module is implemented in a separate file. The file that handles dates, for example, may be called dates.cpp. How would other modules "know" about the definitions and functions in dates.cpp? One approach would be to include the file's text into the main program by putting
somewhere in the main program. This would be equivalent to copying the source code from dates.cpp into the main program - the modules would be combined at the source code level. Any change to dates.cpp would require recompilation of all modules that use the date functions, and the object code for them would be repeated in every module that uses them. This approach would make it difficult to maintain the integrity of large software systems and would waste both space and compilation time.
A better approach, which is supported in C++ and other modular languages, is to compile the modules separately and then combine them into one executable program. The modules don't need to know the details of each other's implementation, but they do need to share some definitions and declarations. In C++ this is accomplished by means of header files. We have already used standard library modules and system header files provided with the compiler. There is nothing that would prevent a programmer from creating his own header files.
| Programmers create their own header files to let modules share definitions and declarations. |
C++ recognizes two forms of the #include directive. One form uses angular brackets and is used with system header files provided with the compiler. For example:
The other form uses double quotes instead of angular brackets. This form is reserved for header files that you have written or that your organization has supplied. For example:
The difference between the two forms is in the order in which the file directories are searched for header files. The double quote form indicates that the search should start with the current user directory, that is, the same directory where all the programmer's source code is located; the angular bracket form indicates that the search should start with the compiler system directories and not look into the current directory at all.
In our example, we can put the definition of the DATE structure and the function prototypes for the date functions into a separate header file. Following the convention, we name this file dates.h - the same name as the respective source module but with the extension ".h."
The code in dates.cpp may look as follows:
The header file is included into each module that needs to use the functions and structures defined in it, but the functions' actual code is not duplicated. Each module can be compiled separately.
A separate test program may be created for testing all the date functions.
A special program, the linker, is used to put the modules together. We will discuss the linker in Section 14.5.
Large projects may involve hierarchies of modules. One approach to good software architecture is to arrange functions in layers based on their "level of functionality" and the "level" of data structures that they deal with. For example, functions that deal with different types of transactions in a banking application may form a separate layer positioned above the layer that deals with dates. Each layer can be implemented as a separate module or several modules.
In an ideal architecture, each layer uses only functions and data structures from the layer immediately below it. Changes in the implementation of a module normally do not disturb other modules. In a "layered" design, even changes to the interface of a module do not propagate through the whole system but affect only the layer above it.
This kind of layered design creates a dilemma related to the use of header files. A higher layer may have its own header file which requires definitions from lower layers. For example, the module processing transactions, trans.cpp, may use its own header file, trans.h. Declarations and definitions in trans.h may require the DATE structure from dates.h, so the programmer may decide to include dates.h at the top of trans.h:
It may be difficult, though, to keep track of which header files are included within other header files. Suppose another header file (say, accounts.h) also includes dates.h. Now suppose both trans.h and accounts.h are included into the same module. Then dates.h will be included twice and the compiler will generate error messages that structures and constants in dates.h are already defined.
To get around this problem, programmers often use conditional compilation preprocessor directives to eliminate duplicate inclusions of the same code. At the beginning of the header file, a programmer defines a constant that identifies that file. The constant's name should be unusual to avoid clashes with other names; for example, it may start and end with a few underscore characters. The text of the header file is placed between the #ifndef-#endif preprocessor directives and is included only if the constant is not defined above; that is, only if that header file has not yet been included in this module. For example:
The same trick is used in system header files. Note that we have included iostream.h into dates.h because we needed the definition of the ostream type. But although we have included both iostream.h and dates.h into testdate.cpp, the conditional compilation in iostream.h prevents duplicate definitions.
The process of combining different object modules into one executable module is called linking. Linking, in a nutshell, involves the following steps. When the compiler finds a function definition (not the prototype, but the actual code), it places the function's name and its address (relative to the beginning of the module) into a special table of global (a.k.a. public) symbols. This function can be used from other modules. When a compiler finds the first call to a function that is not defined in the given module, it places the function's name in a special table of external symbols and reserves some logical external address for it, leaving that address temporarily undefined. This indicates that the function's code should be found in some other module. Each object module has a table of "globals" and a table of "externals."
(Actually, as we discussed in Section 13.5 of Part 1, C++ supports function overloading: functions with the same name but different sets or data types of arguments are considered by the compiler to be entirely different functions. Thus, a complete function signature, including its name and the types of all its arguments, is stored in the tables of globals and externals.)
| A special program, the linker, examines the set of object modules that you have specified in the linking command. It combines all the code contained in them and tries to resolve all external references by finding their names and addresses among the globals of other modules. The logical addresses of externals are then replaced with their actual addresses in the combined code, and the linker creates an executable file. |
The linker is provided to you together with other development tools.
In case of problems, the linker reports errors. One common error is "Unresolved external: YourFunction." This happens when none of the specified modules contains YourFunction's code. Another possible error is "Multiple definitions of YourFunction," which occurs when two or more modules define functions with the same signature. Note that many modules may contain the declaration of the function (i.e., the function prototype) but only one module may contain the actual code (the function's definition).
In large projects it is convenient to combine the object modules into one or more object module libraries using a utility program called the librarian. A librarian also helps to maintain a library and allows you to add, replace, and delete modules and list their globals and externals.
The linker is capable of searching specified libraries. It examines each library and picks only those modules that contain remaining unresolved externals.
C++ compilers all come with standard object module libraries containing standard functions. Libraries with various useful functions are also available from third-party vendors. The vendors can provide the documentation, the header files, and the object code while keeping their source code confidential - another advantage of integration at the object module level!
| In modern development environments, the process of linking is transparent to the user. The project maintenance facility usually allows programmers to specify which modules should be included into the current project; pressing a key or clicking on a menu item automatically compiles all the necessary modules and links them into an executable file. |
Several modules may not only use the same functions but also share global constants or variables.
| A constant or variable declared outside of any function is not only global in its own module, but is automatically considered global between modules; its name and data type are included in the table of globals and passed to the linker. |
Other modules may gain access to that variable or constant by declaring it with the extern keyword:
A global variable or constant cannot be declared without the extern modifier in more than one module because the linker will generate error messages about multiple definitions of a global symbol. On the other hand, extern declarations do not conflict with the actual declaration:
So extern declarations may be placed in the header file, which is a good way to insure consistency of external declarations between modules.
The extern modifier can also be used with function prototypes, but it is redundant, since a function is assumed by default to be external unless it is defined in the same module.
| Global variables are "considered harmful" even in one module and should be avoided at all costs between modules because they violate locality and make the project structure intractable. If used, they should be carefully documented. |
It is possible to declare a global variable or a function within one module but hide it from the other modules. This is done by using the keyword static in the declaration. For example:
Static variables and functions are not placed into the table of globals and are not reported to the linker.
Static declarations belong in the source file; it wouldn't make much sense to place them in a header file. It is good practice always to use static for variables, constants and functions restricted to one module; this documents that they are used only in this module and allows other modules to use the same name without conflict.
C++ allows programmers to declare fragments of code as inline functions. An inline function pretends to be a normal function, but instead of implementing it as a real function, the compiler just inserts a copy of the inline function's code whenever it encounters a call to the function. Inline functions avoid the overhead associated with calling a function but make the executable code bigger. They should be used only for very short functions.
Inline functions may neither be static nor external. If an inline function is to be accessible to many modules, it should be defined in a header file. For example:
Inline functions may use prototypes but usually don't need them because their short code can be defined together with the declaration.
Modularity is essential for sound software design. Modular programs are easier to develop and test, especially for a team of programmers. They are also easier to understand and maintain because certain changes can be implemented locally and do not require extensive modifications or retesting of the entire application.
Modules should be designed, implemented, and documented with an eye to their possible future use in other projects. It is desirable to create reusable modules, isolating more general functions from more application-specific functions.
Each module is usually implemented in two separate files, a header file and a source file. The header file may contain constants, function prototypes, inline functions, definitions of data structures, and declarations of external variables. The source code contains function definitions (code) and static variables and functions. The header file is #include-ed into the source code and into other modules.
The modules are compiled separately and linked together into one executable program by a linker program. Object modules may be combined into object module libraries by using a librarian utility program. The linker can search specified libraries for modules that supply remaining unresolved external references and include them into the executable file.