In C, C++, Java and many other programming languages, the function, also known as a method or subroutine, is the basic unit of code that can be grouped together. This article talks a little about how compilers analyse your code and work out what to actually call.
C and C++ compilers read your code from top to bottom, and only deal with a single compilation unit at any one time. That's just a fancy way of saying they deal with a single .c or .cpp file at once, and don't look at the contents of any other .c/.cc files. This is an artifact of how the languages were designed (in 1969 for C, and 1978 for C++), and the capabilities of the computers back then. Splitting code into individual compilation units also forces programmers to think about not only how code is divided between different files, but also how the code is exposed between different files.
Let's start with a basic example, where all the code for a program
is in a single C++ source file; let's call it app.cc
.
#include <math.h>
#include <iomanip>
#include <iostream>
float distance(float x1, float y1, float x2, float y2) {
return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}
int main(int argc, char **argv) {
std::cout << "Distance between points is " << std::setprecision(1)
<< std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}
We can see here that our program is contained entirely within
a single file, apart from standard library functions that we
call such as sqrt()
and pow()
. We
have two functions that we define, main()
and
distance()
. But, how does the C++ compiler know
to call the function distance() when we refer to it inside of
main()
? The answer is, because we have already
defined it, above the reference in main(). When
it comes to parsing our C++ code, the compiler simply reads
the code from top-to-bottom, building up a symbol table as it
goes. When it reads the definition of distance()
,
that is, float distance(float x1, float y1, float x2, float
y2)
, it maintains a reference back to that part of the
parsed C++ code with the name 'distance'. When we subsequently
refer to distance(0.0, 0.0, 300.0, 218.0)
, the
compiler knows we already have a function defined named
distance
that takes four parameters, and will use that
reference that it saved earlier.
But, what if we didn't want to put main()
at the end of
our file? You might re-write your code to look something like this:
#include <math.h>
#include <iomanip>
#include <iostream>
int main(int argc, char **argv) {
std::cout << "Distance between points is " << std::setprecision(1)
<< std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}
float distance(float x1, float y1, float x2, float y2) {
return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}
However, unlike the earlier snippet of code, this code will not compile. You'll get an error something like this:
app.cc: In function ‘int main(int, char**)’:
app.cc:7: error: ‘distance’ was not declared in this scope
You might recall that I mentioned the C++ compiler reads
the code from top-to-bottom, so when you refer to
distance(...)
before defining it, the compiler has no
reference available to any code that it has parsed, and will simply
give up. But, there is a way to give the compiler a hint that a
function does exist, even if you haven't defined it yet in the file.
There are two ways to tell a C/C++ compiler about a function; you can provide a definition, which includes the function's type signature, and the code that actually makes up the function, or you can provide a declaration, which only includes the type signature as a hint to the compiler to say "I've defined this somewhere else, but it exists, honest". Let's make the code above able to compile again, and show the difference between a definition, an a declaration.
#include <math.h>
#include <iomanip>
#include <iostream>
// This is a declaration of distance()
// It doesn't contain the code, just the type signature.
// Notice how it ends with a semicolon, and not curly braces.
float distance(float x1, float y1, float x2, float y2);
// This is a definition of main()
int main(int argc, char **argv) {
std::cout << "Distance between points is " << std::setprecision(1)
<< std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}
// This is a definition of distance()
float distance(float x1, float y1, float x2, float y2) {
return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}
This code will compile successfully because we've already given the
compiler a hint about distance() by including the declaration
above main()
.
Let's say that we found the distance()
routine fairly
handy, and that we wanted to use it in a number of places. One way
to re-use code is to move it into a separate file, and refer to the
function in just the same way as we did above. Header files, like
the ones we're already using above, math.h
,
iomanip
and iostream
provide a set of
declarations, exposing functions that we can call,
despite not copying those functions into our own .cc file.
So, what do header files contain? They usually contain three
things; a set of declarations, like the one we
provided for distance()
; an include guard
to stop from re-declaring the same functions multiple times;
and optionally, they may #include other headers.
Let's create our own header file for distance()
,
and call it distance.h
.
#ifndef __DISTANCE_H__
#define __DISTANCE_H__ (1)
// This is a declaration of distance().
// distance() is a function that returns the Euclidean distance between
// two points, using Pythagoreas' theorem.
extern float distance(float x1, float y1, float x2, float y2);
#endif // __DISTANCE_H__
The declaration of distance()
is almost the same as the
declaration that we had in app.cc
, but you might notice
the extern
that was added at the start of the line. We'll
discuss that later. Also, you might notice the #ifndef
,
#define
and #endif
lines surrounding the declaration. This is referred to as an include
guard, and stops us including the same declaration more than once.
It's good practice to have an include guard in your header files,
typically named after the filename.
In addition to creating the header file, we need to split out the definition
of distance() into another file so it can be compiled and linked into
your program. We'll create another file, distance.cc
that
looks like this:
#include <math.h>
#include "distance.h"
// This is a definition of distance()
float distance(float x1, float y1, float x2, float y2) {
return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}
There are three important parts of this C++ file; firstly, we
#include
the system header file math.h
which defines sqrt()
and pow()
. This
is needed because we call these functions from distance()
.
Secondly, we include the header file with the declaration of distance(),
the file we just created, distance.h
.
This isn't strictly necessary, but it is good practice - if you change
the type signature in the C++ file, but not the header, your compiler
should warn you if they do not match. That way, they won't fall out of
sync. Finally, we've also put the definition of
distance()
into the C++ file. This file will be compiled by
itself, as a single compilation unit, and the compiled code will be
made available to the linker when it creates your application.
So, now that we've taken distance()
into its own file, we
need to remove it from app.cc
. We need to remove the
definition and declaration from app.cc
, and replace it
with a #include
of the header file distance.h
,
which contains the declaration of distance()
. The updated
app.cc
will look something like this:
#include <iomanip>
#include <iostream>
#include "distance.h"
int main(int argc, char **argv) {
std::cout << "Distance between points is " << std::setprecision(1)
<< std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}
app.cc
no longer contains any declaration or
definition of distance()
, but we can call it all the
same. How does it work under the hood? When you use the #include
pre-processor command, the compiler literally includes the contents of
that file in-place before trying to compile the C++ code. So, to the
compiler, app.cc
looks something like this:
#include <iomanip>
#include <iostream>
#ifndef __DISTANCE_H__
#define __DISTANCE_H__ (1)
// This is a declaration of distance().
// distance() is a function that returns the Euclidean distance between
// two points, using Pythagoreas' theorem.
extern float distance(float x1, float y1, float x2, float y2);
#endif // __DISTANCE_H__
int main(int argc, char **argv) {
std::cout << "Distance between points is " << std::setprecision(1)
<< std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}
The section in yellow is the code from distance.h
that has
been included verbatim, and the system headers iomanip
and
iostream
will be similarly expanded by the compiler.
Earlier, we saw that we needed a definition of distance() in
the same file for the compiler to know what to run when we call distance().
But, now that the code has been split into three files, app.cc
,
distance.cc
and distance.h
, app.cc does not
contain any definition of distance()
, even after expanding
the included header file. So, how does it all work?
When we moved the definition to a separate file, we also added the prefix
extern
to the declaration of distance()
. Using
extern
tells the compiler that the definition might be in
an external compilation unit (or, to put it another way, a
different .cc file), and that the compiler shouldn't worry about trying
to resolve it at compile time.
Compilers run over your C and C++ code, translating one .cpp (or .cc) file at a time from human-readable code into machine code which can run on your processor natively. However, your code most likely depends on code that other people have written, in other .cpp or .cc files which are also processed by themselves. How do we pull it all together into one coherent application? With a stage called the linker. The linker takes all the loose ends from the compiler; the extern symbols that the compiler couldn't resolve, and stitches together all the different compiled blobs of machine code, patching the loose ends together.
If we tried to compile and link app.cc
by itself, we
would get an error at the linker step, saying that it could not
resolve the function distance
. For example:
/tmp/ccFp1Ju6.o: In function `main':
app.cc:(.text+0x27): undefined reference to `distance(float, float, float, float)'
collect2: ld returned 1 exit status
To successfully build the application, we need to tell the compiler
and linker to combine both app.cc
and distance.cc
into a single output file, and then the application can successfully
be run:
Distance between points is 370.8