Home Catalog Evaluate Download

Multithreading Tutorial


When you run two programs on an operating system that offers memory protection, as Windows and UNIX/LINUX do, the two programs are executed as separate processes which means they are given separate address spaces. This means that when program #1 modifies the address 0x800A 1234 in its memory space, program #2 does NOT see any change in the contents of its memory at address 0x800A 1234. With simpler operating systems that cannot accomplish this separation of processes, a faulty program can bring down not only itself but the other programs running on that computer (including the operating system itself).

The ability to execute more than one process at a time is known as multi-processing. A process consists of a program (usually called the application) whose statements are performed in an independent memory area. There is a program counter that remembers which statement should be executed next and there is a stack which holds the arguments passed to functions as well as the variables local to functions and there is a heap which holds the remaining memory requirements of the program. The heap is used for the memory allocations that must persist longer than the lifetime of a single function. In the C language you use malloc to acquire memory from the heap and in C++ you use the new keyword.

Sometimes it is useful to arrange for two or more processes to work together to accomplish one goal. One situation where this is beneficial is where the computer's hardware offers multiple processors. In the old days this meant two sockets on the motherboard each populated with an expensive Xeon chip. Thanks to advances in VLSI integration, these two processor chips can now fit in a single package. Examples are Intel's "Core Duo" and AMD's "Athlon 64 X2". If you want to keep two microprocessors busy working on a single goal, you basically have to two choices:

  1. design your program to use multiple processes (which usually means multiple programs), or
  2. design your program to use multiple threads.

So what's a thread? A thread is another mechanism for splitting the workload into separate execution streams. A thread is lighter weight than a process. This means it offers less flexibility than a full blown process, but can be initiated faster because there is less for the operating system to set up. What's missing? The separate address space is what is missing. When a program consists of two or more threads, all the threads share a single memory space. If one thread modifies the contents of address 0x800A 1234, then all the other threads immediately see a change in the contents of their address 0x800A 1234. Furthermore, all the threads share a single heap. If one thread allocates (via malloc or new) all of the memory available in the heap, then attempts at additional allocations by the other threads will fail.

But each thread is given its own stack. This means thread #1 can be calling FunctionWhichComputesALot() at the same time that thread #2 is calling FunctionWhichDrawsOnTheScreen(). Both of these functions were written in the same program. There is only one program. But there are independent threads of execution running through that program.

What's the advantage? Well, if your computer's hardware offers two processors, then two threads can run simultaneously. And even on a uni-processor, multi-threading can offer an advantage. Most programs can't perform very many statements before they need to access the hard disk. This is a very slow operation and hence the operating system puts the program to sleep during the wait. In fact, the operating system assigns the computer's hardware resources to somebody else's program during the wait. But if you have written a multi-threaded program, then when one of your threads stalls, your other threads can continue.

The Jaeschke Magazine Articles

One good way to learn any new programming concept is to study other people's code. You can find source code in magazine articles and posted on the Internet at sites such as codeproject.com . I came across some good examples of multi-threaded programs in two articles written for the C/C++ Users Journal by Rex Jaeschke. In the October 2005 issue Jaeschke wrote an article entitled "C++/CLI Threading: Part 1" and in the November 2005 issue he wrote his follow-up article entitled "C++/CLI Threading: Part 2". Unfortunately the C/C++ Users Journal magazine folded shortly after these articles appeared. But the original articles and Jaeschke's source code are still available at the following web sites:

Part 1: http://www.ddj.com/dept/windows/184402018
Part 2: http://www.ddj.com/dept/windows/184402029

You'll notice that the content from the defunct C/C++ Users Journal has been integrated into the Dr. Dobb's Portal web site, which is associated with Dr. Dobb's Journal, another excellent programming magazine.

You might not be familiar with the notation C++/CLI. This stands for "C++ Common Language Infrastructure" and is a Microsoft invention. You're probably familiar with Java and C#, which are two languages that offer managed code where the operating system rather than the programmer is responsible for deallocating all memory allocations made from the heap. C++/CLI is Microsoft's proposal to add managed code to the C++ language.

I am not a fan of this approach so I wasn't very interested in Jaeschke's original source code. I am sure Java and C# are going to hang around but C++/CLI attempts to add so many new notations (and concepts) on top of C++, which is already a very complicated language, that I think this language will disappear.

But I still read the original C/C++ Users Journal article and thought Jaeschke had selected good examples of multi-threading. I especially liked how his example programs were short and yet displayed data corruption when run without the synchronization methods that are required for successful communication between threads. So I sat down and rewrote his programs in standard C++. This is what I am sharing with you now. The source code I present could also be written in standard C. In fact, that's easier than accomplishing it in C++ for a reason we will get to in just a minute.

This is probably the right time to read Jaeschke's original articles, since I don't plan to repeat his great explanations of multitasking, reentrancy, atomicity, etc. For example, I don't plan to explain how a program is given its first thread automatically and all additional threads must be created by explicit actions by the program (oops). The URLs where you can find Jaeschke's two articles are given above.

Creating Threads Under Windows

It is unfortunate that the C++ language didn't standardize the method for creating threads. Therefore various compiler vendors invented their own solutions. If you are writing a program to run under Windows then you will want to use the WIN32 API to create your threads. This is what I will demonstrate. The WIN32 API offers the following function to create a new thread:

uintptr_t _beginthread( 
void( __cdecl *start_address )( void * ), unsigned stack_size, void *arglist

This function signature might look intimidating, but using it is easy. The _beginthread() function takes 3 passed parameters. The first is the name of the function which you want the new thread to begin executing. This is called the thread's entry-point-function. You get to write this function, and the only requirements are that it take a single passed parameter (of type void*) and that it returns nothing. That is what is meant by the function signature:

    void( __cdecl *start_address )( void * ),

The second passed parameter to the _beginthread() function is a requested stack size for the new thread (remember, each thread gets its own stack). However I always set this parameter to 0 which forces the Windows operating system to select the stack size for me, and I haven't had any problems with this approach. The final passed parameter to the _beginthread() function is the single parameter you want passed to the entry-point-function. This will be made clear by the following example program.

#include <stdio.h>
#include <windows.h>
#include <process.h>         // needed for _beginthread()
void  silly( void * );       // function prototype

int main()
    // Our program's first thread starts in the main() function.

    printf( "Now in the main() function.\n" );

    // Let's now create our second thread and ask it to start
    // in the silly() function.

    _beginthread( silly, 0, (void*)12 );

    // From here on there are two separate threads executing
    // our one program.

    // This main thread can call the silly() function if it wants to.

    silly( (void*)-5 );

    Sleep( 100 );

void  silly( void *arg )
    printf( "The silly() function was passed %d\n", (INT_PTR)arg ) ;

Go ahead and compile this program. Simply request a WIN32 Console Program from Visual C++ .NET 2003's New Project Wizard and then "Add a New item" which is a C++ source file (.CPP file) in which you place the statements I have shown. I am providing Visual C++ .NET 2003 workspaces for Jaeschke's (modified) programs but you need to know the key to starting a multi-threaded program from scratch: you must remember to perform one modification to the default project properties that the New Project Wizard gives you. Namely, you must open up the Project Properties dialog (select "Project" from the main Visual C++ menu and then select "Properties"). In the left hand column of this dialog you will see a tree view control named "Configuration Properties" with the main sub-nodes labeled "C/C++", "Linker", etc. Double-click on the "C/C++" node to open this entry up. Then click on "Code Generation". In the right hand area of the Project Properties dialog you will now see listed "Runtime Library". This defaults to "Single Threaded Debug (/MLd)". [The notation /MLd indicates that this choice can be accomplished from the compiler command line using the /MLd switch.] You need to click on this entry to observe a drop-down list control where you must select Multi-threaded Debug (/MTd). If you forget to do this, your program won't compile and the error message will complain about the _beginthread() identifier.

A very interesting thing happens if you comment out the call to the Sleep() function seen in this example program. Without the Sleep() statement the program's output will probably only show a single call to the silly() function with the passed argument -5. This is because the program's process terminates as soon as the main thread reaches the end of the main() function and this may occur before the operating system has had the opportunity to create the other thread for this process. This is one of the discrepancies from what Jaeschke says concerning C++/CLI. Evidently in C++/CLI each thread has an independent lifetime and the overall process (which is the container for all the threads) persists until the last thread has decided to die. Not so for straight C++ WIN32 programs: the process dies when the primary thread (the one that started in the main function) dies. The death of this thread means the death of all the other threads.

Using a C++ Member Function as the Thread's Entry-Point-Function

The example program I just listed really isn't a C++ program because it doesn't use any classes. It is just a C language program. The WIN32 API was really designed for the C language and when you employ it with C++ programs you sometimes run into difficulties. Such as this difficulty: "How can I employ a class member function (a.k.a. an instance function) as the thread's entry-point-function?"

If you are rusty on your C++, let me remind you of the problem. Every C++ member function has a hidden first passed parameter known as the this parameter. Via the this parameter, the function knows which instance of the class to operate upon. Because you never see these this parameters, it is easy to forget they exist.

Now let's again consider the _beginthread() function which allows us to specify an arbitrary entry-point-function for our new thread. This entry-point-function must accept a single void* passed param. Aye, there's the rub. The function signature required by _beginthread() does not allow the hidden this parameter and hence a C++ member function cannot be directly activated by _beginthread().

We would be in a bind were it not for the facts that C and C++ are incredibly expressive languages (famously allowing you the freedom to shoot yourself in the foot) and the additional fact that _beginthread() does allow us to specify an arbitrary passed parameter to the entry-point-function. So we use a two-step procedure to accomplish our goal: we ask _beginthread() to employ a static class member function (which, unlike an instance function, lacks the hidden this parameter) and we send this static class function the hidden this pointer as a void*. The static class function knows to convert the void* parameter to a pointer to a class instance. Voila! We now know which instance of the class should call the real entry-point-function and this call completes the 2 step process. The relevant code (from Jaeschke's modified Part 1 Listing 1 program) is shown below:

class ThreadX

  // In C++ you must employ a free (C) function or a static
  // class member function as the thread entry-point-function.

  static unsigned __stdcall ThreadStaticEntryPoint(void * pThis)
      ThreadX * pthX = (ThreadX*)pThis;   // the tricky cast
      pthX->ThreadEntryPoint();           // now call the true entry-point-function

      // A thread terminates automatically if it completes execution,
      // or it can terminate itself with a call to _endthread().

      return 1;          // the thread exit code

  void ThreadEntryPoint()
     // This is the desired entry-point-function but to get
     // here we have to use a 2 step procedure involving
     // the ThreadStaticEntryPoint() function.

Then in the main() function we get the two step process started as shown below:

    hth1 = (HANDLE)_beginthreadex( NULL,         // security
                                   0,            // stack size
                                   ThreadX::ThreadStaticEntryPoint,  // entry-point-function
                                   o1,           // arg list holding the "this" pointer
                                   CREATE_SUSPENDED,  // so we can later call ResumeThread()
                                   &uiThread1ID );

Notice that I am using _beginthreadex() rather than _beginthread() to create my thread. The "ex" stands for "extended" which means this version offers additional capability not available with _beginthread(). This is typical of Microsoft's WIN32 API: when shortcomings were identified more powerful augmented techniques were introduced. One of these new extended capabilities is that the _beginthreadex() function allows me to create but not actually start my thread. I elect this choice merely so that my program better matches Jaeschke's C++/CLI code. Furthermore, _beginthreadex() allows the entry-point-function to return an unsigned value and this is handy for reporting status back to the thread creator. The thread's creator can access this status by calling GetExitCodeThread(). This is all demonstrated in the "Part 1 Listing 1" program I provide (the name comes from Jaeschke's magazine article).

At the end of the main() function you will see some statements which have no counterpart in Jaeschke's original program. This is because in C++/CLI the process continues until the last thread exits. That is, the threads have independent lifetimes. Hence Jaeschke's original code was designed to show that the primary thread could exit and not influence the other threads. However in C++ the process terminates when the primary thread exits and when the process terminates all its threads are then terminated. We force the primary thread (the thread that starts in the main() function) to wait upon the other two threads via the following statements:

    WaitForSingleObject( hth1, INFINITE );
    WaitForSingleObject( hth2, INFINITE );

If you comment out these waits, the non-primary threads will never get a chance to run because the process will die when the primary thread reaches the end of the main() function.

Synchronization Between Threads

In the Part 1 Listing 1 program, the multiple threads don't interact with one another, and hence they cannot corrupt each other's data. The point of the Part 1 Listing 2 program is to demonstrate how this corruption comes about. This type of corruption is very difficult to debug and this makes multi-threaded programs very time consuming if you don't design them correctly. The key is to provide synchronization whenever shared data is accessed (either written or read).

A synchronization object is an object whose handle can be specified in one of the WIN32 wait functions such as WaitForSingleObject(). The synchronization objects provided by WIN32 are:

An event notifies one or more waiting threads that an event has occurred.

A mutex can be owned by only one thread at a time, enabling threads to coordinate mutually exclusive access to a shared resource. The state of a mutex object is set to signaled when it is not owned by any thread, and to nonsignaled when it is owned by a thread. Only one thread at a time can own a mutex object, whose name comes from the fact that it is useful in coordinating mutually exclusive access to a shared resource.

Critical section objects provide synchronization similar to that provided by mutex objects, except that critical section objects can be used only by the threads of a single process (hence they are lighter weight than a mutex). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. There is no guarantee about the order in which threads will obtain ownership of the critical section, however, the operating system will be fair to all threads. Another difference between a mutex and a critical section is that if the critical section object is currently owned by another thread, EnterCriticalSection() waits indefinitely for ownership whereas WaitForSingleObject(), which is used with a mutex, allows you to specify a timeout.

A semaphore maintains a count between zero and some maximum value, limiting the number of threads that are simultaneously accessing a shared resource.

A waitable timer notifies one or more waiting threads that a specified time has arrived.

This Part 1 Listing 2 program demonstrates the Critical Section synchronization object. Take a look at the source code now. Note that in the main() function we create 2 threads and ask them both to employ the same entry-point-function, namely the function called StartUp(). However because the two object instances (o1 and o2) have different values for the mover class data member, the two threads act completely different from each other. Because in one case isMover = true and in the other case isMover = false, one of the threads continually changes the Point object's x and y values while the other thread merely displays these values. But this is enough interaction that the program will display a bug if used without synchronization.

Compile and run the program as I provide it to see the problem. Occasionally the print out of x and y values will show a discrepancy between the x and y values. When this happens the x value will be 1 larger than the y value. This happens because the thread that updates x and y was interrupted by the thread that displays the values between the moments when the x value was incremented and when the y value was incremented.

Now go to the top of the Main.cpp file and find the following statement:


Uncomment this statement (that is, remove the double slashes). Then re-compile and re-run the program. It now works perfectly. This one change activates all of the critical section statements in the program. I could have just as well used a mutex or a semaphore but the critical section is the most light-weight (hence fastest) synchronization object offered by Windows.

The Producer/Consumer Paradigm

One of the most common uses for a multi-threaded architecture is the familiar producer/consumer situation where there is one activity to create packets of stuff and another activity to receive and process those packets. The next example program comes from Jaeschke's Part 2 Listing 1 program. An instance of the CreateMessages class acts as the producer and an instance of the ProcessMessages class acts as the consumer. The producer creates exactly 5 messages and then commits suicide. The consumer is designed to live indefinitely, until commanded to die. The primary thread waits for the producer thread to die and then commands the consumer thread to die.

The program has a single instance of the MessageBuffer class and this one instance is shared by both the producer and consumer threads. Via synchronization statements, this program guarantees that the consumer thread can't process the contents of the message buffer until the producer thread has put something there, and that the producer thread can't put another message there until the previous one has been consumed.

Since my Part 1 Listing 2 program demonstrates the critical section, I elected to employ a mutex in this Part 2 Listing 1 program. As with the Part 1 Listing 2 example program, if you simply compile and run the Part 2 Listing 1 program as I provide it, you will see that it has a bug. Whereas the producer creates the 5 following messages:

the consumer receives the 5 following messages:

There is clearly a synchronization problem: the consumer is getting access to the message buffer as soon as the producer has updated the first character of the new message. But the rest of the message buffer has not yet been updated.

Now go to the top of the Main.cpp file and find the following statement:


Uncomment this statement (that is, remove the double slashes). Then re-compile and re-run the program. It now works perfectly.

Between the English explanation in Jaeschke's original magazine article and all the comments I have put in my C++ source code, you should be able to follow the flow. The final comment I will make is that the GetExitCodeThread() function returns the special value 259 when the thread is still alive (and hence hasn't really exited). You can find the definition for this value in the WinBase header file:


where you find STATUS_PENDING defined in the WinNT.h header file:

#define STATUS_PENDING    ((DWORD   )0x00000103L)    

Note that 0x00000103 = 259.

Thread Local Storage

Jaeschke's Part 2 Listing 3 program demonstrates thread local storage. Thread local storage is memory that is accessible only to a single thread. At the start of this article I said that an operating system could initiate a new thread faster than it could initiate a new process because all threads share the same memory space (including the heap) and hence there is less that the operating system needs to set up when creating a new thread. But here is the exception to that rule. When you request thread local storage you are asking the operating system to erect a wall around certain memory locations in order that only a single one of the threads may access that memory.

The C++ keyword which declares that a variable should employ thread local storage is __declspec(thread).

As with my other example programs, this one will display an obvious synchronization problem if you compile and run it unchanged. After you have seen the problem go to the top of the Main.cpp file and find the following statement:


Uncomment this statement (that is, remove the double slashes). Then re-compile and re-run the program. It now works perfectly.


Jaeschke's Part 2 Listing 4 program demonstrates the problem of atomicity which is the situation where an operation will fail if it is interrupted mid-way through. This usage of the word "atomic" relates back to the time when an atom was believed to be the smallest particle of matter and hence something that couldn't be further split. Assembly language statements are naturally atomic: they cannot be interrupted half-way through. This is not true of high-level C or C++ statements. Whereas you might consider an update to a 64 bit variable to be an atomic operation, it actually isn't on 32 bit hardware. Microsoft's WIN32 API offers the InterlockedIncrement() function as the solution for this type of atomicity problem.

This example program could be rewritten to employ 64 bit integers (the LONGLONG data type) and the InterlockedIncrement64() function if it only needed to run under Windows 2003 Server. But, alas, Windows XP does not support InterlockedIncrement64(). Hence I was originally worried that I wouldn't be able to demonstrate an atomicity bug in a Windows XP program that dealt only with 32 bit integers. But, curiously, such a bug can be demonstrated as long as we employ the Debug mode settings in the Visual C++ .NET 2003 compiler rather than the Release mode settings. Therefore, you will notice that unlike the other example programs inside the .ZIP file that I distribute, this one is set for a Debug configuration.

As with my other example programs, this one will display an obvious synchronization problem if you compile and run it unchanged. After you have seen the problem go to the top of the Main.cpp file and find the following statement:

static bool interlocked = false;    // change this to fix the problem

Change false to true and then re-compile and re-run the program. It now works perfectly because it is now employing InterlockedIncrement().

The Example Programs

In order that other C++ programmers can experiment with these multithreaded examples I make available a .ZIP file holding five Visual C++ .NET 2003 workspaces for the Part 1 Listing 1, Part 1 Listing 2, Part 2 Listing 1, Part 2 Listing 3, and Part 2 Listing 4 programs from Jaeschke's original article (now translated to C++) here. Enjoy !

Return to the Catalog of Programs