An Introduction To Pthreads-Tcl

This document describes changes to the Tool Control Language (Tcl) that enable it to be used in conjunction with POSIX Threads (pthreads).

Table Of Contents

1 Introduction

Tcl was originally designed for using in single-threaded programs. But recently, there has been an increasing need to use the power of Tcl in applications that contain multiple threads of control. Unfortunately, stock Tcl will not function properly in a multi-threaded application if it is used simultaneously by more than one thread.

There have been prior attempts to address this issue. Steve Jankowski created a modified version of the Tcl sources called MTtcl which allows the use of Tcl in a multi-threaded environment. But Jankowski's implementation has limitations of its own:

This article describes a new implementation of multi-threaded Tcl that is based on POSIX threads and works with Tcl version 7.6. We are currently calling the new implementation ``Pthreads-Tcl'' or ``PtTcl'' for short. (Suggestions for better names are welcomed.) PtTcl borrows some of Jankowski's ideas, but is a completely new implementation. The latest sources to PtTcl can be obtained from

http://www.hwaci.com/sw/pttcl/pttcl.tar.gz

2 Threading Model

PtTcl allows an application to have multiple Tcl interpreters running in independent threads. Each thread in a PtTcl program can contain any number of interpreters (including zero). But PtTcl only allows an interpreter to be run from a single thread. If another thread tries to use an interpreter, an error message is returned.

In regular Tcl, there is a single event queue used to process all timer and file events. In PtTcl, this concept is extended to one event queue per thread. That is to say, every thread has its own independent event queue. The fact that each thread has its own event queue is a necessary consequence of the restriction that Tcl interpreters must always be run in the same thread. Recall that the usual action taken when an event arrives is for a Tcl script to run in some interpreter. But if the event arrives to the event queue in thread A, there would be no way for the event queue to execute a Tcl script on some other thread B. Hence, each thread must have its own event queue in order to be able to invoke Tcl scripts in response to events.

Each thread has the concept of a main interpreter. The main interpreter is different from other interpreters in the same thread in only one way: you can send messages to main interpreter. Other than that, all interpreters are the same.

Messages sent to other threads are a kind of event, so a Tcl interpreter running in a given thread will not process any messages until it visits its event loop. A Tcl interpreter visits its event loop whenever it executes one of the commands vwait or update. Tcl might also visit the event loop in response to two commands that are new to PtTcl: thread eventloop and thread send.

Messages can be either synchronous (meaning they will wait for a response) or asynchronous (fire and forget). The result returned to the sending thread from a synchronous message is the result of the Tcl script in the receiving thread or possibly an error message if the message couldn't be sent for some reason. An asynchronous message has no result unless there is an error. Asynchronous messages can be broadcast to all main interpreters, or to all main interpreters except the interpreter that is doing the sending.

A message can be sent from any interpreter, not just the main interpreter, or directly from C code. There does not have to be a Tcl interpreter running in a thread in order for that thread to send a message, but a main interpreter is necessary in order for the message to be received.

Most variables used by a Tcl interpreter are private to that interpreter. But PtTcl implements a mechanism for sharing selected variables between two or more interpreters, even interpreters running in different threads. This mechanism provides a kind of crude shared memory between threads. However, it is not currently possible to put trace events on shared variables, and this limits their usefulness.

Here is a quick summary of the execution model used by PtTcl:

3 Building Pthreads-Tcl

To build PtTcl, first obtain and unpack the source tree, then cd into the directory pttcl7.6a2/unix and enter one of the commands

   ./configure --enable-pthreads
or
   ./configure --enable-mit-pthreads
Use the first form at installations where POSIX threads programs can be built simply by linking in the special -lpthreads library. The second form is for installations that use MIT pthreads and require the special pgcc C compiler.

After configurating the distribution, type

   make
to build a tclsh executable, as you normally would. Note that if you omit the --enable-pthreads or --enable-mit-pthreads option from the ./configure command, then the tclsh you build will not contain support for pthreads.

4 New Tcl Commands

The PtTcl package implements two new Tcl commands. The ``shared'' command is used to designate variables that are to be shared with other interpreters, and the ``thread'' command is used to create and control threads.

4.1 The ``shared'' command

The shared is uncomplicated. It works very much like the standard global command. Shared takes one or more arguments which are names of variables that are to be shared by all Tcl interpreters, including interpreters in other threads. Note that both interpreters must execute the shared command independently before they will really be using the same variable.

Unfortunately, the trace command will not work on shared variables. This is another consequence of the fact that a given interpreter can only be used in a single thread. When a trace is set on a variable, a Tcl script is run whenever that variable is read, written or deleted. But if the trace was set by thread A and the variable is changed by thread B, there is no way for thread B to invoke the trace script in thread A.

In order to do much of anything with shared variables, there needs to be some way to lock a variable so that two threads to try to change it at the same time. This is not hard to implement from a technical standpoint, but it is tricky to get the design right. We're still working on the design.

4.2 The ``thread'' command

The thread command is much more complex than shared. Thread contains nine separate subcommands used to create new threads, send and receive messages, query the thread database, and so forth. Each is described separately below.

thread self

Every thread in PtTcl that contains an interpreter is assigned a unique positive integer Id. This Id is used by other thread commands to designate a message recipient or the target of a join. The thread self command returns the Id of the thread that executes the command.

thread create [command] [-detach boolean]

New threads can be created using the thread create command. The optional argument to this command is a Tcl script that is executed by the new thread. After the specified script is completed, the new thread exits. If no script is specified, the command ``thread eventloop'' is used instead.

After a thread finishes executing its Tcl script, it normally waits for another thread to join with it and takes its return value. (See the thread join command below.) But if the -detach option evaluates to true, then the thread will terminate immediately upon finishing its script. A detached thread can never be joined.

Assuming the new thread is created successfully, the thread create command return the thread Id of the new thread.

thread send whom message [-async boolean]

Use the thread send command to send a message from one thread to another. The arguments to this command specify the target thread and the message to be sent. The message is really just a Tcl script that is executed on the remove thread. The thread send command normally waits for the message to complete on the remote thread, then returns the result of the script. But if the -async option is true, the thread send will return immediately, not waiting on a reply.

thread broadcast message [-sendtoself boolean]

The thread broadcast works like thread send except that it sends the message to all threads and it always operates asynchronously. It won't normally send the message to itself, unless you also specify the -sendtoself flag.

thread update

This command causes the current thread to process all pending messages, that is, messages that other threads have sent and are waiting for this thread to process. Only thread messages are processed by this command -- other kinds of pending events are ignored. If you want to process all pending events including thread messages, use the update command from regular Tcl.

thread eventloop

This command causes the current thread to go into an infinite loop processing events, including incoming messages. This command will not return until the interpreter is destroyed by either an exit command or a interp destroy {} command.

thread join [-id Id] [-timeout milliseconds]

The thread join command causes the current thread to join with another thread that has completed processing. The return value of this command is the result of the last command executed by the thread that was joined.

By default, the first available thread is joined. But you can wait on a particular thread by using the -id option.

The calling thread will wait indefinitely for another thread to join, unless you specify a timeout value. When a timeout is specified, the thread join will return after that timeout regardless of whether or not it has found another thread to join. A timeout of zero (0) can be used if you just want to do a quick check to see if there are any threads already waiting to be joined, and don't want to block.

thread list

This command returns a list of Tcl thread Id numbers for each existing thread.

thread yield

Finally, the thread yield command causes the current thread to yield its timeslice to some other thread that is ready to run, if any.

5 New C Functions

In addition to the new Tcl commands, PtTcl also provides several new C functions that can be used by C or C++ programs to create and control Tcl interpreters in a multi-threaded environment.

   int Tcl_ThreadCreate(
         char *cmdText,
         void (*initProc)(Tcl_Interp*,void*),
         void *argPtr
   );

The Tcl_ThreadCreate() function creates a new thread and starts a Tcl interpreter running in that thread. The first argument is the text of a Tcl script that the Tcl interpreter running in the new thread will execute. You can specify NULL for this first argument and the Tcl interpreter will execute the command thread eventloop. The second argument to Tcl_ThreadCreate() is a pointer to a function that can be used to initialize the new Tcl interpreter before it tries to execute its script. The third argument is the second parameter to this initialization function. Either or both of these arguments can be NULL.

The Tcl_ThreadCreate() returns an integer which is the Tcl thread Id of the new thread it creates. This is exactly the same integer that would have been returned if the thread had been created using the thread create Tcl command.

Note that the Tcl_ThreadCreate() may be called from a thread that does not itself have a Tcl interpreter. This function allows threads that don't use Tcl to create subthreads that do.

All threads created by Tcl_ThreadCreate() are detached.

Note that the (*initProc)() function might not have executed in the new thread by the time Tcl_ThreadCreate() returns, so the calling function should not delete the argPtr right away. It is safer to let the (*initProc)() take responsibility for cleaning up argPtr.

   int Tcl_ThreadSend(
         int toWhom,
         char **replyPtr,
         char *format,
         ...
   );

The Tcl_ThreadSend() function allows C or C++ code to send a message to another thread. The first argument is the Tcl thread Id number (not the pthread_t identifier) of the destination thread. You can specify a destination of zero (0) in order to broadcast a message.

The second parameter is a pointer to a pointer to a string. The message response will be written into memory obtained from the Tcl function ckalloc() and **replyPtr will be made to point to this memory. If the value of the second parameter is NULL, then the message is sent asynchronously. If the first parameter is 0, then the second parameter must be NULL or else an error will be returned and no messages will be sent.

The third parameter is a format string, in the style of printf() that specifies the message that is to be sent. Subsequent arguments are added as needed, exactly as with printf().

The return value from Tcl_ThreadSend() is the return value of the call to Tcl_Eval() in the destination thread, if this is a synchronous message, or TCL_OK if this is an asynchronous message. But TCL_ERROR might be returned if an error is encountered, such as a destination Id number of a thread that doesn't exist.

   Tcl_Interp *Tcl_GetThreadInterp(
         Tcl_Interp *interp
   );

One final function that might be useful is Tcl_GetThreadInterp(). This routine will return a pointer to the main interpreter for the calling thread. If the calling thread doesn't have a main interpreter, then the interpreter specified as its argument is made the main interpreter. If the argument is NULL, then Tcl_CreateInterp() is called to create a new Tcl interpreter which then becomes the main interpreter. At the conclusion of this function, the calling thread is guaranteed to have a main interpreter and a pointer to that interpreter will be returned.

6 Status Of PtTcl Development

PtTcl is still under development. The system contains known bugs, and existing features are subject to alteration and/or removal. You should therefore use it with caution. If you encounter bugs, or have improvement suggestions, please send them to Richard Hipp or Mike Cruse. All comments are welcomed.

So far, PtTcl has been compiled and tested only under the Linux Operating System version 2.0.0 and higher. For the pthreads library, we've used both Chris Provenzano's user-level implementation (also known as MIT Pthreads) and a kernel-level pthreads implementation by Xavier Leroy built on the clone() system call of Linux. Neither of these pthreads implementations is without flaw. Under MIT Pthreads, the exec Tcl command does not work reliably. The exec command works fine using Linux kernel pthreads, but under heavy load, the kernel's process table has been known to become corrupted, resulting in system crash. Both problems are being addressed.

There also remain many features missing from PtTcl. We've already mentioned that some kind of Tcl-level thread synchronization and locking is needed. This shouldn't be hard to implement -- it is mostly a question of trying to chose the best interface. Also, it may be that some of the library functions called by the Tcl socket command (ex: gethostbyname()) are not thread-safe and need better locking.

Last but not least, some small amount of work is yet to be done in order to get PtTcl to work with Tk.

7 PtTcl For Windows And Mac

While PtTcl has so far only been tested under Unix, but there is nothing in the implementation of PtTcl that would preclude its use under Windows or MacIntosh. All you need is a library for the target platform that implements basic pthreads functionality. We am not aware of any such library, but suspect that they do exist. If not, it really would not be much trouble to implement as a wrapper around the native Windows or MacIntosh thread capability. PtTcl only uses a few of the more basic pthreads routines, so most of the pthreads library could remain unimplemented.

If anyone undertakes to port PtTcl to Windows or Mac, we would appreciate hearing from you. Contact Richard Hipp or Mike Cruse.

8 Credits

PtTcl was developed for and released by Conservation Through Innovation, Ltd., a manufacturer of environmental and industrial control systems based in Prescott, AZ.