|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
"It is rather difficult to find short, simple examples of
coroutines which illustrate the importance of the idea; the most useful
coroutine applications generally are quite lengthy" Knuth [The Art of Computer Programming] "Subroutines are special cases of ... coroutines." |
|
Coroutines are classic programming-in-the-large methodology that most know via pipes concept which was brought to mainstream by Unix.
The term "coroutine" was coined by Melvin Conway in 1958, when he invented the concept and applied it to the construction of a compiler. He is also famous for formulation of so called Conway Law (first published in Datamation in 1968 and popularized by F. Books in The Mythical Man-Month) and Think Pascal debugger
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.
The fate of coroutine concept was much more difficult then the fate of Conway Law, which instantly found almost universal recognition. Like often happens with innovative concepts, Conway has difficulties in publishing his paper and that happened much later in Conway's article "Design of a Separable Transition-Diagram Compiler", CACM 6 (1963), 396-408. In this paper Melvin Convey considered coroutines as a natural compiler writing paradigm, which was tremendous breakthrough in compiler design for the time: you can write multipass compiler and debug it using intermediate files one stage at the time (as output of previous stage can just be saved). Then you can integrate all passes using coroutines achieving much greater efficiency and eliminating and writing to the disk except by the final stage of the compiler. Here is a couple of quotes:
Figure 4 presents the coroutine structure of the Cobol compiler designed at Case. The program is separable under the condition that the two pairs of module^ which share tables are considered to be a single coroutine.The reader is asked to understand that the present treatment is concerned more with exposition than completeness. A more thorough treatment would not ignore copy, pictures, and literals, for example. Let it suffice to say that these features can be accommodated without any significant design changes over what is presented here.
In Figure 4 solid arrows are communication paths between coroutines; dashed arrows show access to tables. When the dashed arrow points to a table the latter is being built; when the dashed arrow points away the table is supplying values to the using coroutine. The specific operations performed by the coroutines will be discussed in the following four sections.
... ... ...
...The chief purpose of a compiler-writing technique is to reduce the labor which follows analysis and which is necessary for the production of the actual compiler. There are other ways to create a cheap compiler than simply to use a compiler as a programming aid. This article attempts to suggest one such way.
If a fast compiler is desired more can be said. The front end of any fast, one pass compiler will be written with an assembler; that’s a corollary of the Seventy-five Percent Rule and some common sense about efficiency of compiler-generated code. Furthermore, the really fast compilers will have only one pass; that’s the result of an analysis of how much extra work must be done by a multi-pass compiler. Notice that a corollary of these two statements is that really fast compilers can be written only for source languages which permit one-pass compilation. This proposition ought to be taken into account by language designers.
Our experience in the development of the prototype suggests that one analyst-programmer, with one or two understanding individuals around to talk to occasionally, can produce a Cobol compiler (sans library and object-program I-0 control system) in a year or less, if he is provided with an assembler which permits incorporating all the special formats he will need into the assembly language.
But the concept itself was discovered much earlier and probably by multiple people. A primitive form of coroutine linkage had already been noted briefly as a 'programming tip' in an early UNIVAC publication. Coroutines were independently studied by J. Erdwinn and J. Merner, at about the same time as Conway. The first academic discussion was in the first volume of the Art of Computer programming by Donald Knuth (1968).
Later it was described with several interesting examples in the influential 1972 book Structured Programming by Edsger W Dijkstra, C. A. R. Hoare, and Ole-Johan Dahl based on the latter experience with Simula.
Coroutines is a natural programming technique for assembler programmers. Actually in some ways it is even more natural then subroutines. You just need to save return address from the point you are leaving coroutine (yielding control) and then jump to this saved address with appropriate saving/restoring register manipulations. Of course that requires saving of the state on exit and resoration on entry, but that a minor nuisance and simple heap allocation will do the job.
Like many other brilliant ideas in programming, the concept of coroutine was so far ahead of its time that it did not get any signicant traction outside very narrow field of simulation languages. It was almost forgotten until Unix reinscration of coportines in t he form of pipes. It was not included in explicit form in any major programming language (although PL/1 exceptions semantically were a special form of coroutine).
Sa there was dead soneces for 15 years(if we count from 1958). The "resurrection" of the concept happened only with the release of Unix in the form of pipes. Pipes are one of the most significant contributions of Unix to the OS design along with hierarchical file system and notion of shell as a separate from the kernel implmentation, as well as one important compled language (C) and one of the first scripting language (AWK). Pipes serves as a cornerstone of Unix Component Model, the first sucessgul component model introduced in software engineering. Components of the pipe are usually called filters.
Despite its great value, as a programming language construct the concept of coroutines was like red hair stepchild: it took almost 50 years to integrate it into mainstream programming languages and that happened only with growth of popularity of scripting languages which happened relatively recently, say, less then decade ago. The first scripting language that included coroutines on the level of language constucts was probably Python where coroutines were available since version 2.5. Rubi and Javascript followed. Earlier attempts were by-and-large limited to specialized or general but far from mainstream languages such as Simula I (1965), Simula-67 (1970) and Modula 2 (1980).
The reason for that is that coroutines can't be implemented using stack which was standard, dominant implementation of subroutines since the days of Algol-60. There were a couple or "deviants", but of them only Modula-2 reached some level of popularity (Simula-67 has a general purpose language too, but never managed to became a popular language). The only place where coroutines were actively used in programming from 1973 till, say, 2003 was Unix shell programming.
The good things never disappeared and situation changed to the better 50 years later with scripting languages coming into mainstream. Currently (as of 2012) coroutines are available in three major scripting languages: JavaScript (since 1.7), Python (since version 2.5) and Ruby (since 1.9). In Perl 5 they are still are not first class construct but available via coro package. One huge blunder of Java design is that despite the fact that the language has heap as the main variable allocation mechanism it does not include coroutines. But, in a way, Java is the "language for mediocrity, created by mediocrity" (aka new Cobol). Despite early hype it is a deeply flawed language that though out a lot of valuable parts of PL/1-C-C++ line of languages. In this sense this is far from surprising ;-).
In languages that does not support coroutines directly, threads can serve as a possible workaround, although threads separate namespaces more then coroutine, denying some elegance of the concept. And in the most primitive form coroutines can be implemented using switch statement with one static variable that remember the last "yield" point and providing a jump to this point from the beginning of the subroutine.
Coroutines also have higher computational costs the subroutines, as they can't rely of stack but on modern computers this is negligible penalty. While coroutines and threads may be used for the same purpose but they are definitively not the same in a very fundamental way. Coroutines share one processor with (at the level of the coroutines primitives) explicit transfer of control where threads may be executed in parallel or participate in time-sharing (without explicit control of transfer). One problem with coroutines is that blocking I/O causes the coroutines to block.
Since coroutines can have more points of entry and exit than subroutines, it is possible to implement any subroutine as a coroutine. As Donald Knuth remarked "Subroutines are special cases of ... coroutines."
Subroutines are special cases of ... coroutines |
Exception handling is essentially based on the coroutines and it is difficult to implement them properly in case the language uses stack for implementation for subroutines. Exception is a coroutine that was activates when control reached it and then stopped. It is resumed only then exception occurs. In this sense PL/1 was one of the first languages which supported coroutines, long before they make their way into Modula.
Exception handling is essentially based on the coroutines and it is difficult to be implemented properly in case the language does not support them. In this sense PL/1 was one of the first languages which supported coroutines, long before they make their way into Modula. |
Coroutines are more generic than subroutines. The lifespan of subroutines is dictated by stack which often contain temporary variables and is last in, first out (the last subroutine called is the first to return); in contrast, the lifespan of coroutines is completely arbitrary and such variables need to be allocated either statically or on the heap.
The start of a subroutine is the only point of entry. Subroutines can return control to the calling point only once for each invocation; in contrast, coroutines can return control (yield) several times. The start of a coroutine is the first point of entry and subsequent points of entry are following yield commands. The first execution of yield operation returns the result to the calling coroutine and gives it back control, like a usual subroutine. However, the next time the coroutine is called, the execution does not start at the beginning of the coroutine but at the first statement after the yield statement. So coroutine remembers the previous call context, while subroutine does not.
A good introduction to coroutines can be found in Wikipedia . Here is an abridged version
Coroutines are more generic than subroutines. The lifespan of subroutines is dictated by last in, first out (the last subroutine called is the first to return); in contrast, the lifespan of coroutines is dictated entirely by their use and need.The start of a subroutine is the only point of entry. Subroutines can return only once; in contrast, coroutines can return (yield) several times. The start of a coroutine is the first point of entry and subsequent points of entry are following yield commands. Practically, yielding returns the result to the calling coroutine and gives it back control, like a usual subroutine. However, the next time the coroutine is called, the execution does not start at the beginning of the coroutine but just after the yield call.
Here is a simple example of how coroutines can be useful. Suppose you have a consumer-producer relationship where one routine creates items and adds them to a queue and another removes items from the queue and uses them. For reasons of efficiency, you want to add and remove several items at once. The code might look like this:
var q := new queue coroutine produce loop while q is not full create some new items add the items to q yield to consume coroutine consume loop while q is not empty remove some items from q use the items yield to produceThe queue is then completely filled or emptied before yielding control to the other coroutine using the yield command. The further coroutines calls are starting right after the yield, in the inner coroutine loop.
Although this example is often used to introduce multithreading, it is not necessary to have two threads to achieve this: the yield statement can be implemented by a branch directly from one routine into the other.
Detailed comparison
Since coroutines can have more points of entry and exit than subroutines, it is possible to implement any subroutine as a coroutine. "Subroutines are special cases of ... coroutines." —Donald Knuth[2]
Each time a subroutine is called (invoked), execution starts at the beginning of the invoked subroutine. Likewise, the first time a coroutine is invoked, execution starts at the beginning of the coroutine; however, each subsequent time a coroutine is invoked, execution resumes following the place where the coroutine last returned (yielded).
A subroutine returns only once. In contrast, a coroutine can return multiple times, making it possible to return additional values upon subsequent calls to the coroutine. Coroutines in which subsequent calls yield additional results are often known as generators.
Subroutines only require a single stack that can be preallocated at the beginning of program execution. In contrast, coroutines, able to call on other coroutines as peers, are best implemented using continuations. Continuations may require allocation of additional stacks and therefore are more commonly implemented in garbage-collected high-level languages. Coroutine creation can be done cheaply by preallocating stacks or caching previously allocated stacks.
... ... ...
Coroutines are useful to implement the following:
- State machines within a single subroutine, where the state is determined by the current entry/exit point of the procedure; this can result in more readable code.
- Actor model of concurrency, for instance in computer games. Each actor has its own procedures (this again logically separates the code), but they voluntarily give up control to central scheduler, which executes them sequentially (this is a form of cooperative multitasking).
- Generators, and these are useful for input/output and for generic traversal of data structures.
Programming languages supporting coroutines
- BCPL
Pascal: Borland Turbo Pascal 7.0 with uThreads module- C#
- Erlang
- Go
- JavaScript (since 1.7)
- ...
- Lua
- ...
- Perl (Perl 5 with Coro, Perl 6 native)
- Prolog
- Python (since 2.5)
- Ruby (since 1.9, using Fibers)
- Tcl (since 8.6)
Since continuations can be used to implement coroutines, programming languages that support them can also quite easily support coroutines.
Early computer hardware handled errors it detected in one of 4 ways:
To detect an error in case (1), the program would have to include a branch to test the flag after every operation, and doing so would increase the code's complexity substantially. Case (2) allows a block of instructions to be executed, and tested jointly for acceptable execution. Case (3) allows the flexibility of both, together with the option of writing a common "error handling" routine, which could even determine where in the program the error occurred, and "recover" from it (e.g., by setting the result of an arithmetic underflow to zero, and resuming the computation after the error-producing instruction). Case (4) was introduced for floating-point computation, and is helpful in allowing data-parallel computation to proceed without introducing separate computation paths for each datum. "Exceptions" in high-level languages represent the "interrupt" option in those languages, with some limitations.
An exception is an object (data-containing record) produced when an error is detected by hardware, or "thrown" explicitly by the program. The record can contain all the information that would be needed to "resume" execution after the exception-producing operation. When an exception is thrown, control passes immediately to an exception handling routine selected by rules of the language. Usually, each block or subroutine has the option of establishing an exception-handler for itself; the exception handler invoked is selected by: A. The type of exception, and B. The handler associated with the most-recently invoked block. That is, blocks are terminated until one is reached which contains an exception-handler for the particular "exception" that was thrown. In some languages, the exception-handler has the option of ending with "resume <expression>", which returns to the point after the exception was thrown, and (if in the middle of an expression) uses <expression> as the operation's result. E.g. Java:
Any method that throws an exception must declare that fact;
A method that uses an exception-throwing method must either declare that IT throws the same exception, or provide a handler for it;
Handlers (the form is "catch <Exception name> {}") cannot "resume", and cannot find out where the exception they handle came from.
Handlers are very useful for quickly terminating a deep nest of procedures, say in a compiler that detects a syntax error, and wants to abandon part of the parsing, and start skipping input until it locates, say, a ";". However, the current designs have some problems -- handlers may not have access to variables within the block that invoked them. Logically, they provide "sneak" exits from loops and other constructs, which may cause loop behavior to be hard to understand.
Coroutines are subroutines, with neither the caller nor the callee being "dominant". Instead, they allow "democratic" program-controlled interleaving of instructions generated by both. Suppose A calls B. Then B wants to allow A to perform some more computation. B can "resume A", which then runs until it "resumes B". Then A can execute until it needs data from B, which might produce part of that data, and resume A, to examine or compute with the part produced so far. Coroutines have been exploited in the past in compilers, where the "parser" asks the "lexer" to run until the lexer has to stop (say at end of line). The lexer then resumes the parser to process that line's data, and is itself resumed to continue reading input characters. The text also shows an example of a tree-comparison problem solved logically by coroutines. Their advantage is that the cooperative behavior allows the "high-level" program to terminate the computation early, before the companion routine "completes" its assigned task. I have also used them to simulate parallel computation, when I want to build my own control over the task scheduling process.
As an interesting exercise, the text shows how coroutines can be simulated in C, using C's "setjmp()" and "longjmp()" library procedures. These procedures are intended for use in setting exception-handler routines. However, they have the property that they create concrete realizations of a "stopped" task -- an instruction counter, along with a variable reference context is stored when a setjmp occurs, and is resumed when a longjmp to the saved item is performed. The longjmp(Buf, Return) causes the setjmp(Buf) to return (again), this time returning value Return, instead of the 0 setjmp(Buf) returns when it is called.
Iterators are a special flavor of coroutines. They produce a sequence of values, rather than just one. Each time an iterator executes a yield <value> statement, the iterator returns <value>. When it is called again, the iterator picks up its computation after the yield, so it can compute the next value to return. In CLU, when the iterator finishes, the controlling for loop in which it is being called also terminates.
Java also provides the functionality of iterators. A Collection object in Java can produce an iterator object IT, with methods next(), hasNext(), and remove(). Repeatedly calling next() until hasNext() returns false steps through the elements of the collection. This seems to provide the essential features of an iterator. It also appears that no special internal mechanism is needed to implement Java Iterators -- each iterator maintains its own "state" in it private variables, and its methods can at any time test and modify that state.
A "continuation" is a parameter (descriptor or pointer) which represents "the remainder of the computation to be performed". The language Io uses continuations, together with a goto which passes parameters, as the sole control structure. Some syntactic fudging let you think of such goto's as ordinary procedures:
declare writeTwice: -> N; Continuation;
Write N; Write N; Continuation
writeTwice 7;
Write 6;
terminate.
writeTwice is declared to accept N, and Continuation as parameters. The "call" on writeTwice performs
a goto to its declaration, which, which after writing N twice, "calls" the Continuation. In fact,
the actual parameter corresponding to Continuation is
"Write 6; terminate."
Continuations of a sort are present in LISP...
The "setjmp(Buf)" operation acts as a form of "continuation" in C. On call, it takes a snapshot of the current environment, and places it into data structure Buf. A subsequent "longjmp(Buf, ReturnValue)" causes the original "setjmp(Buf)" to return again. The first return from setjmp() returns 0, while subsequent returns via "longjmp()" return the ReturnValue instead. A mechanism like this allows the "concretization" of a "program in operation", allowing this conceptual object to be stored, copied, and "executed" under full program control. This forms the basis for the implementation of other control structures, like co-routines, in C.
These control structures are based on the notion of making a hidden object, a "partially completed execution" concrete, so it can be stored, passed as a parameter, returned from a procedure, an ultimately executed. Making hidden objects "first class" in this sense adds power to a language.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
|
A block of code is said to be active while it is being executed. An activation record is a data structure which describes a single activation of a procedure, function or other block of code. The activation record of a block holds the values of all variables local to the block, and it holds sufficient information to establish an environment for interpreting references global to that block. In the context of procedures and functions, the times at which the activation records are allocated and deallocated determine the lifetimes of the local variables and whether or not the procedures or functions are allowed to be recursive.
Older languages such as FORTRAN had ill-specified local variable lifetimes, and most implementations of such languages use static allocation of activation records. As a result, classic FORTRAN implementations do not allow recursion, and local variables retain their values from one call to a procedure to the next. In contrast, the languages descended from Algol, including Pascal, C, Ada and Java, all allocate a new activation record with each call to a procedure and deallocate it on return. Thus, local variables in such languages do not retain their values from one call to the next but recursion is allowed.
When one procedure or function calls another, the activation of that procedure is not considered to be terminated; instead, it is considered to be suspended until the called procedure or function returns. As a result, when a procedure calls itself recursively, there will be a number of coexisting activations of that procedure, each with its own activation record. All but one of these activations will be suspended awaiting the termination of others, and the one which is not suspended will be the one which was called most recently.
The sequence of machine instructions which is used to transfer control to a procedure is called the calling sequence for that procedure. The code at the head of the called procedure to which the call transfers control is called the receiving sequence.
Other terms, such as register save sequence and entry sequence are sometimes used.Finally, the code at the end of the procedure which returns control to the caller is called the return sequence. These sequences of instructions are strongly dependent on each other, so it is common to use the term calling sequences in a more generic sense to refer to all of them. The remainder of this section illustrates the concept of activation record management with a discussion of the calling sequences which might be used for procedures on a typical computer.Most computers have a special procedure calling instruction which will be at the heart of any calling sequence. For the examples used here, the JSR instruction will be used, a mnemonic for jump to subroutine. The basic resources of the example machine and the semantics of the JSR, JMP and MOV instructions are given in Figure 15.1.
It will be assumed that the example machine has multiple general registers (the particular number of registers is unimportant), and it will be assumed that the basic process performed by the central processor is to repeatedly fetch instructions from M[PC], increment PC, and then execute the instruction as suggested by the procedural descriptions of the instruction semantics given in Figure 15.1.type { basic resources } word = integer; address = integer; var { registers in the central processor } R: array [0...] of word; PC: address; { main memory } M: array [address] of word; { definitions of instructions } procedure JSR( var link: word; destination: address ); begin { jump to subroutine } link := PC; PC := destination; end { JSR }; procedure JMP( destination: address ); begin { jump } PC := destination; end { JMP }; procedure MOV( source: word; var destination: word ); begin { move data } destination := source; end { MOV };Figure 15.1. Formal definition of a partial machine architecture.
As an example of a simple instruction, MOV(R[1],R[2]) will move the contents of the register R[1] into the register R[2]. Note that, in the examples given here, it will be assumed that the assembler uses a Pascal or C like syntax to specify operand addressing modes.
The JSR instruction first moves the old value of the program counter register PC into a linkage register (or memory location) specified by the programmer and then it transfers control to the destination address. Because of the convention that the program counter is incremented after fetching the instruction and prior to execution, the saved value of the program counter refers to the instruction immediately following the JSR instruction; that is, it is the return address.
The JSR instruction described here is very similar to the corresponding instructions on the IBM 360/370/390, the Power PC, the Silicon Graphics R series super-microprocessors, and other modern RISC machines. Many older architectures, notably the classic Intel microprocessors, the widely studied DEC PDP-11 and VAX architectures, and the old Motorola 680x0 microprocessors have procedure call instructions which are far more complex.
All calling sequences on the example machine will have a structure which is similar to that shown in Figure 15.2.
; calling sequence -- save any important register values -- place parameters in agreed upon registers JSR( R[link], <destination> ) -- get results from agreed upon registers -- restore any important register values ; receiving sequence MOV( R[link], M[linksave] ) -- save parameters (if needed) ; return sequence -- get parameters (if needed) JMP( M[linksave] )Figure 15.2. A skeletal calling sequence.
Lines starting with -- in Figure 15.2 indicate sequences of additional instructions which might have to be added when this skeletal calling sequence is used as part of a real program. The particular register used for linkage is of no importance, and the memory location in which the linkage register is saved depends on the procedure being called. Some calling sequences allow the use of a linkage register to be bypassed entirely by having the procedure call instruction save the return address directly in memory, for example, on a stack.
The skeletal calling sequence shown in Figure 15.2 is typical of the sequences used for hand-coded assembly language procedures with small numbers of parameters and statically allocated activation records. If large numbers of parameters are to be passed, data structures must be constructed in memory to hold them, and if activation records are dynamically allocated, code must be added to the calling, receiving and return sequences to take care of this.
The division of responsibility between the calling, receiving, and return sequences may vary considerably. Activation records may be allocated by either the calling sequence or the receiving sequence, and they may be deallocated by either the return sequence or the part of the calling sequence after the JSR instruction.
Activation records for languages such as Pascal, C, C++ and Java can be allocated on a stack, since they are allocated in a strictly last-in first-out order. Although most language implementations use a stack, there is no requirement that a stack be used for this. It would be perfectly practical to allocate activation records on a heap, using general purpose allocate and deallocate routines, but calls to these routines must use a special calling sequence and their activation records must be allocated specially, since we cannot call the allocate routine in order to allocate an activation record for itself.
In addition to the local variables of the called procedure, each activation record must contain at least two other items: The identity of the activation record of the caller, and information about how to return control to the caller. These fields have many different common names, but the former is typically called the dynamic link or return link, while the latter is called either a return address or a continuation point. Unfortunately, the latter terms are not interchangible. Both refer to the place where the linkage register is saved, that is, the address in the calling routine where execution is to resume when the called routine returns. If the linkage register is saved in the called routine's activation record, it is called the return address; if it is saved in the calling routine's activation record, it is called the continuation point.
A typical activation record format using continuation points is shown in Figure 15.3.
During the execution of any procedure, some register must be reserved to point to the current activation record. This will be called R[local] in the examples used here, since this register is used as the index register for the local variables of the current procedure. Other names for this register abound, among them, FP, the frame pointer register; this name comes from the common use of the term stack frame to refer to an activation record allocated on a stack. The constants link and cont will be used for the offsets of the return link and continuation point fields of the activation record, so M[R[local]+link] and M[R[local]+cont] will be the memory locations holding the return link and continuation point of the current activation record.| | | | |========| <------ bottom of activation record | link | the return link |--------| | cont | the continuation point |--------| | . parameters (if any) . . | |--------| | . local variables (if any) . . | |--------| | . temporary variables (if any) . . | |========| <------ top of activation record | |Figure 15.3. The structure of an activation record.
The activation record illustrated in Figure 15.3 includes no assumptions about the use of a stack. While most modern languages push activation records onto a stack, nothing requires this! The calling sequence given in Figure 15.4, for example, makes no assumptions about the use of a stack.
The calling sequence given in Figure 15.4 uses calls to allocate and deallocate to manage activation records on the heap; as noted previously, the calling sequences for these routines must be special. As coded here, they use a special dedicated location, M[alloclink] to store their return address, and they presumably have statically allocated activation records. This calling sequence requires that the caller do most of the work! The result is receiving and return sequences which are relatively simple, and the code to move parameter values into place can be simple because the activation record of the caller is available for computing parameter values at the same time that the activation record of the called routine is available for storing them.; special calling sequence for allocate -- put size of desired block in R[temp] JSR( M[alloclink], allocate ) -- pointer to allocated block is returned in R[temp] ; special calling sequence for deallocate -- put pointer to deallocated block in R[temp] JSR( M[alloclink], deallocate ) ; regular calling sequence -- save any important register values in caller's activation record MOV( <destination activation record size>, R[temp] ) JSR( M[alloclink], allocate ) -- insert parameter values in new activation record MOV( R[local], M[R[temp]+link] ) JSR( M[R[local]+cont], <destination> ) JSR( M[alloclink], deallocate ) ; receiving sequence MOV( R[temp], R[local] ) ; return sequence MOV( R[local], R[temp] ) MOV( M[R[temp]+link], R[local] ) JMP( M[R[local]+cont] )Figure 15.4. A calling sequence with activation records on the heap.
One disadvantage of the calling sequence given in Figure 15.4 is that the caller must know the activation record size needed by the called routine in order to allocate it. Simple programming conventions can overcome this problem; for example, it can be required that each procedure declaration provide a symbolic definition of the activation record size for that procedure, for example, we might have the convention that the procedure proc has its activation record size given by procARS.
Many systems have calling sequences which are carefully designed to put most of the work in the receiving and return sequences. If this is done right, the result is a 'universal calling sequence' which allows programs written in any language to call procedures written in any other language. One way in which operating system designers encourage the use of a system wide standard calling sequence is by providing a set of macros, giving them names such as CALL, PROC and RETURN, which make the use of the standard sequences so convenient that few programmers invent other calling sequences. Where such macros are not provided, assembly language programmers are well advised to write their own in order to avoid problems with calling sequences. Programmers should be aware that the standard calling sequences provided by some machines are not particularly efficient, so it is sometimes profitable to design and use special purpose calling sequences.
The calling sequences used by languages such as Algol, Pascal, PL/I and Ada are more complex than that given above because of the problems of dealing with nested blocks. C and C++ have far more restricted nesting because they forbid functions from being declared local to other functions. The possibility of local procedures and functions introduces a need for a static link in each activation record which is used to identify the activation record of the block that statically encloses the block associated with the current activation record. Whenever a variable declared in one of the enclosing blocks is referenced, the chain of static links is followed out to the appropriate activation record. There is an alternative to the static link called a display, which is a global array of pointers to the current activation records of all accessible blocks, but this also requires a new entry in each activation record to save the previous display entry for each block. These issues are beyond the scope of this work.
Calling sequences using a stack are not difficult to design; space can be allocated by pushing the initial values of the allocated storage on the stack, and deallocation can be done by popping the stack. Figure 15.5 illustrates a typical structure for an activation record on a stack.
| | | | previous activation record | | |========| R[local]->| link | the return link | |--------| - a stack mark | cont | the continuation point | |--------| | . parameters, local variables, and temporaries (if any) . . | top word on stack |========| R[stack]->| | first free word beyond stack top | |Figure 15.5. A typical activation record on the stack top.
In Figure 15.5, it is assumed that the stack pointer register R[stack] normally points to the first free word beyond the top item on the stack. We will also assume that there are instructions PUSH and POP for operating on the stack, and that these increment and decrement the stack pointer, respectively. The basic structure of the activation record shown here is the same as that shown in Figure 15.3 except for the fact that activation records are embedded in the stack.
When activation records are pushed on a stack, the pair of locations containing the return link and the continuation point are sometimes called a stack mark. Consecutive activation records on the stack are separated by the stack marks, so these locations mark the point in the stack which determines how far the stack must be popped when deallocating an activation record. Some machines have special instructions for pushing a stack mark onto the stack.
A calling sequence using the stack based activation record format shown in Figure 15.5 is given in Figure 15.6.
; regular calling sequence -- save any important register values in caller's activation record MOV( R[stack], R[temp] ) | PUSH( R[local] ) - push the stack mark PUSH( 0 ) | -- PUSH parameter values on stack JSR( M[R[local]+cont], <destination> ) ; receiving sequence MOV( R[temp], R[local] ) -- PUSH initial values of local variables ; return sequence MOV( R[local], R[stack] ) MOV( M[R[local]+link], R[local] ) JMP( M[R[local]+cont] )Figure 15.6. A stack based calling sequence.
The calling sequence in Figure 15.6 is a straightforward modification of that shown in Figure 15.4. It differs from the original in the use of PUSH instructions to build the stack mark and allocate storage for the new activation record. As a result of this, the caller need only push the stack mark and the required number of parameters on the stack; the called routine is responsible for pushing the local variables and any temporary storage. In fact, if the programming language does not advertise any particular initial value for local variables, the called routine may simply increment the stack pointer by the number of local variables needed in order to avoid the need to push default initial values one by one, only to have the programmer ignore the default initial values.
The return sequence shown in Figure 15.6 deallocates the activation record by reseting the stack pointer to point to the start of its activation record; note that the base register for the activation record was initially derived from the value of the stack pointer prior to the call, so this simple assignment to the stack pointer undoes the effects of any pushes since that time. It should be noted that the use of a stack pointer register is not always required when pushing activation records on a stack! Simpler, although less intuitive calling sequences make use of the fact that it is usually possible to statically determine the size of the stack within each activation record from an inspection of the code, so the stack top at any point in the code can be determined by a constant displacement from the base of the current activation record.
Some machines designed in the early 1970's include stack mechanisms which are particularly inconvenient from the point of view of activation record management. These include the DEC PDP-11, the R 6502, the Intel 8080 and its descendants up through the Pentium. The problem on these machines is that the procedure call instruction itself pushes the return address on the stack and the return instruction simply pops the stack into the program counter. As a result, the return address is typically above any parameters which were pushed on the stack instead of being incorporated into a stack mark.
The basic return instructions on these machines do nothing to solve the problem of activation record deallocation. As a result, the return sequence must typically pop all local variables off of the stack prior to the return instruction, and the calling sequence, after control returns, must finish deallocating the activation record by popping the parameters. Late in the history of these architecture, special instruction, for example, the MARK instruction on the PDP-11, were introduced to try to solve these problems, but very few programmers ever understand how to use such instructions, and as a result, they are rarely used.
Coroutines
A coroutine is a routine which does not enforce a last-in first-out ordering on control transfers that we associate with procedures and functions. As a result, activation records for coroutines cannot usually be allocated on a stack, but must be allocated from a heap. Like a procedure, a coroutine is a body of code which must be associated with an activation record before it is run. Unlike procedures, however, the activation records for coroutines are not automatically allocated as a result of control transfers. Instead, the user must explicitly allocate coroutine activation records prior to a transfer of control, and the user may transfer control to a given coroutine, running in a given activation record, a number of times before that coroutine terminates.
The transfer of control to a coroutine is accomplished by a resume operation. This causes the computation which initiated the control transfer to be suspended, as a coroutine, and it causes the resumption of the computation to which control is transferred. As a result, two or more coroutine activations may easily transfer control back and forth, where each time a coroutine regains control, it picks up at the point it was most recently executing. Note the distinction between the terms coroutine and coroutine activation. A coroutine is a block of code which may be activated, producing an activation to which control may be transferred. The alternate term coroutine instance is sometimes used as a synonym for coroutine activation, because the activation record for a coroutine can be viewed as an instance of a special kind of class.
When procedures were the only concern, there was little reason to choose between the use of continuation points and return addresses in designing activation records and calling sequences. With coroutines, on the other hand, the choice is obvious: Continuation points should be used. This is because, when control is transferred to a coroutine, it must be possible to find the correct value of the program counter for that coroutine without knowledge of how control was most recently transferred away from that coroutine. Recall that continuation points are stored in the activation record of the routine as control is passed to some other routine, while return addresses tend to be stored in the activation record of the routine to which control is being transferred.
As motivation for the development of coroutines, consider the problem which was originally presented in Figure 10.3 through Figure 10.5. The problem was to write a program which processed data which arrived at some rate at an input port, delete all occurences of 'z' from the input, double all occurences of 'x', and pass the data on to an output port. Input/output at these ports happens at varying rates, so queues must be used to buffer the transfer of data between the input/output ports and the applications program.
A coroutine-based solution to this problem is given in Figure 15.7, using a Pascal-like language supporting coroutines.
{ global variables } var poll, appl: activation; coroutine pollingloop; begin repeat if inputstatus = ready and not full( inputqueue ) then enqueue( inputqueue, inputdata ); if outputstatus = ready and not empty( outputqueue ) then dequeue( outputqueue, outputdata ); resume appl; forever; end { pollingloop }; coroutine application; var ch: char; begin repeat while empty( inputqueue ) do resume poll; dequeue( inputqueue, ch ); if ch \(!= 'z' then begin if ch = 'x' then begin while full( outputqueue ) do resume poll; enqueue( outputqueue, ch ); end; while full( outputqueue ) do resume poll; enqueue( outputqueue, ch ); end; forever; end { application }; begin { main program } poll := activate pollingloop; appl := activate application; resume poll; end { main program }.Figure 15.7. An example using coroutines.
In Figure 15.7, variables of type activation hold pointers to activation records, and the special operator activate takes the name of a coroutine as an operand and returns a pointer to an activation record for that coroutine. The activation record allocated by activate has a size sufficient for the coroutine in question and has its continuation point set to the first instruction of the body of that coroutine. The resume statement takes a pointer to an activation record as an argument and transfers control to that activation record.
The important thing to note about Figure 15.7 is that it allowed the problem originally posed in Chapter 10 to be solved with what look like two main programs, each of which is a coroutine. The first of these, pollingloop, emphasizes the structure of the problem from the point of view of performing input/output; thus, the control structure is the same as that of the main polling loop solution illustrated in Figure 10.2. The second coroutine, application, has a structure which emphasizes the problem which is to be solved; thus, the control structure is the same as that of the polling procedure solution shown in Figure 10.5.
The code for the application coroutine shown in Figure 15.7 could have been improved by the use of read and write procedures which would encapsulate the references to the input and output queues and hide the small polling loops which await status changes in these queues. Unfortunately, this change introduces the problem of how a procedure or function can be called from within a coroutine and resume another. This problem has been solved, but the solutions are beyond the scope of this discussion.
As with calling sequences, there are many possible coroutine resume sequences. The resume sequences discussed here use the activation record structure shown in Figure 15.3. This structure includes a link field which is not really needed in the context of "pure" coroutines, but is quite useful in situations where it is useful for a coroutine to be able to resume its caller. The structure also includes provisions for parameters. Parameters were not used or needed in the example given in Figure 15.7, but many coroutine systems allow parameters to be passed to a coroutine at the time of activation record allocation.
Two coroutine resume sequences are shown in Figure 15.8.
; resume sequence 1 -- save any important register values MOV( R[local], R[temp] ) MOV( <destination activation record>, R[local] ) JSR( M[R[temp]+cont], M[R[local]+cont] ) -- restore saved register values ; resume sequence 2 -- save any important register values MOV( <destination activation record>, R[temp] ) JSR( M[R[local]+cont], M[R[temp]+cont] ) MOV( R[temp], R[local] ) -- restore saved register valuesFigure 15.8. Two coroutine resume sequences.
The first resume sequences shown in Figure 15.8 does all of the work before transferring control, while the second transfers control as soon as possible, requiring that the destination coroutine finish the job. These resume sequences are equally good, but it is important to note that they are not compatible; a coroutine which uses one of these resume sequences cannot resume one which uses the other. The second resume sequence may be somewhat confusing because the first two instructions are executed prior to a transfer of control to a different coroutine, while the third instruction is executed as a result of some other coroutine resuming the original one.
Generators
A generator is a special kind of coroutine which is similar to a procedure in many ways. As with coroutines, the allocation of an activation record for a generator must be done explicitly prior to any transfer of control to that generator, but unlike coroutines, the transfer of control to a generator activation resembles a procedure call. The difference between a call to a generator activation and a call to a procedure is that each call to a generator activation causes it to begin execution immediately after the point where it most recently returned.
The term 'generator' stems from the use of functions which generate sequences of values. Conventionally, such a function would be written using global variables to remember previous values to be used in generating the next. Generators allow these to be replaced with local variables which persist from one call to the generator to the next. The function and generator approaches are contrasted in Figure 15.9.
Note that a 'generator function' is a generator which returns a value each time it returns to its caller.{ global variables needed to generate the Fibonacci numbers } var i: integer { initially 0 }; j: integer { initially 1 }; function fibonacci: integer; var k: integer; begin fibonacci := i; k := i + j; i := j; j := k; end { fibonacci }; generator function fibonacci: integer; var i,j,k: integer; begin i := 0; j := 1; repeat return i; k := i + j; i := j; j := k; forever; end { fibonacci };Figure 15.9. Generators and functions contrasted.
Assuming that the global variables used by the function fibonacci in Figure 15.9 are properly initialized, and assuming that they are not accidentally used by other code, successive calls to this function will return successive values of the Fibonacci series. The generator returns these same values, but it has the advantage of having complete control over its local variables and their initialization. Note that, unlike the function, the generator cannot be called until an activation record has been allocated, for example, by an activate operation, and that the call to the generator must be done using something like the resume statement used with coroutines.
In an object oriented language, generators would be modeled as objects, where the local variables of the generator are private components of the object, and the generator function itself is the method of the object. This object oriented solution is intermediate between the simple functional model and the couroutine model for achieving the same effect.
The transfer of control to a generator is accomplished by a generator resume sequence, and the return of control from a generator to its caller is done by a generator return sequence. These strongly resemble hybrids of the sequences used with procedures and coroutines. As was the case with coroutines, the activation record structure shown in Figure 15.3 is also sufficient for generators. Two example sets of sequences for use with generators are shown in Figure 15.10.
; generator resume sequence 1 -- save any needed registers MOV( R[local], R[temp] ) MOV( <destination activation record>, R[local] ) MOV( R[temp], M[R[local]+link] ) JSR( M[R[temp]+cont], M[R[local]+cont] ) -- restore any needed registers ; generator return sequence 1 -- save any needed registers MOV( R[local], R[temp] ) MOV( M[R[temp]+link], R[local] ) JSR( M[R[temp]+cont], M[R[local]+cont] ) -- restore any needed registers ; generator resume sequence 2 -- save any needed registers MOV( <destination activation record>, R[temp] ) JSR( M[R[local]+cont], M[R[temp]+cont] ) -- restore any needed registers ; generator return sequence 2 -- save any needed registers MOV( R[local], R[temp] ) MOV( M[R[temp]+link], R[local] ) JSR( M[R[temp]+cont], M[R[local]+cont] ) MOV( R[local], M[R[temp]+link] ) MOV( R[temp], R[local] ) -- save any needed registersFigure 15.10. Two sets of generator calling and return sequences.
Both of sets of sequences given in Figure 15.10 accomplish the same thing, but one sequence puts an equal burden on the caller and the generator itself, while the other puts most of the work in the generator, with very little work in the caller.
Note that the execution of the first half of a resume sequence will transfer control to the second half of some return sequence, and the execution of the first half of a return sequence will transfer control to the second half of a resume sequence. This symmetry illustrates the close relationship between generators and coroutines, but it also allows the relationship between generators and procedures to be clarified: A generator resume sequence is similar to a procedure calling sequence, and a generator return sequence is similar to a procedure return sequence followed immediately by a procedure receiving sequence, with the additional constraint that the procedure return sequence is modified to save the continuation point of the procedure so that the next call will transfer control to the following receiving sequence.
cpan.org
Coro started as a simple module that implemented a specific form of first class continuations called Coroutines. These basically allow you to capture the current point execution and jump to another point, while allowing you to return at any time, as kind of non-local jump, not unlike C's
setjmp
/longjmp
. This is nowadays known as a Coro::State.One natural application for these is to include a scheduler, resulting in cooperative threads, which is the main use case for Coro today. Still, much of the documentation and custom refers to these threads as "coroutines" or often just "coros".
A thread is very much like a stripped-down perl interpreter, or a process: Unlike a full interpreter process, a thread doesn't have its own variable or code namespaces - everything is shared. That means that when one thread modifies a variable (or any value, e.g. through a reference), then other threads immediately see this change when they look at the same variable or location.
Cooperative means that these threads must cooperate with each other, when it comes to CPU usage - only one thread ever has the CPU, and if another thread wants the CPU, the running thread has to give it up. The latter is either explicitly, by calling a function to do so, or implicity, when waiting on a resource (such as a Semaphore, or the completion of some I/O request). This threading model is popular in scripting languages (such as python or ruby), and this implementation is typically far more efficient than threads implemented in other languages.
Perl itself uses rather confusing terminology - what perl calls a "thread" (or "ithread") is actually called a "process" everywhere else: The so-called "perl threads" are actually artifacts of the unix process emulation code used on Windows, which is consequently why they are actually processes and not threads. The biggest difference is that neither variables (nor code) are shared between processes or ithreads.
Cooperative ThreadsCooperative threads is what the Coro module gives you. Obviously, you have to
use
it first:use Coro;To create a thread, you can use the
async
function that automatically gets exported from that module:async { print "hello\n"; };Async expects a code block as first argument (in indirect object notation). You can actually pass it extra arguments, and these will end up in
@_
when executing the codeblock, but since it is a closure, you can also just refer to any lexical variables that are currently visible.The above lines create a thread, but if you save them in a file and execute it as a perl program, you will not get any output.
The reasons is that, although you created a thread, and the thread is ready to execute (because
async
puts it into the so-called ready queue), it never gets any CPU time to actually execute, as the main program - which also is a thread almost like any other - never gives up the CPU but instead exits the whole program, by running off the end of the file. Since Coro threads are cooperative, the main thread has to cooperate, and give up the CPU.To explicitly give up the CPU, use the
cede
function (which is often calledyield
in other thread implementations):use Coro; async { print "hello\n"; }; cede;Running the above prints
hello
and exits.Now, this is not very interesting, so let's try a slightly more interesting program:
use Coro; async { print "async 1\n"; cede; print "async 2\n"; }; print "main 1\n"; cede; print "main 2\n"; cede;Running this program prints:
main 1 async 1 main 2 async 2This nicely illustrates the non-local jump ability: the main program prints the first line, and then yields the CPU to whatever other threads there are. And there is one other, which runs and prints "async 1", and itself yields the CPU. Since the only other thread available is the main program, it continues running and so on.
Let's look at the example in more detail:
async
first creates a new thread. All new threads start in a suspended state. To make them run, they need to be put into the ready queue, which is the second thing thatasync
does. Each time a thread gives up the CPU, Coro runs a so-called scheduler. The scheduler selects the next thread from the ready queue, removes it from the queue, and runs it.
cede
also does two things: first it puts the running thread into the ready queue, and then it jumps into the scheduler. This has the effect of giving up the CPU, but also ensures that, eventually, the thread gets run again.In fact,
cede
could be implemented like this:sub my_cede { $Coro::current->ready; schedule; }This works because
$Coro::current
always contains the currently running thread, and the scheduler itself can be called directly viaCoro::schedule
.What is the effect of just calling
schedule
without putting the current thread into the ready queue first? Simple: the scheduler selects the next ready thread and runs it. And the current thread, as it hasn't been put into the ready queue, will go to sleep until something wakes it up. If. Ever.The following example remembers the current thread in a variable, creates a thread and then puts the main program to sleep.
The newly created thread uses rand to wake up the main thread by calling its
ready
method - or not.use Coro; my $wakeme = $Coro::current; async { $wakeme->ready if 0.5 > rand; }; schedule;Now, when you run it, one of two things happen: Either the
async
thread wakes up the main thread again, in which case the program silently exits, or it doesn't, in which case you get something like this:FATAL: deadlock detected. PID SC RSS USES Description Where 31976480 -C 19k 0 [main::] [program:9] 32223768 UC 12k 1 [Coro.pm:691] 32225088 -- 2068 1 [coro manager] [Coro.pm:691] 32225184 N- 216 0 [unblock_sub scheduler] -Why is that? Well, when the
async
thread runs into the end of its block, it will be terminated (via a call toCoro::terminate
) and the scheduler is called again. Since theasync
thread hasn't woken up the main thread, and there aren't any other threads, there is nothing to wake up, and the program cannot continue. Since there are threads that could be running (main) but none are ready to do so, Coro signals a deadlock - no progress is possible. Usually you also get a listing of all threads, which might help you track down the problem.However, there is an important case where progress is, in fact, possible, despite no threads being ready - namely in an event-based program. In such a program, some threads could wait for external events, such as a timeout, or some data to arrive on a socket.
Since a deadlock in such a case would not be very useful, there is a module named Coro::AnyEvent that integrates threads into an event loop. It configures Coro in a way that, instead of
die
ing with an error message, it instead runs the event loop in the hope of receiving an event that will wake up some thread.
Dru's Blog
About 10 months ago, I was writing a library. As I was writing it, I started to look at the whole issue of notifying the caller of errors. In typical fashion, I tried to optimize the error handling problem rather than just do the right thing, and just use error codes. I did a ton of research. Here is a current list of links and articles on the subject.
Getting Started
To get you started here are some good starting points. They both received a lot of attention on the internet.
A colorful post by Damien Katz.
A nice opinion piece that is pro-error codes by the famous Joel of Joel on Software.
Read my original post with excellent comments by Daniel Lyons, Paul Clegg, and Neville of the North.
Nutshell
The default and standard way of handling errors since the begining is to just use error codes with some convention of noticing them. For example, you could document the error condition with an api and then set a global variable for the actual code. It is up to the programmer calling the function to notice the error and do the right thing.
This is the technique used by operating systems and most libraries. Historically, these systems have never been consistent or compatable with other conventions. The most evolved system for this would probably be the Microsoft COM system. All functions return an HRESULT, which is essentially an error code.
The next system was the 'exception-handling' system. In this system errors cannot be ingored. Exception handlers are declared, optionally, at a given scope. If an exception is thrown (ie an error has occurred), handlers are searched up the stack until a matching handler is found.
IMHO, the exception system isn't used properly in 90% of the cases. There is a fine balance between a soft error and something exceptional. The syntax also tends to get in the way for even the simplest of errors. I agree that there should be errors that are not ignored, but there has to be a better way.
So, old skoolers are 'we use error codes, and we like them, dammit - aka, super disciplined programming, usually for real-time, embedded and smaller systems.
The new schoolers are, 'you have to be kidding about error-codes, use exceptions' - aks, yeah, we use exceptions, that is what the language gives us… and btw, no, we don't mind typing on our keyboards a lot
Somehow, there has to be a better way. Maybe it will be system or application, specific.
Moving On - Old / New Ideas
If you don't mind it being a C++ article, here is an amazing one from Andrei Alexandrescu and Petru Marginean. (Andrei is widely known for his great work on Policy Based design with C++, which is excellent) The artcle is well written and practical. In fact, the idea was so good, the language 'D' made it part of the language.
Here is an example:
void User::AddFriend(User& newFriend) { friends_.push_back(&newFriend); try { pDB_->AddFriend(GetName(), newFriend.GetName()); } catch (...) { friends_.pop_back(); throw; } }
10 lines, and this is for the super-simple example.
void User::AddFriend(User& newFriend) { friends_.push_back(&newFriend); ScopeGuard guard = MakeObjGuard(friends_, &UserCont::pop_back); pDB_->AddFriend(GetName(), newFriend.GetName()); guard.Dismiss(); }
In D it would look even cleaner:
void User::AddFriend(User& newFriend) { friends_.push_back(&newFriend); scope(failure) friends_.pop_back(); pDB_->AddFriend(GetName(), newFriend.GetName()); }
IMHO, I think exception handling will move more towards systems like this. Higher level, simpler and cleaner.
Other interesting systems are the ones developed for Common Lisp, Erlang, and Smalltalk. I'm sure Haskell has something to say about this as well.
The Common Lisp and Smalltalk ones are similar. Instead of forcing a mechanism like most exception handlers. These systems give the exception 'catcher' the choice of retry'ing or doing something different at the point of the exception. Very powerful.
Speaking of smalltalk, here is an excellent article called Subsystem Exception Handling in Smalltalk. I highly recommend it.
My Recomendation
If you are building a library, use error codes. Error codes are much easier to turn into exceptions by the language wrapper that will eventually be built on top.
When programming, don't get trapped into think about the little picture. A lot of these errors are just pawns in the grand scheme of assuring that you have all of your resources in place before you begin your task at hand. If you present your code in that manner, it will be much easier to understand for all parties.
More Links
Error Codes vs. Exceptions by Damien Katz.
opinion piece that is pro-error codes by the famous Joel of Joel on Software.
Read my original post with excellent comments by Daniel Lyons, Paul Clegg, and Neville of the North.
D Language - Exception Safe Programming
Subsystem Exception Handling in Smalltalk - nice section on history as well
http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html
A nice long thread on comp.lang.c++.moderated
*Slightly Wacky, But Neat *
http://www.halfbakery.com/idea/C20exception20handling_20macros http://www.nicemice.net/cexcept/ http://home.rochester.rr.com/bigbyofrocny/GEF/ http://www.on-time.com/ddj0011.htm
P160 Efficient coroutine generation of constrained Gray sequences, aka ``Deconstructing coroutines'' (written with Frank Ruskey). TeX file deco.tex; MetaPost illustrations; another illustration; Compressed PostScript
From: Alan Kennedy ([email protected])
Subject: Re: Help with coroutine-based state machines?
Newsgroups: comp.lang.python
View: Complete Thread (21 articles) Original Format
Date: 2003-06-03 02:21:55 PST
[Alan spends the weekend upgrading an old laptop, so he can look at this stuff in the comfort of the garden, which was bathed in glorious sunshine for the entirety of the Irish Bank Holiday weekend B-). He also prints and reads the writings of Christian Tismer[1], David Mertz[2,3] and the Timbot[4,5], in relation to generators, coroutines, continuations, pseudo/weightless/micro-threads, etc]. Steven Taschuk wrote: [ Lots of great stuff which illustrates the problems of calling from generator/coroutine code to functional code and vice versa. ] I now understand. Thanks for taking the time to explain. Thanks to your clear examples, I now picture coroutines in the following way (which I hope is correct :-). 1. The "traditional" function/method call paradigm builds a stack frame every time a function/method is called. Since this stack frame holds the references to the local variables for the function/method, it must be destroyed and collected if memory is not to be "leaked". The only time when the stack frame can be destroyed is when the references it contains are no longer required: i.e. after the instance of the function/method call has finished. 2. The only method by which the "traditional" function call can invoke another "traditional" function/method call is to call it "recursively", thus causing the construction of another stack frame. There is no mechanism for it to "substitute" the stack frame of the called function "in place of" its own stack frame. (Although I believe that Stackless can do this, after much rewriting and reinvention of truth :-). 3. Because all "traditional" function/method calls, involve the creation and eventual destruction of a stack frame, I label the space in which they operate "Linear stack space". 4. There is another space, which I call "Generator space". When a call is made into "Generator space", a stack frame is not constructed: at least not every time. Instead, a resumable and persistent stack frame is "resumed": this "resumable stack frame" was created once, sometime in the past: it is termed, in current python terminology, the generator-iterator. 5. When the code in "Generator space" (i.e. the generator-iterator) is called, it resumes immediately after where it last 'yield'ed, and continues until it 'yield's again, or returns or excepts. When it 'yield's, two things happen. A: The resumable stack frame is "frozen", so that it can later be resumed again. and B: A python object, which may be None, is transferred back to the caller. 6. For any call from "Linear Stack space" into "Generator space", there is a crossover point, P. When the called code in "Generator space" finishes executing, it can only enter back into "Linear stack space" through point P: it cannot exit through any other point. (According to the current python model). 7. If any calls are made from "Generator space" into "Linear stack space", this leads to the creation of a stack frame, which must be destroyed. If the called function/method in "Linear stack space" then calls back into "Generator space", and the "Linear space" function is not allowed to exit naturally, this is essentially unbound recursion, which will lead eventually to a blown stack. (The non-functioning Producer/Consumer example illustrates this: the code in "Generator space" could be made to work by "traditionally" calling the write and read functions, but this mutual recursion would eventually blow the "Linear space" stack). 8. When a stack frame in "Generator space" 'yield's, it is possible to adopt a convention for the 'yield'ed value: the "dispatcher" convention. Under this convention, the value 'yield'ed can be a reference to another resumable stack frame in Generator space. A special "dispatch"-function is coded to recognise these references, and to immediately call back into "Generator space". The newly resumed stack frame is still limited by the same constraints as the original call into "Generator space": it must eventually return to "Linear stack space" through point P, in this case the dispatcher function. 9. A complex network of interacting "resumable stack frames", or generators, can mutually 'yield' control to each other, assuming the cooperation of the dispatcher function. Execution continually "bounces" back and forth between the dispatch function (Point P, in "Linear stack space") and various 'yield'ed resumable states in "Generator space". 10. A network of interacting generator-iterators could potentially execute much faster than an equivalent series of "traditional" function/method calls in "Linear stack space", since there is no overhead associated with con/destruction of stack frames, no parsing of parameters, etc. Such a network of interacting generators are called "coroutines", and require the presence of a dispatcher function: the dispatcher function must be explicitly created by the programmer, as python currently exists. 10a: refinement: In fact python generators are really "semi"-coroutines, because of the requirement to keep 'yield'ing to a dispatcher function. In a language which implemented "full"-coroutines (maybe a __future__ version of python?), the dispatcher would not be required (at least explicitly), and resumable states could transfer control directly to any other resumable state for which they hold a reference. 11. The efficiency benefits of generators can be also realised by *not* adopting the dispatcher convention. Instead, a generator can simply 'yield' the series of values it has been coded to generate: the nodes in a data structure, the numbers in a computed series, etc. This can lead to more natural code, particularly for the calling function which utilises the series of values 'yield'ed: it is mostly unaware that there is a resumable state involved in the generation of the values. Simple example: def squares(n): for i in xrange(n): yield n, n*n # Instantiate a generator-iterator sqgen = squares(100) # We can do the following because the generator-iterator # conventiently implements the iterator protocol. for i, sq in sqgen(): print "The square of %d is %d" % (i, sq) 12. It is possible to avoid the "mutual" recursion problem of calling from "Generator space" into "Linear stack space", by the following actions o Moving all "traditional" function/method calls required from "Linear stack space" into "Generator space". o By expanding the "dispatcher" convention to allow generators to yield special values representing a function call or a return value. The dispatcher function must then explicitly manage a call stack. 13. It is possible to model ultra-lightweight threads by representing each thread as a simple generator which steps through the states required to implement the functionality required of the "thread". The "execution pointer" of such threads is advanced simply by calling the ".next()" method of the "thread"s generator-iterator. (Incidentally, as well as being highly efficient in terms of resource consumption, these ultra-lightweight threads offer much finer control of thread-prioritisation, thread-creation, destruction and "collection", etc.) Now that I think I've got my head around it, I think I'm going to try my hand at an example or two, which I will post to the newsgroup (might be a few weeks, I'm kinda busy right now). The two main interesting examples that come to mind are 1. A ultra-lightweight thread implementation of an asyncore/medusa/twisted style socket server. 2. A coroutine based version of something that is normally (relatively) resource-intensive: a series of SAX2 callbacks to a chain of ContentHandlers/Filters. Lastly, I think I'm beginning to "get" continuations. Although the examples given in Christian Tismers paper "Continuations and Stackless Python"[1] are difficult to follow without the definition of a continuation object (which seems to require intimate familiarity with the Stackless implementation), I think I understand the general principle. And it's a principal I'm not really interested in pursuing, because I can't see that it will make me any more productive, or my programs more efficient (than they could be using the "coroutines" and "weightless-threads" described above). And I can imagine that continuation code could be a little brain-bending to write (thus making me *less* productive %-), as this example of a while loop (which I sort of understand) shows: def looptest(n): this = continuation.current() k = this.update(n) if k: this(k-1) else: del this.link But I can see the power inherent in the continuations paradigm. Again, many thanks to all who responded to my original post. Reading material. [1] http://www.stackless.com/spcpaper.pdf [2] http://www-106.ibm.com/developerworks/linux/library/l-pygen.html [3] http://www-106.ibm.com/developerworks/library/l-pythrd.html [4] http://tinyurl.com/dbyn [5] http://www.tismer.com/research/stackless/coroutines.tim.peters.html kind regards, -- alan kennedyTerry Reedy ([email protected])
Subject: Re: PEP 288: Generator Attributes
Newsgroups: comp.lang.python
View: Complete Thread (11 articles) Original Format
Date: 2002-12-09 13:19:27 PST
"John Roth" <[email protected]> wrote in message news:[email protected]... > > "Rocco Moretti" <[email protected]> wrote > > process of set attributes -> call processing function -> repeat is an > > akward way of achieving what you really want to do - continue the > > generator with given data. I'd agree with what Guido said on > > Python-Dev: there is no advantage in this scheme over passing a > > mutable dummy object when the generator is created (you could even > > call it __self__ if you really wanted too ...). > I thoroughly agree. Me too. > with some bemusement, because, for me at least, the most > intuitive thing to do is make yield a built in function rather than > a statement. But then we would need a another keyword (possible, of course) to tell the compiler to change the generated bytecode. It would also break the parallel between 'return something' and 'yield something'. The only difference in execution is that yield does less -- by *not* deleting the execution frame. > That way, it can return a value, which is what > was passed in on the reinvocation. One of the design features of generators is that resumption is extremely quick precisely because the argument processing and function setup steps are bypassed. Just restore the execution frame and go on with the next statement [bytecode actually]. Another design feature is that they are meant to work seemlessly with for statements, with only one explicit call (to produce the iterator that 'for' needs). In this context, passing in additional values is not possible. Another problem is with multiple yields. Consider the following hypothetical generator with the proposed yield() 'function' (see next comment for why the '' marks): def genf(a): b=yield(a) b,c=yield(a+b) yield(a+b+c) gen = genf(1) Then the documentation must say: on the first call to gen.next(), pass one arg; on the next, pass two; finally, don't pass any. Not good, methings. > The trouble with the whole thing is that conceptually, "yield" has two > values: one that it returns to the caller, and one that it > gets from the caller. It's a two way pipe. (You mean, 'would be'.) Pipes and such usually have explicit read and write methods that make the order of operations clear. The yield() proposal hijacks function notation to squash a read and write together -- with the order switched at the two ends. To explain: y=f(x) usually means 'set y to a value depending on (calculated from) x' -- this is the meaning of 'function'. Operationally, a function call means to send x to process f to initialize the corresponding parameter, suspend operation while f operates, and set y to the value returned by f. The combined next(arg)/yield() idea warps and shuffles these semantics. Let p = pipe or whatever: Then (omitting suspend) y = yield(x) could mean p.write(x); y=p.read() where (contrary to the implication of function notation) the value read cannot depend on x since it is calculated at the other end before the write. The corresponding x = gen.next(y) then means x=p.read; p.write(y) which makes the order of value passing the opposite of what a function call means. The only reason to write y is for a possible future call to gen.next(). I think it much better to separate the read and write and pair the writing of y with the reading of the x that functionally depends on the value written. One could reverse the meaning of x = gen.next(y) to be p.write(y); x=p.read() but the x read would still not depend on the y written since the latter would have to be set aside and ignored until the predetermined x was sent back. IE, y=yield(x) would have to mean and be implemented as <hidden-slot> = p.read(); p.write(x); y=<hidden-slot> In either case, there is a synchronization problem in that the values sent to the generator are shifted by one call. The last gen.next() call must send a dummy that will be ignored. On the other hand, if, for instance, there is one yield function which depends on the variable reset by the yield function, then that variable must be separately initialized in the initial genf() call. So it arguably makes just as much sense to initialize via genf() with a mutable with paired write/read, send/receive, or set/get methods or the equivalent operations (as with dicts). One can then make explicit calls to pass data in either direction without fake 'function' calls. If one pairs 'yield None' with 'dummy=gen.next()' or 'for dummy in genf(mutable):', one can even do all value passing, in both directions, via the two-way channel or mutable and use 'yield' strictly for suspend/resume synchronization of the coroutines. -- This proposal come close to asking for what the original Stackless did with continuations. These allowed some mind-boggling code. Almost too fancy for what Python is meant to be 8-). Terry J. Reedy
From: Tim Peters ([email protected])
Subject: PEP 255: Simple Generators, Revised Posting
Newsgroups: comp.lang.python.announce
This is the only article in this thread View: Original Format
Date: 2001-06-24 15:31:44 PST
Major revision: more details about exceptions, return vs StopIteration, and interactions with try/except/finally; more Q&A; and a BDFL Pronouncement. The reference implementation appears solid and works as described here in all respects, so I expect this will be the last major revision (and so also last full posting) of this PEP. The output below is in ndiff format (see Tools/scripts/ndiff.py in your Python distribution). Just the new text can be seen in HTML form here: http://python.sf.net/peps/pep-0255.html "Feature discussions" should take place primarily on the Python Iterators list: mailto:python[email protected] Implementation discussions may wander in and out of Python-Dev too. PEP: 255 Title: Simple Generators - Version: $Revision: 1.3 $ ? ^ + Version: $Revision: 1.12 $ ? ^^ Author: nas@python.ca (Neil Schemenauer), [email protected] (Tim Peters), [email protected] (Magnus Lie Hetland) Discussion-To: python[email protected] Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 - Post-History: 14-Jun-2001 + Post-History: 14-Jun-2001, 23-Jun-2001 ? +++++++++++++ Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. - Specification + Specification: Yield ? ++++++++ A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase - this in. [XXX spell this out] + this in. [XXX spell this out -- but new keywords have ripple effects + across tools too, and it's not clear this can be forced into the future + framework at all -- it's not even clear that Python's parser alone can + be taught to swing both ways based on a future stmt] The yield statement may only be used inside functions. A function that - contains a yield statement is called a generator function. + contains a yield statement is called a generator function. A generator ? +++++++++++++ + function is an ordinary function object in all respects, but has the + new CO_GENERATOR flag set in the code object's co_flags member. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. + Restriction: A yield statement is not allowed in the try clause of a + try/finally construct. The difficulty is that there's no guarantee + the generator will ever be resumed, hence no guarantee that the finally + block will ever get executed; that's too much a violation of finally's + purpose to bear. + + + Specification: Return + A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). - When a return statement is encountered, nothing is returned, but a + When a return statement is encountered, control proceeds as in any + function return, executing the appropriate finally clauses (if any - StopIteration exception is raised, signalling that the iterator is ? ------------ + exist). Then a StopIteration exception is raised, signalling that the ? ++++++++++++++++ - exhausted. The same is true if control flows off the end of the + iterator is exhausted. A StopIteration exception is also raised if + control flows off the end of the generator without an explict return. + - function. Note that return means "I'm done, and have nothing ? ----------- + Note that return means "I'm done, and have nothing interesting to ? +++++++++++++++ - interesting to return", for both generator functions and non-generator ? --------------- + return", for both generator functions and non-generator functions. ? +++++++++++ - functions. + + Note that return isn't always equivalent to raising StopIteration: the + difference lies in how enclosing try/except constructs are treated. + For example, + + >>> def f1(): + ... try: + ... return + ... except: + ... yield 1 + >>> print list(f1()) + [] + + because, as in any function, return simply exits, but + + >>> def f2(): + ... try: + ... raise StopIteration + ... except: + ... yield 42 + >>> print list(f2()) + [42] + + because StopIteration is captured by a bare "except", as is any + exception. + + + Specification: Generators and Exception Propagation + + If an unhandled exception-- including, but not limited to, + StopIteration --is raised by, or passes through, a generator function, + then the exception is passed on to the caller in the usual way, and + subsequent attempts to resume the generator function raise + StopIteration. In other words, an unhandled exception terminates a + generator's useful life. + + Example (not idiomatic but to illustrate the point): + + >>> def f(): + ... return 1/0 + >>> def g(): + ... yield f() # the zero division exception propagates + ... yield 42 # and we'll never get here + >>> k = g() + >>> k.next() + Traceback (most recent call last): + File "<stdin>", line 1, in ? + File "<stdin>", line 2, in g + File "<stdin>", line 2, in f + ZeroDivisionError: integer division or modulo by zero + >>> k.next() # and the generator cannot be resumed + Traceback (most recent call last): + File "<stdin>", line 1, in ? + StopIteration + >>> + + + Specification: Try/Except/Finally + + As noted earlier, yield is not allowed in the try clause of a try/ + finally construct. A consequence is that generators should allocate + critical resources with great care. There is no restriction on yield + otherwise appearing in finally clauses, except clauses, or in the try + clause of a try/except construct: + + >>> def f(): + ... try: + ... yield 1 + ... try: + ... yield 2 + ... 1/0 + ... yield 3 # never get here + ... except ZeroDivisionError: + ... yield 4 + ... yield 5 + ... raise + ... except: + ... yield 6 + ... yield 7 # the "raise" above stops this + ... except: + ... yield 8 + ... yield 9 + ... try: + ... x = 12 + ... finally: + ... yield 10 + ... yield 11 + >>> print list(f()) + [1, 2, 4, 5, 8, 9, 10, 11] + >>> Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print + Both output blocks display: + + A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + Q & A + Q. Why not a new keyword instead of reusing "def"? + + A. See BDFL Pronouncements section below. + - Q. Why a new keyword? Why not a builtin function instead? + Q. Why a new keyword for "yield"? Why not a builtin function instead? ? ++++++++++++ A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new - keyword makes that easy. + keyword makes that easy. The CPython referrence implementation also + exploits it heavily, to detect which functions *are* generator- + functions (although a new keyword in place of "def" would solve that + for CPython -- but people asking the "why a new keyword?" question + don't want any new keyword). + + Q: Then why not some other special syntax without a new keyword? For + example, one of these instead of "yield 3": + + return 3 and continue + return and continue 3 + return generating 3 + continue return 3 + return >> , 3 + from generator return 3 + return >> 3 + return << 3 + >> 3 + << 3 + + A: Did I miss one <wink>? Out of hundreds of messages, I counted two + suggesting such an alternative, and extracted the above from them. + It would be nice not to need a new keyword, but nicer to make yield + very clear -- I don't want to have to *deduce* that a yield is + occurring from making sense of a previously senseless sequence of + keywords or operators. Still, if this attracts enough interest, + proponents should settle on a single consensus suggestion, and Guido + will Pronounce on it. + + Q. Why allow "return" at all? Why not force termination to be spelled + "raise StopIteration"? + + A. The mechanics of StopIteration are low-level details, much like the + mechanics of IndexError in Python 2.1: the implementation needs to + do *something* well-defined under the covers, and Python exposes + these mechanisms for advanced users. That's not an argument for + forcing everyone to work at that level, though. "return" means "I'm + done" in any kind of function, and that's easy to explain and to use. + Note that "return" isn't always equivalent to "raise StopIteration" + in try/except construct, either (see the "Specification: Return" + section). + + Q. Then why not allow an expression on "return" too? + + A. Perhaps we will someday. In Icon, "return expr" means both "I'm + done", and "but I have one final useful value to return too, and + this is it". At the start, and in the absence of compelling uses + for "return expr", it's simply cleaner to use "yield" exclusively + for delivering values. + + + BDFL Pronouncements + + Issue: Introduce another new keyword (say, "gen" or "generator") in + place of "def", or otherwise alter the syntax, to distinguish + generator-functions from non-generator functions. + + Con: In practice (how you think about them), generators *are* + functions, but with the twist that they're resumable. The mechanics of + how they're set up is a comparatively minor technical issue, and + introducing a new keyword would unhelpfully overemphasize the + mechanics of how generators get started (a vital but tiny part of a + generator's life). + + Pro: In reality (how you think about them), generator-functions are + actually factory functions that produce generator-iterators as if by + magic. In this respect they're radically different from non-generator + functions, acting more like a constructor than a function, so reusing + "def" is at best confusing. A "yield" statement buried in the body is + not enough warning that the semantics are so different. + + BDFL: "def" it stays. No argument on either side is totally + convincing, so I have consulted my language designer's intuition. It + tells me that the syntax proposed in the PEP is exactly right - not too + hot, not too cold. But, like the Oracle at Delphi in Greek mythology, + it doesn't tell me why, so I don't have a rebuttal for the arguments + against the PEP syntax. The best I can come up with (apart from + agreeing with the rebuttals ... already made) is "FUD". If this had + been part of the language from day one, I very much doubt it would have + made Andrew Kuchling's "Python Warts" page. Reference Implementation - A preliminary patch against the CVS Python source is available[7]. + The current implementation, in a preliminary state (no docs and no + focused tests), is part of Python's CVS development tree[9]. + Using this requires that you build Python from source. + + This was derived from an earlier patch by Neil Schemenauer[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html + [9] To experiment with this implementation, check out Python from CVS + according to the instructions at + http://sf.net/cvs/?group_id=5470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End:
From: Tim Peters ([email protected])
Subject: RE: Stackless & String-processing
Newsgroups: comp.lang.python
View: Complete Thread (32 articles) | Original Format Date: 1999/07/15
[Neel Krishnaswami] > ... > I've been looking at Icon, and it occurs to me that if coroutines and > generators were available at the Python level, it might yield a way of > doing string processing that is more "Pythonic" than regexps. Yes, and you can find much more about this in the very early days of the String SIG archives. > Regexps are nice because when you have a pattern that they can > directly represent, then you can simply specify a pattern and then you > don't have to worry about the tedious bookkeeping of looping over the > string and keeping track of the state variable. > > However, when you need to match a pattern that a regexp can't match, > then suddenly you need to break the string into tokens with regexps > and then loop over the pieces and keep track of a bunch of state > variables that don't necessarily correspond to the pieces you are > actually interested in. > > This is unpleasant, because a) clumsy bookkeeping is bad, and b) > there's two different sorts of syntax needed to do basically > similar tasks. The latter is one of the arguments Griswold gave in the paper that introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct pattern and procedural sublanguages. I don't think he'd make the same argument today! Icon is an idiosyncratic delight, but most string pattern matching was in fact easier (to write, modify, or read) in SNOBOL4. Once you start using Icon for complex pattern matching, you'll soon discover that it's very hard to go back to old code and disentangle "the pattern" from "the processing" -- *because* everything looks the same, and it's all intermixed. The distinct sublanguages in SNOBOL4 enforced a clean separation; and that was more often an aid than a burden. Have noted before that writing string matching code in Icon feels a lot like writing in an assembly language for SNOBOL4. Of course the latter didn't have Icon's power in other areas (e.g. it's at least possible to program structural pattern-matchers in Icon, using generators to match e.g. tree patterns; SNOBOL4's patterns started & ended with strings). > If we could compose generators just like functions, then the bookkeeping > can be abstracted away and the same approach will work for arbitrarily > complicated parsing tasks. The real advantage of regexps is speed, & that's probably while they'll always be much more popular. SNOBOL4 and Icon didn't define "bal" builtins because you couldn't code that pattern yourself <wink>. bal is beyond a regexp's abilities, but it's still a simple kind of pattern, and just runs too darned slow if implemented via the recursive definition as run with the general backtracking machinery. > So I think it would be nice if these lovely toys were available at > the Python level. Is this a possibility? I've been bugging Guido about generators since '91, but for the billion and one uses other than string matching. Anything is possible -- except that I'll ever stop bugging him <wink>. > (I'd love to be able to define coroutines in Python like so: > > def evens(z): > for elt in z: > if z % 2 == 0: > suspend(z) How do you intend to resume it? > It would probably require some rethinking of Python's iteration > protocol though.) That's not a hangup; Guido already knows how to do it cleanly. Like any Big New Feature there are real questions about costs vs benefits, and so far this stuff has been widely viewed as "too esoteric". I think it's natural as breathing, to the point that a *non*-resumable function strikes me as never inhaling <wink>. For now, there's a working implementation of a generator protocol (somewhat more flexible than Icon's) in the source distribution, under Demo/threads/Generator.py. I also posted a general (thread-based) coroutine implementation a bit over 5 years ago. Building these on Christian's stackless Python instead should run much faster. BTW, generators are much easier to implement than coroutines -- the former don't require threads or stacklessness or continuations under the covers, just some relatively straightforward changes to Python's current implementation (Steven Majewski noted this 5 years ago). This is why generators work in all ports of Icon, but coroutines are an optional feature that's supported only if a platform guru has taken the time to write the platform-dependent context-switching assembly code Icon coexps require. degeneratoredly y'rs - tim
From: Daniel Larsson ([email protected])
Subject: Re: Python and Java
Newsgroups: comp.lang.python, comp.lang.modula3
View: Complete Thread (22 articles) | Original Format Date: 1996/07/03
Aaron Watters wrote: > > James C. Phillips wrote: > > > > Daniel Larsson wrote: > > > Coroutines are still rare in newer languages. I'm sure there > > > are more, but I know only of Modula-2 and BETA. > > > > For us non-computer-science-types, what is a coroutine? > > I've never heard this term before. > > > > -Jim > > I've never really used them, but as I recall they are like > subroutines, except that more than one coroutine can be > active at once and one coroutine can explicitly give control > to another or something... I personally don't understand why > you need programming language features to emulate this kind of > behaviour, but I'm probably ill informed and wrong. > > Maybe some expert can tell me: is there anything you can do > with coroutines that can't be emulated directly with instances > of classes in Python or M3? Please educate... -- Aaron Watters I would hardly describe myself as an expert in coroutines, but anyway... Suppose you have a binary tree class: class Node: def __init__(self, elem, left=None, right=None): self.elem = elem self.left, self.right = left, right class BinTree: def __init__(self): self.root = None def scan(self, node): if node: self.scan(node.left) suspend node.elem # Suspend coroutine and return value self.scan(node.right) coroutine traverse(self): self.scan(self.root) return None # Terminate coroutine A coroutine has its own stack. When a coroutine is called, the caller's stack is saved, and the execution is moved to the callee's stack. When a coroutine suspends, stacks are reversed again. In the above case, each call to traverse will do one inorder step in the tree and return the value at that node. Here's how to use it to print the contents of a binary trees: tree = BinTree() ... # init tree while 1: elem = tree.traverse() if elem == None: break print elem This could certainly be implemented in Python without coroutines, and I don't even know if this is even a good example of the benefits of coroutines. Anyway, some problems where you would like to use threads, might be easier to solve with coroutines, since you don't have any synchronization problems (you have explicit control of when switching between threads). Hope I didn't confuse too many people out there... -- Daniel Larsson, ABB Industrial Systems ABFrom: Christian Tismer ([email protected])
Subject: Re: Iterators & generators (RE: Real Problems with Python)
Newsgroups: comp.lang.python
View: Complete Thread (21 articles) Original Format
Date: 2000/02/15
Howdy, Andrew Cooke wrote: > > In article <000001bf768e$48e40580$45a0143f@tim>, > "Tim Peters" <[email protected]> wrote: > > def traverse_post(self): > > for child in self.left, self.right: > > if child is not None: > > suspend child.traverse_post() > > suspend self.data > > That finally hammered home to me just what continuations are all about. > Does anyone have something similarly elegant that shows a coroutine? > I've just had a look at the stackless python documentation and > coroutines seem to be described as something coming out of a single > detailed example. Is there a simpler definition? (I did look back > through some posts on Deja, but there was nothing recent that seemed to > explain what a coroutine actually is - sorry if I've missed something > obvious). What did you read, actually? Here some pointers: Homepage: http://www.tismer.com/research/stackless/ IPC8 paper: http://www.tismer.com/research/stackless/spcpaper.htm Slides: http://www.tismer.com/research/stackless/spc-sheets.ppt The latter gives you a bit of understanding what a continuation is. Tim Peters about coroutines can be found here: http://www.tismer.com/research/stackless/coroutines.tim.peters.html More on coroutines by Sam Rushing: http://www.nightmare.com/~rushing/copython/index.html On Scheme, continuations, generators and coroutines: http://www.cs.rice.edu/~dorai/t-y-scheme/ Revised Scheme 5 report: http://www.swiss.ai.mit.edu/~jaffer/r5rs_toc.html The following books are also highly recommended: - Andrew W. Appel, Compiling with Continuations, Cambridge University Press, 1992 - Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes, Essentials of Programming Languages, MIT Press, 1993 - Christopher T. Haynes, Daniel P. Friedman, and Mitchell Wand, Continuations and Coroutines, Computer Languages, 11(3/4): 143-153, 1986. - Strachey and Wadsworth, Continuations: A mathematical semantics which can deal with full jumps. Technical monograph PRG-11, Programming Research Group, Oxford, 1974. I don't think this is easy stuff at all, and you can't expect to find a simple answer by skimming a couple of web pages. It took me a lot of time to understand a fair part of this, and this is also a showstopper to get this stuff to be used. > Also, comp.lang.lisp is currently dissing continuations. Would that be > because the alternative is to pass the code that will process the node > into the iterator as a (first class) function? Obviously, in this case, > yes, but is that true in general (are there examples where continuations > or coroutines make something possible that really is tricky to do in > other ways)? An iterator is quite an easy thing, and it can be implemented without continuations rather easily. Continuations cover problems of another order of magnitude. It is due to too simple examples that everybody thinks that coroutines and generators are the only target. Continuations can express any kind of control flow, and they can model iterators, coroutines and generators easily. The opposite is not true! I know this isn't enough to convince you, but at the time I have no chance. I need to build some very good demo applications which use continuations without exposing them to the user, and this is admittedly difficult. ciao - chris -- Christian Tismer :^) <mailto:[email protected]> Applied Biometrics GmbH : Have a break! Take a ride on Python's Düppelstr. 31 : *Starship* http://starship.python.net 12163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
We add a a new special form, (coroutine (lambda (v) <body>)) that evaluates to a coroutine object.Name a coroutine the same way you name an ordinary function; e.g.
(define Coroutine-1 (coroutine (lambda (v) ...)))The body of a coroutine is a non-terminating expression using standard Scheme constructs and one additional form This is (resume <co> <val>), where <co> refers to any other coroutine defined in the same way.
The parameter v is passed the first time coroutine is called or resumed.
Suppose coroutine A executes (resume x B).
- If this is the first time B has been called, x is passed to v.
- Otherwise, B was halted by having called (resume y ..). x is returned as the value of that call to resume.
This PEP discusses changes required to core Python in order to efficiently support generators, microthreads and coroutines. It is related to PEP 220, which describes how Python should be extended to support these facilities. The focus of this PEP is strictly on the changes required to allow these extensions to work.
While these PEPs are based on Christian Tismer's Stackless[1] implementation, they do not regard Stackless as a reference implementation. Stackless (with an extension module) implements continuations, and from continuations one can implement coroutines, microthreads (as has been done by Will Ware[2]) and generators. But in more that a year, no one has found any other productive use of continuations, so there seems to be no demand for their support.
However, Stackless support for continuations is a relatively minor piece of the implementation, so one might regard it as "a" reference implementation (rather than "the" reference implementation).
Background
Generators and coroutines have been implemented in a number of languages in a number of ways. Indeed, Tim Peters has done pure Python implementations of generators[3] and coroutines[4] using threads (and a thread-based coroutine implementation exists for Java). However, the horrendous overhead of a thread-based implementation severely limits the usefulness of this approach.
Microthreads (a.k.a "green" or "user" threads) and coroutines involve transfers of control that are difficult to accommodate in a language implementation based on a single stack. (Generators can be done on a single stack, but they can also be regarded as a very simple case of coroutines.)
Real threads allocate a full-sized stack for each thread of control, and this is the major source of overhead. However, coroutines and microthreads can be implemented in Python in a way that involves almost no overhead. This PEP, therefor, offers a way for making Python able to realistically manage thousands of separate "threads" of activity (vs. todays limit of perhaps dozens of separate threads of activity).
Another justification for this PEP (explored in PEP 220) is that coroutines and generators often allow a more direct expression of an algorithm than is possible in today's Python.
Discussion
The first thing to note is that Python, while it mingles interpreter data (normal C stack usage) with Python data (the state of the interpreted program) on the stack, the two are logically separate. They just happen to use the same stack.
A real thread gets something approaching a process-sized stack because the implementation has no way of knowing how much stack space the thread will require. The stack space required for an individual frame is likely to be reasonable, but stack switching is an arcane and non-portable process, not supported by C.
Once Python stops putting Python data on the C stack, however, stack switching becomes easy.
The fundamental approach of the PEP is based on these two ideas. First, separate C's stack usage from Python's stack usage. Secondly, associate with each frame enough stack space to handle that frame's execution.
In the normal usage, Stackless Python has a normal stack structure, except that it is broken into chunks. But in the presence of a coroutine / microthread extension, this same mechanism supports a stack with a tree structure. That is, an extension can support transfers of control between frames outside the normal "call / return" path.
Problems
The major difficulty with this approach is C calling Python. The problem is that the C stack now holds a nested execution of the byte-code interpreter. In that situation, a coroutine / microthread extension cannot be permitted to transfer control to a frame in a different invocation of the byte-code interpreter. If a frame were to complete and exit back to C from the wrong interpreter, the C stack could be trashed.
The ideal solution is to create a mechanism where nested executions of the byte code interpreter are never needed. The easy solution is for the coroutine / microthread extension(s) to recognize the situation and refuse to allow transfers outside the current invocation.
We can categorize code that involves C calling Python into two camps: Python's implementation, and C extensions. And hopefully we can offer a compromise: Python's internal usage (and C extension writers who want to go to the effort) will no longer use a nested invocation of the interpreter. Extensions which do not go to the effort will still be safe, but will not play well with coroutines / microthreads.
Generally, when a recursive call is transformed into a loop, a bit of extra bookkeeping is required. The loop will need to keep it's own "stack" of arguments and results since the real stack can now only hold the most recent. The code will be more verbose, because it's not quite as obvious when we're done. While Stackless is not implemented this way, it has to deal with the same issues.
In normal Python, PyEval_EvalCode is used to build a frame and execute it. Stackless Python introduces the concept of a FrameDispatcher. Like PyEval_EvalCode, it executes one frame. But the interpreter may signal the FrameDispatcher that a new frame has been swapped in, and the new frame should be executed. When a frame completes, the FrameDispatcher follows the back pointer to resume the "calling" frame.
So Stackless transforms recursions into a loop, but it is not the FrameDispatcher that manages the frames. This is done by the interpreter (or an extension that knows what it's doing).
The general idea is that where C code needs to execute Python code, it creates a frame for the Python code, setting its back pointer to the current frame. Then it swaps in the frame, signals the FrameDispatcher and gets out of the way. The C stack is now clean - the Python code can transfer control to any other frame (if an extension gives it the means to do so).
In the vanilla case, this magic can be hidden from the programmer (even, in most cases, from the Python-internals programmer). Many situations present another level of difficulty, however.
The map builtin function involves two obstacles to this approach. It cannot simply construct a frame and get out of the way, not just because there's a loop involved, but each pass through the loop requires some "post" processing. In order to play well with others, Stackless constructs a frame object for map itself.
Most recursions of the interpreter are not this complex, but fairly frequently, some "post" operations are required. Stackless does not fix these situations because of amount of code changes required. Instead, Stackless prohibits transfers out of a nested interpreter. While not ideal (and sometimes puzzling), this limitation is hardly crippling.
Advantages
For normal Python, the advantage to this approach is that C stack usage becomes much smaller and more predictable. Unbounded recursion in Python code becomes a memory error, instead of a stack error (and thus, in non-Cupertino operating systems, something that can be recovered from). The price, of course, is the added complexity that comes from transforming recursions of the byte-code interpreter loop into a higher order loop (and the attendant bookkeeping involved).The big advantage comes from realizing that the Python stack is really a tree, and the frame dispatcher can transfer control freely between leaf nodes of the tree, thus allowing things like microthreads and coroutines.
References
[2] http://world.std.com/~wware/uthread.html
[3] Demo/threads/Generator.py in the source distribution
Python semi-coroutines From: Steven D. Majewski ([email protected])
Subject: Python semi-coroutines
Newsgroups: comp.lang.python
View: (This is the only article in this thread) | Original Format Date: 1998/07/24
Since this has come up a couple of times, I finally dug thru my files and found a few traces of this experiment and put the files on <http://galen.med.virginia.edu/~sdm7g/Python/CO> BTW: I considered the experiment a partial failure, because it would not work to implement full-coroutines ( because ceval.c is recursive and some of the state is implicit in the C calling stack. ) but I believe it did successfully implement semi-coroutines ( i.e. Icon-like generators ) -- which seems to be all that some folks want. The mods were done to python-1.0.2 I haven't looked at how hard this would be to update to 1.5.1 --- README --- There were experimental mods to python-1.0.2 to make frameobjects into resumable continuations. ( put here for the curious and just in case anyone wants to try a similar experiment with a more current release. ) the files modified were frameobject.h and ceval.c A new opcode was added for SUSPEND, but no new suspend statement was added to the parser. I used a python disassembler/assembler to change specific RETURN opcodes into SUSPENDs. ( various .py files here were hacked in that manner and used for testing. ) co.txt & cont.semantics were the beginning of some notes on various thread control issues. py-co is part of the mailing list correspondence between me, Tim and Guido that started me on that experiment. More notes on this later when I find time. - Steve Majewski <[email protected]>
Microthreads are useful when you want to program many behaviors happening simultaneously. Simulations and games often want to model the simultaneous and independent behavior of many people, many businesses, many monsters, many physical objects, many spaceships, and so forth. With microthreads, you can code these behaviors as Python functions. You will still need to think a teeny bit about the fact that context switching occurs between threads, hence this documentation. The microthread package uses Stackless Python.
Microthreads switch faster and use much less memory than OS threads. You can run thousands of microthreads simultaneously. Additionally, the microthread library includes a rich set of objects for inter-thread communication, synchronization, and execution control.
Some sort of resolution to Stackless Python seems likely for 2.1. Guido is inclined to take a solution for 90% of the problems: "I still think that the current Stackless implementation is too complex, and that continuations aren't worth the insanity they seem to require (or cause :-), but that microthreads and coroutines *are* worth having and that something not completely unlike Stackless will be one day the way to get there..." He then went on to post a strawman API for microthreads:
http://www.python.org/pipermail/python-dev/2000-November/017216.htmlChristian Tismer agreed with him that continuations aren't really necessary. "I'm happy to toss continuations for core Python, if we can find the right building blocks for coro/gen/uthreads. I think Guido comes quite near this, already."
http://www.python.org/pipermail/python-dev/2000-November/017252.htmlAnd so did Tim: "I don't know of any comprehensible application of continuations that can't be done without them."
http://www.python.org/pipermail/python-dev/2000-November/017258.htmlChristian Tismer enumerated the changes to ceval.c made by Stackless:
http://www.python.org/pipermail/python-dev/2000-November/017238.htmlFinally, Gordon McMillan put up a Stackless intro and tutorial:
http://www.mcmillan-inc.com/stackless.html
Traditional pattern matching languages, like SNOBOL4, represent patterns as an abstract data type. A special and specific pattern evaluation routine traverses both the pattern structure and the subject text, applying evaluation rules as indicated by the pattern, and advancing or regressing depending on their outcome.
A co-expression can be thought of as an independent, encapsulated thread-like context, where the results of the expression can be picked off one at a time. Let us consider an example: suppose you are writing a program that generates code, and you need something that will generate names for you. This expression will generate names:
"name" || seq()(
seq
produces an infinite sequence of integers, by default starting at 1.) Of course, an expression exists at one point in the code; we need to separate the evaluation of the expression from its textual position in the program. We do this by creating a co-expression:c := create ("name" || seq())Now wherever a label name is desired, it can be obtained by activating the co-expression
c
:tempvar_name := @cAfter a co-expression has produced all its results, further evaluation with
@
will fail. The^
operator produces a new co-expression with the same expression as its argument, but `rewound' to the beginning.c := ^c
Coroutines in Python by Tim Peters [email protected]
A while ago there was a (crossposted) thread about about pipes: most of the discussion was about Perl, but it caused me to repost my redirect.py modules ( to show how *easy* the problem was in Python! ;-), and also to think more about the utility of having a more general 2-way interface.
The tostring/tolines functions in redirect are able to package the printed output from a function and return it as a python object. This gives some of the capability of unix pipes without having to actually pipe thru the OS - the non-trivial semantic difference being that redirect does not produce any concurrency - the inner function must run to completion before the redirecting wrapper can return the collected output to the calling function.
Actual concurrency of independent threads is not required, though: coroutine would be sufficient since the read/write is a blocking operation that could be an implicit transfer of control.
I know Guido is not as enamored of coroutines as old Icon-ers like Tim and I. He has (more than once, I think) stated the view that anything you could do with coroutines, you can do with classes. This view is certainly true from a theoretical POV - complemantarity of the object=method+data and closure=function+environment and all that, and coroutines are just a subset of continuations, etc., etc. But the difference is that saving of state into an objects instance variables has to be explicitly programmed, where coroutines and continuation are implicit. ( And being able to turn a pair of read/writes in a pair of functions into a coroutine return/continue, mediated transparently by a file-like python object would seem to be a potential big win for code-reuse. )
It is possible to program a pair of classes that cooperate with each other, but there is no way to make a class that can link two arbitrary functions by their read/write methods. The flow of control is not symetrical: one of the functions must RETURN before the other can get continue.
The fact that Python frames are (heap allocated) objects and not just pushed on the stack, means that they can persist whether that
function is active or not. In fact, an exception passes you a link to a whole bunch of frameobjects in it's traceback object.A frame object appears contain all of the necessary state to serve as a continuation except that the interpreter's stactpointer and blockpointer are not preserved, and when a stack frame is unwound by an exception, the stack is emptied by poping and DECREF-ing the contents.
I looks to me that it would no be difficult to give Python continuations: if a stackpointer and blockpointer are added to the frameobject, all that is missing is an operation to package a frameobject-continuation, and a 'resume' operation ( a change to ceval.c ) that takes a frame/continuation object instead of creating a new frameobject.
I am far from a complete understanding of ceval and what happens to frame and blocks in Python, so I may be missing some obvious
problem ( And all the details of *exactly* how to do this aren't completely clear to me yet, either ).
Comments ?
What I initially wanted was some sort of cooperative coroutine control between two functions mediated by a file-like object that used it's read/write methods to alternate control on the two functions. Except for inet servers, most of the cases where I might want threads seem to actually fit a blocking coroutine model better than parallel threads, and in the cases where it doesn't, I'm quite happy to use unix processes for threading.
I would like to have threads on dos/mac, but I don't know if any of the portable (on unix) threads packages work on those platforms, or how much work it would be to get really portable threads written in C to work in Python. But since the easiest method of adding coroutine control appears to be the more general case of adding continuations to Python, I would think that this also gives a possible method of adding preemptive threads to the Python interpreter in a portable manner by coding it into the virtual machine rather than the real machine.Comments ?
[ Encouraging or _discouraging_ comments invited! If I'm blind to some fatal flaw or don't understand what's going on in ceval, I'ld be happy to hear about it before I actually waste time in trying to implement it! ]
-- Steve Majewski (804-982-0831) <[email protected]> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --
[ "Cognitive Science is where Philosophy goes when it dies ... if it hasn't been good!" - Jerry Fodor ]
Co-routines are of particular use in file processing and simulation applications.
In file processing, co-routines enable the programmer to separate records or characters in time, rather than in space (i.e. physical file position), and view his program in terms of data flow from module to module, rather than flow of control.
In simulation, co-routines allow a more natural modelling of of a set of co-operating processes. It should be stressed that although co-routines share a number of properties with asynchronous processes (preservation of local variables etc.), which make modelling easy, co-routines are not separate processes, and the user must still manage flow of control between them.
For a formal definition of a co-routine and a full explanation of the fundamental concepts, the reader is referred to the technical literature (Knuth, CACM etc.).
Preserving State:
The current state of execution of a function can be largely described by its program instruction counter (IC) and its stack frame and pointer. The IC gives the location where execution is taking place, and the stack frame provides the values of all arguments and other storage local to the function. If the IC, stack frame, and stack pointer are preserved when a function is suspended, the function may be resumed at the exact point of suspension by restoring these values.
In regular functions, the current state of the function is lost when the function returns; the only way the state may be preserved when transferring to another function is to call the other function as a subroutine of the first. Then, of course, the state of the subroutine must be lost when it returns to resume execution in its caller.
In a like manner, one can conceive of a group of functions, like those that constitute a program, which can only preserve the group state by calling other groups of functions (programs) as sub-programs. The state of a sub-program, as with the state of a called function, vanishes when the sub-program returns to its caller. The states of both the caller and callee cannot be easily preserved.
Co-routines, on the other hand, are groups of functions which have been given a mechanism for preserving their states before transferring to other co-routines. Transfers among co-routines do not involve the regular caller/callee hierarchy, and the states of all co-routines may be thought of as existing concurrently.
See Also:
expl b coro - the B co-routine package
8.5 Coroutines
We've spent the last several pages on almost microscopic details of process behavior. Rather than continue our descent into the murky depths, we'll revert to a higher-level view of processes.
Earlier in this chapter, we covered ways of controlling multiple simultaneous jobs within an interactive login session; now we'll consider multiple process control within shell programs. When two (or more) processes are explicitly programmed to run simultaneously and possibly communicate with each other, we call them coroutines.
This is actually nothing new: a pipeline is an example of coroutines. The shell's pipeline construct encapsulates a fairly sophisticated set of rules about how processes interact with each other. If we take a closer look at these rules, we'll be better able to understand other ways of handling coroutines-most of which turn out to be simpler than pipelines.
When you invoke a simple pipeline, say ls | more, the shell invokes a series of UNIX primitive operations, a.k.a. system calls. In effect, the shell tells UNIX to do the following things; in case you're interested, we include in parentheses the actual system call used at each step:
Create two subprocesses, which we'll call P1 and P2 (the fork system call).
Set up I/O between the processes so that P1's standard output feeds into P2's standard input (pipe).
Start /bin/ls in process P1 (exec).
Start /bin/more in process P2 (exec).
Wait for both processes to finish (wait).
You can probably imagine how the above steps change when the pipeline involves more than two processes,
Now let's make things simpler. We'll see how to get multiple processes to run at the same time if the processes do not need to communicate. For example, we want the processes dave and bob to run as coroutines, without communication, in a shell script. Our initial solution would be this:
dave & bobAssume for the moment that bob is the last command in the script. The above will work-but only if dave finishes first. If dave is still running when the script finishes, then it becomes an orphan, i.e., it enters one of the "funny states" we mentioned earlier in this chapter. Never mind the details of orphanhood; just believe that you don't want this to happen, and if it does, you may need to use the "runaway process" method of stopping it, discussed earlier in this chapter.
8.5.1 wait
There is a way of making sure the script doesn't finish before dave does: the built-in command wait. Without arguments, wait simply waits until all background jobs have finished. So to make sure the above code behaves properly, we would add wait, like this:
dave & bob waitHere, if bob finishes first, the parent shell will wait for dave to finish before finishing itself.
If your script has more than one background job and you need to wait for specific ones to finish, you can give wait the same type of job argument (with a percent sign) as you would use with kill, fg, or bg.
However, you will probably find that wait without arguments suffices for all coroutines you will ever program. Situations in which you would need to wait for specific background jobs are quite complex and beyond the scope of this book.
8.5.2 Advantages and Disadvantages of Coroutines
In fact, you may be wondering why you would ever need to program coroutines that don't communicate with each other. For example, why not just run bob after dave in the usual way? What advantage is there in running the two jobs simultaneously?
If you are running on a computer with one processor (CPU), then there is a performance advantage-but only if you have the bgnice option turned off (see Chapter 3, Customizing Your Environment), and even then only in certain situations.
Roughly speaking, you can characterize a process in terms of how it uses system resources in three ways: whether it is CPU intensive (e.g., does lots of number crunching), I/O intensive (does a lot of reading or writing to the disk), or interactive (requires user intervention).
We already know from Chapter 1 that it makes no sense to run an interactive job in the background. But apart from that, the more two or more processes differ with respect to these three criteria, the more advantage there is in running them simultaneously. For example, a number-crunching statistical calculation would do well when running at the same time as a long, I/O-intensive database query.
On the other hand, if two processes use resources in similar ways, it may even be less efficient to run them at the same time as it would be to run them sequentially. Why? Basically, because under such circumstances, the operating system often has to "time-slice" the resource(s) in contention.
For example, if both processes are "disk hogs," the operating system may enter a mode where it constantly switches control of the disk back and forth between the two competing processes; the system ends up spending at least as much time doing the switching as it does on the processes themselves. This phenomenon is known as thrashing; at its most severe, it can cause a system to come to a virtual standstill. Thrashing is a common problem; system administrators and operating system designers both spend lots of time trying to minimize it.
8.5.3 Parallelization
But if you have a computer with multiple CPUs (such as a Pyramid, Sequent, or Sun MP), you should be less concerned about thrashing. Furthermore, coroutines can provide dramatic increases in speed on this type of machine, which is often called a parallel computer; analogously, breaking up a process into coroutines is sometimes called parallelizing the job.
Normally, when you start a background job on a multiple-CPU machine, the computer will assign it to the next available processor. This means that the two jobs are actually-not just metaphorically-running at the same time.
In this case, the running time of the coroutines is essentially equal to that of the longest-running job plus a bit of overhead, instead of the sum of the run times of all processes (although if the CPUs all share a common disk drive, the possibility of I/O-related thrashing still exists). In the best case-all jobs having the same run time and no I/O contention-you get a speedup factor equal to the number of jobs.
Parallelizing a program is often not easy; there are several subtle issues involved and there's plenty of room for error. Nevertheless, it's worthwhile to know how to parallelize a shell script whether or not you have a parallel machine, especially since such machines are becoming more and more common.
We'll show how to do this-and give you an idea of some of the problems involved-by means of a simple task whose solution is amenable to parallelization.
Task 8.3
Write a utility that allows you to make multiple copies of a file at the same time.
We'll call this script mcp. The command mcp filename dest1 dest2 ... should copy filename to all of the destinations given. The code for this should be fairly obvious:
file=$1 shift for dest in "$@"; do cp $file $dest doneNow let's say we have a parallel computer and we want this command to run as fast as possible. To parallelize this script, it's a simple matter of firing off the cp commands in the background and adding a wait at the end:
file=$1 shift for dest in "$@"; do cp $file $dest & done waitSimple, right? Well, there is one little problem: what happens if the user specifies duplicate destinations? If you're lucky, the file just gets copied to the same place twice. Otherwise, the identical cp commands will interfere with each other, possibly resulting in a file that contains two interspersed copies of the original file. In contrast, if you give the regular cp command two arguments that point to the same file, it will print an error message and do nothing.
To fix this problem, we would have to write code that checks the argument list for duplicates. Although this isn't too hard to do (see the exercises at the end of this chapter), the time it takes that code to run might offset any gain in speed from parallelization; furthermore, the code that does the checking detracts from the simple elegance of the script.
As you can see, even a seemingly trivial parallelization task has problems resulting from multiple processes having concurrent access to a given system resource (a file in this case). Such problems, known as concurrency control issues, become much more difficult as the complexity of the application increases. Complex concurrent programs often have much more code for handling the special cases than for the actual job the program is supposed to do!
Therefore it shouldn't surprise you that much research has been and is being done on parallelization, the ultimate goal being to devise a tool that parallelizes code automatically. (Such tools do exist; they usually work in the confines of some narrow subset of the problem.) Even if you don't have access to a multiple-CPU machine, parallelizing a shell script is an interesting exercise that should acquaint you with some of the issues that surround coroutines.
8.5.4 Coroutines with Two-way Pipes
Now that we have seen how to program coroutines that don't communicate with each other, we'll build on that foundation and discuss how to get them to communicate-in a more sophisticated way than with a pipeline. The Korn shell has a set of features that allow programmers to set up two-way communication between coroutines. These features aren't included in most Bourne shells.
If you start a background process by appending |& to a command instead of &, the Korn shell will set up a special two-way pipeline between the parent shell and the new background process. read -p in the parent shell reads a line of the background process' standard output; similarly, print -p in the parent shell feeds into the standard input of the background process. Figure 8.2 shows how this works.
Figure 8.2: Coroutine I/O
This scheme has some intriguing possibilities. Notice the following things: first, the parent shell communicates with the background process independently of its own standard input and output. Second, the background process need not have any idea that a shell script is communicating with it in this manner. This means that the background process can be any pre-existing program that uses its standard input and output in normal ways.
Here's a task that shows a simple example:
Task 8.4
You would like to have an online calculator, but the existing UNIX utility dc(1) uses Reverse Polish Notation (RPN), a la Hewlett-Packard calculators. You'd rather have one that works like the $3.95 model you got with that magazine subscription. Write a calculator program that accepts standard algebraic notation.
The objective here is to write the program without re-implementing the calculation engine that dc already has-in other words, to write a program that translates algebraic notation to RPN and passes the translated line to dc to do the actual calculation. [12]
[12] The utility bc(1) actually provides similar functionality.
We'll assume that the function alg2rpn, which does the translation, already exists: given a line of algebraic notation as argument, it prints the RPN equivalent on the standard output. If we have this, then the calculator program (which we'll call adc) is very simple:
dc |& while read line'?adc> '; do print -p "$(alg2rpn $line)" read -p answer print " = $answer" doneThe first line of this code starts dc as a coroutine with a two-way pipe. Then the while loop prompts the user for a line and reads it until the user types [CTRL-D] for end-of-input. The loop body converts the line to RPN, passes it to dc through the pipe, reads dc's answer, and prints it after an equal sign. For example:
$ adc adc> 2 + 3 = 5 adc> (7 * 8) + 54 = 110 adc> ^D $Actually-as you may have noticed-it's not entirely necessary to have a two-way pipe with dc. You could do it with a standard pipe and let dc do its own output, like this:
{ while read line'?adc> '; do print "$(alg2rpn $line)" done } | dcThe only difference from the above is the lack of equal sign before each answer is printed.
But: what if you wanted to make a fancy graphical user interface (GUI), like the xcalc program that comes with many X Window System installations? Then, clearly, dc's own output would not be satisfactory, and you would need full control of your own standard output in the parent process. The user interface would have to capture dc's output and display it in the window properly. The two-way pipe is an excellent solution to this problem: just imagine that, instead of print " = $answer ", there is a call to a routine that displays the answer in the "readout" section of the calculator window.
All of this suggests that the two-way pipe scheme is great for writing shell scripts that interpose a software layer between the user (or some other program) and an existing program that uses standard input and output. In particular, it's great for writing new interfaces to old, standard UNIX programs that expect line-at-a-time, character-based user input and output. The new interfaces could be GUIs, or they could be network interface programs that talk to users over links to remote machines. In other words, the Korn shell's two-way pipe construct is designed to help develop very up-to-date software!
8.5.5 Two-way Pipes Versus Standard Pipes
Before we leave the subject of coroutines, we'll complete the circle by showing how the two-way pipe construct compares to regular pipelines. As you may have been able to figure out by now, it is possible to program a standard pipeline by using |& with print -p.
This has the advantage of reserving the parent shell's standard output for other use. The disadvantage is that the child process' standard output is directed to the two-way pipe: if the parent process doesn't read it with read -p, then it's effectively lost.
3. Use Coroutines
A multiphase algorithm in which the phases are linked by temporary files (or arrays) can be reduced to a single-pass algorithm using coroutines.
Coroutines are defined and described in detail in Knuth (Volume I) and most other modern books on algorithmics. Under IRIX, you can write coroutines in a natural way using any one of three models:
- The UNIX model of forked processes that communicate using pipes.
- The POSIX threads model using POSIX message queues to communicate.
- The MPI (or PVM) model of cooperating processes exchanging messages.
Coroutines are subroutines, with neither the caller nor the callee being "in charge". Instead, they allow program-controlled interleaving of instructions generated by both. Suppose A calls B. Then B wants to allow A to perform some more computation. B can "resume A", which then runs until it "resumes B". Then A can execute until it needs data from B, which might produce part of that data, and resume A, to examine or compute with the part produced so far. Coroutines have been exploited in the past in compilers, where the "parser" asks the "lexer" to run until the lexer has to stop (say at end of line). The lexer then resumes the parser to process that line's data, and is itself resumed to continue reading input characters. The text also shows an example of a tree-comparison problem solved logically by coroutines. Their advantage is that the cooperative behavior allows the "high-level" program to terminate the computation early, before the companion routine "completes" its assigned task. I have also used them to simulate parallel computation, when I want to build my own control over the task scheduling process.
As an interesting exercise, the text shows how coroutines can be simulated in C, using C's "setjmp()" and "longjmp()" library procedures. These procedures are intended for use in setting exception-handler routines. However, they have the property that they create concrete realizations of a "stopped" task -- an instruction counter, along with a variable reference context is stored when a setjmp occurs, and is resumed when a longjmp to the saved item is performed. The longjmp(Buf, Return) causes the setjmp(Buf) to return (again), this time returning value Return, instead of the 0 setjmp(Buf) returns when it is called.
coroutines and continuations ( in Python? - long discussion )
Steven D. Majewski ([email protected])
Fri, 29 Apr 1994 13:58:25 -0400
- Messages sorted by: [ date ][ thread ][ subject ][ author ]
- Next message: Thomas Kofler: "[Q]: Syntax For Blocks Probable?"
- Previous message: Mark Lutz: "Re: python strings"
- Next in thread: Tim Peters: "Re: coroutines and continuations ( in Python? - long discussion )"
A while ago there was a (crossposted) thread about about pipes: most of the discussion was about Perl, but it caused me to repost my redirect.py modules ( to show how *easy* the problem was in Python! ;-),
and also to think more about the utility of having a more general 2-way interface.The tostring/tolines functions in redirect are able to package the printed output from a function and return it as a python object. This gives some of the capability of unix pipes without having to actually pipe thru the OS - the non-trivial semantic difference being that redirect does not produce any concurrency - the inner function must run to completion before the redirecting wrapper can return the collected output to the calling function.
Actual concurrency of independent threads is not required, though: coroutine would be sufficient since the read/write is a blocking operation that could be an implicit transfer of control.
I know Guido is not as enamored of coroutines as old Icon-ers like Tim and I. He has (more than once, I think) stated the view that anything you could do with coroutines, you can do with classes. This view is certainly true from a theoretical POV - complemantarity of the object=method+data and closure=function+environment and all that, and coroutines are just a subset of continuations, etc., etc. But the difference is that saving of state into an objects instance variables has to be explicitly programmed, where coroutines and continuation are implicit. ( And being able to turn a pair of read/writes in a pair of functions into a coroutine return/continue, mediated transparently by a file-like python object would seem to be a potential big win for code-reuse. )
It is possible to program a pair of classes that cooperate with each other, but there is no way to make a class that can link two arbitrary functions by their read/write methods. The flow of control is not symetrical: one of the functions must RETURN before the other can get continue.
The fact that Python frames are (heap allocated) objects and not just pushed on the stack, means that they can persist whether that function is active or not. In fact, an exception passes you a link to a whole bunch of frameobjects in it's traceback object.
A frame object appears contain all of the necessary state to serve as a continuation except that the interpreter's stactpointer and blockpointer are not preserved, and when a stack frame is unwound by an exception, the stack is emptied by poping and DECREF-ing the contents.
I looks to me that it would no be difficult to give Python continuations: if a stackpointer and blockpointer are added to the frameobject, all that is missing is an operation to package a frameobject-continuation, and a 'resume' operation ( a change to ceval.c ) that takes a frame/continuation object instead of creating a new frameobject.
I am far from a complete understanding of ceval and what happens to frame and blocks in Python, so I may be missing some obvious problem ( And all the details of *exactly* how to do this aren't completely clear to me yet, either ).
Comments ?
What I initially wanted was some sort of cooperative coroutine control between two functions mediated by a file-like object that used it's read/write methods to alternate control on the two functions. Except for inet servers, most of the cases where I might want threads seem to actually fit a blocking coroutine
model better than parallel threads, and in the cases where it doesn't, I'm quite happy to use unix processes for threading. I would like to have threads on dos/mac, but I don't know if any of the portable (on unix) threads packages work on those platforms, or how much work it would be to get really portable threads written in C to work in Python. But since the easiest method of adding coroutine control appears to be the more general case of adding continuations to Python, I would think that this also gives a possible method of adding preemptive threads to the Python interpreter in a portable manner by coding it into the virtual machine rather than the real machine.
Comments ?
[ Encouraging or _discouraging_ comments invited! If I'm blind
to some fatal flaw or don't understand what's going on in
ceval, I'ld be happy to hear about it before I actually
waste time in trying to implement it! ]
-- Steve Majewski (804-982-0831) <[email protected]> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --
[ "Cognitive Science is where Philosophy goes when it dies ...
if it hasn't been good!" - Jerry Fodor ]
Subject: Re: Coroutines (was: Compiler.)
From: [email protected] (Eric Werme - replace nobody with werme)
Date: 1998/05/05
Message-ID: <[email protected]>
Newsgroups: alt.folklore.computers,alt.sys.pdp10[email protected] (Alan Bowler) writes in a.f.c:
>In article <[email protected]> [20][email protected] writes
:
>>
>>Another neat thing that was used was the notion of co-routines.
>>Perhaps somebody more qualified would talk about this?>You mean "co-operative multitasking with threads" :-)
Well, there are coroutines that take many instructions to switch between and there are coroutines that take one instruction. Barbara was talking about the latter. The definition of a coroutine as I heard it is that the two code paths use the same mechanism to transfer control back and forth. Of the many styles of subroutine calls on the PDP-10, JSP ac,addr is the fastest, as it's the only one that doesn't require a memory store. Its ISP is something like:
ac = PC
PC = effective address [addr in the usual case]The subroutine return, of course, is:
JRST (ac)
Here, the efective address is the contents of the register.
The coroutine instruction combined the two:
JSP ac,(ac)
This essentially exchanged the PC with ac.
There's one big catch here - there can't be unanswered pushes or pops in either piece of code, this is one reason why many people equate coroutines with context switches and the exchange of many registers.
I have two good examples to describe. I'll put the second one in a separate posting.
I wrote the client side of TOPS-10 TELNET for the Arpanet that ran at Carnegie Mellon, Harvard, and Rutgers. Telnet has several multi character sequences and they can be split across message boundaries. TOPS-10 made
it easiest for use to get a message at interrupt level, and step through each octet in sequence, calling the TTY input code as necessary. However, parsing the Telnet options is more easily done by code that can call a
subroutine to fetch one character at a time.
So I compromised. The network side looked something like:prcmsg: PUSHJ P,SAVE1 ;Save P1
MOVE P1,LAST_PC(F) ;Get PC where telnet left off
PUSHJ P,GET_CH ;Get next byte from message
JRST DONE ;None left
JSP P1,(P1) ;call telnet processor
JRST LOOPDONE: MOVEM P1,LAST_PC(F)
POPJ P,Not too exciting.
The telnet side looked something like:
PRCTEL: CAIN T1,IAC ;Telnet escape?
JRST NORMCH ;just text
JSP P1,(P1) ;get next character
CAIGE T1,MINTEL ;command in range 240 - 255
JRST BAD ;Out of range
JRST DISPTBL-MINTEL(T1) ; Dispatch...
WONT: JSP P1,(P1) ;Get option code
...The nice thing about all this was that the telnet processor had no idea that some of its coroutine calls actually resulted in dismissing an interrupt.
Another way of looking at this code is to see a state machine where the PC is the state variable.
A decade later when I was at Alliant, I fielded a phone call from a customer with a Macintosh machine that was sometimes having trouble with its Telnet link. The customer had managed to trace it to telnet commands
split between two TCP messages. I really tried to be sympathetic, but I was firm that Alliant's system, while perhaps not being Mac-friendly, was compliant with the protocols and that I was sure a Macintosh should
be able to handle split telnet commands.Other architectures have coroutine instructions too. On the PDP-11:
JSR Rx,@Rx
The Intel 860 _almost_ has one:
calli r1
(Calli is like jsp r1,ea in that the return pc is stored in r1.) However,
the i860 manual says this is a no-no.I should know if Alpha has one, but I just don't do enough assembly language
here.
--
<> Eric (Ric) Werme <> The above is unlikely to contain <>
<> ROT-13 addresses: <> official claims or policies of <>
<> <[email protected]> <> Digital Equipment Corp. <>
<> <[email protected]> <> http://www.cyberportal.net/werme
Google matched content |