Reading some papers for my class, and in particular Andrew's "Reactive Objects" paper, hst night, has stimulated a few thoughts about Infopipe DSL issues. I thought I'd write these down and send them out to stimulate further thoughts and discussion.
Input/Output ------------ It occurs to me that streaming applications are fundamentally I/O driven. Well, all programs are, I guess, but I/O seems to be particularly prominent in streaming applications since they are all about moving data from one place to another often with some transformations along the way and some timing constraints. So, we need to hide the complexity of programming I/O. The complexity we want to avoid includes approaches such as (a) using blocking I/O calls with multi-threading, (b) using non-blocking I/O calls with call backs, and (c) polling, perhaps timer driven. We need to be able to support concurrent waiting, within a single Infopipe component and across Infopipe components. We also need to support active and passive sources and sinks. The reactive object approach seems to me to be a useful starting point for all of this. Concurrency ----------- Obviously, the Infopipe abstraction lends itself nicely to pipelined concurrency - meaning that conceptually, one stage of a pipeline, represented as an infopipe component, is processing one item at the same time as another stage (component) is processing a different item. We need to support this kind of concurrency virtually (on a single CPU) and truely (on several CPUs). True concurrency includes running different components on different CPUs of a shared memory multiprocessor, and running them on different nodes in a distributed system. We need to do this without forcing the application programmer to deal with the complexity of local or remote inter-component communication and synchronization. So far, I have said nothing about concurrency within an Infopipe component. But we may also want to support this. In this case the concurrency may be constrained by ordering constraints on the output. For example, an encoder component may encode two successive video frames in parallel on an SMMP, but must output the first before the second. For this kind of concurrency, the Infopipe abstraction doesn't help us as much, but the property specifications might be a good starting point for generating synchronization approaches with different behaviors. Synchronization --------------- We need to think about what synchronization means in a reactive model. In a single CPU system one might make event handling non-preemptable and quit worrying about synchronization. However, we will have to support true concurrency, so we need to think about synchronization. It occurs to me that the synchronization for the pipelined concurrency case always follows a producer-consumer pattern. This is forced upon it by the Infopipe abstraction. It also occurs to me that this passing of items among components may be viewed as adding and deleting items from a linked list. For certain tasks like this there are some highly efficient non-blocking synchronization approaches that run well on shared memory multiprocessors. If a DSL compiler could generate this kind of code automatically that would save a lot of programming complexity. We need to look closely to see if this stuff is appropriate though. For true concurreny within a component we need to look at high performance approaches to synchronization. Taking a look at techniques like RCU and thinking about whether it would be possible to automatically generate code for certain synchronization patterns, would be an interesting research topic. DSL Compiler ------------ If the DSL compiler could generate the code needed to hide concurrent I/O waiting within and across components, deal with active and passive I/O, synchronize data exchange among producer and consumer components, and marshal data exchange across networks, we would have gone a long way toward achieving our goal. The compiler and runtime also need to somehow schedule the execution of pipeline components in order to satisfy the timing and prioritization properties. ... Also ------------- If we implement Infopipe components as reactive objects or event handlers we will need to consider how to do event dispatching. This is sort of the equivalent of a scheduling policy for a thread-based approach. It seems to me that event dispatching policy is going to be application specific and will probably encompass notions of fairness, timeliness and graceful degradation under load, as well as performance issues having to do with making best use of the underlying resources (processor/cache affinity etc). This view also raises the question of whether buffers are components at the same level as other infopipe components, since for the event dispatching policies mentioned above one would have to have event queues/buffers between components, with the servicing of those queues/buffers being determined by the dispatching policy. In such a system, do you think it is more sensible to view Priority-Progress-Streaming as an Infopipe component or an event dispatching policy? -- Jon