If you're anything like us then you've probably written countless numbers of scripts to manipluate data in various way. The more operations/transformations on the data, the more your code starts to look like spaghetti. What if you could write small portions of logic that each perfrom a specific task and then connect them all together so that the output of one task becomes the input of another? This is precisely how Unix pipes work.
Pypes is an attempt to provide that same ideology in a more modern world. Pypes allows you to write small applications called "components". Each component focuses on providing a specific operation on a stream of data objects called "packets". This programming paradigm is referred to as flow-based programming and pypes supports a subset of flow-based features.
Pypes also provides a rich Web 2.0 interface that allows users to connect components using simple drag-n-drop functionality. The UI was highly inspired by Yahoo! Pipes and provides a similar experience. The end result is a visual programming experience where non-developers can build complex data flow architectures for manipulating digital content.
If you are a developer then pypes allows you to write custom components. The UI will detect any custom components and make them available for use. Pypes even provides project templates for generating new components. The templates create the boilerplate code necessary for writing custom components as well as the build scripts used to produce the final product. If you're familiar with Python, components are nothing more than egg files that define specific entry points.
Pypes is more than just a pretty face though. The underlying framework is capable of utilizing multi-core/cpu architectures and avoids the overhead associated with traditional thread models. Each component is an abstraction of a Stackless tasklet and consumes only a few hundred bytes of memory. Switching context between components is extremely robust allowing systems to scale up and take full advantage of the underlying hardware.
At the same time, the system scales out through its REST interface. For large data streams, more nodes can be added to build out entire clusters capable of processing a large number of documents in parallel. The system is completely decentralized (no single point of failure) and homogonous (working separately to achieve a common goal).
Lastly, pypes is not limited to linear pipeline models. It provides full support for directed acyclic graphs (DAG) giving you the ability to publish content streams to multiple end points and aggregate several input streams with branching and merging operations.
Our original intent when we started developing pypes was to build a framework for indexing content for search. This typically involves hundreds of millions of documents having to be scrubbed, transformed, and indexed. This played a major role in the design decisions and pushed us to create something that was highly scalable. To that extent, the system also scales down nicely allowing it to run on very modest hardware.
Whenever we introduce someone new to pypes we always hear about interesting ideas we never envisioned. We've come to realize that the system has the potential to solve a number of interesting problems that were never part of the original roadmap. If you have any interesting ideas we'd like to hear about them.
The pypes development team
- RESTful HTTP Interface (Web 2.0)
- Pure Message Passing (JSON objects)
- Distributed (Decentralized & Homogonous)
- Micro-Threaded (Components are Stackless tasklets)
- Multi-processor support (Utilize SMP and multicore hardware)
- Lightweight (Scales down)
- Pluggable Architecure (Create your own components)
- Project Templates (Generate boilerplate components)
- Simple Authentication (Password protection)
- Import/Export Projects (Share your project)
- Stackless Python 2.6.x
- Mailing List (Google Groups)