nabu: a Publishing System using Text Files
"Nabu is the Babylonian god of wisdom and writing [...] Nabu is accorded the office of patron of the scribes, taking over from the Sumerian goddess Nisaba. His consort is Tashmetum. His symbols are the clay writing tablet with the writing stylus. [...] His power over human existence is immense, because Nabu engraves the destiny of each person, as the Gods have decided, on the tablets of sacred record."
- Reporting Bugs
- Installation and Dependencies
- Copyright and License
Assuming that a user can easily create text documents --this is the case for most programmers and techies, we do this all day, and we all have our favourite text editors-- this system allows you to store various kinds of data across multiple documents. You can create and maintain a body of text files from which various elements automatically could get stored in an organized manner in a database.
Nabu is a simple framework that extracts chunks of various types of information from documents written in simple text files (written with reStructuredText conventions) and that stores this information (including the document) in a remote database for later retrieval. The processing and extraction of the document is handled on a server, and there is a small and simple client that is used to push the files to the server for processing and storage (think rsync). The client requires only Python to work. The presentation layer is left unspecified: you can use whichever web application framework you like to present the extracted data in the way that you prefer.
It is probably best illustrated by some examples...
For example, imagine that you write several files about your travels, say, one for each destination. Most likely they contain various contact information. You could insert them like this in any of your text files and they would automatically be extracted to a global list of contact info:
:Contact: :n: Martin Blais :e: email@example.com :a: 1 rue de la Montagne, Mongueuil, France
You could use this to feed into an online address book system, or create a birthday notification system for yourself by adding your birth date.
Another example: you could easily build a blogging system by detecting documents with certain bibliographic fields and a date at the top of document, for example:
Cookies and Cream Today ======================= :Blog: general :Date: 2005-06-25 Ahh, I like to talk about cookies and cream...
All you would have to do is present the documents in an orderly way, e.g. by category, by date, just the most recent, whatever you like.
Another idea: you could automatically extract all the URLs from the documents pushed to the server (across all the files) and automatically build an online RSS bookmarks feeder, where your bookmarks would be organized by keywords, maybe even fetching the keywords in the respective documents from which they were extracted.
Various simple extractors will come with the Nabu package itself, but we suspect you will very soon want to write your own. Writing extractors is very easy.
All of this is built using the very powerful docutils tools, which includes the reStructuredText parser and provides the data structures that represent your source documents. Extractors are simply docutils transforms that visit the document structure (which is provided and saved by the Nabu system). To add a new type of entry, all you need to do is write code that visits the tree, finds what you want, and write a simple object to store it in the way you prefer. You can use any system you like to store the data in a database or in text files (SQL DBAPI, an ORM, files, ...).
At the moment I have not had time to write complex extractors yet, due to work/time constraints, but I will soon write a system to selectively publish lots of disparate documents using Nabu and the extractors I will write for this will come with the Nabu distribution. I'll be happy to include external contributions of extractor codes in the distribution if anyone cares to provide them.
This system is made to be a simple as possible to use. However, it is not designed for your mother. In order to be able to use this efficiently, you must
- understand the conventions and simple no-markup syntax of reStructuredText;
- be able to edit simple text files.
I suspect that this will cater to people who are already familiar with computers.
For more details, see the design document and project proposal under the doc subdirectory.
Nabu is cool, because:
it is flexible: you can use
- any text editor you like to edit the files;
- any source code control system you like to store and maintain them (or none);
- any database for storage;
- and any web application framework for presentation. Nabu does not dictate how the information is presented/served to the clients;
you edit files locally, not in a bleeping web browser window (programmers will appreciate the value of this), in your favourite editor environment;
the organization of the source files in subdirectories has nothing to do with how the content is presented. We use a unique ID system (similar to arch) where your document to be published must contain a unique string to mark it with that id. You can put that string in a reStructuredText comment or a bibliographic field. Unlike Wikis, this allows you to change the title of your documents while keeping the possibility of a permanent link to them.
It effectively offers you a sandbox for creating content, and then how you organize and present the content is dictated by ways that you decide, most likely independent of the source file organization structure;
the input data can be scattered over many files, it does not have to be stored in files per-category (for example, you don't have to store all your "contacts" in a single "address book" file, they can be found within/across all your body of published file and a server might present as a single list if desired). I conjecture that this may be closer to how humans think of this data. This body of files can be used to create a mind-mapping system;
we recognize that the value of the information lies in the source itself, the text files. This valuable source remains with you, and you are free to manage them in any way you prefer, with any version control system you like (if you want to do that). You can completely dump the data stored in the database and rebuild it from the text files;
various semantic chunks of content are automatically extracted from your document. These semantic things are easily written with little code and are configurable. Nabu comes with example content extractors;
a light-weight program with minimal dependencies is used to upload the files to the server. The server processes the files for content. This maximizes the potential that you will be able to use Nabu anywhere, on any platform. The client only requires Python to work;
- Not a Wiki
- Although we upload documents (like Wiki pages) to a server, it may be that the presentation layer never serves the pages/documents themselves. Maybe it just serves bits of information extracted from the files. Also, documents are identified by a unique string, and not by Wiki names. This allows one to use whatever title for single documents and to create permanent links to specific documents (as long as the file ID is maintained);
- Not just a Blogging system
It can be used to add any document set to a database. Organization by date and category is simply implemented, but not a requirement. You present the data set (which can be anything) in your preferred manner.
You could, however, very easily implement a flexible and very complete blogging system using Nabu.
- Not a Word Processing system
It is not meant to just serve documents. You can use it to publish just the information that is found within documents, used to contain that information.
Also, you write your documents locally, but they have the limitations of reStructuredText, so it is not really like a Word processing system. You must use a simple text editor.
- Not a generic data entry format
Although you could potentially use it to enter larges lists of similarly structured data, the overhead of document processing does not make it the most efficient way to do this. If you have lots of similarly structured bits of human-editable data, it would probably be better to write custom code to parse a known format rather than leverage Nabu for this. The docutils processing on the server is not extremely fast, and if you need speed for lots of data (say, thousands of files), you will probably find this too slow.
However, for smaller sets of files, Nabu could be used this way. Or if you have to input various kinds of structures within different contexts which can be grouped in files.
Nabu sits between all these things. I do not know what to call it, so I just call it "Nabu".
- Server Setup: instructions for installing and configuring a Nabu server;
- Writing an Extractor: an example extractor source code, with details and commentary. Since you are expected to provide or at least configure your own extractors for your application, this will be useful if you intend to put Nabu to good use. This could also help you understand more precisely the nature of what Nabu actually provides and what it does not provide;
- TODO list: list of future changes, broken things, etc.;
- Change Log: recent changes, history;
A Mercurial repository can be found at:
It consists mostly of server code, but it also contains the publisher client. When I get some feedback I will decide on the version number.
You will need a recent snapshot of the docutils svn development tree (after stamp 3624) to be able to install and run a server. This does not affect clients.
Stability is a relative matter. Personally, I consider these snapshots to correspond to a beta stability for a 1.0 release. I'm already using the system on personal projects (works for me), and I'm committed to fix bugs reported by others. When I get some feedback I will select version number (probably 1.0).
- Nabu Publisher Client
A single file written in Python.
The publisher finds files on the local client filesystem and figures out which ones are meant for publication by detecting the ids within the top of the files, connects to the Nabu Server and asks for comparison of the files and incrementally uploads the files that have changed to update the content on the server. The client code may also contain a easy-to-use editor for source files.
Requires Python-2.3 or greater (and perhaps some kind of shell environment to invoke the Python script with some arguments (cmd.exe that comes with Windows can do the job)). That's it.
- Nabu Server / Publisher Handler
Receives updates of the source files from the publisher client, helps the client figure out which files have changed since the last update by serving MD5 sums of the files it has processed, and makes the necessary updates. The content is stored in a database after having been parsed.
We provide generic handler code for the publisher server, which you can bind to a CGI script with XML-RPC (very simple) or within your own server system (if you can write a little bit of Python code, still very simple). On the server, you must have:
- Presentation Layer
This is up to you. Nabu does not provide nor dictate the way in which you present your content, it only deals with the upload and processing of the source documents and the storage of extracted information. You can use any computer language or framework you prefer to build a front-end for data fed into a database by Nabu.
However, we provide a simple CGI script that allows you to browse the uploaded contents for debugging and visualizing what has been stored in the Nabu database.
Copyright (C) 2005 Martin Blais. This code is distributed under the GNU General Public License.
This project uses two important libraries to do its thing:
- the Docutils project (written mostly by Dave Goodger), which provides the reStructuredText parser, the internal document structure, and the HTML writer (and more). Docutils is a major facilitator for this system--without it this idea probably would not have seen the light of day;
It is also built in the Python language.