A few weeks ago at the New Orleans symposium in Plone, I presented on some software I was writing called Content Mirror, which is an addon product for Plone content serialization to a structured/relational database. Its a tool for doing content data deployments. Its nothing particularly new, Alan and I have been talking about the deployment story for Plone since 2002. I was building CMFDeployment at the time for pushing out static copies of plone sites for ultra-secure systems. Alan and EnfoldSystems went on to build a data deployment solution in Entransit, for doing data deployments. Both were fairly succesful deployment solutions and are still in use and deployed today by a variety of Plone vendors. But both also had some failings, they required alot of configuration and committment to get working for an existing plone site. For example, Entransit required using the instance layout and additional products needed by EnSimpleStaging. While CMFDeployment has a bewildering array of configuration options. In my opinion, both are specialized consulting ware, ie. their primary deployed successfully by folks doing Plone development full time.
For most of the last year, I’ve primarily been doing Zope3 applications, in relational databases. A large part of the reason why I’ve enjoyed Zope3 so much, is that the impedence mismatch between application development is much less than with Plone. Paul Everitt asked at the first Plone conference in New Orleans if Plone was a Product or Framework. Its a question still heard in the community to this day. But to me the answer is clear, Plone is a product, and frankly thats a good thing for both the software and its users. Its however a bad thing when your building applications, their tends to be much more policy with products, that needs to be replaced or worked around when your building on them. As a result, products tend to have two other downsides in application development, developer inefficiencies and computational ineffiencies.
As an example of a developer inefficiency, one is evident just in starting up and serving a page from Plone. I call it the Plone tax, and over the course of a year its about a man month of work. This becomes more stark in comparison to other web app frameworks ( pylons, django, rails, z3, etc) which startup and serve a page in a few seconds. There’s been alot of work recently to this with plone.reload, optional loading of translations, and some heroic work by hanno, but the fact is that there is a lot of code to load up as well as data from the zodb to startup and serve a page in Plone.
Developer inefficiencies are also evident in the learning curve associated with being productive with a product. A product is typically a much bigger software stack, and Plone has and utilizes many components, from zope2, zope3, cmf, archetypes providing foundations, in addition to a growing number of plone specific infrastructure. Plone is the OpenOffice of opensource content management systems. We could drop in a pylons in a cubby hole of a plone tarball. Smaller systems offer a much better productivity to new developers, by giving them the ability to focus on the problem domain and solution, rather than how to frame the problem domain in terms of product concepts and contexts like Plone.
The real key in a data deployment scenario is to keep all the many and great features of Plone as part of the content management process, but also making that data accessible for use in other applications. By deploying content to an rdbms we get language and framework neutrality to interact with it, as well as access to a widespread number of developers and tools. In a nutshell, its data portability.
As a bonus, when using Plone as a product, and reserving customizations to applications onto of the content of a data deployment, the migration story for a Plone instance also becomes much easier.
In terms of computation inefficiency, Plone does alot of work, which makes it easier to use as a product, but its also computationaly expensive for content delivery compared to simpler solutions that fit the needs of an application/problem domain. ie. the first rule of optimization, do less work. Replacing Plone as a content delivery mechanism, is a great way to make a system more responsive and vertically scalable, while still allowing a dynamic system.
Plone is a great product, and out of the box its offers ease of through the web customization, installation, and a wide range of functionality. My goal for data deployment with Plone was to make something that would enable reusing Plone as a product, as a content management system, but would allow flexibility in usage of that data. Moreover a tool that was easy to drop into new or existing sites.
Data deployments can also bring new features into a Plone. Its much easier to mine business intelligence and reports out of a relational system. For example getting graphs of content creation broken down by month and type or user or using commercial reporting tools.
So back to introducing content mirror. Its basically a system for doing data deployments, it features.. .
- Out of the Box support for Default Plone Content Types.
- Out of the Box support for all builtin Archetypes Fields (including files,
and references ).
- Support for 3rd Party / Custom Archetypes Content Types in one line of configuration.
- Supports Capturing Containment and Workflow in the serialized database.
- Completely Automated Mirroring, zero configuration required beyond installation.
- Easy customization via the Zope Component Architecture
- Elegant and Simple Design, less than 600 lines of code, extensive doctests
- Support for Plone 2.5, 3.0, and 3.1
- Opensource ( GPLv3 )
installation docs
http://code.google.com/p/contentmirror/wiki/Installation
technical introduction / readme
http://code.google.com/p/contentmirror/wiki/Introduction
in a nutshell its technical architecture, is an event observers with aggregation by object on txn boundaries, using an operation pattern for serialization actions, along with schema transformation of archetypes to relational databases tables, using a sqlalchemy runtime generated orm layer. the technical introduction goes into more details.
