In software, an XML pipeline is formed when XML processes, especially XML transformations and XML validations, are connected. For instance, given two transformations T1 and T2, the two can be connected so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.
Linear operations
Linear operations can be divided in at least two parts
Micro-operations
They operate at the inner document level
Rename - renames elements or attributes without modifying the content
Replace - replaces elements or attributes
Insert - adds a new data element to the output stream at a specified point
Delete - removes an element or attribute
Wrap - wraps elements with additional elements
Reorder - changes the order of elements
Document operations
They take the input document as a whole
Identity transform - makes a verbatim copy of its input to the output
Transform - execute a transform on the input file using a specified XSLT file. Version 1.0 or 2.0 should be specified.
Split - take a single XML document and split it into distinct documents
Sequence operations
They are mainly introduced in XProc and help to handle the sequence of document as a whole
Count - it takes a sequence of documents and counts them
Identity transform - makes a verbatim copy of its input sequence of documents to the output
split-sequence - takes a sequence of documents as input and routes them to different outputs depending on matching rules
wrap-sequence - takes a sequence of documents as input and wraps them into one or more documents
Non-linear
Non-linear operations on pipelines may include:
Conditionals — where a given transformation is executed if a condition is met while another transformation is executed otherwise
Loops — where a transformation is executed on each node of a node set selected from a document or a transformation is executed until a condition evaluates to false
Tees — where a document is fed to multiple transformations potentially happening in parallel
Aggregations — where multiple documents are aggregated into a single document
Exception Handling — where failures in processing can result in an alternate pipeline being processed
Some standards also categorize transformation as macro or micro
XML pipeline languages
XML pipeline languages are used to define pipelines. A program written with an XML pipeline language is implemented by software known as an XML pipeline engine, which creates processes, connects them together and finally executes the pipeline. Existing XML pipeline languages include:
Standards
XProc: An XML Pipeline Language is a W3C Recommendation for defining linear and non-linear XML pipelines.
Product-specific
W3C XML Pipeline Definition Language is specified in a W3C Note.
W3C XML Pipeline Language Version 1.0 is specified in a W3C Submission and a component of OrbeonPresentation Server OPS. This specification provides an implementation of an earlier version of the language. XPL allows the declaration of complex pipelines with conditionals, loops, tees, aggregations, and sub-pipelines. XProc is roughly a superset of XPL.
Cocoon sitemaps allow, among other functionality, the declaration of XML pipelines. Cocoon sitemaps are one of the earliest implementations of the concept of XML pipeline.
smallx XML Pipelines are used by the smallx project.
ServingXML defines a vocabulary for expressing flat-XML, XML-flat, flat-flat, and XML-XML transformations in pipelines.
used by PolarLake's runtime to define . Circuits are collections of paths through which fragments of XML stream. Components are placed on paths to interact with the stream in a low latency process.
Stylus Studio XML Pipeline is a visual grammar which defines the following operations: Input, Output, XQuery, XSLT, Validate, XSL-FO to PDF, Convert To XML, Convert From XML, Choose, Warning, Stop.
Pipe granularity
Different XML Pipeline implementations support different granularity of flow.
Document: Whole documents flow through the pipe as atomic units. A document can only be in one place at a time. Though usually multiple documents may be in the pipe at once.
Event: Element/Text nodes events may flow through different paths. A document may be concurrently flowing through many components at the same time.
Standardization
Until May 2010, there was no widely used standard for XML pipeline languages. However, with the introduction of the W3C XProc standard as a W3C Recommendation as of May 2010, widespread adoption can be expected.
History
1972 Douglas McIlroy of Bell Laboratories adds the pipe operator to the UNIXcommand shell. This allows the output from one shell program to go directly into input of another shell program without going to disk. This allowed programs such as the UNIX awk and sed to be specialized yet work together . For more details see Pipeline.
1998 Stefano Mazzocchi releases the first version of Apache Cocoon, one of the first software programs to use XML pipelines.
1998 build , which includes .
2002 Notes submitted by Norman Walsh and Eve Maler from Sun Microsystems, as well as a W3C Submission submitted in 2005 by Erik Bruchez and Alessandro Vernet from Orbeon, were important steps toward spawning an actual standardization effort. While neither submission directly became a W3C recommendation, they were considered key sources of inspiration for the W3C XML Processing Working Group.
September 2005 W3C XML Processing Working Group started. The task of this working group was to create a specification for an XML pipelining language.
August 2008, , an XML pipeline language was announced at