An XML database is a data persistencesoftware system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.
Rationale for XML in databases
There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include:
An enterprise may have a lot of XML in an existing standard format
Data may need to be exposed or ingested as XML, so using another format such as relational forces double-modeling of the data
XML is very well suited to sparse data, deeply nested data and mixed content
XML is human readable whereas relational tables require expertise to access
Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.
XML enabled databases
XML enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:
XML is stored into a CLOB
XML is `shredded` into a series of Tables based on a Schema
XML is stored into a native XML Type as defined by ISO Standard 9075-14
Typically an XML enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML, a native XML database is better suited.
select id, vol, xmlquery as name from journals where xmlexists
Native XML databases
Native XML databases are especially tailored for working with XML data. As managing XML as large strings would be inefficient, and due to the hierarchical nature of XML, custom optimized data structures are used for storage and querying. This usually increases performance both in terms of read-only queries and updates. XML nodes and documents are the fundamental unit of storage, just as a relational database has fields and rows. The standard for querying XML data per W3C recommendation is XQuery; the latest version is XQuery 3.1. XQuery includes XPath as a sub-language and XML itself is a valid sub-syntax of XQuery. In contrast to XML enabled databases, native databases provide full support for XQuery. In addition to XPath, some XML databases support XSLT as a method of transforming documents or query results retrieved from the database.
Language features
Supported APIs
Data-centric XML datasets
For data-centric XML datasets, the unique and distinct keyword search method, namely, XDMA for XML databases is designed and developed based on dual indexing and mutual summation.