This item on "Metadata for Publishing" is the second of a segment on "Metadata and Electronic Document Management for Electronic Commerce" first presented for the Australian National University course "Information Technology in Electronic Commerce" (COMP3410/COMP6341).
This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The Slides and these notes are provided in the one HTML document, using HTML Slidy.
Metadata can be described simply as "Data about Data". As an example the "creator" of this document is "Tom Worthington". The data is "Tom Worthington" and the metadata is "creator".
Metadata is essential for e-commerce, as it provides standard data items to allow parties to communicate about their organisations, products, terms and conditions. The payment and the "money" itself consists of data in an agreed metadata format, in an electronic transaction. Without suitable metadata standards, e-commerce could not take place and "money" in our online financial systems would cease to exist.
Metadata can also be used to describe published documents. The use of metadata for e-commerce and for publishing has converged in the last few years with the use of the same XML technology for both applications.
This metadata from the Australian Government home page. It was intended that data in this format would be inserted into the HEAD of all government web pages, to aid data retrieval.
The challenge is to create formats which are sufficiently expressive to be able to communicate what is needed, but simple enough to be implemented efficiently.
Creating and using metadata standards is both a technical and political process. Most standards need to be profiled, to create a workable subset, before they can be used for practical purposes. Some standards need to be enhanced and others should not be used at all.
Here is an example of an e-commerce transaction. This is an Australian Taxation Office electronic tax form for the Goods and Services Tax (GST). This is a different use of metadata, for defining the data in a financial transaction.
The World Wide Web Consortium (W3C) standard for Scalable Vector Graphics (SVG), provides a way to define images in web pages. As well as the expected features of shapes, filling, symbols, colours and patterns there is the 'metadata' element.
Simple definition politically complex
The apparently technically simple definition of metadata for SVG is made politically complex by this paragraph in the standard.
The ease of defining metadata using new web based tools has made standardization more difficult. It is technically simple to define a new standard if an exiting definition is not quite right. However, having many standards is as much a problem as having no standards at all.
According to this official version of events, the Australian Government Locator Service (AGLS) metadata standard (discussed later) was originally called "AUSGILS" and intended to be based on the U.S. Government Information Locator Service (GILS), but this was abandoned in favour of the Dublin Core metadata standard in 1997. However, the proposed standard was first called "AGILS" in an earlier architecture proposal:
META tag of HTML is used in the header section of HTML documents. Example: <META NAME="Date" CONTENT="1966-01-12">. The field identifiers from the selected meta-data set is used in the NAME field and the field value in CONTENT. The set of meta-data definitions being used (the meta-meta-data) should be included in a tag. Example: <meta name="metadata" content="AGILS">.
From: Architecture For Access To Government Information, Report of the IMSC -Technical Group, Commonwealth of Australia, 25 July 1996, URL (archived copy): http://www.defence.gov.au/imsc/imsctg/imsctg1c.htm#RTFToC87
This was done for political reasons, to suggest compatibility with the US Government standard. The name was later shortened to AGLS.
Standards politics are important to metadata and electronic document development in the real world. Standards are selected based on the importance of the organisations and individuals supporting them, not technical merit. Standards are then adapted, extended, made into subsets or combined.
E-commerce and electronic publishing depend on decisions made on what standards to use. Previously separate standards for electronic commerce, documents and television are converging to use the same format (XML).
One example of where standards for document formats and commercial interests collide is the Portable Document Format (PDF). Developed by Adobe as an extension to the Postscript format for desk-top publishing, PDF has provide a popular electronic document format. However, PDF has a number of limitations as an on-screen format and for disabled users. Adobe have attempted to address these limitations with "Tagged Adobe PDF", which added some XML interoperability to the PDF format.
However additional work is needed by document creators to use these features. There is also an inherent contradiction between one of PDF's original selling point of providing an accurate representation of a printed document and the aims of the enhancement of allowing the representation to be transformed. Adobe are not the only ones struggling with this problem. One possible solution is OpenOffice.org's XML Packages format. This packages up XML documents and supplementary binary format data, such as images, in ZIP file format.
Dublin Core (DC)is a metadata standards project originating from a workshop held in Dublin, Ohio, USA in 1995. "Dublin Core" metadata element set is a small set of metadata definitions intended for cross-domain information resources. However, DC has its origins in the work of librarians and so tends to work better for describing printed text, than other items, such as video.
The intention with DC is to provide a brief standard set of essential metadata items for resources: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights.
Other examples of controlled vocabulary are using the Internet Media Types ( MIME) for defining computer media formats in the format element and language tags, such as "en-AU" for Australian English.
The Australian Digital Theses Program provides a database of digitised theses produced at Australian Universities. Authors at ANU use a deposit form, the data from which is expressed as DC metadata. and provided via a search facility.
The Australian Government Locator Service (AGLS) metadata standard is a set of 19 descriptive elements to improve the visibility and accessibility of services and information over the Internet. The AGLS standard is based the 15 Dublin Core elements, plus four extra elements:
No elements are mandatory for DC, but AGLS requires five (or six) of them.
Qualifiers are used to restrict the semantics of the relationship between the resource and the element value. AGLS encourages more use of qualifiers than DC, but does not require it.
AGLS uses two types of qualifiers.
Metadata is rarely entered be the document author typing in text. When encoded in the header of a HTML document the metadata is not displayed by a web browser. Specialized software, such as a content management systems, or features in word processors are used to enter and display the metadata. The user of the system is likely to be unaware they are using a metadata standard or how it is encoded. Examples of how these systems will be shown later.
The Distributed Systems Technology Centre (DSTC Pty Ltd), has produced a metadata tool to create AGLS and Dublin Core metadata. Rege, can be used to generate AGLS metadata syntax. This would be too cumbersome for creating real metadata, but is a useful way to learn about the process.