Metadata and Electronic Document Management for Electronic Commerce
Metadata for Publishing
Version of 2 August 2008
Metadata
metadata n., a set of data that describes and gives information
about other data...
[1968 Proc. IFIP 4th Congr.: Suppl. 10 I. 113/2 There are
categories of information about each data set as a unit in a data
set of data sets, which must be handled as a special meta data
set.] 1987 Philos. Trans. Royal Soc. A. 322 373 The challenge is to
accumulate data..from diverse sources, convert it to
machine-readable form with a harmonized array of *metadata
descriptors and present the resulting database(s) to the user. 1998
New Scientist 30 May 35/2 With XML, attaching metadata to a
document is easy, at least in theory.
Oxford English Dictionary, (Online) Draft entry
Dec. 2001, URL: http://dictionary.oed.com/cgi/entry/00307096/00307096se19
Metadata can be described simply as
"Data about Data". As an example the "creator"
of this document is "Tom Worthington". The data is
"Tom Worthington" and the metadata is
"creator".
Metadata is essential for e-commerce, as it provides standard
data items to allow parties to communicate about their
organisations, products, terms and conditions. The payment
and the "money" itself consists of data in an agreed
metadata format, in an electronic transaction. Without suitable
metadata standards, e-commerce could not take place and
"money" in our online financial systems would cease to
exist.
Metadata can also be used to describe published documents. The
use of metadata for e-commerce and for publishing has converged in
the last few years with the use of the same XML technology
for both applications.
Australian Government Metadata
<meta name="DC.Publisher" scheme="X500" content="ou=Australian Government Information Management Office (AGIMO) ; o= Commonwealth of Australia ; c=AU">
<meta name="DC.Description" content="The australia.gov.au website is your connection with government in Australia...">
<meta name="DC.Subject" scheme="TAGS" content="Government information; Federal government; Government services; Government publications; Web sites">
<meta name="DC.Type.documentType" scheme="agls-document" content="homepage">
From: "australia.gov.au : your connection with government", Australian Government Information Management Office, 2004-06-30, URL: http://www.australia.gov.au/
This metadata from the Australian Government home page. It was intended that data in this format would be inserted into the HEAD of all government web pages, to aid data retrieval.
The challenge is to create formats which are sufficiently expressive to be able
to communicate what is needed, but simple enough to be implemented
efficiently.
Creating and using metadata standards is both a technical and political process. Most standards need to be profiled, to create a workable
subset, before they can be used for practical purposes. Some
standards need to be enhanced and others should not be used at all.
Tax Office e-commerce transaction
<FORM_PERIOD_LABEL_TEXT>July to September
2001</FORM_PERIOD_LABEL_TEXT>
<EFT_CODE> 51111 121 059 9059</EFT_CODE>
<BILLER_CODE>75556</BILLER_CODE>
<PAYG_WITHHOLDING>0</PAYG_WITHHOLDING>
<PAYG_INSTALMENT>12541</PAYG_INSTALMENT>
<DEFERRED_COMPANY_FUND_INSTALMENT>7879801 </DEFERRED_COMPANY_FUND_INSTALMENT>
<TOTAL_DEBITS>7892342</TOTAL_DEBITS>
<TOTAL_CREDITS>0</TOTAL_CREDITS>
<NET_AMOUNT_FOR_THIS_STATEMENT>7892342 </NET_AMOUNT_FOR_THIS_STATEMENT>
<GST_LABEL_TEXT>for the QUARTER from 1 Jul 2001 to 30 Sep
2001</GST_LABEL_TEXT>
<GST_ACCOUNTING_METHOD_LABEL_TEXT>Cash ...
From: Formatting the eBAS with XSL, Tom
Worthington, 29 November 2002, URL: http://www.tomw.net.au/2002/atoxml.html
Here is an example of an e-commerce transaction. This is an
Australian Taxation Office electronic tax form for the Goods and
Services Tax (GST). This is a different use of metadata, for defining the data in a financial transaction.
Scalable Vector Graphics Metadata Definition
<!ENTITY % metadataExt "" >
lt;!ELEMENT metadata (#PCDATA %metadataExt;)* >
<!ATTLIST metadata %stdAttrs; >
From 21.2 The
'metadata' element, Scalable Vector Graphics (SVG) 1.0
Specification W3C Proposed Recommendation 19 July, 2001, URL:
http://www.w3.org/TR/SVG/
The World Wide Web Consortium (W3C) standard for Scalable Vector
Graphics (SVG), provides a way to define images in web pages. As
well as the expected features of shapes, filling, symbols, colours
and patterns there is the 'metadata' element.
Scalable Vector Graphics Metadata Explanation
Individual industries or individual content creators are free to
define their own metadata schema but are encouraged to follow
existing metadata standards and use standard metadata schema
wherever possible to promote interchange and interoperability. If a
particular standard metadata schema does not meet your needs, then
it is usually better to define an additional metadata schema in an
existing framework such as RDF and to use custom metadata schema in
combination with standard metadata schema, rather than totally
ignore the standard schema.
From 21.1
Introduction, Scalable Vector Graphics (SVG) 1.0 Specification
W3C Proposed Recommendation 19 July, 2001, URL: http://www.w3.org/TR/SVG/
Simple definition politically complex
The apparently technically simple definition of metadata for SVG is made politically complex by this paragraph in the standard.
The ease of defining metadata using new web based
tools has made standardization more difficult. It is technically simple to define
a new standard if an exiting definition is not quite right. However, having many standards is as much a problem as having no standards at all.
AUSGILS to AGLS
At the time of the IMSC it was thought that an Australian
Government Locator Service would be a variant of the U.S.
Government Information Locator Service (GILS). Consequently, for
much of its gestation period what is now known as AGLS was referred
to as AUSGILS. However, late last year when a workshop of experts
convened to develop the AUSGILS standard it was decided to abandon
the GILS framework and instead base the online locator service on
the Dublin Core metadata standard.
From: Enabling Seamless Online Access to
Government, Adrian Cunningham, National Archives of Australia, 26
August 1998, URL (archived copy):
http://www.naa.gov.au/recordkeeping/gov_online/agls/Metadata_paper22sept98.html
According to this official version of events, the Australian
Government Locator Service (AGLS) metadata standard (discussed later) was originally called "AUSGILS" and intended to be based on the U.S. Government Information Locator Service (GILS), but this was abandoned in favour of the Dublin Core metadata standard in 1997. However, the proposed
standard was first called "AGILS" in an earlier
architecture proposal:
META tag of HTML is used in the header section of HTML
documents. Example: <META NAME="Date"
CONTENT="1966-01-12">. The field identifiers from the
selected meta-data set is used in the NAME field and the field
value in CONTENT. The set of meta-data definitions being used (the
meta-meta-data) should be included in a tag. Example: <meta
name="metadata" content="AGILS">.
From: Architecture For Access To Government
Information, Report of the IMSC -Technical Group, Commonwealth of
Australia, 25 July 1996, URL (archived copy):
http://www.defence.gov.au/imsc/imsctg/imsctg1c.htm#RTFToC87
This was done for political reasons, to suggest compatibility
with the US Government standard. The name was later shortened to
AGLS.
Standards, Definitions and Dollars
Adobe Acrobat 5.0 software introduced tagged Adobe PDF, an
enhancement to the PDF specification that allows PDF files to
contain logical document structure. Logical structure refers to the
organization of a document, such as the title page, chapters,
sections, and subsections. Tagged Adobe PDF documents can be
reflowed to fit small-screen devices and offer better support for
repurposing content. They also are more accessible to the visually
impaired.
From Adobe PDF, Adobe Systems Incorporated,
2001, URL: http://www.adobe.com/products/acrobat/adobepdf.html
Standards politics are important to metadata and electronic
document development in the real world. Standards are selected based on the importance of the organisations and individuals supporting them, not technical merit. Standards are then adapted, extended, made into subsets or combined.
E-commerce and electronic publishing depend on decisions made on what
standards to use. Previously separate standards for electronic
commerce, documents and television are converging to use the same
format (XML).
One example of where standards for document formats and
commercial interests collide is the Portable Document Format (PDF).
Developed by Adobe as an extension to the Postscript format for
desk-top publishing, PDF has provide a popular electronic document
format. However, PDF has a number of limitations as an on-screen
format and for disabled users. Adobe have attempted to address
these limitations with "Tagged Adobe PDF", which added
some XML interoperability to the PDF format.
However additional work is needed by document creators to use
these features. There is also an inherent contradiction between
one of PDF's original selling point of providing an accurate
representation of a printed document and the aims of the
enhancement of allowing the representation to be transformed. Adobe
are not the only ones struggling with this problem. One possible
solution is OpenOffice.org's XML
Packages format. This packages up XML documents and
supplementary binary format data, such as images, in ZIP file
format.
Dublin Core
Title |
Typically, Title will be a name by which the resource is
formally known. |
Creator |
Examples of Creator include a person, an organization, or a
service. ... |
Subject |
... keywords, key phrases or classification codes that describe
a topic of the resource. Recommended best practice is to select a
value from a controlled vocabulary or formal classification
scheme. |
Description |
... an abstract, table of contents, reference to a graphical
representation of content or a free-text account of the content.
... |
Adapted from "Dublin Core Metadata Element
Set", Version 1.1: Reference Description, DCMI, 2003-06-02,
URL: http://dublincore.org/documents/dces/
Dublin Core
(DC)is a metadata standards project originating from a workshop
held in Dublin, Ohio, USA in 1995. "Dublin Core"
metadata element set is a small set of metadata definitions
intended for cross-domain information resources. However, DC has
its origins in the work of librarians and so tends to work better
for describing printed text, than other items, such as video.
The intention with DC is to provide a brief standard set of
essential metadata items for resources: Title, Creator, Subject,
Description, Publisher, Contributor, Date, Type, Format,
Identifier, Source, Language, Relation, Coverage, Rights.
Other examples of controlled vocabulary are using the Internet
Media Types (
MIME) for defining computer media formats in the format element
and language tags, such as "en-AU" for Australian
English.
Australian Digital Theses Program
Metadata Standards
Dublin Core metadata will be automatically generated out of the
ADT Deposit form. This metadata will form the basis of the database
of distributed digitised theses across the 7 participating
institutions. ...
<meta name="DC.language" scheme="RFC3066"
content="en">
*** English will be the default language. In order to add another
language the Deposit form will need to be amended to add another
field. As theses will be predominantly in English, this will remain
the default and the issue of other languages and the appropriate
scheme to use will be investigated at a future date if
necessary.
From: "Metadata standard:, Australian
Digital Theses Program, UNSW Library 1997, Updated 12/09/03 URL:
http://www.library.unsw.edu.au/thesis/adt-ADT/info/metadata.html
Other Dublin Core Projects are listed at URL: http://dublincore.org/projects/subject.shtml
AGLS
Element |
Example |
Function |
<META NAME="AGLS.Function" CONTENT="School
Education"> |
Availability |
<META NAME="AGLS.Availability"
CONTENT="Medical assistance is available by contacting the
after hours hotline on ..."> |
Audience |
<agls:audience>anglers</agls:audience> |
Mandate |
<META NAME="AGLS.Mandate.case"
SCHEME="URI" CONTENT="http://..."> |
Complied from AGLS Metadata Element Set, Part 2:
Usage Guide, Version 1.3 , National Archives of Australia, 2002,
URL:
http://www.naa.gov.au/recordkeeping/gov_online/agls/metadata_element_set.html
The Australian
Government Locator Service (AGLS) metadata standard is a set of
19 descriptive elements to improve the visibility and accessibility
of services and information over the Internet. The AGLS standard is
based the 15 Dublin Core elements, plus four extra elements:
No elements are mandatory for DC, but AGLS requires five (or
six) of them.
Qualifiers
Qualifiers are additions and extensions to the metadata elements
that give metadata creators the option to refine the semantics of
the element set, and add precision to the values of the metadata
elements. For example, it may be useful to indicate that the value
has been selected from a particular controlled vocabulary, such as
a list of keywords, or is encoded using a particular convention -
the format for dates is an important case - or in a particular
natural language.
From: AGLS Metadata Element Set, Part 2: Usage
Guide, Version 1.3 , National Archives of Australia, 2002, URL:
http://www.naa.gov.au/recordkeeping/gov_online/agls/metadata_element_set.html
Qualifiers are used to restrict the semantics of the
relationship between the resource and the element value. AGLS
encourages more use of qualifiers than DC, but does not require
it.
AGLS Qualifiers
-
Element refinements are represented in HTML
<meta> syntax with qualifiers appended to the element
names. For example: "DC.Type.documentType". Note that the "T" in "Type" in the
example is in upper case, whereas the "d" of
"document" is not. This is a somewhat odd practice in
DC.
-
Encoding schemes indicate how the value is to
be interpreted if it has been chosen from a controlled vocabulary,
or externally defined standard. For example:
<META NAME="DC.Date.modified"
SCHEME="ISO8601" CONTENT="1998-08-27">
AGLS uses two types of qualifiers.
Metadata Tools
This is a demonstration of DSTC's Reg metadata editor. Reg
allows you to:
- enter metadata
- export metadata in a number of syntaxes
- save metadata records to a test repository
- reload metadata records from a repository for editing
Reg uses metadata schemas to customize itself for different
metadata element sets. ...
"Reg - Metadata Editor", DSTC Pty Ltd,
1998, 2000, URL: http://metadata.net/cgi-bin/reg/demo.cgi.
Metadata is rarely entered be the document author typing in
text. When encoded in the header of a HTML document the metadata is
not displayed by a web browser. Specialized software, such as a
content management systems, or features in word processors are used
to enter and display the metadata. The user of the system is likely
to be unaware they are using a metadata standard or how it is
encoded. Examples of how these systems will be shown later.
The Distributed Systems Technology Centre (DSTC Pty Ltd), has
produced a metadata tool to create AGLS and Dublin Core metadata.
Rege, can be used
to generate AGLS metadata syntax. This would be too cumbersome for
creating real metadata, but is a useful way to learn about the
process.