Metadata and Electronic Data Management

Lecture notes on Metadata and Electronic Data Management. First presented for Course: COMP3410 "Information Technology in Electronic Commerce" at the Australian National University, 25 July 2000, revised: 2001, 2003, 2004, 2005 to 2007, 2006 to 2008. This is version 9 October 2009.

Table of Contents

1 Introduction

Metadata, document and e-commerce standards add an additional layer over the web to provide a global services for civil society, government and business. The systems of non-government, government and commercial organisations can be designed to securely inter-operate to provide services to the public. These services can use formats which are globally standardized, usable for decades and have legal standing. Services can be made available from hand held wireless devices, as well as desktop computers.

HTML markup is designed for human readable web pages. XML can be used to provide documents with sufficient structure to be processed by an automated system. The same XML dcouemtns can be transformed using XSLT and formatted using CSS to be read as a web page.

Building systems which can be read by both people and machines is challenging. Such formats need to be efficient for storage and transmission, while being able to be converted into a format for human reading (rendered). The format needs to be agreed by all those who use it (ideally worldwide) and fixed for long enough to be useful (ideally for decades), but adaptable for use.

Metadata provides a tool to make electronic documents more efficient and flexible. In many cases a short summary of a document (the metadata) can be used in place of the full document, saving on transmission and processing as well as saving time for the human reader. The metadata can be used to manipulate the information in documents to create new documents. The same encoding used for describing documents can be used by data processing systems to carry out electronic commerce.

Standards for a Civil Society, Government and Commerce

In 2009 collaborative development of policy recommendations for the Australian government commenced online, under the title "Public Sphere". The process uses RSS feeds, Blogs, instant messages, Wikis, streamed and podcast video, as well as traditional conference. Underlying and unifying these different modes of online communcation are a common set of metadata and document standards.

Documents or databases?

Documents and databases represent two views of data in a computer system. Electronic documents are fixed in content and format, are individual distinct entities, can be displayed using software from different suppliers, are expected to last for years and outlive the software which created them. Database have content which changes, can be displayed in different ways, may only be of value for minutes or months and may depend on one version of database software. This is not to say that all documents are fixed and all databases fluid, but is a useful generalization.

document, n. ...

Something written, inscribed, etc., which furnishes evidence or information upon any subject, as a manuscript, title-deed, tomb-stone, coin, picture, etc.

Database ...

A structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf., any large collection of information.

From: OED Online, SECOND EDITION, 1989

XML The Answer?

XML now provides formatting options to allow HTML-like documents and database processing. OpenOffice.org's XML based file format provides a way to package up all the elements of an XML document (including images) into one compressed file. This provides the prospect of formats which can be edited in a word processor, displayed as a web page, transformed for a hand held device or printed with specific styles.

In 2002 OASIS announced a committee to work on an office XML standard format based on OpenOffice.org's XML format. The first draft was released in March 2004:

  1. it must be suitable for office documents containing text, spreadsheets, charts, and graphical documents,
  2. it must be compatible with the W3C Extensible Markup Language (XML) v1.0 and W3C Namespaces in XML v1.0 specifications,
  3. it must retain high-level information suitable for editing the document,
  4. it must be friendly to transformations using XSLT or similar XML-based languages or tools
  5. it should keep the document's content and layout information separate such that they can be processed independently of each other, and
  6. it should "borrow" from similar, existing standards ...
From: "OASIS TC Call For Participation: Open Office XML TC", Karl Best, 4 Nov 2002, URL: http://lists.oasis-open.org/archives/tc-announce/200211/msg00001.html

Open Access to Information

The Internet has allowed lower cost access to information, placing pressure on organisations to provide access to the information. Systems such as Creative Commons provide a way to licence to provide information freely, while retaining ownership.

Access to Public Sector Information is discussed in the Final Report of the Inquiry into Improving
Access to Victorian Public Sector Information and Data
(Parliament of Victoria, June 2009) and more recenly at Public Sphere #2 - Government 2.0: Policy & Practice (Senator Lundy, June 2009).

Social Networking for Business and Government

... <h1 id="name">
<span class="fn n">
<span class="given-name">Tom</span>
<span class="family-name">Worthington</span>
</span></h1> </div>
<div class="content">
<div class="info"> <div
class="image">
<img src="http://media.linkedin....jpg" class="photo" alt="Tom Worthington"></div>
<p class="headline title">Adjunct Senior Lecturer ...</p>
<div class="adr"> <p class="locality"> Canberra Area, Australia ...

From: Source code of a Profile, LinkedIn Corporation, 2008

Social networking software allows for a computer system to help people interact in groups. While normally thought of for social purposes, it is now being adopted for business. Linked-In provides a way for professionals to interact with each other, find colleagues. Naymz provides a reputation management service. It is likely such systems will be used within and between organisations, including government, to manage work, grant access to information, and work out remuneration for staff. This requires the metadata about people and their actions to be carefully encoded and stored.

HTML has only limited provision for metadata. Systems such as Liniked in get around the problem using Microformats, using HTML class names for the metadata element names. This allows the metadata to be included in the body of the HTML document, instead of the header and requires less duplication of information.

Are e-documents legal?

  1. ... The ordinary dictionary meaning of "document" is a printed or written paper containing information. ... No violence is done to the object or language of s 418(3) by holding that "document" includes information that is stored in a computer or a fax machine and which can be printed out by pressing one or more keys or buttons. No reason appears for thinking that Parliament intended to distinguish between information stored on paper and information stored in the electronic impulses of a computer that can be printed on paper by pressing a key or keys on the computer's keyboard. ...

From: "Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal", 8 August 2002, High Court of Australia, http://www.austlii.edu.au/au/cases/cth/high_ct/2002/30.html

In 2003 the High Court of Australia concluded that the MIGRATION ACT's definition of documents included electronic documents stored in a database.

Identifying e-documents

  1. "Documents" may include electronic documents: ... Today, in ordinary speech, one can readily refer to a "document" in a database, although such a document may never have been reduced to tangible form. Typically, a database will yield information that appears in paginated format....

  2. ... Electronic "documents" could perhaps be "given" by separate identification and annexure to an electronic transmission. Yet even that was not done in the present case. Merely making such "documents" (or some of them) "available" in a mass of undifferentiated material in a database of constantly changing content does not comply with the language and particular design of the Act ...

From: "Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal", 8 August 2002, High Court of Australia, http://www.austlii.edu.au/au/cases/cth/high_ct/2002/30.html

The High Court also concluded electronic documents need to be separately identified.

1.1 Metadata

Metadata can be described as data about data:

metadata n., a set of data that describes and gives information about other data...

[1968 Proc. IFIP 4th Congr.: Suppl. 10 I. 113/2 There are categories of information about each data set as a unit in a data set of data sets, which must be handled as a special meta data set.] 1987 Philos. Trans. Royal Soc. A. 322 373 The challenge is to accumulate data..from diverse sources, convert it to machine-readable form with a harmonized array of *metadata descriptors and present the resulting database(s) to the user. 1998 New Scientist 30 May 35/2 With XML, attaching metadata to a document is easy, at least in theory.

Oxford English Dictionary, (Online) Draft entry Dec. 2001, URL: http://dictionary.oed.com/cgi/entry/00307096/00307096se19

In e-commerce, metadata provides standard data items to allow parties to communicate about their organisations, products, terms and conditions. Electronic payment details of a transaction, and the money itself, consists of data defined by metadata.

Metadata is also used to describe published documents. The same XML technology can be used to express metadata for e-commerce and for publishing.

Australian Government Metadata

<meta name="DC.Publisher" scheme="X500" content="ou=Australian Government Information Management Office (AGIMO) ; o= Commonwealth of Australia ; c=AU">
<meta name="DC.Description" content="The australia.gov.au website is your connection with government in Australia...">
<meta name="DC.Subject" scheme="TAGS" content="Government information; Federal government; Government services; Government publications; Web sites">
<meta name="DC.Type.documentType" scheme="agls-document" content="homepage">

From: "australia.gov.au : your connection with government", Australian Government Information Management Office, 2004-06-30, URL: http://www.australia.gov.au/

This metadata from the Australian Government home page. It was intended that data in this format would be inserted into the HEAD of all government web pages, to aid data retrieval.

The challenge is to create formats which are sufficiently expressive to be able to communicate what is needed, but simple enough to be implemented efficiently.

Creating and using metadata standards is both a technical and political process. Most standards need to be profiled, to create a workable subset, before they can be used for practical purposes. Some standards need to be enhanced and others should not be used at all.

Tax Office e-commerce transaction

<FORM_PERIOD_LABEL_TEXT>July to September 2001</FORM_PERIOD_LABEL_TEXT>
<EFT_CODE> 51111 121 059 9059</EFT_CODE>
<BILLER_CODE>75556</BILLER_CODE>
<PAYG_WITHHOLDING>0</PAYG_WITHHOLDING>
<PAYG_INSTALMENT>12541</PAYG_INSTALMENT>
<DEFERRED_COMPANY_FUND_INSTALMENT>7879801 </DEFERRED_COMPANY_FUND_INSTALMENT>
<TOTAL_DEBITS>7892342</TOTAL_DEBITS>
<TOTAL_CREDITS>0</TOTAL_CREDITS>
<NET_AMOUNT_FOR_THIS_STATEMENT>7892342 </NET_AMOUNT_FOR_THIS_STATEMENT>
<GST_LABEL_TEXT>for the QUARTER from 1 Jul 2001 to 30 Sep 2001</GST_LABEL_TEXT>
<GST_ACCOUNTING_METHOD_LABEL_TEXT>Cash ...

From: Formatting the eBAS with XSL, Tom Worthington, 29 November 2002, URL: http://www.tomw.net.au/2002/atoxml.html

Here is an example of an e-commerce transaction. This is an Australian Taxation Office electronic tax form for the Goods and Services Tax (GST). This is a different use of metadata, for defining the data in a financial transaction.

Scalable Vector Graphics Metadata Definition

<!ENTITY % metadataExt "" >
lt;!ELEMENT metadata (#PCDATA %metadataExt;)* >
<!ATTLIST metadata %stdAttrs; >

From 21.2 The 'metadata' element, Scalable Vector Graphics (SVG) 1.0 Specification W3C Proposed Recommendation 19 July, 2001, URL: http://www.w3.org/TR/SVG/

The World Wide Web Consortium (W3C) standard for Scalable Vector Graphics (SVG), provides a way to define images in web pages. As well as the expected features of shapes, filling, symbols, colours and patterns there is the 'metadata' element.

Scalable Vector Graphics Metadata Explanation

Individual industries or individual content creators are free to define their own metadata schema but are encouraged to follow existing metadata standards and use standard metadata schema wherever possible to promote interchange and interoperability. If a particular standard metadata schema does not meet your needs, then it is usually better to define an additional metadata schema in an existing framework such as RDF and to use custom metadata schema in combination with standard metadata schema, rather than totally ignore the standard schema.

From 21.1 Introduction, Scalable Vector Graphics (SVG) 1.0 Specification W3C Proposed Recommendation 19 July, 2001, URL: http://www.w3.org/TR/SVG/

Simple definition politically complex

The apparently technically simple definition of metadata for SVG is made politically complex by this paragraph in the standard.

The ease of defining metadata using new web based tools has made standardization more difficult. It is technically simple to define a new standard if an exiting definition is not quite right. However, having many standards is as much a problem as having no standards at all.

AUSGILS to AGLS

At the time of the IMSC it was thought that an Australian Government Locator Service would be a variant of the U.S. Government Information Locator Service (GILS). Consequently, for much of its gestation period what is now known as AGLS was referred to as AUSGILS. However, late last year when a workshop of experts convened to develop the AUSGILS standard it was decided to abandon the GILS framework and instead base the online locator service on the Dublin Core metadata standard.

From: Enabling Seamless Online Access to Government, Adrian Cunningham, National Archives of Australia, 26 August 1998, URL (archived copy): http://www.naa.gov.au/recordkeeping/gov_online/agls/Metadata_paper22sept98.html

According to this official version of events, the Australian Government Locator Service (AGLS) metadata standard (discussed later) was originally called "AUSGILS" and intended to be based on the U.S. Government Information Locator Service (GILS), but this was abandoned in favour of the Dublin Core metadata standard in 1997. However, the proposed standard was first called "AGILS" in an earlier architecture proposal:

META tag of HTML is used in the header section of HTML documents. Example: <META NAME="Date" CONTENT="1966-01-12">. The field identifiers from the selected meta-data set is used in the NAME field and the field value in CONTENT. The set of meta-data definitions being used (the meta-meta-data) should be included in a tag. Example: <meta name="metadata" content="AGILS">.

From: Architecture For Access To Government Information, Report of the IMSC -Technical Group, Commonwealth of Australia, 25 July 1996, URL (archived copy): http://www.defence.gov.au/imsc/imsctg/imsctg1c.htm#RTFToC87

This was done for political reasons, to suggest compatibility with the US Government standard. The name was later shortened to AGLS.

Standards, Definitions and Dollars

Adobe Acrobat 5.0 software introduced tagged Adobe PDF, an enhancement to the PDF specification that allows PDF files to contain logical document structure. Logical structure refers to the organization of a document, such as the title page, chapters, sections, and subsections. Tagged Adobe PDF documents can be reflowed to fit small-screen devices and offer better support for repurposing content. They also are more accessible to the visually impaired.

From Adobe PDF, Adobe Systems Incorporated, 2001, URL: http://www.adobe.com/products/acrobat/adobepdf.html

Standards politics are important to metadata and electronic document development in the real world. Standards are selected based on the importance of the organisations and individuals supporting them, not technical merit. Standards are then adapted, extended, made into subsets or combined.

E-commerce and electronic publishing depend on decisions made on what standards to use. Previously separate standards for electronic commerce, documents and television are converging to use the same format (XML).

One example of where standards for document formats and commercial interests collide is the Portable Document Format (PDF). Developed by Adobe as an extension to the Postscript format for desk-top publishing, PDF has provide a popular electronic document format. However, PDF has a number of limitations as an on-screen format and for disabled users. Adobe have attempted to address these limitations with "Tagged Adobe PDF", which added some XML interoperability to the PDF format.

However additional work is needed by document creators to use these features. There is also an inherent contradiction between one of PDF's original selling point of providing an accurate representation of a printed document and the aims of the enhancement of allowing the representation to be transformed. Adobe are not the only ones struggling with this problem. One possible solution is OpenOffice.org's XML Packages format. This packages up XML documents and supplementary binary format data, such as images, in ZIP file format.

Dublin Core

Title Typically, Title will be a name by which the resource is formally known.
Creator Examples of Creator include a person, an organization, or a service. ...
Subject ... keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.
Description ... an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. ...

Adapted from "Dublin Core Metadata Element Set", Version 1.1: Reference Description, DCMI, 2003-06-02, URL: http://dublincore.org/documents/dces/

Dublin Core (DC)is a metadata standards project originating from a workshop held in Dublin, Ohio, USA in 1995. "Dublin Core" metadata element set is a small set of metadata definitions intended for cross-domain information resources. However, DC has its origins in the work of librarians and so tends to work better for describing printed text, than other items, such as video.

The intention with DC is to provide a brief standard set of essential metadata items for resources: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights.

Other examples of controlled vocabulary are using the Internet Media Types ( MIME) for defining computer media formats in the format element and language tags, such as "en-AU" for Australian English.

Australian Digital Theses Program

Metadata Standards

Dublin Core metadata will be automatically generated out of the ADT Deposit form. This metadata will form the basis of the database of distributed digitised theses across the 7 participating institutions. ...

<meta name="DC.language" scheme="RFC3066" content="en">

*** English will be the default language. In order to add another language the Deposit form will need to be amended to add another field. As theses will be predominantly in English, this will remain the default and the issue of other languages and the appropriate scheme to use will be investigated at a future date if necessary.

From: "Metadata standard:, Australian Digital Theses Program, UNSW Library 1997, Updated 12/09/03 URL: http://www.library.unsw.edu.au/thesis/adt-ADT/info/metadata.html

Other Dublin Core Projects are listed at URL: http://dublincore.org/projects/subject.shtml

The Australian Digital Theses Program provides a database of digitised theses produced at Australian Universities. Authors at ANU use a deposit form, the data from which is expressed as DC metadata. and provided via a search facility.

AGLS

Element Example
Function <META NAME="AGLS.Function" CONTENT="School Education">
Availability <META NAME="AGLS.Availability" CONTENT="Medical assistance is available by contacting the after hours hotline on ...">
Audience <agls:audience>anglers</agls:audience>
Mandate <META NAME="AGLS.Mandate.case" SCHEME="URI" CONTENT="http://...">

Complied from AGLS Metadata Element Set, Part 2: Usage Guide, Version 1.3 , National Archives of Australia, 2002, URL: http://www.naa.gov.au/recordkeeping/gov_online/agls/metadata_element_set.html

The Australian Government Locator Service (AGLS) metadata standard is a set of 19 descriptive elements to improve the visibility and accessibility of services and information over the Internet. The AGLS standard is based the 15 Dublin Core elements, plus four extra elements:

AGLS Mandatory Elements

  • Creator
  • Publisher (note: this element is not mandatory for descriptions of services)
  • Title
  • Date
  • Subject OR Function
  • Identifier OR Availability

From: AGLS Metadata Element Set, Part 2: Usage Guide, Version 1.3 , National Archives of Australia, 2002, URL: http://www.naa.gov.au/recordkeeping/gov_online/agls/metadata_element_set.html

No elements are mandatory for DC, but AGLS requires five (or six) of them.

Qualifiers

Qualifiers are additions and extensions to the metadata elements that give metadata creators the option to refine the semantics of the element set, and add precision to the values of the metadata elements. For example, it may be useful to indicate that the value has been selected from a particular controlled vocabulary, such as a list of keywords, or is encoded using a particular convention - the format for dates is an important case - or in a particular natural language.

From: AGLS Metadata Element Set, Part 2: Usage Guide, Version 1.3 , National Archives of Australia, 2002, URL: http://www.naa.gov.au/recordkeeping/gov_online/agls/metadata_element_set.html

Qualifiers are used to restrict the semantics of the relationship between the resource and the element value. AGLS encourages more use of qualifiers than DC, but does not require it.

AGLS Qualifiers

  1. Element refinements are represented in HTML <meta> syntax with qualifiers appended to the element names. For example: "DC.Type.documentType". Note that the "T" in "Type" in the example is in upper case, whereas the "d" of "document" is not. This is a somewhat odd practice in DC.

  2. Encoding schemes indicate how the value is to be interpreted if it has been chosen from a controlled vocabulary, or externally defined standard. For example:
    <META NAME="DC.Date.modified" SCHEME="ISO8601" CONTENT="1998-08-27">

AGLS uses two types of qualifiers.

Metadata Tools

This is a demonstration of DSTC's Reg metadata editor. Reg allows you to:

  • enter metadata
  • export metadata in a number of syntaxes
  • save metadata records to a test repository
  • reload metadata records from a repository for editing

Reg uses metadata schemas to customize itself for different metadata element sets. ...

"Reg - Metadata Editor", DSTC Pty Ltd, 1998, 2000, URL: http://metadata.net/cgi-bin/reg/demo.cgi.

Metadata is rarely entered be the document author typing in text. When encoded in the header of a HTML document the metadata is not displayed by a web browser. Specialized software, such as a content management systems, or features in word processors are used to enter and display the metadata. The user of the system is likely to be unaware they are using a metadata standard or how it is encoded. Examples of how these systems will be shown later.

The Distributed Systems Technology Centre (DSTC Pty Ltd), has produced a metadata tool to create AGLS and Dublin Core metadata. Rege, can be used to generate AGLS metadata syntax. This would be too cumbersome for creating real metadata, but is a useful way to learn about the process.

1.2 Standards for E-commerce

E-document and E-commerce Standards

Metadata for managing documents tends to have a few dozen elements for each document. Most elements are text fields, rather than numeric values or qualified values. Metadata for electronic commerce uses more elements, more qualified and numeric values.

Early E-commerce Standards: UN/EDIFACT andANS X12

The United Nations agreed standards for world e-commerce called UN/EDIFACT. This is one of the two early internationally cited family of standards for Electronic Data Interchange (EDI). The other standard is the USA's ANS X12 Syntax. In most cases the same metadata elements can be used with EDIFACT and ANS X12.

UN/EDIFACT

26. United Nations rules for Electronic Data Interchange For Administration, Commerce and Transport. They comprise a set of internationally agreed standards, directories and guidelines for the electronic interchange of structured data, and in particular that related to trade in goods and services between independent, computerized information systems.

27. Recommended within the framework of the United Nations, the rules are approved and published by UN/ECE in the (this) United Nations Trade Data Interchange Directory (UNTDID) and are maintained under agreed procedures.

From: "UN/EDIFACT Draft Directory", United Nations Economic Commission for Europe, (undated), http://www.unece.org/trade/untdid/texts/d100_d.htm

ANS X12

This code list is used by United States Government contracting and grant activities to indicate the data expressions that are contained herein. It is designed principally for use with Electronic Date Interchange (EDI) in either the American National Standard X12 syntax or the United Nations/Electronic Data Interchange for Administration, Commerce, and Transport (UN/EDIFACT) syntax. It may be used in other data systems as appropriate, to include as domain values for standard data schemes or as application data. ...

From: Federal Procurement Code List One (FP1), National Institute of Standards and Technology, 1998 (Revised: April 25, 2001), URL: http://snad.ncsl.nist.gov/dartg/edi/fededi-coding.html
No longer on-line, copy at URL: http://fedebiz.disa.mil/private/edit/document/resource/fp1.rtf

ANS X12 Example

BTA Small Disadvantaged Business Performing in the US
BTB Other Small Business Performing in the US
BTC Large Business Performing in the US
BTD Javits-Wagner-O'Day Act (JWOD) Participating Nonprofit Agencies
BTF Hospital
BTL Foreign Concern/Entity ...

From: Federal Procurement Code List One (FP1), National Institute of Standards and Technology, 1998 (Revised: April 25, 2001), URL: http://snad.ncsl.nist.gov/dartg/edi/fededi-coding.html

USA Standards for Business Forms

  • 810 Invoice - Updated in January 1996 and published as NIST Special Pub 881-10 - ( ASCII, RTF, or PDF ) - Version Control Number: 003040FED01A regenerated as 003040F810_0.

  • 820A Payment Order/Remittance Advice (Automated Standard Application for Payments): Version Control Number 003040F820A1 - updated April 20, 1999 - ( PDF, ASCII, RTF

From: Federal Procurement Code List One (FP1), National Institute of Standards and Technology, 1999 URL: http://snad.ncsl.nist.gov/dartg/edi/3040-ic.html

Standards were developed as electronic versions of commonly used business forms, such as invoices and Remittance Advice.

An XML/EDI: Payment Order

<?xml version="1.0"?>
<!DOCTYPE PAY-NAT SYSTEM "pay-nat.dtd">
<PAY-NAT RefNo="0005">
<BGM>AA124</BGM>
<DTM1>19980812</DTM1>
<DTM1 Type="203">19970815</DTM1>
<MOA>100</MOA>
<FII Party="OR">
<UKB>010344</UKB>
<ACC>23412345</ACC>
<ACN>MR N SMITH</ACN>...

From: "Interim Report", CEN/ISSS XML/EDI Workshop, 2000, Archived at URL: http://www.cenorm.be/isss/workshop/ec/xmledi/documents_99/xml001_99.htm#NatPay

The Interim Report for the CEN/ISSS XML/EDI Pilot Project gave an example of an XML version of an EDIFACT National Payment Order.

Payment Order Elements

PAY-NAT Container for the message segments ...
BGM Identifies the beginning of the message...
MOA Monetary amount of payment. Defaults to GBP - Pounds sterling ...
FII Container for financial institution information...

From: "Interim Report", CEN/ISSS XML/EDI Workshop, 2000, Archived at URL: http://www.cenorm.be/isss/workshop/ec/xmledi/documents_99/xml001_99.htm#NatPay

Some elements used for the CEN/ISSS Payment Order.

XML DTD

...<!ATTLIST PAY-NAT
UN-EDIFACT:Prefix CDATA #FIXED "UNH"
RefNo CDATA #IMPLIED
MessageTypeID CDATA #FIXED "PAYEXT"
Version CDATA #FIXED "D"
ReleaseNumber CDATA #FIXED "96A"
Agency CDATA #FIXED "UN"
AssociationCode CDATA #FIXED "SIMP01" >
...
<!ELEMENT MOA (#PCDATA) >
<!ATTLIST MOA
UN-EDIFACT:Prefix CDATA #FIXED "MOA"
Type CDATA #FIXED "9"
Currency CDATA "GBP" >

From: "Interim Report", CEN/ISSS XML/EDI Workshop, 2000, Archived at URL: http://www.cenorm.be/isss/workshop/ec/xmledi/documents_99/xml001_99.htm#NatPay

Part of the XML document type definition (DTD) of the CEN/ISSS Payment Order

W3C XML E-commerce Standards

WSDL Web Services Description Language
SOAP A lightweight protocol for exchanging structured information in a decentralized, distributed environment.
XML Schema For describing the structure and constraining the contents of XML 1.0 documents

W3C provide a very useful table to compare XML protocols.

Document Related Standards

XSL Extensible Stylesheet Language
XSLT XSL Transformations: For transforming XML documents into other XML documents.
XHTML Basic XHTML subset for Small Information Appliances
XML Extensible Markup Language

1.3 E-commerce Examples

Web Services Demonstration

Web services can be thought of as the transaction processing equivalent of the world wide web. The web provided a relatively easy and standardized way to create distributed hypertext. Web Services is a set of standards which aims to provide easy and standardized distributed transaction processing.

Formatting the eBAS with XSL

The Australian Taxation Office (ATO) provides specifications of an electronic versions of tax forms, including the Business Activity Statement (BAS) in relation to the Goods and Services Tax (GST). This is a demonstration of how XML transactions can be transformed into printable documents.

Research Data Australia

Research Data Australia is a directory of Australian research data collections which makes use of metadata to provide a marketplace for researchers.

2 Electronic Document Management

Electronic Document Management allows business to be conducted with legally recognised e-commerce transactions.

Electronic document management systems are more than just systems for tracking the location of electronic documents. Such systems should manage documents for their complete life cycle based on the value of the document to the agency's business. Just as there are standard procedures for the registration of paper documents and records, suitable procedures should be implemented to manage each electronic document throughout its life from creation to disposal...

From: Improving Electronic Document Management: Guidelines for Australian Government Agencies, Office of Government Information Technology, 1995, Archive copy at URL: http://www.defence.gov.au/imsc/edmsc/iedmtc.htm

The State Records of South Australia has a useful description of the process of: Records Creation to Archive.

In 1995 the Australian Government released Guidelines for Australian Government Agencies, on Electronic Document Management. The gudielines identified seven requirements for e-document management:

  1. Provision of context
  2. Authenticity
  3. Disposal of documents and records
  4. Robustness against organisational change
  5. Robustness against technological change
  6. Management of working documents
  7. Links to paper systems

The guidelines identified three design responses to the requriements:

  1. Metadata in a text readable format (mostly supersets of Dublin Core) to describe the records. The metadata can be held with the record or separately.

  2. Standard document formats to store and transport the documents. Implementations either use the original format the document was created in, a standardised format (such as XML or PDF) or multiple formats.

  3. Security to identify and protect the integrity: using digital signatures.

Records Management

Electronic document management is a specialised form of records management. The International Standard on Records Management (ISO 15489), which was based on Australian Standard AS 4390-1996, covers both paper and electronic records:

  • applies to the management of records, in all formats or media, created or received by any public or private organization in the conduct of its activities, or any individual with a duty to create and maintain records,

  • provides guidance on determining the responsibilities of organizations for records and records policies, procedures, systems and processes,

  • provides guidance on records management in support of a quality process framework ...

  • provides guidance on the design and implementation of a records system, but

  • does not include the management of archival records within archival institutions. ...

From: "Introduction to Australian Standard AS ISO 15489, State Records of South Australia , 2005, URL: http://www.archives.sa.gov.au/files/management_ARM_ISO15489.pdf

RKMS

The Recordkeeping Metadata Standard for Commonwealth Agencies (RKMS) defines 20 elements (eight mandatory) and 65 sub-elements for the record keeping systems used by Commonwealth government agencies. It is based on the Australian Government Locator Service (AGLS) metadata standard, but adds metadata items for maintaining government records:

... help agencies to identify, authenticate, describe and manage their electronic records in a systematic and consistent way to meet business, accountability and archival requirements. The standard is designed to be used as a reference tool by agency corporate managers, IT personnel and software vendors involved in the design, selection and implementation of electronic recordkeeping and related information management systems. ...

From: "Recordkeeping Metadata Standard for Commonwealth Agencies", Version 1.0, National Archives of Australia, 1999, URL: http://www.naa.gov.au/recordkeeping/control/rkms/summary.htm

RKMS Elements from AGLS

RKMS is not a strict superset of AGLS. Some elements are from AGLS:

Element
SUBJECT
DESCRIPTION
LANGUAGE
COVERAGE
FUNCTION
TYPE

Adapted from: "Recordkeeping Metadata Standard for Commonwealth Agencies", Version 1.0, National Archives of Australia, 1999, URL: http://www.naa.gov.au/recordkeeping/control/rkms/summary.htm

RKMS Elements Extending AGLS

 

Element
TITLE
RELATION (also AGLS SOURCE)
DATE
FORMAT
MANDATE

RKMS Elements Differently Named

Element AGLS Equivalent
AGENT CREATOR, PUBLISHER, OTHER CONTRIBUTOR
RIGHTS MANAGEMENT RIGHTS
AGGREGATION LEVEL TYPE + Aggregation level
RECORD IDENTIFIER IDENTIFIER
MANAGEMENT HISTORY DATE (partial only)

RKMS Elements not in AGLS

Element
USE HISTORY
PRESERVATION HISTORY
LOCATION
DISPOSAL

VERS

The Public Record Office of Victoria issued a more prescriptive standard for the management of electronic records than other Australian governments. VERS uses a superset of the National Archives of Australia (NAA) Recordkeeping metadata.

  1. System Requirements for Preserving Electronic Records

  2. Metadata Scheme

  3. Standard Electronic Record Format

  4. Long Term Preservation Formats

  5. Export of Electronic Records to PROV

From: "The Victorian Electronic Records Strategy (VERS)", 31 July, , URL: http://www.prov.vic.gov.au/vers/standard/default.htm.

VERS allows multiple encoding of one document and fixes the record at the time of creation using digital signatures. This requires new metadata to be kept separate from the document, or wrapped around the original record to form a new compound record. It also assumes that a particular digital signature will be readable over a long time and that the digital signature standards used will be supported in the long term. VERS uses text, PDF and TIFF for its standard formats.

2.1 The Digital Library

A digital library allows access to electronic documents, while respecting the intellectual property rights of the author.

An overview of e-publishing issues is provided in the Australian Government Information Management Office Web Publishing Guide:

Publishing Mistakes Are Dangerous

Publishing, even academic publishing, is a significant economic activity and can also have significant effects on the lives of the public. This example from the Journal of Neurology contains a potentially dangerous mistake:

Sirs: Recently we found out that our abstract "Severe Tardive Dystonia: Treatment with Continuous Intrathecal Baclofen Administration" (J Neurol 243 Suppl 2: S75) contains a severe and potentially dangerous mistake.

The dose of intrathecal baclofen in the patient presented was 100 mg/day rather than 100 g/day. The abstract submitted as well as the computer disk (Microsoft Word for Windows Version 2.0b) additionally handed in for electronic publication contained the correct figure spelled with the Greek character "m".

Investigations into this subject revealed that occasionally special characters may be misinterpreted by different versions of the same wordprocessing programme ...

From: "Risks of electronic publishing", D. Dressler, page 61, Letters to the Editors, Journal of Neurology, Steinkopff Verlag , Volume 244, Number 1/November 28, 1996, URL: http://www.springerlink.com/openurl.asp?genre=article&eissn=1432-1459&volume=244&issue=1&spage=61

Library Metadata

Libraries, such as the ANU Library, now provide web based search facilities. Libraries have been in the information business for longer than the web. As an example the Library of Alexandria was destroyed by fire about 2000 years ago and reopened in 2003, with a web site:

The new Bibliotheca Alexandrina will be officially opened by Egyptian President Hosni Mubarak at a ceremony attended by other heads of state and top officials.

Based on the old Library of Alexandra, the most famous library of Ancient Times, this modern public study centre will be open to students, researchers and the general public. ...

From: " Inauguration of the Alexandria Library", UNESCO, 2002

On-line Public Access Catalog (OPAC)

Libraries previously used paper based card catalogs. The metadata elements for the card catalogs were carried over to the electronic systems:

Author Aristotle, 384-322 B.C.
Title Athenaion Politeia / Aristoteles; Edidit Mortimer Chambers.
Publisher Stuttgart : B.G. Teubner, 1994.
Call Number 089.81
Description xx, 84p., [4]p. of Plates : Plates ; 20cm.
Series Stmt Bibliotheca Scriptorum Graecorum et Romanorum Teubneriana ; No. 1113

From: "On-line Public Access Catalog (OPAC)", Bibliotheca Alexandrina, URL: http://www.bibalex.org/English/

Catalogues adapted to paper and e-documents

As with corporate records management systems, library catalogues have been adapted to record both paper and electronic documents. The ANU library catalogue includes links to on-line versions of documents, where available:

Author Bourk, Michael J

Title Universal service? : telecommunications policy in Australia and people with disabilities / Michael J Bourk ; edited by Tom Worthington

Published Belconnen, A.C.T. : TomW Communications, 2000

Click on the following to:

View electronic text

LOC'N

CALL #

STATUS

CHIFLEY

HV1559.A8B682 2000

AVAILABLE ...

From: " ANU Full Database", ANU, 2003

MAchine-Readable Cataloging (MARC) Format

The same catalogue information can also be displayed in the MARC format, developed in the 1970s for "MAchine-Readable Cataloging"' by libraries. This format uses numeric codes to identify each metadata item:

050 HV1559.A8B682 2000

100 1 Bourk, Michael J

245 10 Universal service? :|btelecommunications policy in

Australia and people with disabilities /|cMichael J Bourk

; edited by Tom Worthington

246 3 Telecommunications policy in Australia and people with

disabilities

260 Belconnen, A.C.T. :|bTomW Communications,|c2000

300 xiv, 273 p. ;|c21 cm

From: From: " ANU Full Database", ANU

 

MARC adapted to XML

As with other metadata formats, MARC is being adapted to XML formats:

<?xml version="1.0" encoding="UTF-8" ?>

<collection xmlns="http://www.loc.gov/MARC21/slim">

<record>

...

<datafield tag="245" ind1="1" ind2="0">

<subfield code="a">Arithmetic /</subfield>

<subfield code="c">Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand.</subfield>

</datafield>

...

</record>

</collection>

From: URL: http://www.loc.gov/standards/marcxml//Sandburg/sandburg.xml

MARC to Dublin Core

<?xml version="1.0" ?>

<dc xmlns="http://purl.org/dc/elements/1.1/">

<title>Arithmetic /</title>

<creator>Sandburg, Carl, 1878-1967.</creator>

<creator>Rand, Ted, ill.</creator>

<type />

<publisher>San Diego :Harcourt Brace Jovanovich,</publisher>

<date>c1993.</date>

<language>eng</language>

...

</dc>

From: URL: http://www.loc.gov/standards/marcxml//Sandburg/sandburgdc.xml
see: MARC 21 XML Schema, The Library of Congress, 2003, URL: http://www.loc.gov/standards/marcxml//

However, it is more useful if the metadata is converted to Dublin Core format for use in non-library systems.

2.2 Electronic Publishing

In 2001 an advocacy group of 11 people from US and UK academic institutions proposed an online public library with the text of all published scientific articles. Since then, more research papers have been made avialable online:

We believe that the permanent, archival record of scientific research and ideas should neither be owned nor controlled by publishers, but should belong to the public, and should be made freely available.

We support the establishment of international online public libraries of science that contain the complete text of all published scientific articles in searchable and interlinked formats. ...

From: "Open Letter", Public Library of Science, Patrick O. Brown and Michael Eisen, 2001, URL: http://www.plos.org/about/letter.html.

Open Archives Initiative

The Open Archives Initiative (OAI) has constructed a virtual library of material using distributed document archives and shared metadata, complying to industry standards.

Digital Library Federation Encourages Use of Open Archives Initiative The Digital Library Federation (DLF) is supporting the development of a small number of Internet gateways through which users will access distributed digital library holdings as if they were part of a single uniform collection. The gateways will be built using the OAI Metadata Harvesting Protocol. DLF gateways will contribute to a practical evaluation of the OAI's harvesting technique and its application within libraries to encourage digital collection managers to expose metadata and build services. ...

From: Open Archives Initiative, URL: http://www.openarchives.org/, 2001

ACS and IFIP Digital Libraires

In November 2006 the Australian Computer Society released the ACS Digital Library, followed by the IFIP Digital Library in August 2008. Both libraries provide papers under a creative commons licence open access licence, using the Open Journal Systems open source software which implements OAI metadata standards.

Capturing Australia's Scholarly Publishing

At a roundtable in 2004 a Thought experiment was outlined to transform the process of distribution of scholarly information in Australia. It proposed to allow research results to be available online to government funding bodies, universities where the research was conducted, industry and the public. This process can now be automated for open access publications, using digital libraries using metadata standards and XML.

Thought experiment

  • A researcher, and ARC grant recipient, at an Australian university completes an article
  • Following peer review the article is accepted by an international proprietary journal
  • A post print copy of the article is also lodged with the university's open access digital repository ...
From: "Governmental Policy Frameworks", Dr Evan Arthur, Department of Education, Science and Training, 2004, URL: http://www.humanities.org.au/NSCF/PowerPoints/NSCF%20(Arthur).ppt

Automated Distribution

It was proposed that an researcher lodging their article in a university repository would automatically update institutional, government and public lists of research publishing.

  • ... These actions lead to automatic updating of
    • the researcher's open access publication list
    • the university's open access record of staff research activity
    • the ARC's open access record of research activity related to its grants
    • a gateway site providing sophisticated, industry tailored access to research activities in Australian research institutions
    • the publicly accessible data warehouse which provides input into quality assessments of Australian research institutions ...
    From: "Governmental Policy Frameworks", Dr Evan Arthur, Department of Education, Science and Training, 2004, URL: http://www.humanities.org.au/NSCF/PowerPoints/NSCF%20(Arthur).ppt

Automated Capture by the ARROW Discovery Service

The availability of standard interfaces and metadata has simplified the approach proposed in the thought experiment. The ARROW Discovery Service automatically collects metadata from Australian university research repositories and the ACS Digital Library. Searches can be made of papers by author and institution.

Ingesting Documents

The term "ingest" is used to describe the process of incorporating an electronic document and its metadata into an electronic archive. Usually only the metadata is converted, with content of the paper remaining in the original format (usually PDF).

Journal and conference indexes are traditionally provided in formats such as Refer and BibTeX. These formats can be converted using XSLT to transform the metadata into the XML format used by OAI and OJS.

Syndication Formats

As well as conversion into formats for archives, publishing metadata can be transformed into Web content syndication format, such as RSS (Really Simple Syndication) and Atom (IETF RFC 4287).

The ANU E Press provides a custom feed in RSS or ATOM of Weblog entries, New Products and Product Reviews.  ACM "Queue" magazine has a "feed" button on the home page:

<?xml version="1.0" ?>
<rss version="2.0">
<channel>
<title>ACM Queue</title>
<link>http://www.acmqueue.com/</link>
<description>Tomorrow's Computing Today</description>
<language>en-us</language>
<item>
<title>Samba Does Windows-to-Linux Dance</title>
<link>http://acmqueue.com/?...pid=171</link>
<description>Mounting remote Linux ...</description>
</item>

From: "RSS feed", Queue magazine, ACM, 2004,
URL: http://acmqueue.com/rss.rdf

OAI Static Repository

OAI Static Repository Gateway Protocol, while more complex than RSS is conceptually similar. The details of a list of published documents can be provided in a static file which can be harvested by a remote system. This file can be simply placed on the publishers web site, alongside the RSS file.

<ListRecords metadataPrefix="oai_dc">
<oai:record>
<oai:header>
<oai:identifier>oai:arXiv:cs/0112017...
<oai:datestamp>2001-12-14</oai:datestamp>
</oai:header>
<oai:metadata>
<oai_dc:dc ...
<dc:title>Using Structural Metadata ...
<dc:creator>Dushay, Naomi</dc:creator>
<dc:subject>Digital Libraries</dc:subject>
<dc:description>With the increasing ...
</oai_dc:dc>
</oai:metadata>
</oai:record>

From: "Specification for an OAI Static Repository and an OAI Static
Repository Gateway Protocol", Version 2.0 of 2002-06-14, URL: http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is conceptually similar to the web services interface provided by Amazon.com to their list of publications.

Repository Explorer

Tools such as the Open Archives Initiative Repository Explorer allow demonstration access to a digital library's OAI interface. The formats the metadata is available in can be queried and then records of electronic documents requested in that format.

Repository Name ACS Digital Library
Base URL http://dl.acs.org.au/index.php/index/oai
Protocol Version 2.0
Admin Email dl@tomw.net.au
Earliest Datestamp 2006-12-05T00:40:05Z
Deleted Record Handling no
Granularity YYYY-MM-DDThh:mm:ssZ
Compression gzip
Compression deflate
Other Information
description: 
oai-identifier:
scheme: oai
repositoryIdentifier: acs.ojs.journals.sfu.ca
delimiter: :
sampleIdentifier: oai:acs.ojs.journals.sfu.ca:article/1

Archive Self-Description, for http://dl.acs.org.au/index.php/index/oai, Repository Explorer, 2008-08-15T02:32:07Z

 

2.3 Electronic Document Management Issues

IFIP Digital Library

The digital library for the International Federation for Information Processing (IFIP) has abstracts of conference papers and the full text of some papers. Metadata standards are used and materials are provided using XML based interfaces. However, conferences are more than just the papers presented. How can the discussions be facilitated and represented in digital format?

Public Sphere

Senator Lundy set up a series of public policy development, using the web, with wikis, blogs, instant messages, digital video and Google Docs, called Public Sphere. See:

Issued for metadata and e-documents:

  1. Complexity of tools and information: would metadta and XML help?
  2. How can the materials, including video, be archived for log term use?