Metadata and Electronic Document Management

Searching for a Common Understanding

Tom WorthingtonFACS HLM

Visiting Fellow, Department of Computer Science, Australian National University, Canberra

For: COMP3410: Information Technology in Electronic Commerce, at The Australian National University
This document is Version 2.3 21 August 2003: http://www.tomw.net.au/2003/dm/index.html

Contents

  1. Metadata

    1. Metadata for Electronic Publishing

    2. Metadata for Electronic Commerce

    3. Metadata Case Studies

    4. tutorial

  2. Data Management

    1. Electronic Document Management and Records Management

    2. The Digital Library

    3. Case Studies

    4. tutorial

Introduction

This material was prepared for "Information Technology in Electronic Commerce" (COMP3410), at the Australian National University, semester 2, 2003. It is intended to introduce two topics: metadata and Data Management (digital library, electronic document management). This is done with three lectures on Metadata, three lectures on Data Management, a tutorial on Metadata and a tutorial on Data Management, assignments and examination questions. There have been minor revisions to the material from 2002, with some of the more technical aspects of XML (such as XML Transformation) now covered in earlier lectures.

Use of the technology for practical e-commerce and e-publishing applications is emphasised using case studies and anecdotes drawing on the author's personal experience. This is intendedto complement earlier components on XML Basics, XML Validation, XML Formatting and Transformation (CSS), and XML Formatting and Transformation (XSL). This will then lead into later components on Data Mining, Security and Electronic Trading.

This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The material may also be of use to those interested in the issues, but not undertaking formal study. However, it is not intended as an on-line course. Those wishing to use the material as part of a formal or for-profit course are invited to contact the author.

Getting there from here

This section is adapted from “Documents and databases: Making sense of developments in eBusiness, eCommerce, ePublishing and eLaw, Tom Worthington, Information Industry Outlook Conference, Canberra, 2002, URL: http://www.tomw.net.au/2002/ebcwxml.html

In its report “Accelerating the Uptake of E-Commerce by SMEs: A Report and Action Plan” ( July 2002 ) the SME E-Commerce Taskforce published an action plan for accelerating the uptake of e-commerce. The suggested steps for Australian small and medium sized enterprises (SMEs) were:

  1. Get on-line and get email
  2. Get Internet banking
  3. Get a website, initially to advertise the business phone number and email address
  4. Get an interactive dynamic e-commerce system integrated with traditional business systems
  5. Get voice and data systems integrated.

The first three steps are good simple advice, but step 4 is an absurdly large leap. This is the equivalent of telling a new aerospace company: "first build a balsa wood glider, then a space shuttle". There need to be more steps for an easier transition by SMEs between a simple web site and an interactive dynamic e-commerce system. Also the last step of voice and data integration doesn't appear to relate to e-commerce and seems to have been included because the list came from a telecommunications vendor.

Suggested steps for small business e-commerce:

  1. Internet access

  1. Internet banking

  2. Check statements

  1. Website

  1. Start replacing paper documents with electronic ones

  1. Progressivly automate the processing of the electronic documents



Using the Internet for business is much harder than it looks. This is made worse by "experts" recommending complicated implementations. Small business can be shown how to save money by using the Internet to do simple things like replacing paperwork with electronic documents. They will then be ready to do something more complex, with integrated e-commerce. New XML technologies, can make that transition possible.

The Australian Taxation Office has an ambitious project to introduce electronic processing of taxation forms using XML technology. This could be made more tangible to the small business by making the forms used directly printable.

Documents or databases?

document, n. ...
Something written, inscribed, etc., which furnishes evidence or information upon any subject, as a manuscript, title-deed, tomb-stone, coin, picture, etc.
Database ...
A structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf., any large collection of information.
From: OED Online, SECOND EDITION, 1989

Documents and databases represent two extremes in the aims and methods of electronic commerce. At the one extreme we have electronic documents which are fixed in content and format, are individual distinct entities, can be displayed using software from different suppliers, are expected to last for years and outlive the software which created them. At the other extreme a database has content which changes, can be displayed in different ways, may only be of value for minutes or months and may depend on one version of database software. This is not to say that all documents are fixed and all databases fluid, but is a useful generalisation.

At the one extreme HTML provides a way to create simple electronic documents which can display on a variety of systems, including small wireless devices, TV displays and on special devices for the disabled. But HTML doesn't provide fine control over the format of the document, especially when printed.

At the other extreme PDF provides a format for close control over the look of a document, as to layout, font and such like, but less flexibility. While recent improvements in PDF do allow more options for flowing text to make it more readable and to structure the document in an XML-like format, this requires extra work from the author and so far few people have bothered. In practice two versions of a document have to be produced: the web version for on-screen display and the PDF one for printing. Even where these two versions are automatically generated from the one common source, they involve extra effort for the people creating and reading them.

XML now provides formatting options to allow the HTML-like flexibility, plus the fine formatting control of PDF. OpenOffice.org's XML based file format is not perfect, but it does provide an efficient way to package up all the elements of an XML document (including images) into one compressed file. This provides the prospect of formats which can be edited in a word processor, displayed as a web page, transformed for a hand held device or printed with specific styles.

OASIS has announced a committee to work on an Open Office XML Format based on OpenOffice.Org:

The resulting file format must meet the following requirements:
  1. it must be suitable for office documents containing text, spreadsheets, charts, and graphical documents,
  2. it must be compatible with the W3C Extensible Markup Language (XML) v1.0 and W3C Namespaces in XML v1.0 specifications,
  3. it must retain high-level information suitable for editing the document,
  4. it must be friendly to transformations using XSLT or similar XML-based languages or tools,
  5. it should keep the document's content and layout information separate such that they can be processed independently of each other, and
  6. it should 'borrow' from similar, existing standards wherever possible and permitted.
Since the OpenOffice.org XML format specification meets these criteria and has proven its value in real life, this TC will use it as the basis for its work.
From: OASIS TC Call For Participation: Open Office XML TC, Karl Best, Mon, 04 Nov 2002 08:32:27 -0500, URL: http://lists.oasis-open.org/archives/tc-announce/200211/msg00001.html

The Law Recognises electronic documents and databases

Recently the High Court of Australia has considered the curly question as to if the MIGRATION ACT's definition of documents included electronic documents stored in a database and how you “give” someone a document which is stored in a database:

  1. ... The ordinary dictionary meaning of "document" is a printed or written paper containing information. That definition of "document" is not apt to cover the sequence of electronic impulses in the electronic circuits of a computer disc that store information. ... No violence is done to the object or language of s 418(3) by holding that "document" includes information that is stored in a computer or a fax machine and which can be printed out by pressing one or more keys or buttons. No reason appears for thinking that Parliament intended to distinguish between information stored on paper and information stored in the electronic impulses of a computer that can be printed on paper by pressing a key or keys on the computer's keyboard. Statutes are always speaking to the present. If we can, we should give the words of a statute - which after all are only the means of conveying ideas and information to the public - a meaning that covers contemporary processes and accords with the object of the enactment[25]. ...
  2. "Documents" may include electronic documents: What, then, does the word "document" mean in such a context? Today, in ordinary speech, one can readily refer to a "document" in a database, although such a document may never have been reduced to tangible form. Typically, a database will yield information that appears in paginated format....
  3. ... Electronic "documents" could perhaps be "given" by separate identification and annexure to an electronic transmission. Yet even that was not done in the present case. Merely making such "documents" (or some of them) "available" in a mass of undifferentiated material in a database of constantly changing content does not comply with the language and particular design of the Act ...
Muin v Refugee Review Tribunal; Lie v Refugee Review Tribunal [2002] HCA 30 (8 August 2002), Last Updated: 11 September 2002, HIGH COURT OF AUSTRALIA

The High Court didn't say if it wanted documents printed in a particular font or with page numbers, but the decision itself is published as a web page with no font style or size specified and with paragraph numbers, rather than page numbers. As the High Court web site has links to these documents, it could be assumed the court is happy with this format.

Further Information

  1. Computing 3410

  2. Author's home page

Comments and corrections to: webmaster@tomw.net.au

Copyright © Tom Worthington2001-2003.