Electronic documents have existed for many years and predate the web. However, web technology, particularly the use of XML standards facilitate document production and management. Some pre-web examples of formats will be given, but XML implementations will be emphasized.
With the expansion of web based systems and there acceptance by the business and general community, it is likely that they will be the first choice for electronic document management systems. The use of 'social networking' may accelerate that use and see EDM systems be more than just computerized versions of paper files.
Several examples are introduced, from business, government and academia. While the use of web technology is normally thought of as applying to desktop computers with wired broadband, the technology is likely to expand for mobile wireless applications.
Lecture Outline
This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The Slides and these notes are provided in the one HTML document, using HTML Slidy.
There have been revisions to the material from 2001/2002, 2003, 2004 and 2005, 2006 and 2007.
Before looking in detail at metadata and electronic document technology, it is useful to consider what the technologist is attempting to accomplish with these. The Internet was designed to provide a global computer data communications network. The >World Wide Web provides electronic documents hypertext linked, most commonly over the Internet. The aim with metadata, document and e-commerce standards is to add an additional layer over the web to provide a global services for civil society, government and business.
Services can be available from hand held wireless devices, as well as desktop computers. These services should use formats which are globally standardized, usable for decades and have legal standing. The systems of non-government, government and commercial organisations should be able to securely inter operate to provide services to the public. Such a global service is now achievable.
The Internet was intended for people to communicate with computers using computer terminals, and for computers to communicate with each other using specially designed machine to machine protocols. The web was designed for people to create electronic documents for people to read. Later web technologies, such as XML, were introduced for machine to machine communications (as for use in e-commerce).
Electronic formats can be optimized for either human or machine communication. However, there are benefits from compromising with a format which can be used for both. Web technologies provide this ability. Documents can have sufficient structure to be processed by an automated system, but also rendered to a format readable by a person.
Building systems which can be read by both people and machines is challenging. Such formats need to be efficient for storage and transmission, while being able to be converted into a format for human reading (rendered). The format needs to be agreed by all those who use it (ideally worldwide) and fixed for long enough to be useful (ideally for decades), but adaptable for use.
Metadata provides a tool to make electronic documents more efficient and flexible. In many cases a short summary of a document (the metadata) can be used in place of the full document, saving on transmission and processing as well as saving time for the human reader. The metadata can be used to manipulate the information in documents to create new documents. The same encoding used for describing documents can be used by data processing systems to carry out electronic commerce.
The Internet has allowed lower cost access to information, placing pressure on governments and others to provide the information. Systems such as Creative Commons provide a way to licence to provide information freely, while retaining ownership.
The Victorian Parliament is conducting an inquiry into government open access.
Social networking software allows for a computer system to help people interact in groups. While normally thought of for social purposes, it is now being adopted for business. Linked-In provides a way for professionals to interact with each other, find colleagues. Naymz provides a reputation management service. It is likely such systems will be used within and between organisations, including government, to manage work, grant access to information, and work out remuneration for staff. This requires the metadata about people and their actions to be carefully encoded and stored.
HTML has only limited provision for metadata. Systems such as Liniked in get around the problem using Microformats, using HTML class names for the metadata element names. This allows the metadata to be included in the body of the HTML document, instead of the header and requires less duplication of information.
Documents and databases represent two views of data in a computer system. Electronic documents are fixed in content and format, are individual distinct entities, can be displayed using software from different suppliers, are expected to last for years and outlive the software which created them. Database have content which changes, can be displayed in different ways, may only be of value for minutes or months and may depend on one version of database software. This is not to say that all documents are fixed and all databases fluid, but is a useful generalization.
XML now provides formatting options to allow HTML-like documents and database processing. OpenOffice.org's XML based file format provides a way to package up all the elements of an XML document (including images) into one compressed file. This provides the prospect of formats which can be edited in a word processor, displayed as a web page, transformed for a hand held device or printed with specific styles.
In 2002 OASIS announced a committee to work on an office XML standard format based on OpenOffice.org's XML format. The first draft was released in March 2004.
In 2003 the High Court of Australia concluded that the MIGRATION ACT's definition of documents included electronic documents stored in a database.
The High Court also concluded electronic documents need to be separately identified.