This item on "Electronic Publishing" is one of a segment on"Metadata and Electronic Document Management for Electronic Commerce" first presented for the Australian National University course "Information Technology in Electronic Commerce" (COMP3410/COMP6341).
This document is intended to provide both for live group presentation and accompanying lecture notes for individual use. The Slides and these notes are provided in the one HTML document, using HTML Slidy.
How and why might the ACS do scholarly publishing online?
The issue of how free and open access to scholarly research should be, came up with a 2001 petition from the Public Library of Science. An Advocacy Group made up of 11 people from US based and one from UK academic institutions proposed the establishment of international online public libraries of science with the complete text of all published scientific articles
The group claimed 26,144 researchers from 170 countries signed the open letter urging publishers to allowing research reports from their journals to be publicly available. The web site for the group is maintained by Patrick O. Brown, Stanford University School of Medicine and the Howard Hughes Medical Institute and Michael Eisen of the Lawrence Berkeley National Lab and University of California at Berkeley.
There was no subsequent boycotting of traditional publishers. But there has been a gradual change in the way research publishing is done. An interesting issue is the position of information technology researchers on the issue, given their role in creating the technology used for electronic publishing.
Some editions of some publications are made available free on-line in PDF or web format. However, there was no overall digital library. The ACS took several years considering publishing strategies, including e-publishing.
Activities such as the Open Archives Initiative are attempting to construct a virtual library of material using distributed document archives and shared metadata.
Organisations now considering electronic publications strategies can use an integrated approach using newer XML tools to create and maintain content. The ACS had a tradition of providing the content of its journal free for non-profit use. This was extended into an electronic edition in a format suitable for direct citation and annotation with metadata in a format suitable for harvesting by specialised virtual library tools as well as traditional web search engines. The content was made available for education.
A pioneer of e-publishing for IT has been the Association for Computing Machinery (ACM). The ACM collection was made available on-line in 1997 and the web interface allows the contents pages of the journals to be browsed and metadata searched. New content was created in SGML, then web, PDF and print versions generated for that. The online service is by paid subscription to members, non-members and institutions or sales per article. The service has proved popular and ACM is considering discontinuing some print titles.
ACM journals accept articles in a number of electronic formats using supplied templates. The PDF versions of documents generated are close in format to the print editions, but the HTML versions use a different format more suited to on-line viewing. Graphics are shown as small thumbnail versions, with links to high resolution versions.
The minutes of the ACM Publications Board show the considerable complexities and manual processing steps which had to be automated.
An example of a document converted using OpenOffice.org, is "ICT Development in Australia - A Strategic Policy Review" prepared for the Australian Computer Society by Professor Houghton. The web adaption of the report was created from the MS-Word version. This was done by first importing the MS-Word document into OpenOffice.org and saving in HTML. The HTML was run through the "Tidy" utility to replace formatting commands throughout the document with styles. The table of contents was then manually re-linked to the document sections and ALT text placed on images.
Using OpenOffice.org to produce HTML has limitations. A better approach may be to use OpenOffice's internal XML format as an intermediate format. This retains more information about the original MS-Word document, than is present in a HTML translation.
As an example the Microsoft Word Style Files for ACM Journals and IEEE Transactions were converted to OpenOffice format:
OpenOffice files are stored as a directory of ZIP compressed files. The text of the word processing document is stored in a file labeled "content.xml" in the directory. Images and other binary files are stored in sub-directories.
Styles from the original style sheet are reflected in text styles in the translated XML documents.
A modified version of the Sun Microsystems developed Open Office format was adopted as an OASIS Standard on May 1, 2005. This "Open Document Format" (ODF) was adopted as an international standard ISO/IEC 26300:2006 in May 3, 2006.
Microsoft's Office Open XML format (OOXML) has similar features to ODF and is Draft International Standard 29500.
Both ODF and OOXML suffer from being derived from legacy word processing packages. A better alternative would be to use XHTML 2 and new CSS standards.
While the formats for publishing have been controversial, progress has been made on the metadata for publishing systems. The ACS has produced a Digital Library system which implements metadata for scholars via services such as the Arrow Discovery Service. A similar system is being implemented at the ANU for IFIP in 2008: IFIP Digital Library.