Metadata and Electronic Document Management for Electronic Commerce
Capturing Australia's Scholarly Publishing
Version of 15 August 2008
Capturing Scholarly Publishing
Thought experiment
- A researcher, and ARC grant recipient, at an Australian
university completes an article
- Following peer review the article is accepted by an
international proprietary journal
- A post print copy of the article is also lodged with the
university's open access digital repository ...
From: "Governmental Policy
Frameworks", Dr Evan Arthur, Department of Education, Science
and Training, 2004, URL:
http://www.humanities.org.au/NSCF/PowerPoints/NSCF%20(Arthur).ppt
At a roundtable in 2004 a Thought experiment was outlined to transform the process of distribution of scholarly information in Australia. It proposed to allow research results to be available online to government funding bodies, universities where the research was conducted, industry and the public. This process can now be automated for open access publications, using digital libraries using metadata standards and XML.
Automated Distribution
- ... These actions lead to automatic updating of
- the researcher's open access publication list
- the university's open access record of staff research
activity
- the ARC's open access record of research activity related
to its grants
- a gateway site providing sophisticated, industry tailored
access to research activities in Australian research
institutions
- the publicly accessible data warehouse which provides input
into quality assessments of Australian research institutions
From: "Governmental Policy
Frameworks", Dr Evan Arthur, Department of Education, Science
and Training, 2004, URL:
http://www.humanities.org.au/NSCF/PowerPoints/NSCF%20(Arthur).ppt
It was proposed that an researcher lodging their article in a university repository would automatically update institutional, government and public lists of research publishing.
Automating Capture
- Journal publishes metadata for all papers in machine readable
format on-line
- Institution archive scans metadata for its authors
- Institution publishes its author's metadata
- ARC ingests metadata from the institution (checks against
publisher)
- Gateways provide industry tailored indexes to research
The "thought experiment" can simplified if an article
to be lodged is already online with the required metadata. The step of lodging the article can be replaced by an automated scan.
To simplify the harvest process, the OAI Static Repository
format is available for participating journals to publish their
list of articles.
The papers can be automatically harvested from the digital library. The metadata can automatically populate publication lists,
research gateways and quality assessment data warehouses.
%0 Conference Proceedings
%A Aa, Tom Vander
%A Eeckhout, Lieven ...
%D 2002
%T Optimizing a 3D Image Reconstruction...
%O Seventh Asia-Pacific ... Conference (ACSAC2002)
%E Lai, Feipei
%E Morris, John
%I ACS
%C Melbourne, Australia
%P 119-126
%S Conferences in Research and Practice ...
%K CRPITVol6
%O confpapers/CRPITV6Aa.pdf
From: "Refer file of all papers",
CRPIT, 2004, URL: http://crpit.com/CRPIT.refer
Same metadata in BibTex
@inproceedings{CRPIT-6-119-126,
Author = {Aa, Tom Vander and Eeckhout...},
Title = {Optimizing a 3D Image ...},
BookTitle = {Seventh Asia-Pacific ... },
Editor = {Lai, Feipei and Morris, John},
Series= {Conferences in Research ...},
Address= {Melbourne, Australia},
Publisher = {ACS},
Volume = {6},
Pages = {119-126},
Year = {2002} }
From: "BiBTeX file of all papers",
CRPIT, 2004, URL: http://crpit.com/CRPIT.bib
The term "ingest"
is used to describe the process of
incorporating an electronic document and its metadata into an
electronic archive. Therefore "Masticate", seems a
suitable term to describe the preceding step of breaking the
document into ingestible items.
Extracting the metadata is a much easier task than that of
converting the entire content of a paper to an e-publishing
format. The metadata required is not much more than already
provided for bibliographic services.
Journal and conference indexes are traditionally provided in
formats such as Refer
and BibTeX.
These formats can be converted to those needed by digital
repositories. The resulting metadata files can be placed on the
publisher's web site for harvesting tools used by readers and
by archives. The archives can use XSLT to transform the metadata into other formats as required.
While Refer and BibTex contain most of the needed bibliographic
information, it would be more convenient in an XML format for use
in XML based systems. There are utility programs available to
convert between bibliographic formats and to XML versions of these
formats (such as BibXML). The XML versions can then be transformed
using XSLT into other XML formats.
BibXML to RSS
<?xml version="1.0" ?>
<rss version="2.0">
<channel>
<title>CRPIT</title>
<link>http://crpit.com/</link>
<description>Conferences in Research and Practice in Information Technology</description>
<language>en</language>
<item>
<title>Fast Segmentation of Large Images</title>
<link>http://crpit.com/confpapers/CRPITV16Crisp.pdf </link>
</item>
Converted using XSLT
XSLT used for BibXML to RSS
<xsl:template match="REF">
<item>
<title>xsl:value-of select="TITLE" /></title>
<link>http://crpit.com/<xsl:apply-templates select="UNRECOGNIZED/ITEM[TAG='note']"/>
</link>
</item>
XSLT used, Tom Worthington, 2004, URL:
crpit.xsl
The XML version of the metadata can then be made available and converted further.
RSS
<?xml version="1.0" ?>
<rss version="2.0">
<channel>
<title>ACM Queue</title>
<link>http://www.acmqueue.com/</link>
<description>Tomorrow's Computing Today</description>
<language>en-us</language>
<item>
<title>Samba Does Windows-to-Linux Dance</title>
<link>http://acmqueue.com/?...pid=171</link>
<description>Mounting remote Linux ...</description>
</item>
From: "RSS feed", Queue magazine, ACM, 2004, URL:
http://acmqueue.com/rss.rdf
RSS (Really Simple
Syndication) is a Web content syndication format usually used for
news items. But it can also make research papers more widely available. As an
example the ACM "Queue" magazine has a
"feed" button on the home page.
Atom (IETF RFC 4287), provides a more advanced, standardised and feature rich syndication format than RSS. The ANU E Press provides a custom feed in RSS or ATOM of Weblog entries,
New Products and Product Reviews.
OAI Static Repository
<ListRecords metadataPrefix="oai_dc">
<oai:record>
<oai:header>
<oai:identifier>oai:arXiv:cs/0112017...
<oai:datestamp>2001-12-14</oai:datestamp>
</oai:header>
<oai:metadata>
<oai_dc:dc ...
<dc:title>Using Structural Metadata ...
<dc:creator>Dushay, Naomi</dc:creator>
<dc:subject>Digital Libraries</dc:subject>
<dc:description>With the increasing ...
</oai_dc:dc>
</oai:metadata>
</oai:record>
From: "Specification for an OAI Static
Repository and an OAI Static Repository Gateway Protocol",
Version 2.0 of 2002-06-14, URL:
http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
OAI Static Repository Gateway Protocol, while more complex than RSS is conceptually similar. The details of a list of published documents can be provided in a static file which can be harvested by a remote
system. This file can be simply placed on the publishers web site,
alongside the Refer, BibTex and RSS files.
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is conceptually similar to the web services interface provided by Amazon.com to their list of publications.
Repository Explorer
Repository Name | ACS Digital Library |
Base URL | http://dl.acs.org.au/index.php/index/oai |
---|
Protocol Version | 2.0 |
Admin Email | dl@tomw.net.au |
---|
Earliest Datestamp | 2006-12-05T00:40:05Z |
Deleted Record Handling | no |
---|
Granularity | YYYY-MM-DDThh:mm:ssZ |
Compression | gzip |
Compression | deflate |
Other Information | description:
oai-identifier:
scheme: oai
repositoryIdentifier: acs.ojs.journals.sfu.ca
delimiter: :
sampleIdentifier: oai:acs.ojs.journals.sfu.ca:article/1
|
Archive Self-Description, for http://dl.acs.org.au/index.php/index/oai, Repository Explorer, 2008-08-15T02:32:07Z
Tools such as the Open Archives Initiative Repository Explorer allow demonstration access to a digital library's OAI interface. The formats the metadata is available in can be queried and then records of electronic documents requested in that format.