Monday, April 07, 2008

Glen Newton suggests an additional criterion to the definition of open access

Open Access must include access by machines:

* At minimum one must allow crawls of the site/content or (to reduce the impact of badly configured crawlers) create a compressed XML file containing all metadata and either content, or direct links to content and make it available for download (and if bandwidth is still an issue put it on a P2P network like BitTorrent).
* Preferable is to offer some kind of API (OTMI) or protocol (OAI-PMH) to get at content and metadata and citations.
* Better is to offer access to the XML of the articles in addition to the PDF and/or HTML; if the XML actually has some semantic content, then we are approaching the optimum.

The end goal is to support and encourage text mining and analysis of the full-text (preferably semantically rich XML), metadata and citations to allow literature-based exploration and discovery in support of the scientific research process.

Comment: hear, hear!! Open Access is about more than incremental steps. The world wide web is a fabulous gift, that lets us do a very great deal more than we ever could before. We have serious issues to tackle, in this world of ours; issues like figuring out a solution to global warming, and how to live together in peace. We need to figure out how to speed up our learning - as this true open access can do - and we don't have a moment to waste.