Tuesday, December 29, 2009

Policy Forum on Public Access to Federally Funded Research: Features and Technology

Following is my comment on the U.S. Office of Science and Technology Policy's Policy Forum on Public Access to Federally Funded Research: Features and Technology (second phase). Reader note: this post is more technical than the average IJPE post.

Q: In what format should published papers be submitted in order to make them easy to find, retrieve, and search and to make it easy for others to link to them?

A: XML is the best format. It is important to also take into account how the researchers work; the process of submission should ideally fit into their workflow. Microsoft has been working on an automated upload feature for repositories. Ideally, researchers should be able to cross-deposit to as many open access archives as are desirable for their work (I already have 3 archives myself, and there are good reason to deposit in all of them).

Q: Are there existing digital standards for archiving and interoperability to maximize public benefit?

- The Open Archives Initiative – Protocol for Metadata Harvesting (OAI-PMH) is key to harvesting and cross-searching metadata from all open access archives.
- Stable URLs, preferably ones that meet the standards for OpenURL (and possibly DOI), are essential.
- The SWORD protocol allows for cross-deposit into multiple archives.
- Creative Commons licensing, to facilitate both human and machine reading of licensing terms.
- For archiving (preservation): LOCKS, CLOCKSS, and Portico. For preservation purposes as well as ensured ongoing access, multiple mirror sites is recommended.
- Open standards are recommended. For example, video materials should use a format like MPEG-4. Open standards will allow the most possible people to access the materials, and will facilitate the task of preservation.

Q: How are these anticipated to change?
A: OAI-PMH is quite stable. SWORD is new; the ability to cross-deposit is very important to researchers, so watch for growth.

Q: What are the best examples of usability in the private sector (both domestic and international) and what makes them exceptional?
- E-LIS, the Open Archive for Library and Information Studies, has exceptional tools for searching, including a custom-designed subject classification scheme – not surprising for a tool developed by and for librarians: http://eprints.rclis.org/
- Google provides a very effective search engine to materials in repositories, particularly for known items. Google strikes me as more effective in this instance than Google Scholar.
- BASE, the Bielefeld Academic Search Engine, aims to be the world’s most comprehensive search service for open archives, using OAI-PMH: http://www.base-search.net/
- It is worthwhile looking at initiatives that are using the same standards for journals, conferences, and archives, providing a foundation for cross-searching materials in all these venues. For example, the Directory of Open Access Journals (DOAJ) http://www.doaj.org features an article-level search, based on OAI-PMH. Open Journal Systems (OJS), a free open source software, also supports OAI-PMH and there is a PKP harvester. http://pkp.sfu.ca/?q=ojs OJS is part of the Public Knowledge Project, which also includes Open Conference Systems and Open Monograph Systems (in development, to be released this February).

Q: Should those who access papers be given the opportunity to comment or provide feedback?

A: Of course; the only questions are the best venues for providing comments or feedback. My perspective is that opening up access to these papers has tremendous potential to inform public debates and commenting on a wide variety of issues; this potential will come to fruition over a period of time, as there will need to be time for learning and exploration. The most fruitful discussions, in my opinion, will be when people take ideas from the papers and bring them to their communities for discussion.

For example, it makes sense to me that a patient advocacy group might lead a discussion on research in their advocacy area, perhaps on their own website, including references to articles of interest. Researchers in this area might well wish to participate in special events with such a group from time to time; this would provide them with feedback in a focused way, and could also be a way for researchers to connect with people who might be good candidates for clinical trials.

Another example: a variety of businesspeople, scientists, and the environmentally minded public might well be interested in research that has the potential to uncover new green technologies.

What would be most helpful to facilitate this kind of discussion would be to ensure that papers have stable URLs so that these communities can reference them, ideally an easy way to export a proper citation, and creative commons licensing to ensure that rights issues are clear (and also to encourage broadest re-use rights; for example, allowing a portion of an article to be posted, with appropriate attribution, to the website of a not-for-profit discussion group).

There can be roles for journalists and media here to act as intermediaries in setting up such discussions, and also for government staff to conduct groups on public policy issues, much like this one.

Q: By what metrics (e.g. number of articles or visitors) should the Federal government measure success of its public access collections?

A: The first important metric is the number of articles that are freely available. This can involve a simple count of articles, percentage of articles covered under policies that are actually freely accessible, percentage of all scholarly articles published anywhere are freely accessible (an indirect measure of extended policy influence; as an example, hundreds of scholarly journals voluntarily participate fully in PubMedCentral in a way that goes far beyond what is required by the NIH Public Access Policy), and (a little harder) levels of inability to access materials; this may require developing a reporting system.

As for use, number of visitors, abstract views, or article downloads would be useful. It is important to focus on this kind of usage in the aggregate, and not at the individual paper level. There are potentially serious issues with using metrics to evaluate scholarly work, as I have touched on in my book chapter, The Implications of Usage Statistics as an Economic Factor in Scholarly Communication: http://eprints.rclis.org/4889/