More than 2 million research papers have disappeared from the Internet

Old documents and books stored on shelves in a library's archive. — A study identified more than two million articles that did not appear in a major digital archive, despite having an active DOI.Credit: Anna Berkut/Alamy

More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January¹, indicate that systems to preserve papers online have failed to keep pace with the growth of research output.

“Our entire epistemology of science and research relies on the chain of footnotes,” explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. “If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself.”

Eve, who is also involved in research and development at digital-infrastructure organization Crossref, checked whether 7,438,037 works labelled with digital object identifiers (DOIs) are held in archives. DOIs — which consist of a string of numbers, letters and symbols — are unique fingerprints used to identify and link to specific publications, such as scholarly articles and official reports. Crossref is the largest DOI registration agency, allocating the identifiers to about 20,000 members, including publishers, museums and other institutions.

The sample of DOIs included in the study was made up of a random selection of up to 1,000 registered to each member organization. Twenty-eight percent of these works — more than two million articles — did not appear in a major digital archive, despite having an active DOI. Only 58% of the DOIs referenced works that had been stored in at least one archive. The other 14% were excluded from the study because they were published too recently, were not journal articles or did not have an identifiable source.

Preservation challenge

Eve notes that the study has limitations: namely that it tracked only articles with DOIs, and that it did not search every digital repository for articles (he did not check whether items with a DOI were stored in institutional repositories, for example).

Nevertheless, preservation specialists have welcomed the analysis. “It’s been hard to know the real extent of the digital preservation challenge faced by e-journals,” says William Kilbride, managing director of the Digital Preservation Coalition, headquartered in York, UK. The coalition publishes a handbook detailing good preservation practice.

“Many people have the blind assumption that if you have a DOI, it’s there forever,” says Mikael Laakso, who studies scholarly publishing at the Hanken School of Economics in Helsinki. “But that doesn’t mean that the link will always work.” In 2021, Laakso and his colleagues reported² that more than 170 open-access journals had disappeared from the Internet between 2000 and 2019.

Kate Wittenberg, managing director of the digital archiving service Portico in New York City, warns that small publishers are at higher risk of failing to preserve articles than are large ones. “It costs money to preserve content,” she says, adding that archiving involves infrastructure, technology and expertise that many smaller organizations do not have access to.

Eve’s study suggests some measures that could improve digital preservation, including stronger requirements at DOI registration agencies and better education and awareness of the issue among publishers and researchers.

“Everybody thinks of the immediate gains they might get from having a paper out somewhere, but we really should be thinking about the long-term sustainability of the research ecosystem,” Eve says. “After you’ve been dead for 100 years, are people going to be able to get access to the things you’ve worked on?”

Reference

Tyler Fields

Tyler Fields is your internet guru, delving into the latest trends, developments, and issues shaping the online world. With a focus on internet culture, cybersecurity, and emerging technologies, Tyler keeps readers informed about the dynamic landscape of the internet and its impact on our digital lives.

Preservation challenge

Leave a Comment Cancel reply