Will Linked Data be the Next Dublin Core?

 

 

Launched in 1995 by a small, elite group of librarians and computer scientists, Dublin Core was created with the intention of replacing the community-developed MARC standard. Its creators thought all old standards—especially library standards and practices--were obsolete and useless. Their naïve goal was to index the internet using a simple metadata scheme and to create easily-sharable metadata that could be popularly used to discover and manage networked resources.

 

Dublin Core failed from the beginning. Web page creators abused the metadata scheme to promote their own pages; the goodwill that DC's creators assumed was quickly exploited to promote commercial and political interests. The limited number of DC fields and the scheme's lack of a sufficiently strong connection with content standards made it almost useless in enabling discovery; in fact, full-text searching was often more successful for discovery than a search of DC metadata. It didn't really provide a way to collocate authors, titles, or subjects; it was essentially free-form tagging in a metadata structure mask. It also didn't allow for meaningful alphabetical browse displays, because the competing content standards made aggregations of DC metadata unmanageable. In the early 2000s, the Government of Canada adopted DC as the metadata scheme for all the Canadian government web pages, but the scheme soon proved useless and unwieldy to them. The early promise of Dublin Core--to be an easy-to-use and universal metadata scheme--was soon broken, for the scheme's design to appeal to all the lowest common denominators made it unusable for most searches.

 

Still, Dublin Core survived because it attracted a dedicated group of fervent enthusiasts who artificially propped it up. They made enormous efforts to create applications that could use Dublin Core (such as D-Space, which now has many superior competitors), and many believed that the standard would soon make MARC obsolete, so they continued their work towards that goal. They established the notions of "application profiles" in DC to make it function more like MARC. The reasons that DC needed to be propped up to survive are clear. First, its creators never fully realized that DC would never fit into the market economy that libraries work in. The scheme was not created to fulfill an information retrieval need; it was created to eliminate MARC. Thus Dublin Core has never been commercially successful, despite the fact that the Dublin Core Metadata Initiative now boasts a CEO and a band of faithful supporters. Few commercial systems use it, it lacks a central database of records, and whatever applications there are to create Dublin Core metadata are largely unused and unknown. Dublin Core never fit into the information retrieval ecology. It has always been an outsider trying to fit in. Indeed, few applications are now using the scheme; it is quickly losing ground to MODS.

 

Now many of these same DC enthusiasts are taking the same approach with linked data that they used with Dublin Core. They are trying to push it on the library community with little or no explanation of its value or need. They have not explained why linking at the word level is better than linking at the document level and why libraries need to do it.

 

It's almost as if they have given up on Dublin Core trying to replace MARC and are turning to the next new thing and giving it a try instead. Linked data is a largely untested tool that is in search of a problem to solve. There is no market for it and no clear commercial need for it. Linked data is part of the long-promised Semantic Web, something the linked data fans threaten that we must all surrender to, for fear of libraries being left out of the next new big thing in online information retrieval.

 

I remain skeptical of the potential applicability and success of the Semantic Web and linked data in libraries. Information retrieval has yet to solve some major problems in Internet searching. Two of these problems, word-sense ambiguity and synonymy, are solved by human-created metadata and controlled vocabularies, but computers haven't reached the level where they can effectively disambiguate homonyms or collocate synonyms. For example, in the world's most advanced search engine, Google, when you enter "boxers" in the search box, the search engine doesn't know what type of boxers you want and will deliver results about the dog, the athlete, the garment, etc. The search engine does not offer a way to make such searches precise and offer only results about one of a homonym's multiple meanings. If Google can't effectively even link at the document level, how is it going to link at the word level? How will linked data know and correctly link the contextual sense of every homonym it encounters?

 

Manual solutions to the linked data movement will not work because the scheme won't scale to the size of the Internet. We can't even catalog everything now at the document level, so it will also be impossible to do it at the word level. I predict that despite all the fervent support that linked data will receive, it will turn out just like Dublin Core: an artificially propped-up, low-quality standard that has no commercial viability and that doesn't effectively solve information retrieval problems. Many trusted the developers of DC that it would be a wildly successful replacement for MARC. Their trust was misplaced, for MARC is as vital as ever despite the attacks on it, and Dublin Core has failed, despite being propped up. We should not make the same mistake with the linked data movement. We should insist on seeing successful commercial implementations of linked data before we devote resources to it, just like we do for library vendors.  

 

 July, 2010