Determinism and the Semantic Web
According to the OED Online, determinism is "the doctrine that everything that happens is determined by a necessary chain of causation."
Good old card and online library catalogs provided (and still provide) information discovery by means of a deterministic process. You search for an author, for example, and the catalog searches metadata to find records that match your search. It then generates a search result set for you that includes everything by that author, and nothing by other authors. It's deterministic because it's a chain of events, a chain that is predictable and (normally) occurs the same way every time the same search is done.
Internet search engines don't operate deterministically for the most part. Instead, they use a probabilistic means of achieving information retrieval. For example, you enter a search for "inflation," and the search engine returns results to you that contain the word "inflation." These results are probably what you are looking for, but not definitely.
The Semantic Web returns us to a deterministic method of information retrieval. Web pages and data on the web will be marked up with standard uniform resource identifiers that will work like metadata records in a library catalog. The level of indexing in the Semantic Web can be much more granular than library catalogs, for it can index at the word level.
There are a couple of problems with the Semantic Web. One is that it doesn't really exist yet, at least in terms of popular applications. The second problem is a trend that I am observing in which systems generate Semantic Web URIs, metadata, and triples probabilistically.
For example, services such as OpenCalais Semantic Proxy and Calais Viewer take raw text and try to semanticize it. This dilutes all the deterministic advantages that the Semantic Web promises.
For instance, if text contains the word pitch, and one of these services assigns the wrong meaning to the word (tree pitch, baseball pitch, airplane pitch) through a probabilistic process, often that process will be wrong. But the meaning the service chooses will be hard wired in the text using Semantic Web markup.
If the Semantic Web is built this way, it won't be an improvement over the current probabilistic methods that search engines use to enable information retrieval and discovery.
Because artificial intelligence-powered systems are still primitive at effectively collocating and disambiguating information resources, making the Semantic Web work satisfactorily will require human effort at a large scale, similar to the cataloging process that occurs in libraries all over the world.
Many have criticized cataloging for its inability to scale to the size of the web. The Semantic Web, if created properly, will suffer the same criticism.
This may be the one factor that keeps the Semantic Web from becoming mainstream: the inability to automate Semantic Web metadata creation.
