2015-02-26

Statements are only statements

A few days ago in the comments of this post by +Teodora Petkova on Google+ I promised to +Aaron Bradley a post explaining why I am uneasy with the reference to things in Tim Berners-Lee's reference document defining (in 2006) Linked Data. The challenge was to make it readable by seven-years old kids or marketers, but I'm not sure the following meets this requirement.

When Google launched its Knowledge Graph (in 2012) with the tagline things, not strings, it was not much more than the principles of Linked Data as exposed in the above said document six years before, but implemented as a Google enclosure of mostly public source data, with neither API nor even public reusable URIs. I ranted here about that, and nothing seems to have changed since for that matter.
But something important I missed at the time is a subtle drift between TBL's prose and Google's one. The former speaks about things and information about those things. The latter starts by using also the term information, but switches rapidly to objects and facts.
[The Knowledge Graph] currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects.
The document uses "thing", "entity" and "object" at various places as apparent broad synonyms, conveying (maybe unwillingly) the (very naive) notion that the Knowledge Graph stands at a neat projection in data of "real-world" well-defined things-entities-objects and proven (true) facts about those. An impression reinforced by the use of expressions such as "Find the right thing". And actually, that's how most people are ready to buy it, "Don't be evil" implies "Don't lie, just facts". In a nutshell, if you want to know (true, proven, quality checked) facts about things, just ask Google. It's used to be just ask Wikipedia, but since the Knowledge Graph taps on Wikipedia, it inherits the trust in its source. But similarly naive presentations can be found here and there uttered by enthusiastic Linked Data supporters. Granted, TBL's discourse avoids reference to "facts", but does not close the door, and by this opening a pervasive neo-platonician view of the world has engulfed. There are things and facts outhere, just represent them on the Web using URIs and RDF, et voilà. The DBpedia Knowledge Base description contains such typical sentences blurring the ontological status of what is described.
All these [DBpedia] versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia.
It's let to everyone's guess to figure what "existence in the English version" can mean for a thing. What should such documents say instead of "things" and "facts" to avoid such a confusion? Simply what they are, data bases of statements using names (URIs) and sentences (RDF triples) which just copy, translate, adapt, in one word re-present on the Web statements already present in documents and data, in a variety of more or less natural, structured, formal, shared, idiomatic languages. As often stressed here (for five years at least), this representation is just another translation.
And, as for any kind of statements in any language, to figure whether you can trust them or not, you should be able to track their provenance, the context and time of their utterance. That's for example how Wikidata is intended to work. Look at the image below, nothing like a real-world thing or fact is mentioned, but a statement with its claim and context.
The question of the relationship of names and statements with any real-world referents is a deep question open by philosophers for ages, and which should certainly remain open. Or in any case the Web, Linked Data and the Knowledge Graph do not, will not, and should not insidiously, or even with no evil in mind, pretend to close it. Those technologies just provide incredibly efficient ways to exchange, link, access, share statements, based on Web architecture and a minimalist standard grammar. Which is indeed a great achievement, no less, but no more. At the end of the day, data are only data, statements are only statements.

No comments:

Post a Comment

Comments welcome