Turing Test Rant

Pablo Nogueira

Updated: 6 July 2012

The Turing Test is advocated by many as a satisfactory test of (machine or human) intelligence. However, it is an anthropomorphic, functionalist, and behaviourist test which relies on the subjective judgement of an arbitrary observer. Certainly a less circumscribed version is used by humans to attest human intelligence in everyday situations. But does the test meet scientific standards?

Ironically, that's Turing's point: since we don't know how to formally define intelligence we better rely on everyday practical judgement.

Strengths and Weaknesses of Generality

Conducting a verbal conversation with a machine about any topic confers generality to the test. But certain kinds of verbal responses are to be classified as manifestations of intelligence rather than others. The exact criteria (objective, in the sense of agreed-upon) for making the distinction is left unspecified. This judgement is left to the tester which has his own criteria (subjective, in the sense of not-agreed-upon).

Simulated vs Real – Function vs Structure

Let's phrase the problem in words related to Chinese Room lore: How can we tell if an entity that appears to have X has the real X or a simulation of X? (This question could be asked to Science as a whole, for example: Do the laws of physics reflect the real world or a simulation (model) obtained with partial information?)

Leaving aside the debate about the nature of X (whether it's an essence, attribute, feature, etc), if we don't know how to make a distinction other than operationally then we must content ourselves to ascribe X if the simulation is convincing. But this only reminds us that we don't know how to make a distinction other than operationally. It reminds us that we lack well-defined criteria for establishing X as well as for recognising how real X comes about. In short, it reminds us of the frailty and fallibility of our knowledge and judgement.

Examples demonstrating the weakness of functional and operational tests abound. For instance, in his book Robot, the Future of Flesh and Machines [notes], Rodney A. Brooks recounts that many owners of Sony's artificial robot AIBO ascribed to it abilities the company categorically denied were possible [penguin edition, pages 105-107].

There's an old dichotomy between function and structure. The Turing Test is a reminder that, unavoidably, only the function can be explored when the structure is unknown. We may assume the systematic exploration of the function will eventually reveal the structure, but this presupposes that function is a function of structure, which isn't true in general: several structures may instrument the same function, and some may have better features than others (power, scalability, flexibility, etc). At any rate, the aim of the Turing Test is not to reveal structure but to judge function.

Newell and Simon's Empirical Stance

One must feel uneasy about the claim that a simulation of X can be built while not understanding X. Adopting an empirical stance is to beg the question. The empirical stance is the proposition put forth by Alan Newell and Herbert Simon in their Physical Symbol System Hypothesis which states that questions of AI are a matter of empirical enquiry and research, that is, they can only be proved or disproved empirically. However, this proposition is partially decidable and its contra-positive forbidden: It will be proved that machines can do X if we manage to make them do it, but we are not allowed to claim they cannot do X just because we haven't yet figure out how to. Questions of AI are not to be settled by arguments of principle. That's unfair. If we want to claim the negative we must use an indirect route, an argument, a contradiction. (Arguments are indeed proposed but then they're questioned, counter-arguments are proposed, and the whole thing gets a spin of its own.) The failures of AI in conjunction with the empirical stance smell of Thomas Aquinas: ‘we cannot know what X is, only what X is not’. Replace X by God, Intelligence, Beauty, etc.

Varieties of Intelligence

Certainly, claims that machines can't in principle do X have been empirically refuted in many areas. Does it follow that we fully understand X? No.

To paraphrase Brooks, ‘something can emerge from simple rules applied relentlessly that is able to do better at a task than an extremely gifted human’. Think of Deep Blue. (And let's acknowledge its rules are all but ‘simple’.) It seems to have a strategy even though it only performs a very fast tactical search. But the amazing thing is not that Deep Blue was able to beat Kasparov, but that Kasparov was able to beat Deep Blue.

Soft-AI positions need not provide satisfactory explanations of human intelligence, just like planes do not provide satisfactory explanations of bird flight. Present-day machines outperform humans in domains that ‘require intelligence’ and have been amenable to symbolic formalisation and implementation in some form of program, even though they ‘fly’ in different ways. After all, calculators are better than people at calculating.

When someone claims computers can't do X they might implicitly add the proviso ‘the way we do it’ so that a machine doing X should shed light on how we do X.

Mechanising X must be more part of the consequence than of the antecedent of a general theory of X. Seldom is this the case, notwithstanding claims of Strong-AI supporters, heirs of the behaviourist movement who staunchly believe that observable outward behaviour is enough to explain phenomena.

But why care so much about how we do X? From a Soft-AI perspective, we need not care how we do X. The goal is to build machines that do it, and do it better, and flawlessly. But from a cognitive-science perspective, we care. The key issue is awareness [more] which seems present in many important aspects of intelligence. It remains unexplained and somewhat disregarded.

Minsky's Empirical Stance

Related to Newell and Simon's empirical stance is Marvin Minsky's remark that one way to understand some problem is to write a computer program that solves it. It carries the implicit assumption that writing the program is a drive to find out more about the problem and its solution. It also carries the implicit assumption that any problem is solvable by instrumental means (Weizembaum's terminology). Taken to the extreme, we should expect hackers to solve the world's problems by writing the programs that solve them.

But programs aren't built ex-nihilo. There has to be a starting point, some assumptions, a model of sorts. And there is where the problem shifts. Researchers building simulations of X rally their tools under the assumption that they are sufficient for the task. They already have a model, a computational one. Some have tried, failed, and some have explained the failure resorting to arguments that partly fall outside science (e.g., philosophy and metaphysics).

We must distinguish algorithmically described processes (those that can be described formally or symbolically) and processes that are executed by algorithms whose interaction or functioning works but is not entirely understood or explained. For example, the distribution of weights among the nodes of a neural network at the end of training does not shed much light about how recognition works, even though we know how to build and design such networks. Accusing the bearers of this argument that they are simply refusing to ascribe human attributes to a computer won't do.

Finally, after the program is written, is it understood? [cf. Weizembaum's Incomprehensible Programs].