What is DOP?

 

Rens Bod

 

 

It is a fortunate coincidence that the P in DOP can be instantiated in various interesting ways. Standing originally for Data-Oriented Parsing, DOP has also become known as Data-Oriented Processing and Data-Oriented Perception. The underlying idea of DOP is that newly perceived input is understood in terms of previously perceived input. Or more concretely, DOP analyzes new data by probabilistically combining fragments from a corpus of previously analyzed data. This idea has become particularly influential in some of the cognitive and informatics sciences, such as machine learning and natural language processing, and DOP can be seen as a general umbrella that covers most of these approaches.

 

Although the DOP idea has led to some very successful models for linguistic as well as for musical and visual processing, it has some cognitive consequences that not everybody has immediately taken for granted. One consequence is that humans massively store previous experiences, a view which for a long time has been regarded as highly controversial. During the last decade, however, a large body of research has shown that people in fact build a huge fragment memory. In music, people store an enormous number of musical patterns, and in vision, people have a remarkably large visual memory, especially with respect to face recognition. In psycholinguistics, it has been shown that people not only store lexical items, bigrams and collocations, but also frequent phrases and whole sentences, and that such units can directly be used for processing new input.

 

These insights go clearly against the idea that human perception could be modeled by a system of rules, i.e. a grammar. "All grammars leak" is the well-known dictum by Edward Sapir. There are so many ambiguities, continua and gradient categories in language, music and vision, that only an approach which takes into account massively stored previous experiences can accurately model human perception.

 

So there is good reason to believe in Data-Oriented Perception. And I also think there is good reason to extend DOP to other fields of cognitive psychology. For many cognitive activities, such as manual reaches, arithmetic operations and problem solving, people store results in memory so that they can be retrieved whenever needed rather than being computed from scratch (Data-Oriented Psychology). And if you believe in Jerry Fodor's dictum that "cognitive science is where philosophy goes when it dies" then DOP could just as well mean Data-Oriented Philosophy. And why not Data-Oriented Problem solving, Data-Oriented Proof theory, etc, etc?

 

Yet there is one field where common wisdom has it that a system of "rules" works so well that a DOP approach seems useless. That field is Physics. What would Data-Oriented Physics look like? Rather than laws, we would have a corpus of derivations for all known physical phenomena ("derivations" describe each step in linking laws to phenomena). New phenomena can then be explained or predicted by combining sub-derivations of previous phenomena. But does this make sense if we can do the same job with laws only? The answer is that we cannot do the same job with laws only. It is nowadays well known that there are no general (bridge) principles that link laws to (models of) phenomena. Each physical phenomenon has its own way of being linked to laws usually via approximation schemes, corrections, renormalizations, and the like.

 

Just take the well-known exponential-decay phenomenon in radioactive processes. This phenomenon cannot be derived from the equations of quantum mechanics by some set of general principles. It can only be approximately derived, for example by a markov approximation over a perturbation expansion of Pauli's equation. But even before you can do this approximation, you first need to create what Nancy Cartwright called a "theory-friendly" description of the phenomenon that will bring it into the theory. You will have to know what boundary conditions can be used, what normalization procedures are valid, and the like. Thus the laws of quantum mechanics alone don't predict anything. And the same even counts for the laws of classical mechanics! In order to fit Newton's equations of motion to an actual phenomenon such as a pendulum, you need to know which assumptions and approximations should be made at which steps in the derivation.

 

Thus for each phenomenon you have to figure out how it can be linked to the relevant laws. Fortunately this does not mean that a resulting link is useless for understanding new phenomena. As every student of physics knows, once you have learned how to fit Newton's equations to a number of phenomena you can use certain derivation steps for a range of other phenomena (for example, parts of the derivation of the pendulum carry over to oscillators). Thomas Kuhn was on the right track when he emphasized the importance of "exemplars" in the training of scientists. And I agree with Ronald Giere that scientists possess a large collection of exemplars which makes it possible for them to recognize a new situation as "similar" to previous situations.

 

It is exactly here that I think Data-Oriented Physics should come in. Kuhn's exemplars are DOP's derivations from laws to phenomena, and Giere's notion of similarity is DOP's analysis mechanism that tries to build new derivations out of previous derivations. The fewer subderivations you need to derive a new phenomenon, the more similar this phenomenon is to previous phenomena. (This has a probabilistic correlate in that fewer subderivations tend to result in a higher probability for the whole derivation.)

 

So yes, I also believe in Data-Oriented Physics. Physics is not just about discovering laws or models for phenomena, it is about discovering derivations from laws to phenomena, usually via approximations, corrections, boundary conditions and the like. Finding these derivations can be hard, but once you have found some of them, you can productively re-use parts of them for explaining and predicting new phenomena.

 

Rather than a minimalist system of laws, science should be viewed as a "maximalist" system of known phenomena in the light of which future phenomena are understood. It was perhaps this what W.V.O. Quine envisaged in 1953 when he wrote: "As an empiricist I continue to think of the conceptual scheme of science as a tool, ultimately, for predicting future experience in the light of past experience."

 

I'll soon make a case for Data-Oriented Politics!