Wednesday, April 30, 2008

Cogent Confabulation

Continuing the "new ideas in artificial intelligence" theme from the Heidegger meeting and the Baum meeting, on Sunday January 27, 2008 the Emergent Epistemology Salon met to talk about an idea that one of our members was exposed to by virtue of taking a class from the author.

Cogent Confabulation
Robert Hecht-Nielsen

A new model of vertebrate cognition is introduced: maximization of cogency P(a,b,g,d GIVEN j). This model is shown to be a direct generalization of Aristotelian logic, and to be rigorously related to a calculable quantity. A key aspect of this model is that in Aristotelian logic information environments it functions logically. However, in non-Aristotelian environments, instead of finding the conclusion with the highest probability of being true (a popular past model of cognition); this model instead functions in the manner of the ‘duck test;’ by finding that conclusion which is most supportive of the truth of the assumed facts.

When we sat down to talk about the idea there were two seemingly unrelated ideas going on. One had to do with probabilistic reasoning and the other had to do with the uses, setup, and general engineering orientation of thinking software.

First, there appeared to be the rather simple idea of training a model using a linear data set so that it can use what it's seen in the past to predict "the next symbol/word/letter/whatever that should show up" independently for the i-th symbol back (for i from 1 to N). Brutally simple markov modeling would look up the last N symbols all together and be done with it but the memory costs of the simple markov model would be roughly (numSymbols)^N whereas the memory costs of Hecht-Neilson's cogent confabulator would be only N*(numSymbols). Nonetheless, this didn't seem like a particularly "deep" idea. Whether it is a good idea or not depends in part of the structural patterns in the data and there is already reconstructability analysis to help there.

Second, there was the idea of cogence - the new standard proposed by Hecht-Nielsen for generating a "theory" from co-occurrence tallies. The goal was not to use new data to update the probability for a theory according to the standard understanding of Bayesian rationality. The way he puts it is that, instead of maximizing a posteriori probability, humans instead maximize a priori probability. This "probabilistically backwards" number that a theory maximizes is called "cogence". In other words, a cogent confabulator is supposed to leap to something like "the obvious but kind of dumb theory" in a defiance-with-a-twist on the growing Bayesian orthodoxy. There's an attempt to support the claim in normal life by arguing that with a phrase "company rules forbid taking" the technically most likely word to follow should be some part of speech like "the" whereas humans are likely to guess something like "naps".

Hecht-Neilson claims that this is a general cortical theory and his writings include gestures into neurobiology but we didn't discuss those aspects of his writings.

Most of our substantive discussion revolved around understanding the algorithm, and then looking for refinements of and uses for a cogent confabulator to get a sense of why it might be as important as Hecht-Neilson claims it is. When we first started discussing the papers we weren't that impressed by the ideas. It's a small paper that's not that different from a lot of things you can find elsewhere. But the the productivity of the conversation was sort of impressive versus those expectations.

For example, the mechanism might be helpful to give some notion of easily calculated kind of "obviousness to a listener" that could then be corrected for by a speaker in the course of producing speech that would be maximally informative. Both sides could do computationally cheap confabulation, then the speaker would compare the impression that conversation would have to the actual model she was trying to convey. The uttered speech could be something slightly more carefully chosen that corrects what's wrong with the (presumptively) mutual confabulation of where the conversation was going. Neat :-)

Monday, April 28, 2008

Evolutionary Economics of Intelligence (Take Two)

After the Cybernetic Totalism talk, the Emergent Epistemology Salon wanted to hunt around for something brilliantly new in the ballpark of "general reasoning" that could actually be implemented. To that end we looped back to something we talked about back in May of 2007 and met again on on Jan 13, 2008 to talk about...

Manifesto for an Evolutionary Economics of Intelligence
Eric B. Baum, 1998

PARTIAL ABSTRACT: We address the problem of reinforcement learning in ultra-complex environments. Such environments will require a modular approach. The modules must solve subproblems, and must collaborate on solution of the overall problem. However a collection of rational agents will only collaborate if appropriate structure is imposed. We give a result, analogous to the First Theorem of Welfare Economics, that shows how to impose such structure. That is, we describe how to use economic principles to assign credit and ensure that a collection of rational (but possibly computationally limited) agents will collaborate on reinforcement learning. Conversely, we survey catastrophic failure modes that can be expected in distributed learning systems, and empirically have occurred in biological evolution, real economics, and artificial intelligence programs, when such structure was not enforced.

We conjecture that simulated economies can evolve to reinforcement learn in complex environments in feasible time scales, starting from a collection of agents which have little knowledge and hence are *not* rational. We support this with two implementations of learning models based on these principles.

Compared to the previous discussion on this paper, we were more focused on the algorithm itself instead of on the broader claims about the relevance of economics to artificial intelligence. We were also more focused on a general theme the group has been following - the ways that biases in an optimizing process are (or are not) suited to the particularities of of given learning problem.

We covered some of the same ideas related to the oddness that a given code fragment within the system needed both domain knowledge and "business sense" in order to survive. Brilliant insights that are foolishly sold for less than their CPU costs might be deleted and at the same time, the potential for "market charlatanism" might introduce hiccups in the general system's ability to learn. By analogy to the real world, is it easier to invent an economically viable fusion technology or to defraud investors with a business that falsely claims to have an angle on such technology?

We also talked about the reasons real economies are so useful - they aggregate information from many contexts and agents into a single data point (price) that can be broadcast to all contexts to help agents in those contexts solve their local problems more effectively. It's not entirely clear how well the analogy from real economies maps into the idea of a general learning algorithm. You already have a bunch of agents in the real world. And there's already all kinds of structure (physical distance and varied resources and so on) in the world. The scope of the agents is already restricted to what they have at hand and the expensive problem that economics solves is "getting enough of the right information to all the dispersed agents". In a computer, with *random access* memory, the hard part is discovering and supporting structure in giant masses of data the first place. It seemed that economic inspiration might be a virtue in the physical world due the necessities imposed by the physical world. Perhaps something more cleanly mathematical would be better inside a computer?

Finally, there was discussion around efforts to re-implement the systems described in the paper and how different re-implementation choices might improve or hurt the performance.

Wednesday, February 13, 2008

Heideggerian A.I.

After reading Lanier's article there was some discussion about potential in the field of Artificial Intelligence and the perception that it didn't seem to have any brilliantly new ideas about "general reasoning". Machine learning techniques from the 60's are in some senses still the state of the art. Or are they? With this background, we thought it would be interesting to spend some time trying to find something new and good in AI. The Emergent Epistemology Salon met on December 16, 2007 to discuss...

Why Heideggerian AI failed and how fixing it would require making it more Heideggerian
Quoting from Hubert L. Dreyfus's text (links added):

As luck would have it, in 1963, I was invited by the RAND Corporation to evaluate the pioneering work of Alan Newell and Herbert Simon in a new field called Cognitive Simulation (CS)...

As I studied the RAND papers and memos, I found to my surprise that, far from replacing philosophy, the pioneers in CS had learned a lot, directly and indirectly from the philosophers. They had taken over Hobbes' claim that reasoning was calculating, Descartes' mental representations, Leibniz's idea of a "universal characteristic" – a set of primitives in which all knowledge could be expressed, -- Kant's claim that concepts were rules, Frege's formalization of such rules, and Russell's postulation of logical atoms as the building blocks of reality. In short, without realizing it, AI researchers were hard at work turning rationalist philosophy into a research program.

At the same time, I began to suspect that the critical insights formulated in existentialist armchairs, especially
Heidegger's and Merleau-Ponty's, were bad news for those working in AI laboratories-- that, by combining rationalism, representationalism, conceptualism, formalism, and logical atomism into a research program, AI researchers had condemned their enterprise to reenact a failure.

Dreyfus's proposed solutions are (very generally) to "eliminate representation" by building "behavior based robots" and to program the ready-to-hand.

In our discussion we spent a good chunk of time reviewing Heidegger. Heideggerian scholars are likely to be horrified by the simplification, but one useful way we found to connect the ideas to more prosaic concepts was to say that Heidegger was taking something like flow in Csikszentmihalyi's sense as the primary psychological state people are usually in and the prototypical experience on which to build the rest of philosophy (and by extension through Dreyfus, the mental architecture that should be targeted in AI research).

There was discussion around difficulties with Dreyfus's word choices. Dreyfus is against "symbols" and "representation" but it would seem that he means something more particularly philosophical than run of the mill computer scientists might assume. It's hard to see how he could be objecting to 1's and 0's working as pointers and instructions and a way of representing regular expressions... or how he could object to clusters of neurons that encode/enable certain psychological states that happened to work as neurological intermediaries between perceptions and actions. In some sense these are symbols but probably not in the way Dreyfus is against symbols. There's a temptation to be glib and say "Oh yeah, symbol grounding is a good idea."

One side track I thought was interesting was the degree to which object oriented programming could be seen as a way for programmers to create explicit affordances over data by writing methods that dangle off of objects in a way that hides potentially vast amounts of detail from other programmers to use in the course of solving other problems.

Lastly, it's amusing that others were blogging about Heideggerian AI just after we discussed it. The subject must be in the air :-)

Monday, January 21, 2008

Cybernetic Totalism

On Sunday, November 25, 2007 the Emergent Epistemology Salon met to talk about Jaron Lanier's December 2000 essay in Wired "One-Half of a Manifesto: Why stupid software will save the future from neo-Darwinian machines". Here is an excerpt from the beginning of the essay:

I hope no one will think I'm equating cybernetics and what I'm calling cybernetic totalism. The distance between recognizing a great metaphor and treating it as the only metaphor is the same as the distance between humble science and dogmatic religion.

Here is a partial roster of the component beliefs of cybernetic totalism:
1. Cybernetic patterns of information provide the ultimate and best way to understand reality.
2. People are no more than cybernetic patterns.
3. Subjective experience either doesn't exist, or is unimportant because it is some sort of ambient or peripheral effect.
4. What Darwin described in biology, or something like it, is in fact also the singular, superior description of all creativity and culture.
5. Qualitative as well as quantitative aspects of information systems will be inexorably accelerated by Moore's law.

And finally, the most dramatic:
6. Biology and physics will merge with computer science (becoming biotechnology and nanotechnology), resulting in life and the physical universe becoming mercurial; achieving the supposed nature of computer software. Furthermore, all of this will happen very soon! Since computers are improving so quickly, they will overwhelm all the other cybernetic processes, like people, and will fundamentally change the nature of what's going on in the familiar neighborhood of Earth at some moment when a new "criticality" is achieved - maybe in about the year 2020. To be a human after that moment will be either impossible or something very different than we now can know.

At the Salon we were all familiar with at least pieces of this sketched belief system (though in some cases the familiarity went with a degree of contempt). Our conversation wound around each of the issues distinctly with an effort to pull out some of the details and see what we thought of them specifically.

Once we had developed a common understanding of the six points there was some discussion about the degree to which the ideas were coherent. Do these ideas actually hang together? Is the final eschatological point really entailed by the first five? If so, do you need all five? And assuming it all hangs together, is everything true?

The truth question shifted the conversation to our own agreements and disagreements about the state of computer science and artificial intelligence, but eventually we swung around to talk about the beliefs in more "religious" terms. How is cybernetic totalism psychologically stable? Leaving aside issues of truth, what do people get from being cybernetic totalists that causes them to hold onto and spread the ideas?

In the last part of the essay, Lanier explained why he didn't think the future would be as bleak as all that, but due to time constraints we mostly didn't talk about his vision of where things were headed and why.