Wednesday, April 30, 2008

Cogent Confabulation

Continuing the "new ideas in artificial intelligence" theme from the Heidegger meeting and the Baum meeting, on Sunday January 27, 2008 the Emergent Epistemology Salon met to talk about an idea that one of our members was exposed to by virtue of taking a class from the author.

Cogent Confabulation
Robert Hecht-Nielsen

A new model of vertebrate cognition is introduced: maximization of cogency P(a,b,g,d GIVEN j). This model is shown to be a direct generalization of Aristotelian logic, and to be rigorously related to a calculable quantity. A key aspect of this model is that in Aristotelian logic information environments it functions logically. However, in non-Aristotelian environments, instead of finding the conclusion with the highest probability of being true (a popular past model of cognition); this model instead functions in the manner of the ‘duck test;’ by finding that conclusion which is most supportive of the truth of the assumed facts.

When we sat down to talk about the idea there were two seemingly unrelated ideas going on. One had to do with probabilistic reasoning and the other had to do with the uses, setup, and general engineering orientation of thinking software.

First, there appeared to be the rather simple idea of training a model using a linear data set so that it can use what it's seen in the past to predict "the next symbol/word/letter/whatever that should show up" independently for the i-th symbol back (for i from 1 to N). Brutally simple markov modeling would look up the last N symbols all together and be done with it but the memory costs of the simple markov model would be roughly (numSymbols)^N whereas the memory costs of Hecht-Neilson's cogent confabulator would be only N*(numSymbols). Nonetheless, this didn't seem like a particularly "deep" idea. Whether it is a good idea or not depends in part of the structural patterns in the data and there is already reconstructability analysis to help there.

Second, there was the idea of cogence - the new standard proposed by Hecht-Nielsen for generating a "theory" from co-occurrence tallies. The goal was not to use new data to update the probability for a theory according to the standard understanding of Bayesian rationality. The way he puts it is that, instead of maximizing a posteriori probability, humans instead maximize a priori probability. This "probabilistically backwards" number that a theory maximizes is called "cogence". In other words, a cogent confabulator is supposed to leap to something like "the obvious but kind of dumb theory" in a defiance-with-a-twist on the growing Bayesian orthodoxy. There's an attempt to support the claim in normal life by arguing that with a phrase "company rules forbid taking" the technically most likely word to follow should be some part of speech like "the" whereas humans are likely to guess something like "naps".

Hecht-Neilson claims that this is a general cortical theory and his writings include gestures into neurobiology but we didn't discuss those aspects of his writings.

Most of our substantive discussion revolved around understanding the algorithm, and then looking for refinements of and uses for a cogent confabulator to get a sense of why it might be as important as Hecht-Neilson claims it is. When we first started discussing the papers we weren't that impressed by the ideas. It's a small paper that's not that different from a lot of things you can find elsewhere. But the the productivity of the conversation was sort of impressive versus those expectations.

For example, the mechanism might be helpful to give some notion of easily calculated kind of "obviousness to a listener" that could then be corrected for by a speaker in the course of producing speech that would be maximally informative. Both sides could do computationally cheap confabulation, then the speaker would compare the impression that conversation would have to the actual model she was trying to convey. The uttered speech could be something slightly more carefully chosen that corrects what's wrong with the (presumptively) mutual confabulation of where the conversation was going. Neat :-)

Monday, April 28, 2008

Evolutionary Economics of Intelligence (Take Two)

After the Cybernetic Totalism talk, the Emergent Epistemology Salon wanted to hunt around for something brilliantly new in the ballpark of "general reasoning" that could actually be implemented. To that end we looped back to something we talked about back in May of 2007 and met again on on Jan 13, 2008 to talk about...

Manifesto for an Evolutionary Economics of Intelligence
Eric B. Baum, 1998

PARTIAL ABSTRACT: We address the problem of reinforcement learning in ultra-complex environments. Such environments will require a modular approach. The modules must solve subproblems, and must collaborate on solution of the overall problem. However a collection of rational agents will only collaborate if appropriate structure is imposed. We give a result, analogous to the First Theorem of Welfare Economics, that shows how to impose such structure. That is, we describe how to use economic principles to assign credit and ensure that a collection of rational (but possibly computationally limited) agents will collaborate on reinforcement learning. Conversely, we survey catastrophic failure modes that can be expected in distributed learning systems, and empirically have occurred in biological evolution, real economics, and artificial intelligence programs, when such structure was not enforced.

We conjecture that simulated economies can evolve to reinforcement learn in complex environments in feasible time scales, starting from a collection of agents which have little knowledge and hence are *not* rational. We support this with two implementations of learning models based on these principles.

Compared to the previous discussion on this paper, we were more focused on the algorithm itself instead of on the broader claims about the relevance of economics to artificial intelligence. We were also more focused on a general theme the group has been following - the ways that biases in an optimizing process are (or are not) suited to the particularities of of given learning problem.

We covered some of the same ideas related to the oddness that a given code fragment within the system needed both domain knowledge and "business sense" in order to survive. Brilliant insights that are foolishly sold for less than their CPU costs might be deleted and at the same time, the potential for "market charlatanism" might introduce hiccups in the general system's ability to learn. By analogy to the real world, is it easier to invent an economically viable fusion technology or to defraud investors with a business that falsely claims to have an angle on such technology?

We also talked about the reasons real economies are so useful - they aggregate information from many contexts and agents into a single data point (price) that can be broadcast to all contexts to help agents in those contexts solve their local problems more effectively. It's not entirely clear how well the analogy from real economies maps into the idea of a general learning algorithm. You already have a bunch of agents in the real world. And there's already all kinds of structure (physical distance and varied resources and so on) in the world. The scope of the agents is already restricted to what they have at hand and the expensive problem that economics solves is "getting enough of the right information to all the dispersed agents". In a computer, with *random access* memory, the hard part is discovering and supporting structure in giant masses of data the first place. It seemed that economic inspiration might be a virtue in the physical world due the necessities imposed by the physical world. Perhaps something more cleanly mathematical would be better inside a computer?

Finally, there was discussion around efforts to re-implement the systems described in the paper and how different re-implementation choices might improve or hurt the performance.