Wednesday, April 30, 2008
A new model of vertebrate cognition is introduced: maximization of cogency P(a,b,g,d GIVEN j). This model is shown to be a direct generalization of Aristotelian logic, and to be rigorously related to a calculable quantity. A key aspect of this model is that in Aristotelian logic information environments it functions logically. However, in non-Aristotelian environments, instead of finding the conclusion with the highest probability of being true (a popular past model of cognition); this model instead functions in the manner of the ‘duck test;’ by finding that conclusion which is most supportive of the truth of the assumed facts.
When we sat down to talk about the idea there were two seemingly unrelated ideas going on. One had to do with probabilistic reasoning and the other had to do with the uses, setup, and general engineering orientation of thinking software.
First, there appeared to be the rather simple idea of training a model using a linear data set so that it can use what it's seen in the past to predict "the next symbol/word/letter/whatever that should show up" independently for the i-th symbol back (for i from 1 to N). Brutally simple markov modeling would look up the last N symbols all together and be done with it but the memory costs of the simple markov model would be roughly (numSymbols)^N whereas the memory costs of Hecht-Neilson's cogent confabulator would be only N*(numSymbols). Nonetheless, this didn't seem like a particularly "deep" idea. Whether it is a good idea or not depends in part of the structural patterns in the data and there is already reconstructability analysis to help there.
Second, there was the idea of cogence - the new standard proposed by Hecht-Nielsen for generating a "theory" from co-occurrence tallies. The goal was not to use new data to update the probability for a theory according to the standard understanding of Bayesian rationality. The way he puts it is that, instead of maximizing a posteriori probability, humans instead maximize a priori probability. This "probabilistically backwards" number that a theory maximizes is called "cogence". In other words, a cogent confabulator is supposed to leap to something like "the obvious but kind of dumb theory" in a defiance-with-a-twist on the growing Bayesian orthodoxy. There's an attempt to support the claim in normal life by arguing that with a phrase "company rules forbid taking" the technically most likely word to follow should be some part of speech like "the" whereas humans are likely to guess something like "naps".
Hecht-Neilson claims that this is a general cortical theory and his writings include gestures into neurobiology but we didn't discuss those aspects of his writings.
Most of our substantive discussion revolved around understanding the algorithm, and then looking for refinements of and uses for a cogent confabulator to get a sense of why it might be as important as Hecht-Neilson claims it is. When we first started discussing the papers we weren't that impressed by the ideas. It's a small paper that's not that different from a lot of things you can find elsewhere. But the the productivity of the conversation was sort of impressive versus those expectations.
For example, the mechanism might be helpful to give some notion of easily calculated kind of "obviousness to a listener" that could then be corrected for by a speaker in the course of producing speech that would be maximally informative. Both sides could do computationally cheap confabulation, then the speaker would compare the impression that conversation would have to the actual model she was trying to convey. The uttered speech could be something slightly more carefully chosen that corrects what's wrong with the (presumptively) mutual confabulation of where the conversation was going. Neat :-)
Monday, April 28, 2008
Manifesto for an Evolutionary Economics of Intelligence
Eric B. Baum, 1998
PARTIAL ABSTRACT: We address the problem of reinforcement learning in ultra-complex environments. Such environments will require a modular approach. The modules must solve subproblems, and must collaborate on solution of the overall problem. However a collection of rational agents will only collaborate if appropriate structure is imposed. We give a result, analogous to the First Theorem of Welfare Economics, that shows how to impose such structure. That is, we describe how to use economic principles to assign credit and ensure that a collection of rational (but possibly computationally limited) agents will collaborate on reinforcement learning. Conversely, we survey catastrophic failure modes that can be expected in distributed learning systems, and empirically have occurred in biological evolution, real economics, and artificial intelligence programs, when such structure was not enforced.
We conjecture that simulated economies can evolve to reinforcement learn in complex environments in feasible time scales, starting from a collection of agents which have little knowledge and hence are *not* rational. We support this with two implementations of learning models based on these principles.
Compared to the previous discussion on this paper, we were more focused on the algorithm itself instead of on the broader claims about the relevance of economics to artificial intelligence. We were also more focused on a general theme the group has been following - the ways that biases in an optimizing process are (or are not) suited to the particularities of of given learning problem.
We covered some of the same ideas related to the oddness that a given code fragment within the system needed both domain knowledge and "business sense" in order to survive. Brilliant insights that are foolishly sold for less than their CPU costs might be deleted and at the same time, the potential for "market charlatanism" might introduce hiccups in the general system's ability to learn. By analogy to the real world, is it easier to invent an economically viable fusion technology or to defraud investors with a business that falsely claims to have an angle on such technology?
We also talked about the reasons real economies are so useful - they aggregate information from many contexts and agents into a single data point (price) that can be broadcast to all contexts to help agents in those contexts solve their local problems more effectively. It's not entirely clear how well the analogy from real economies maps into the idea of a general learning algorithm. You already have a bunch of agents in the real world. And there's already all kinds of structure (physical distance and varied resources and so on) in the world. The scope of the agents is already restricted to what they have at hand and the expensive problem that economics solves is "getting enough of the right information to all the dispersed agents". In a computer, with *random access* memory, the hard part is discovering and supporting structure in giant masses of data the first place. It seemed that economic inspiration might be a virtue in the physical world due the necessities imposed by the physical world. Perhaps something more cleanly mathematical would be better inside a computer?
Finally, there was discussion around efforts to re-implement the systems described in the paper and how different re-implementation choices might improve or hurt the performance.
Wednesday, February 13, 2008
Why Heideggerian AI failed and how fixing it would require making it more Heideggerian
Quoting from Hubert L. Dreyfus's text (links added):
As luck would have it, in 1963, I was invited by the RAND Corporation to evaluate the pioneering work of Alan Newell and Herbert Simon in a new field called Cognitive Simulation (CS)...
As I studied the RAND papers and memos, I found to my surprise that, far from replacing philosophy, the pioneers in CS had learned a lot, directly and indirectly from the philosophers. They had taken over Hobbes' claim that reasoning was calculating, Descartes' mental representations, Leibniz's idea of a "universal characteristic" – a set of primitives in which all knowledge could be expressed, -- Kant's claim that concepts were rules, Frege's formalization of such rules, and Russell's postulation of logical atoms as the building blocks of reality. In short, without realizing it, AI researchers were hard at work turning rationalist philosophy into a research program.
At the same time, I began to suspect that the critical insights formulated in existentialist armchairs, especially Heidegger's and Merleau-Ponty's, were bad news for those working in AI laboratories-- that, by combining rationalism, representationalism, conceptualism, formalism, and logical atomism into a research program, AI researchers had condemned their enterprise to reenact a failure.
Dreyfus's proposed solutions are (very generally) to "eliminate representation" by building "behavior based robots" and to program the ready-to-hand.
In our discussion we spent a good chunk of time reviewing Heidegger. Heideggerian scholars are likely to be horrified by the simplification, but one useful way we found to connect the ideas to more prosaic concepts was to say that Heidegger was taking something like flow in Csikszentmihalyi's sense as the primary psychological state people are usually in and the prototypical experience on which to build the rest of philosophy (and by extension through Dreyfus, the mental architecture that should be targeted in AI research).
There was discussion around difficulties with Dreyfus's word choices. Dreyfus is against "symbols" and "representation" but it would seem that he means something more particularly philosophical than run of the mill computer scientists might assume. It's hard to see how he could be objecting to 1's and 0's working as pointers and instructions and a way of representing regular expressions... or how he could object to clusters of neurons that encode/enable certain psychological states that happened to work as neurological intermediaries between perceptions and actions. In some sense these are symbols but probably not in the way Dreyfus is against symbols. There's a temptation to be glib and say "Oh yeah, symbol grounding is a good idea."
One side track I thought was interesting was the degree to which object oriented programming could be seen as a way for programmers to create explicit affordances over data by writing methods that dangle off of objects in a way that hides potentially vast amounts of detail from other programmers to use in the course of solving other problems.
Lastly, it's amusing that others were blogging about Heideggerian AI just after we discussed it. The subject must be in the air :-)
Monday, January 21, 2008
I hope no one will think I'm equating cybernetics and what I'm calling cybernetic totalism. The distance between recognizing a great metaphor and treating it as the only metaphor is the same as the distance between humble science and dogmatic religion.
Here is a partial roster of the component beliefs of cybernetic totalism:
1. Cybernetic patterns of information provide the ultimate and best way to understand reality.
2. People are no more than cybernetic patterns.
3. Subjective experience either doesn't exist, or is unimportant because it is some sort of ambient or peripheral effect.
4. What Darwin described in biology, or something like it, is in fact also the singular, superior description of all creativity and culture.
5. Qualitative as well as quantitative aspects of information systems will be inexorably accelerated by Moore's law.
And finally, the most dramatic:
6. Biology and physics will merge with computer science (becoming biotechnology and nanotechnology), resulting in life and the physical universe becoming mercurial; achieving the supposed nature of computer software. Furthermore, all of this will happen very soon! Since computers are improving so quickly, they will overwhelm all the other cybernetic processes, like people, and will fundamentally change the nature of what's going on in the familiar neighborhood of Earth at some moment when a new "criticality" is achieved - maybe in about the year 2020. To be a human after that moment will be either impossible or something very different than we now can know.
At the Salon we were all familiar with at least pieces of this sketched belief system (though in some cases the familiarity went with a degree of contempt). Our conversation wound around each of the issues distinctly with an effort to pull out some of the details and see what we thought of them specifically.
Once we had developed a common understanding of the six points there was some discussion about the degree to which the ideas were coherent. Do these ideas actually hang together? Is the final eschatological point really entailed by the first five? If so, do you need all five? And assuming it all hangs together, is everything true?
The truth question shifted the conversation to our own agreements and disagreements about the state of computer science and artificial intelligence, but eventually we swung around to talk about the beliefs in more "religious" terms. How is cybernetic totalism psychologically stable? Leaving aside issues of truth, what do people get from being cybernetic totalists that causes them to hold onto and spread the ideas?
In the last part of the essay, Lanier explained why he didn't think the future would be as bleak as all that, but due to time constraints we mostly didn't talk about his vision of where things were headed and why.
Thursday, December 6, 2007
A 20-minute chat at TED (which appears to have been given when the actual algorithm was only in the intuition stage) entitle "Brain science is about to fundamentally change computing".
"Prospects and Problems of Cortical Theory" given at UC Berkeley on October 7, 2005 - it's a little over an hour long and gives all the basics of the theory. (Warning: the words don't perfectly sync with the images... it's a pretty good talk but the medium imposes on you a little.)
This talk, "Hierarchical Temporal Memory: Theory and Implementation" is less chatty and spends more time on the theory. There's a pitch at the end for the software tools his startup wrote.
Significant parts of this material are also covered in his 2004 book On Intelligence and he has working algorithms (designed by Dileep George) with publicly accessible code (if you don't mind some non-standard licensing) that you can find by clicking around the website of his startup company, Numenta. The code is implemented in C++ with an eye towards scaling to clusters and has Python bindings.
Our actual discussion of the material was weaker than normal. Mostly we went over the algorithms and talked about whether they might be able able to capture various kinds of cognition and/or concept formation. Part of the problem may have been that we turned out to be a much more literate crowd than a video watching crowd.
(The above link goes to the full online text, alternately try Amazon or Google Books)
From the Google Books blurb: "Suppose legislators propose that armed robbers receive life imprisonment. Editorial pages applaud them for getting tough on crime. Constitutional lawyers raise the issue of cruel and unusual punishment. Legal philosophers ponder questions of justness. An economist, on the other hand, observes that making the punishment for armed robbery the same as that for murder encourages muggers to kill their victims. This is the cut-to-the-chase quality that makes economics not only applicable to the interpretation of law, but beneficial to its crafting. Drawing on numerous commonsense examples, in addition to his extensive knowledge of Chicago-school economics, David D. Friedman offers a spirited defense of the economic view of law. He clarifies the relationship between law and economics in clear prose that is friendly to students, lawyers, and lay readers without sacrificing the intellectual heft of the ideas presented. Friedman is the ideal spokesman for an approach to law that is controversial not because it overturns the conclusions of traditional legal scholars--it can be used to advocate a surprising variety of political positions, including both sides of such contentious issues as capital punishment--but rather because it alters the very nature of their arguments. For example, rather than viewing landlord-tenant law as a matter of favoring landlords over tenants or tenants over landlords, an economic analysis makes clear that a bad law injures both groups in the long run."
The book covered basic economic concepts such as economic efficiency, externalities, and Coase's Therem. The author's honesty about the way the concept of "property" involves a substantial amount of work and thinking to get right is charming and intellectually productive.
When something is owned, in a rather deep sense what's owned in is not simply "a thing" but a complicated bundle of rights related to the thing. With "my land", for example, there are: the right to build on land, the right to not have certain things built on neighboring land, the right to control the movement of physical objects through airspace above the land, the right to the minerals beneath the surface, the right to have it supported by neighboring land, the right to make loud noises on the edge of the land, and so on and so forth.
Once bundling of rights is acknowledged to be going on in potentially arbitrary ways it opens up the discussion to questions about how rights relating to different things should be bundled, who should initially own the bundles, what sort of transfer schemes should hold. Mr. Friedman takes a position on what should be happening but it's rather complicated. Chapter 5 has a "spaghetti diagram" showing a variety of possible initial assignments of different kinds of rights, further ramified by the relative costs and benefits that accrue to various outcomes after rights have been renegotiated by various means (up front purchase contract, after the fact court dispute, etc).
The "thing that should happen" isn't a single way of doing things but a situationally sensitive rule that requires estimation of the costs and benefits parties on each side of an allocation of rights faces, and further guesses about who is likely to be able to see how many of the costs and benefits (the parties involved, the courts, etc), recognition of the number of people owning various rights and how many people any particular agent would have to negotiate with in order to get anything useful, and estimates of transaction costs (like the cost of making all these estimates) to boot.
If the initial assignment of rights is done poorly, various game theoretic barriers to collective action can arise. Given the complexity of the decision, this is not necessarily a conclusion that inspires happiness and hope. Mr. Friedman discusses institutions for working through these issues, including an examination of the claim that the best institution for achieving economically efficient outcomes in the long run is common law.
Our discussion of the book ranged rather widely. One of the juiciest veins of thought we found was in the question of bundling rights in novel ways and trying to understand how they might be rebundled by the market over time. For example, the idea of "salesright" (inspired by copyright) was a sort of "horrifying or amazing" concept that fell out of the discussions. Sales right would be "the right to a sale given certain propaganda efforts". If one company advertised at you, and you ended up buying something in their industry (when you wouldn't otherwise have done so) but you buy something from one of their competitors... in some sense the competitor has "stolen a sale" that "rightfully" belonged to the company that paid for the advertisement. (And you thought patents and copyright were bad :-P)
Another theme we examined (that Mr. Friedman mostly ignored) was the similarity between the questions of rights bundling and what, in in modern philosophy, is known as the Goodman's new problem of induction.
Saturday, September 29, 2007
A quote from decomplexity's Amazon review: Kauffman's start point is autocatalysis: that it is very likely that self-reproducing molecular systems will form in any large and sufficiently complex chemical reaction. He then goes on to investigate what qualities a physical system must have to be an autonomous agent. His aim is to define a new law of thermodynamics for those systems such as the biosphere that may be hovering in a state of self-organised criticality and are certainly far from thermodynamic equilibrium. This necessitates a rather more detailed coverage of Carnot work cycles and information compressibility than was covered in passing in his earlier books. It leads to the idea that a molecular autonomous agent is a self-reproducing molecular system capable of carrying out one or more work cycles.
But Kauffman now pushes on further into stranger and uncharted territory. The Universe, he posits, is not yet old enough to have synthesised more than a minute subset of the total number of possible proteins. This leads to the fundamental proposition that the biosphere of which we are part cannot have reached all its possible states. The ones not yet attained - the `adjacent possible' as Kauffman terms it - are unpredictable since they are the result of the interaction of the large collection of autonomous agents: us - or rather our genes - and all the other evolving things in the external world. His new fourth law of thermodynamics for self-constructing systems implies that they will try to expand into the `adjacent possible' by trying to maximise the number of types of events that can happen next.The book covers more than that (see the rest of the quoted review) but we focused on the early part of the book. Kauffman points out that he wasn't really doing science and he's right about that. However, he had a number of ideas arranged in a sequence that made some sense to us... the trick was that in between the ideas there appeared to be high flying prose and analogies to mathematical concepts where perhaps the details of the original math weren't being faithfully imported. Or maybe they were and we just couldn't see it? It was interesting to hypothesize the existence of mechanical connections and see if we could reconstruct some of them.
We spent some time with autocatalytic sets and looked into some assumptions about what it took to make them work right (various kinds of neighborliness of molecular species, differential rates of reaction, etc) especially the presumption that real chemistry possessed the "algorithmic generosity" required. It inspired an interesting analogy to the "debates" between working biologists and theists promoting "intelligent design"... one could imagine people insisting that autocatalysis was a sufficient "algorithm" to explain biogenesis while another group insisted that chemistry had to work in certain ways for the algorithm to successfully operate and that the fact that chemistry did work in such ways was evidence that it had been "designed".
There was also some discussion around Kauffman's claims that the processes or parameters of evolution could not be pre-stated or predicted in any meaningful way. It seems that he was inspired by theorems about computability but it would have been nice if he'd spent more time wondering if the axioms involved in those theorems really applied to biology at the level of biology that humans are interested in. It appeared that he believed biological systems were doing something "non algorithmic" in the sense that you'd have to know every detail of everything to predict what an ecosystem (or an economy) would think up next. It would have been nice if his analogy for scientific theories was "lossy compressions of reality with possibly calculable error bounds" instead of something more pristine. (Mysterians were mentioned as having a vaguely similar attitude towards cognition... seeming to want to find something that was impossible to understand.)
Sunday, July 15, 2007
The theory of facilitated variation
Gerhart & Kirschner, May 2007
This theory concerns the means by which animals generate phenotypic variation from genetic change. Most anatomical and physiological traits that have evolved since the Cambrian are, we propose, the result of regulatory changes in the usage of various members of a large set of conserved core components that function in development and physiology. Genetic change of the DNA sequences for regulatory elements of DNA, RNAs, and proteins leads to heritable regulatory change, which specifies new combinations of core components, operating in new amounts and states at new times and places in the animal. These new configurations of components comprise new traits. The number and kinds of regulatory changes needed for viable phenotypic variation are determined by the properties of the developmental and physiological processes in which core components serve, in particular by the processes' modularity, robustness, adaptability, capacity to engage in weak regulatory linkage, and exploratory behavior. These properties reduce the number of regulatory changes needed to generate viable selectable phenotypic variation, increase the variety of regulatory targets, reduce the lethality of genetic change, and increase the amount of genetic variation retained by a population. By such reductions and increases, the conserved core processes facilitate the generation of phenotypic variation, which selection thereafter converts to evolutionary and genetic change in the population. Thus, we call it a theory of facilitated phenotypic variation.
This paper is, roughly, an eight page long abstract for Gerhart & Kirschner's book "The Plausibility of Life". It covers a lot of ground idea-wise, with entire chapters in the book compressed down to a few paragraphs in the paper. The paper has a really high idea-to-word density (which is great in some ways) but if you're looking for elaborated concrete examples to ground the theory or inspire your own intuitions, the book is the probably the place to go.
A lot of our discussion revolved around laying out the theory of G & K and trying to find equivalent patterns (of conserved structures reused and able to interact via the influence of thin regulatory signals) in the processes of science and the algorithms of machine learning.
Thursday, June 28, 2007
I propose a program for seeking thick, data-rich analogies between learning systems. The goal is to understand why evolution is able to design species; why animals are able to acquire useful behavior patterns; and why communities of scientists are able to find predictively useful theories. Each individual system is already being studied by a large number of system-specific specialists. My proposal offers a framework for enrolling these system-specific research efforts into a single, larger endeavor.
Specifically, I argue that each of the above systems can be understood as a procedure of natural selection undertaken on a certain kind of fitness landscape with a certain kind of variation-making. I argue that if we make the fitness landscape and variation-making central objects of study, we will be able to move past the thin cross-system models that have previously been offered to make rich contact with the data.
Having talked about the paper, the abstract appears especially abstract relative to the content of the paper. For example, there's no mention in the abstract of the No Free Lunch Theorem or Occam's Razor or Grue or...