Thursday, April 24, 2008

AI Luminaries and their claims

This is a bit of a rant. In our weekly AI colloquium we have covered two high profile AI people, Dr. Robert Hecht-Nielsen and his Confabulation theory and Jeff Hawkins and his book On Intelligence plus his ideas around Hierarchical Temporal Memory.

Neither of the two above people are terribly famous in AI academic circles per se, but they do get some press. They are both founders of very successful business in HNC (Now FairIssac) and Palm/PalmPilot. Particularly in the case of HNC's use of AI for fraud detection in financial transactions, I am very impressed. Yet this is where it gets ugly.

If you read Hawkin's book or listen to a Youtube lecture of Dr. Hecht-Nielsen you would think these guys have invented the next great AI of the 'I have modeled the brain' variety. They both demonstrate very interesting neural network inspired architectures and basic computation. They are both complete with deep layering, feedback loops and other structures that NN people have known will work for years.

Yet both of them will just repeat similar claims that this is how the brain actually works and with this architecture real cognition is possible, even potentially trivial? Hogwash. Batshit Insanity.

These aren't my words, they were used to describe Stephen Wolfram's work on Cellular Automata and his claims that his flavor of CAs can build anything and describe all physics. He makes lots of strong claims against a theory he spend decades toiling on nearly alone.

Would Peter Norvig ever make these types of claims? I tend to think no way.

First, how can you make the claim that this architecture is really how the brain works and as such will lead to cognition or reasoning? To me that is what they are.. architectures of computation.

Where are the programs? Where is the embedded/encoded/learned method that can actually reason in some logical or nearly logical fashion? Ie chaining facts together to create derived knowledge? Picking apart disparate background statements/data to answer queries for information? How does this architecture answer the question of what are necessary and sufficient conditions to produce computerized general reasoning?

It's a massive leap of faith for me to take for granted that a neural architecture, with all it's bells and whistles, will just lead to reasoning. Doesn't it take just one counter-example of something that such an architecture and associated learning methods can not learn correctly to break the bold claims?

To me that is a central lesson of genetic algorithm theory, people for years went around insisting that the GA was a better optimizer (mousetrap) than all that had come before. They invented theories to describe it's behavior and made bold claims. Yet Wolpert and Macready come along and show the No Free Lunch theorems. It basically blew many of the bolder claims of GA superiority to hell. It has now spread into general Machine Learning as well.

I think people, particularly ones in AI with monster egos, need to exercise some humility and not make such strong claims. Didn't that type of behavior contribute to the AI Winter?

Every time I hear or read these types of claims my mind sees an image of Tom Hanks in Cast Away dancing around his fire and saying "Yes! Look what I have created! I have made fire!! I... have made fire!". The horrible irony of the scene is that he remains fundamentally lost, couldn't find his island on a map and is no closer to finding home as a result of making his fire.

Sunday, April 20, 2008

Series of Economist Stories on Digital Nomads

During a flight to Silicon Valley last week for AdTech I read a special report in the Economist on Digital Nomads. The intro article sets the tone and talks about how the mobility of cellular devices are obviously quickly changing the world. I love the description of early digital nomads as more akin to astronauts who must carry everything with them (cables, disks, dongles etc).

My comments:
I do tend to carry a good inch of paper everywhere I go, 90% of which are CS/AI research papers or dissertation notes to work on as I have the time. Opening the notebook is way to0 much of a hassle as a mobile reading device (on an airplane). He also states that engineers at Google tend to carry only a smart phone of some kind and no laptop when traveling. Is this true? I find this a bit hard to believe unless they are managers and not active coders.. I can't imagine writing code on a blackberry or worse and iPhone.
I'll admit I have no smart phone and do carry a laptop everywhere. I'd like it to be smaller, but not at the cost of horsepower or decent display size. I tried using a Microsoft powered Samsung smart phone and hated it, what it gave me in increased cool functions I sacrificed in phone function.

The next article in the series discusses the new found benefits and costs of our ability to work everywhere and anywhere.

My comments:
My laptop is definitely a desktop replacement for me, I want mobility.. but I am old fashioned enough to want a regular desk to work at.. even if that is my current count of three different ones at different times of the week. I have tried the coffee shop thing.. it works to a point but I find myself only being efficient for the first 2-3 hours then it goes to hell as I start hearing and being distracted by the conversations around me. I suppose it would be fine if I needed to do mostly emailing and short attention span coding. I also think that most business conversations are sensitive enough that talking in a public space is not acceptable in my mind... nothing to hide, yet why broadcast every mundane and not so mundane detail to the people around you?
The parts about culture clashes from old cubical work and new mobile work are spot on. It points to the need to trust every employee and set a tone that what matters is production and not the act of working.

The third article in the series explores the need for new types of spaces and architecture for this new way of working.

My comments:
I loved this one, even if it assumes that I 100% embrace the nomad ethos. Perhaps if I had such a space to work, and people to work with in that space I'd abandon some of my older ways. The closest thing I can imagine these 'third places' being is a combination of a student-union building and a college library. Very non uniform places with nooks and crannies for every type of 'work'. The Bozeman paper just had an article on a new business for 'on demand offices' and I saw two others described in a Seattle tech magazine at the airport. Winning idea.. 50% of the reason I go to campus two days a week is for socialization.

Great quote:
These places are "physically inhabited buy psychologically evacuated" ... leaving people feeing "more isolated that if the cafe where merely empty".

Great food for thought in the articles. I do notice that in my travels I see one thing that disturbs me. It is some people's inability to ignore their cell phone or crackberry when the are engaged in a face to face conversation or meeting. I was very appreciative of one executive's recent demonstration of 100% ignoring his device when it rang or vibrated.. he didn't even flinch. This was the exception to the rule over the past week.

I need to let this brew more.. at some point I am sure it will spur some good ideas. Perhaps there is an algorithm or platform waiting to be discovered that will spur us to look up from our devices and engage each other again. I suspect a big part of our addiction to them is that it's a much more high bandwidth information pipe that simple conversations are.

Friday, April 18, 2008

Using human relevance judgements in search and advertising

This is old news on a couple of dimensions. Read Write Web had a post on how Google uses human relevance studies to help judge/QA their search results. This resulted from an interview that Peter Norvig gave to MIT Technology Review and caused some commenting in the blogosphere (NewYorkTimes Tech Blog, Goolge Blogoscoped). Old news on old news.

We now know that both Yahoo and Microsoft are using (to some degree) human studies to evaluate computational advertising algorithms (see this and this). Evaluating the correlation of what informational item an algorithm predicts, vs what humans think, is relevant to a context is the performance metric of your algorithm.

Question: When will TREC have a computational advertising contest?

Thursday, April 17, 2008

Text Summarization and Advertising

Recently read another CompAdvert paper (CIKM'07) from the Yahoo group, Just-in-Time Contextual Advertising. They describe a system where the current page is scraped on-line via a javascript tag, summarized and then that summary is passed to servers to match with Ad listings. Interesting points are:
  • 5% of page text carefully chosen can yield 97+% of full-text advert matching relevance
  • Best parts of the document are URL, referring URL, title, Meta and headings.
  • Adding in a classification against a topical taxonomy adds to the accuracy of the ad matches.
  • They judged the ad matching relevance against human judgments of ad to page relevance.
I found these papers within the last few months as OthersOnline focused on behavioral based advertising. In many ways their finding are interesting, affirming and unsurprising. Interesting in that they are pushing the state of the art in advert matching, and affirming in that we @ OO are on the right track. Unsurprising in that using the document-fields about is the classic approach to indexing webpages and documents.
Of course internet search engines used this for years (it defines the SEO industry's eco-system), and the old/retired open source engine HtDig has had special treatment of those fields since the late 90s. The difference now is the direction, the documents are the "query" and the hits are the ads. Best part about the method is that it's cheap... javascript + the browser becomes your distributed spider and summarizer of the web.
I do love finding these papers.. we just don't have the time or resources to have a study like this and confirm the approach with a paid human factors study. Just go forward on gut educated feel day to day and the human measure is if we get clicks on the ads.
This approach is similar to the one we outlined and implemented before finding this paper. The difference is what we do with the resulting "query", using the signal to learn a predictive interest model of users.
Still no mention of any relative treatment of words within the same field... one would assume this would move the needle on relevance as well.
I still believe that this type of summarization approach can be used to make an implicit page tagger and social recommender like ... if you can filter the summary based upon some knowledge of the users real (as opposed to statistical) interests. Key route to auto-personalization of the web.

Friday, April 04, 2008

Itemset mining in data streams

We covered a nice paper by de Graaf et al of Leiden University in the AI colloquium. Clustering Co-occurrences of Maximal Frequent Patterns in Streams. It deals with the problem of finding frequent itemsets in a datastream. Ideally you want an incremental algorithm for this with an 'updatable model' rather than being forced to reprocess the entire data/transaction sequence when adding new data. The paper's approach has the extra benefit that the itemsets are clustered by similarity as well. I really enjoy using and learning about algorithms that have nice side-effects.

A rough overview of the algorithm is that it does three basic operations with incoming data. First it builds a support model of patterns encountered. It does this with a with a reinforcement and decay technique, reinforcing support for patterns encountered in the current transaction and decaying those that don't. Second it maintains a geometric organization of itemsets according to a distance metric in a boxed 2-D area. As new data is processed itemsets' coordinates in the (x,y) box move and shift around according to their similarity with other patterns. Third it performs a pattern merging/splitting mechanism to derive new patterns for the model to track. new patterns get a random (x,y) position.

At the termination of processing some amount of data, you are left with a list of itemsets and their support/frequency as well as a nice grouping by similarity.

One advantage of his presentation is that it is stripped of all excess complexity. They well note that it learns an approximation of what you would get from a full-data-scan of a traditional itemset miner. Fine with me.. I don't get hung up on exactness and have lots of faith that incremental model building works well in practice.

The minor flaw of the paper is that they fail to point out (or notice??) that what they have built is a hybrid of a Swarm Intelligence and a Self Organizing Map. The Swarm/Ant portion comes from the reinforcement & decay of the support model, and the SOM from the geometric clustering of the itemsets. On could duplicate this algorithm in spirit by implementing Ant System + SOM with the merging/splitting for new pattern production. By 'Ant System' here I refer to the spirit of an ant system where you use pheromone reinforcement and decay of a model, actual ants traversing a path in a graph are not necessary. The cells in the SOM would contain a sparse vector of itemsets and apply the standard rules for updating.

Yet, even as I see the connection, this is a pointy-headed comment. The paper is nice in that the algorithm is presented without flourish in a straight forward way... sometimes using the word 'hybrid' and casting your work that way is just a form of academic buzzword bingo.

I'll definitively look to implement something similar to this ASAP. I may skip the merging/splitting and use a standard itemset miner offline over the 'last X transactions' and form a itemset pattern dictionary. Only itemsets in the dictionary will be tracked and clustered with the data stream, and every so often run the offline algorithm to learn new patterns and support to merge into the dictionary.