Saturday, December 27, 2008

Marshall Kirkpatrick and Data Mining

[somehow this got lost in drafts in my blog - back in August]

Marshall KirkPatrick's RRW post Four Ad-Free Ways that Mined Data Can Make Money is interesting.

In approx 2002 I wrote version 1.0 of the RightNow Tech sentiment analysis software to analyze the positive versus negative overall tone of incoming support requests/emails in the CRM system. I called it 'Emotix', but the Marketing people renamed it SmartSense. Later Steve Durbin and I bolted on a POS tagger to get a bit more accuracy given language forms like 'I am not very happy' and 'I am very angry' require the modifiers be taken into account.

Basically Emotix was tasked to attach a numerical positive/negative emotional score to each incoming request.. such that the queue of requests could be ordered to service angry customers first. We weren't interesting in extreme accuracy.. just a decent ordering that was fairly predictive.

There were two interesting stories to version 1.0. The first concerned the negative/neutral/positive word dictionary. Basically my office mate and I sat down and compiled a list of every positive and negative word we could find and put them on a wide numerical scale. When it was time for swear words, we shut the door and howled in laughter as we threw mock insults around the room. The Wicked Words book was an invaluable source of inspiration.

When it came time to litmus test our word ratings we put co-workers in front of a terminal that would put random words from the list and ask them to agree to disagree with the rating. Needless to say we had to forewarn everyone that it WOULD be offensive and that this did not constitute any form of harassment. Watching the process was excruciating and hilarious.

The second humorous story concerned testing and training the system on real support messages. My favorite data set was from a well known customer that made specialty ice cream. Their customers tended to begin each service contact with a large block of text extolling the virtues of the company and its ice cream.. with the negative comments on their experience with the ice cream last... usually written in apologetic terms. Obviously the creamery wanted to get the custmers with real negative experience problems to the top of the queue.. ice cream is all about the eating experience. But how do you filter out and bias for the overall 'fan mail' tone of 90% of the requests? Fun stuff to work on. Some details here, others here.

Recently Steve extended it to work with both the RNT Voice product and the marketing-automation product as well. The big lesson here as an engineer is that good enough can be just fine and often it will be used in unexpected ways down the line. The largely un-refactored code is still running and processing billions of textual contacts every year. Ok this is exagerated a bit.. but it hasn't really been rewritten, just optimized frequently.

No comments: