Sunday, December 28, 2008

Statutory Inventions and the Public Domain

This is interesting: USPTO Statutory Invention Registration

Basically it's a way of publishing an invention to the public via the USPTO. Rarely used for obvious reasons.

Question #1: Why don't open source people apply for these USPTO invention declarations? It seems to be the patent law equivalent of a BSD/MIT license.. ie "use/extend it for any purpose but it's still my work and you can't claim it as your own work."

The more interesting tidbit on there is, while obvious, a potential source of great technical material. Since 1999 when you file a patent it is published 18 months after the file date. Once an application is abandoned the application and is published by the USPTO and it becomes public domain.

Question #2: Are rejected patents whose appeals have run out then public domain? I can't seem to find a clear answer.

How many rejected/abandoned software patents etc out there from Microsoft, Oracle, IBM, etc are there that contain very valuable algorithms and techniques that are now public domain? Yes this is a bit like looking for gold in the trash can...

Notes:

The European Patent Office by treaty publishes many USPTO patent apps.. and honestly has a better interface for getting the status of your patent than I have yet found at the USPTO.

The Patent Reform Act of 2005 (Republican sponsored) was an attempt to close the publication 'loophole'. The Patent Reform Act of 2007 (Democrat sponsored) keeps the current publication system in place. Neither is law (yet).

Saturday, December 27, 2008

New Year's Resolutions - Blog more

New Years resolution - blog more. I've been very busy doing cool stuff at Others Online and in the process developed some tunnel vision to stay on task.

Here's to hoping that more blogging will cause me to see things in a different light more easily as well as get me more in 'writing mode' to finish the PhD before summer 2009.

Twittering has replaced blogging as my outlet for the second half of the year.. yet the 140 char format isn't much good for a personal musing and research blog.

Marshall Kirkpatrick and Data Mining

[somehow this got lost in drafts in my blog - back in August]

Marshall KirkPatrick's RRW post Four Ad-Free Ways that Mined Data Can Make Money is interesting.

In approx 2002 I wrote version 1.0 of the RightNow Tech sentiment analysis software to analyze the positive versus negative overall tone of incoming support requests/emails in the CRM system. I called it 'Emotix', but the Marketing people renamed it SmartSense. Later Steve Durbin and I bolted on a POS tagger to get a bit more accuracy given language forms like 'I am not very happy' and 'I am very angry' require the modifiers be taken into account.

Basically Emotix was tasked to attach a numerical positive/negative emotional score to each incoming request.. such that the queue of requests could be ordered to service angry customers first. We weren't interesting in extreme accuracy.. just a decent ordering that was fairly predictive.

There were two interesting stories to version 1.0. The first concerned the negative/neutral/positive word dictionary. Basically my office mate and I sat down and compiled a list of every positive and negative word we could find and put them on a wide numerical scale. When it was time for swear words, we shut the door and howled in laughter as we threw mock insults around the room. The Wicked Words book was an invaluable source of inspiration.

When it came time to litmus test our word ratings we put co-workers in front of a terminal that would put random words from the list and ask them to agree to disagree with the rating. Needless to say we had to forewarn everyone that it WOULD be offensive and that this did not constitute any form of harassment. Watching the process was excruciating and hilarious.

The second humorous story concerned testing and training the system on real support messages. My favorite data set was from a well known customer that made specialty ice cream. Their customers tended to begin each service contact with a large block of text extolling the virtues of the company and its ice cream.. with the negative comments on their experience with the ice cream last... usually written in apologetic terms. Obviously the creamery wanted to get the custmers with real negative experience problems to the top of the queue.. ice cream is all about the eating experience. But how do you filter out and bias for the overall 'fan mail' tone of 90% of the requests? Fun stuff to work on. Some details here, others here.

Recently Steve extended it to work with both the RNT Voice product and the marketing-automation product as well. The big lesson here as an engineer is that good enough can be just fine and often it will be used in unexpected ways down the line. The largely un-refactored code is still running and processing billions of textual contacts every year. Ok this is exagerated a bit.. but it hasn't really been rewritten, just optimized frequently.