Two-thirds of the approximately 700 software entrepreneurs who participated in the 2008 Berkeley Patent Survey report that they neither have nor are seeking patents for innovations embodied in their products and services. These entrepreneurs rate patents as the least important mechanism among seven options for attaining competitive advantage in the marketplace. Even software startups that hold patents regard them as providing only a slight incentive to invest in innovation.
Musings about artificial intelligence, search engines, machine learning, computational advertising, intellectual property law, social media & widgets, and good beer.
Tuesday, March 22, 2011
Pamela Samuelson on startups and software patents
Thursday, March 10, 2011
Comments re "The Noisy Channel: A Practical Rant About Software Patents"
The Noisy Channel: A Practical Rant About Software Patents - [My comments cross-posted here]
Daniel, nice writeup.
I worked for a BigCo and filed many patents. It was a mixed bag. The time horizon is so long that even after I’ve been gone for 3.5 years many of them are still lost in the USPTO. Average time for me to see granted patents was 5+ years.
Here are my biased opinions:
1) Patents really matter for BigCos operating on a long time horizon. It’s a strategic investment.
2) Patents are nearly worthless for a Startup or SmallCo. The time horizon is way past your foreseeable future, and thus the whole effort is akin to planning for an alternate reality different than the current business context. Throwing coins in a fountain for good luck is about as relevant. You simply are better off getting a filing date on a provisional design writeup and hiring an engineer with the money you’d spend on Patent lawyers.
3) As an Acquiring company looking at a company to acquire, Provisional or Pending Patents are a liability not an asset. They take time and resources to push to completion for a strategy of deterrence.
4) Patents are mostly ignored in the professional literature. Take Sentiment Analysis as one example. Sentiment Analysis exploded in 2001 w.r.t. Academic publishing, yet there are more than a few older patents discussing good technical work on Sentiment Analysis. I’ve NEVER seen an algorithm in a patent cited in a paper as previous work. And I have seen academic papers with algorithms already 90% covered by an older patent… and the papers are cited as ‘novel work’.
5) Finding relevant patents is ludicrously hard. It might be the most challenging problem in IR w.r.t. a corpus IMO. Different words mean the same thing and vise versa due to the pseudo-ability in a Patent to redefine a word away from the obvious meaning. With two different lawyers rendering the same technical design into a writeup and claims results in wildly different work product.
6) I’ve seen some doosey granted Patents. Things that appear to either be implementations of very old CS ideas into new domains.. or worse stuff that would be a class project as an undergrad.
It’s just plain ugly in this realm.
On Strategic Plans

Sunday, March 06, 2011
Hilarious system calls in the BeOS
Returns 1 if the computer is on. If the computer isn't on, the value returned by this function is undefined.
Returns the temperature of the motherboard if the computer is currently on fire. If the computer isn't on fire, the function returns some other value.
#include <stdio.h>
#include <be/kernel/OS.h>
int main()
{
printf("[%d] = is_computer_on()\n", is_computer_on());
printf("[%f] = is_computer_on_fire()\n", is_computer_on_fire());
}
Friday, March 04, 2011
Contractor Needed: HTML/CSS/Javascript Ninja
Thursday, March 03, 2011
Job Post: Software Engineer/Scientist: Ad Serving, Optimization and Core Team
LOCATION: the Rubicon Project HQ in West Los Angeles or Salt Lake City
the Rubicon Project is on a mission to automate buying and selling for the $65 billion global online advertising industry. Backed by $42 million in funding, we are currently looking for the best engineers in the world to work with us.
Team Description
The mission of the Core Team is to build robust, scalable, maintainable and well documented systems for ad serving, audience analytics, and market analysis. Every day we serve billions of ads, process terabytes of data and provide valuable data and insights to our publishers. If building software that touches 500+ million people every month is interesting to you, you'll fit in well here.
Some of the custom software we've built to solve these problems include:
A patented custom ad engine delivering thousands of ad impressions per second with billions of real time auctions daily
A real time bid engine designed to scale out to billions of bid requests daily
Optimization Algorithms capable of scheduling and planning adserving opportunities to maximize revenue
Client side Javascript that performs real-time textual analysis of web pages to extract semantically meaningful data and structures
A web-scale key value store based on ideas from the Amazon Dynamo paper used to store 100s of millions of data points
Unique audience classification system using various technologies such as Solr and Javascript for rich, real-time targeting of web site visitors
Data Mining buying and selling strategies from a torrent of transactional data
Analytics systems capable of turning a trillion data points into real business insight
Job Description
Your job, should you accept it, is to build new systems, new features and extend the functionality of our existing systems. You will be expected to architect new systems from scratch, add incremental features on existing systems, fix bugs in other people's code and help manage production operations of the services you build. Sometimes you'll have to (or want to) do this work when you are not in the office, so working remote can't scare you off.
Most of our systems are written in Perl, Java, and C, but we have pieces of Python, Clojure and server-side Javascript as well. Hopefully you have deep expertise in at least one of these; you'll definitely need to have a desire to quickly learn and work on systems written in all of the above.
You should also have worked with and/or designed service oriented architectures, advanced db schemas, big data processing, highly scalable and available web services and are well aware of the issues surrounding the software development lifecycle. We expect that your resume will itemize your 3+ years experience, mention your BS or MS in Computer Science and be Big Data Buzzword Compliant.
Bonus points for experience with some of the technologies we work with:
- Hadoop
- NodeJS
- MySql
- Solr/Lucene
- RabbitMQ
- MongoDB
- Thrift
- Amazon EC2
- Memcached
- MemcacheQ
- Machine Learning
- Optimization Algorithms
- Economic Modeling
Monday, February 28, 2011
A note on software teams and individuals
- What direction I am going relative to team goals.
- What specific items I am working on today.
- Does anyone need any help from me?
- Do I need any help with my work?
- Are we as a group going the right direction (towards the goal)?
- Will we meet the timeline and/or functional goals?
- Is there any functional or task ambiguity that needs working out?
- Are any course corrections needed?
- Does this person communicate well and often?
- Does this person have the capability and desire to resolve ambiguity on their own when possible?
Monday, February 07, 2011
RightNow - Our cowboys ride code

RightNow Technologies serves about 10 billion [customer interactions] a year through he companies and institutions it works with. “Every person in North America has used one of our solutions about 25 times,” says Gianforte.
The quality of life here is a huge advantage, but more importantly, says Gianforte, “there’s a ranch saying around here that goes,‘When something needs to get done, well then, we’re just gonna get ‘er done.’ In many environments, they have to form a committee, pull in consultants and such to make things happen, but our clients appreciate that when something needs to get done, we can easily make that hap pen because of the work ethic here.”
Sunday, January 30, 2011
The Provenance of Data, Data Branding and "Big Data" Hype
The credibility of where data comes from in all these "big data" plays is absolutely crucial. Waving hands re "algorithms" won't cut it. @nealrichter Jan 27, 1010 Tweet
- Web analytics - crunch web traffic and distill visitation and audience analytics reports for web site owners. Often they use these summaries to make decisions and sell their ad-space to advertisers.
- Semantic Web APIs - crunch webpages, tweets etc and return topical and semantic annotations of the content
- Comparison shopping - gather up product catalogs and pricing to aggregate for visitors
- Web publishers - companies who run websites
- Prediction services - companies that use data to predict something
Friday, January 14, 2011
Finance for Engineers
- His MIT and Stanford MBA students often run off to found start-ups and forget the basic Wilson Lumber case. By the time they approach him for help it's too late and they are in Mr Wilson's position: shut-down, take in $$ and lots of equity dilution (and loss of control) or slow growth dramatically.
- Also a quote along the lines of "Startups founded by MIT PhDs fail at a rate above far average".
Thursday, December 30, 2010
List of Best Paper awards in CS/AI/ML conferences
http://jeffhuang.com/best_paper_awards.html
Wednesday, December 29, 2010
Managing Open Source Licenses
- A clear company policy is set on what open source licenses are allowed and how developers can use open source come or components.
- The corporate code is cleanly annotated with any third party attributions (see below).
- Open Source code that has bad licenses for commercial usage is identified and removed before release.
- A Bill of Materials is created for each release listing third-party software in the release.
- Necessary copyright or other notices appear in About dialogs, manuals or product websites.
/*
* XYZ.com Third-party or Open Source Declaration
* Name: Bart Simpson
* Date of first commit: 04/25/2009
* Release: 3.5 “The Summer Lager Release”
* Component: tinyjson
* Description: C++ JSON object serializer/deserializer
* Homepage: http://blog.beef.de/projects/tinyjson/
* License: MIT style license
* Copyright: Copyright (c) 2008 Thomas Jansen (thomas@beef.de)
* Note: See below for original declarations from the code
*/
Friday, December 17, 2010
Stochastic Universal Sampling/Selection
Monday, November 22, 2010
Computing, economics and the financial meltdown (a collection of links)
Information technology has enabled the development of a global financial system of incredible sophistication. At the same time, it has enabled the development of a global financial system of such complexity that our ability to comprehend it and assess risk, both localized and systemic, is severely limited. Financial-oversight reform is now a topic of great discussion. The focus of these talks is primarily over the structure and authority of regulatory agencies. Little attention has been given to what I consider a key issue—the opaqueness of our financial system—which is driven by its fantastic complexity. The problem is not a lack of models. To the contrary, the proliferation of models may have created an illusion of understanding and control, as is argued in a recent report titled "The Financial Crisis and the Systemic Failure of Academic Economics."
Krugman's essay at the time How Did Economists Get It So Wrong? gave a nice history of economic ideas, the models behind and his interpretations of their correctness.
The theoretical model that finance economists developed by assuming that every investor rationally balances risk against reward — the so-called Capital Asset Pricing Model, or CAPM (pronounced cap-em) — is wonderfully elegant
[snip]
Economics, as a field, got in trouble because economists were seduced by the vision of a perfect, frictionless market system.
[snip]
H. L. Mencken: “There is always an easy solution to every human problem — neat, plausible and wrong.”
“You put chicken into the grinder”—he laughed with that infectious Wall Street black humor—“and out comes sirloin.”
Poormojo "Any sufficiently advanced financial instrument is indistinguishable from fraud."
Recipe for Disaster: The Formula That Killed Wall Street
Wednesday, October 27, 2010
Review of "Learning to Rank with Partially-Labeled Data"

Tuesday, October 26, 2010
Stanford Computational Advertising course - Fall 2010
Monday, August 16, 2010
Strategic Marketing
1) Making decisions by experimentation versus meetings+intuition is crucial.
2) Don't assume your role is to know the answer. Your role is really to work out how to find the answer as quickly as possible.
3) Brand is unimportant when customers can observe you are meeting their needs.
4) Brand is important when they can't search/observe and must reason with less data.
5) Don't price your products based upon cost or competition, work out your true value to the customer.
Tuesday, July 27, 2010
Dissertation done!
Really happy to be done.
Advice for working professionals attempting a PhD:
1) pick something relevant to your work.
2) think twice about a theoretical topic.
3) don't make it longer/bigger than necessary.
4) don't grow your family during this time
I did not follow this advice and this likely resulted in a 4 year delay. The outcome was great and the topic is now relevant to the new job at Rubicon.
http://nealrichter.com/research/dissertation/
On Mutation and Crossover in the Theory of Evolutionary Algorithms
Abstract:
The Evolutionary Algorithm is a population-based metaheuristic optimization algorithm. The EA employs mutation, crossover and selection operators inspired by biological evolution. It is commonly applied to find exact or approximate solutions to combinatorial search and optimization problems.
This dissertation describes a series of theoretical and experimental studies on a variety of evolutionary algorithms and models of those algorithms. The effects of the crossover and mutation operators are analyzed. Multiple examples of deceptive fitness functions are given where the crossover operator is shown or proven to be detrimental to the speedy optimization of a function. While other research monographs have shown the benefits of crossover on various fitness functions, this is one of the few (or only) doing the inverse.
A background literature review is given of both population genetics and evolutionary computation with a focus on results and opinions on the relative merits of crossover and mutation. Next, a family of new fitness functions is introduced and proven to be difficult for crossover to optimize. This is followed by the construction and evaluation of executable theoretical models of EAs in order to explore the effects of parameterized mutation and crossover.
These models link the EA to the Metropolis-Hastings algorithm. Dynamical systems analysis is performed on models of EAs to explore their attributes and fixed points. Additional crossover deceptive functions are shown and analyzed to examine the movement of fixed points under changing parameters. Finally, a set of online adaptive parameter experiments with common fitness functions is presented.
Finalized April 19, 2010
Monday, November 02, 2009
RFP: Bounding memory usage in Tokyo Cabinet and Tokyo Tyrant
Here's a start: http://github.com/nealrichter/tokyotyrant_rsshack
I've attempted to do this myself and have not had the time to finish or fully test it. I've asked Mikio for feedback/help finishing this and he's been nearly silent on the request.
At the moment we (myself, Sam Tingleff and Mike Dierken) work around this issue by continuing to play with various parameter TC settings and restarting the daemon when the memory usage grows beyond a comfort level.
I'm a C coder and have hacked the internals of BerkeleyDB in the past, so can help review code, trade ideas, etc. We (as a team) don't have the time to work on this at the moment.
If you are interested contact me! We've got a few other ideas for TT enhancements as well...
Wednesday, October 28, 2009
Current Computational Advertising Course and Events
National Institute of Statistical Sciences is holding a workshop on CompAdvert in early November. The upcoming WINE'09 conference in Rome contains a few accepted papers in CompAdvert. SigKDD 2010 mentions it in the CFP..
Luckily the search engine results on 'Computational Advertising are still free of noise. Google, Yahoo, Bing
Prior Events: