Musings about artificial intelligence, search engines, machine learning, computational advertising, intellectual property law, social media & widgets, and good beer.
Tuesday, December 27, 2011
Standford's Introduction to Computational Advertising course
Summer Program on Computational Advertising
Tuesday, November 15, 2011
What Software Engineers should know about Control Theory
Control theory is an interdisciplinary branch of engineering and mathematics, that deals with the behavior of dynamical systems. The desired output of a system is called the reference. When one or more output variables of a system need to follow a certain reference over time, a controller manipulates the inputs to a system to obtain the desired effect on the output of the system.
- Create a webservice that calls another (or three) for data/inputs, then does X with them.
- Meters the usage of the other web services.
- Your webservice must respond within Y milliseconds with good output or a NULL output.
- Support high concurrency, ie not use too many servers.
- Open Loop, Feed-forward: Requires good model of system inputs and response of the system.
- Closed Loop, Feed-back
- Linear Feedback
- Stability Analysis
- Frequency response
- response time
- Gain Scheduling
- Model Reference Adaptive Systems
- Self-tuning regulators
- Dual Control
- http://research.microsoft.com/en-us/um/people/liuj/cse590k2008winter/
- Joseph L Hellerstein et al "Feedback control of computing systems" 2004 Wiley Google Books Amazon
- Hellerstein 2003 IBM Tech Report "Challenges in Control Engineering of Computing Systems"
- Hellerstein et al "Research challenges in control engineering of computing systems" Volume: 6 Issue: 4, 2010 IEEE Trans on Network and Service Management
Thursday, November 10, 2011
Open RTB panel - IAB Ad Ops Summit 2011
The clip shows an exchange after Steve from the IAB asked a question about how webpage inventory is described in RTB. I described an example of differentiating a simple commodity, barley.
Two of the major uses of barley in the US are animal feed and malting for making beer. Malting barley has specific requirements in terms of moisture content, protein percentage and other factors. Farmers don't always know what quality their crop will finish at. They count on having two general markets, if the tested quality meets malting standards then the premium over feed prices can be healthy. A 2011 report noted that malting barley provided a 70% premium over feedstock barley. Growing specific varieties and/or using organic farming methods can provide additional premiums over generic feed barley. The curious can follow the links below.
How does this relate to publishers and advertising and OpenRTB? In my opinion we need several things standardized:
Tuesday, July 12, 2011
SchemaMgr - MySQL schema management tool
Each change is assigned a version number and placed in a file. When the SQL in the file is executed successfully, a special table is set with that version number. Subsequent runs install only the higher versioned files.
It can also be used to reinstall views and stored procedures.
The best practice is to copy the file and change X in the filename and in the $DB_NAME variable.
$ ./bin/schemamgr_X.pl
Usage: either create or upgrade X database
schemamgr_X.pl -i -uUSERNAME -pPASSWORD [-vVERSION] [-b]
updates DB of to current (default) or requested version
schemamgr_X.pl -s -uUSERNAME -pPASSWORD
reinstalls all stored procedures
schemamgr_X.pl -w -uUSERNAME -pPASSWORD
reinstalls all views
schemamgr_X.pl -q -uUSERNAME -pPASSWORD
Requests and prints current version
Optional Params
-vXX -- upgrades upto a specific version number XX
-b -- backs up the database (with data) before upgrades
-nYY -- runs the upgrades against database YY - default is X
All other files are update files with greater than v1 numbers.
build/
|-- create_X_objects_v1_20110615.sql
|-- update_X_objects_v2_20110701.sql
`-- update_X_objects_v3_20110702.sql
Tuesday, June 28, 2011
Managing yourself to tasks and finishing them.
Thursday, April 07, 2011
JSON parsing speed in various Node.JS versions
Out of curiosity I ran some tests on JSON parsing speed in different versions of Node.JS
node.js code:
var sys = require('sys');Essentially this re-parses the same example JSON (I created a fake RSS like JSON pacakge) 1M times.
var data = "{ \"item_uuid\": \"8ec56438-d3cf-442a-bbf7-7f076f229f35\", \"return_code\": 0, \"data\": [ { \"valid\": true, \"votes\": 2345, \"date\":\"Thu, 07 Apr 2011 15:17:17 EDT\", \"headline\": \"Senate Majority Leader Harry Reid indicates there likely will be a government shutdown on Friday. Lawmakers have been unable to agree on a new federal budget\", \"source\": \"Yahoo News\", \"published\":{\"hour\":\"19\",\"timezone\":\"UTC\",\"second\":\"17\",\"month\":\"4\",\"minute\":\"17\",\"utime\":\"1302203837\",\"day\":\"7\",\"day_of_week\":\"4\",\"year\":\"2011\"} } ] }";try {
for(var i = 0; i < 1000000; i++)
{
var tmp = JSON.parse(data);
}
} catch(e) { sys.puts("ERROR: on parsing JSON with v8 parser"); }sys.puts(data);
var tmp = JSON.parse(data);
sys.puts(JSON.stringify(tmp));
sys.puts("\n DONE \n");
process.exit();
- Node 0.1.3x: real 0m30.050s
- Node 0.2.6: real 0m30.050s
- Node 0.3.8: real 0m9.915s
- Node 0.4.5: real 0m9.999s
- jsmn: real 0m2.276s
- vjson: real 0m7.465s Note that vjson is a destructive parser, and I had to fix that first.
- Node 0.1.33: v8: 2010-03-17: Version 2.1.5
- Node 0.2.6: v8: 2010-08-16: Version 2.3.8
- Node 0.3.8: v8: 2011-02-02: Version 3.1.1
- Node 0.4.5: v8: 2011-03-02: Version 3.1.8
Tuesday, March 22, 2011
Pamela Samuelson on startups and software patents
Two-thirds of the approximately 700 software entrepreneurs who participated in the 2008 Berkeley Patent Survey report that they neither have nor are seeking patents for innovations embodied in their products and services. These entrepreneurs rate patents as the least important mechanism among seven options for attaining competitive advantage in the marketplace. Even software startups that hold patents regard them as providing only a slight incentive to invest in innovation.
Thursday, March 10, 2011
Comments re "The Noisy Channel: A Practical Rant About Software Patents"
The Noisy Channel: A Practical Rant About Software Patents - [My comments cross-posted here]
Daniel, nice writeup.
I worked for a BigCo and filed many patents. It was a mixed bag. The time horizon is so long that even after I’ve been gone for 3.5 years many of them are still lost in the USPTO. Average time for me to see granted patents was 5+ years.
Here are my biased opinions:
1) Patents really matter for BigCos operating on a long time horizon. It’s a strategic investment.
2) Patents are nearly worthless for a Startup or SmallCo. The time horizon is way past your foreseeable future, and thus the whole effort is akin to planning for an alternate reality different than the current business context. Throwing coins in a fountain for good luck is about as relevant. You simply are better off getting a filing date on a provisional design writeup and hiring an engineer with the money you’d spend on Patent lawyers.
3) As an Acquiring company looking at a company to acquire, Provisional or Pending Patents are a liability not an asset. They take time and resources to push to completion for a strategy of deterrence.
4) Patents are mostly ignored in the professional literature. Take Sentiment Analysis as one example. Sentiment Analysis exploded in 2001 w.r.t. Academic publishing, yet there are more than a few older patents discussing good technical work on Sentiment Analysis. I’ve NEVER seen an algorithm in a patent cited in a paper as previous work. And I have seen academic papers with algorithms already 90% covered by an older patent… and the papers are cited as ‘novel work’.
5) Finding relevant patents is ludicrously hard. It might be the most challenging problem in IR w.r.t. a corpus IMO. Different words mean the same thing and vise versa due to the pseudo-ability in a Patent to redefine a word away from the obvious meaning. With two different lawyers rendering the same technical design into a writeup and claims results in wildly different work product.
6) I’ve seen some doosey granted Patents. Things that appear to either be implementations of very old CS ideas into new domains.. or worse stuff that would be a class project as an undergrad.
It’s just plain ugly in this realm.
On Strategic Plans
Sunday, March 06, 2011
Hilarious system calls in the BeOS
Returns 1 if the computer is on. If the computer isn't on, the value returned by this function is undefined.
Returns the temperature of the motherboard if the computer is currently on fire. If the computer isn't on fire, the function returns some other value.
#include <stdio.h>
#include <be/kernel/OS.h>
int main()
{
printf("[%d] = is_computer_on()\n", is_computer_on());
printf("[%f] = is_computer_on_fire()\n", is_computer_on_fire());
}
Friday, March 04, 2011
Contractor Needed: HTML/CSS/Javascript Ninja
Thursday, March 03, 2011
Job Post: Software Engineer/Scientist: Ad Serving, Optimization and Core Team
LOCATION: the Rubicon Project HQ in West Los Angeles or Salt Lake City
the Rubicon Project is on a mission to automate buying and selling for the $65 billion global online advertising industry. Backed by $42 million in funding, we are currently looking for the best engineers in the world to work with us.
Team Description
The mission of the Core Team is to build robust, scalable, maintainable and well documented systems for ad serving, audience analytics, and market analysis. Every day we serve billions of ads, process terabytes of data and provide valuable data and insights to our publishers. If building software that touches 500+ million people every month is interesting to you, you'll fit in well here.
Some of the custom software we've built to solve these problems include:
A patented custom ad engine delivering thousands of ad impressions per second with billions of real time auctions daily
A real time bid engine designed to scale out to billions of bid requests daily
Optimization Algorithms capable of scheduling and planning adserving opportunities to maximize revenue
Client side Javascript that performs real-time textual analysis of web pages to extract semantically meaningful data and structures
A web-scale key value store based on ideas from the Amazon Dynamo paper used to store 100s of millions of data points
Unique audience classification system using various technologies such as Solr and Javascript for rich, real-time targeting of web site visitors
Data Mining buying and selling strategies from a torrent of transactional data
Analytics systems capable of turning a trillion data points into real business insight
Job Description
Your job, should you accept it, is to build new systems, new features and extend the functionality of our existing systems. You will be expected to architect new systems from scratch, add incremental features on existing systems, fix bugs in other people's code and help manage production operations of the services you build. Sometimes you'll have to (or want to) do this work when you are not in the office, so working remote can't scare you off.
Most of our systems are written in Perl, Java, and C, but we have pieces of Python, Clojure and server-side Javascript as well. Hopefully you have deep expertise in at least one of these; you'll definitely need to have a desire to quickly learn and work on systems written in all of the above.
You should also have worked with and/or designed service oriented architectures, advanced db schemas, big data processing, highly scalable and available web services and are well aware of the issues surrounding the software development lifecycle. We expect that your resume will itemize your 3+ years experience, mention your BS or MS in Computer Science and be Big Data Buzzword Compliant.
Bonus points for experience with some of the technologies we work with:
- Hadoop
- NodeJS
- MySql
- Solr/Lucene
- RabbitMQ
- MongoDB
- Thrift
- Amazon EC2
- Memcached
- MemcacheQ
- Machine Learning
- Optimization Algorithms
- Economic Modeling
Monday, February 28, 2011
A note on software teams and individuals
- What direction I am going relative to team goals.
- What specific items I am working on today.
- Does anyone need any help from me?
- Do I need any help with my work?
- Are we as a group going the right direction (towards the goal)?
- Will we meet the timeline and/or functional goals?
- Is there any functional or task ambiguity that needs working out?
- Are any course corrections needed?
- Does this person communicate well and often?
- Does this person have the capability and desire to resolve ambiguity on their own when possible?
Monday, February 07, 2011
RightNow - Our cowboys ride code
RightNow Technologies serves about 10 billion [customer interactions] a year through he companies and institutions it works with. “Every person in North America has used one of our solutions about 25 times,” says Gianforte.
The quality of life here is a huge advantage, but more importantly, says Gianforte, “there’s a ranch saying around here that goes,‘When something needs to get done, well then, we’re just gonna get ‘er done.’ In many environments, they have to form a committee, pull in consultants and such to make things happen, but our clients appreciate that when something needs to get done, we can easily make that hap pen because of the work ethic here.”
Sunday, January 30, 2011
The Provenance of Data, Data Branding and "Big Data" Hype
The credibility of where data comes from in all these "big data" plays is absolutely crucial. Waving hands re "algorithms" won't cut it. @nealrichter Jan 27, 1010 Tweet
- Web analytics - crunch web traffic and distill visitation and audience analytics reports for web site owners. Often they use these summaries to make decisions and sell their ad-space to advertisers.
- Semantic Web APIs - crunch webpages, tweets etc and return topical and semantic annotations of the content
- Comparison shopping - gather up product catalogs and pricing to aggregate for visitors
- Web publishers - companies who run websites
- Prediction services - companies that use data to predict something
Friday, January 14, 2011
Finance for Engineers
- His MIT and Stanford MBA students often run off to found start-ups and forget the basic Wilson Lumber case. By the time they approach him for help it's too late and they are in Mr Wilson's position: shut-down, take in $$ and lots of equity dilution (and loss of control) or slow growth dramatically.
- Also a quote along the lines of "Startups founded by MIT PhDs fail at a rate above far average".