Tuesday, March 22, 2011

Pamela Samuelson on startups and software patents

Following up on my last post here is the view from Pamela Samuelson:

Why software startups decide to patent ... or not

Two-thirds of the approximately 700 software entrepreneurs who participated in the 2008 Berkeley Patent Survey report that they neither have nor are seeking patents for innovations embodied in their products and services. These entrepreneurs rate patents as the least important mechanism among seven options for attaining competitive advantage in the marketplace. Even software startups that hold patents regard them as providing only a slight incentive to invest in innovation.

She also lists a variety of reasons why these software entrepreneurs decided to forgo patenting their last invention. It's a very interesting write up.

Posted via email from aicoder - nealrichter's blog

Thursday, March 10, 2011

Comments re "The Noisy Channel: A Practical Rant About Software Patents"

The Noisy Channel: A Practical Rant About Software Patents - [My comments cross-posted here]

Daniel, nice writeup.

I worked for a BigCo and filed many patents. It was a mixed bag. The time horizon is so long that even after I’ve been gone for 3.5 years many of them are still lost in the USPTO. Average time for me to see granted patents was 5+ years.

Here are my biased opinions:

1) Patents really matter for BigCos operating on a long time horizon. It’s a strategic investment.

2) Patents are nearly worthless for a Startup or SmallCo. The time horizon is way past your foreseeable future, and thus the whole effort is akin to planning for an alternate reality different than the current business context. Throwing coins in a fountain for good luck is about as relevant. You simply are better off getting a filing date on a provisional design writeup and hiring an engineer with the money you’d spend on Patent lawyers.

3) As an Acquiring company looking at a company to acquire, Provisional or Pending Patents are a liability not an asset. They take time and resources to push to completion for a strategy of deterrence.

4) Patents are mostly ignored in the professional literature. Take Sentiment Analysis as one example. Sentiment Analysis exploded in 2001 w.r.t. Academic publishing, yet there are more than a few older patents discussing good technical work on Sentiment Analysis. I’ve NEVER seen an algorithm in a patent cited in a paper as previous work. And I have seen academic papers with algorithms already 90% covered by an older patent… and the papers are cited as ‘novel work’.

5) Finding relevant patents is ludicrously hard. It might be the most challenging problem in IR w.r.t. a corpus IMO. Different words mean the same thing and vise versa due to the pseudo-ability in a Patent to redefine a word away from the obvious meaning. With two different lawyers rendering the same technical design into a writeup and claims results in wildly different work product.

6) I’ve seen some doosey granted Patents. Things that appear to either be implementations of very old CS ideas into new domains.. or worse stuff that would be a class project as an undergrad.

It’s just plain ugly in this realm.

Posted via email from aicoder - nealrichter's blog

On Strategic Plans

This needs absolutely no comment.

“We have a ‘strategic plan.’ It’s called doing things.” ~ Herb Kelleher

Posted via email from aicoder - nealrichter's blog

Sunday, March 06, 2011

Hilarious system calls in the BeOS

These system calls in the BeOS still make me smile.

int32 is_computer_on(void)

Returns 1 if the computer is on. If the computer isn't on, the value returned by this function is undefined.

double is_computer_on_fire(void)

Returns the temperature of the motherboard if the computer is currently on fire. If the computer isn't on fire, the function returns some other value.

#include <stdio.h>
#include <be/kernel/OS.h>
int main()
{
printf("[%d] = is_computer_on()\n", is_computer_on());
printf("[%f] = is_computer_on_fire()\n", is_computer_on_fire());
}

These functions serve a similar purpose to getpid() in Unix, essentially no-op calls that can be used to test the kernel's intrinsic response time under load.

Write up of BeOS history is here, Haiku is an open source clone of the BeOS that is curiously under active development.

Posted via email from aicoder - nealrichter's blog

Friday, March 04, 2011

Contractor Needed: HTML/CSS/Javascript Ninja

The Rubicon Project is looking for an in-browser HTML/CSS/Javascript Ninja to restructure the workflow of an application GUI. The server side code is perl/mod_perl. Please contact me if you are interested and available. The contract is 4-6 weeks.

Posted via email from aicoder - nealrichter's blog

Thursday, March 03, 2011

Job Post: Software Engineer/Scientist: Ad Serving, Optimization and Core Team

LOCATION: the Rubicon Project HQ in West Los Angeles or Salt Lake City

the Rubicon Project is on a mission to automate buying and selling for the $65 billion global online advertising industry. Backed by $42 million in funding, we are currently looking for the best engineers in the world to work with us.

Team Description

The mission of the Core Team is to build robust, scalable, maintainable and well documented systems for ad serving, audience analytics, and market analysis. Every day we serve billions of ads, process terabytes of data and provide valuable data and insights to our publishers. If building software that touches 500+ million people every month is interesting to you, you'll fit in well here.

Some of the custom software we've built to solve these problems include:

A patented custom ad engine delivering thousands of ad impressions per second with billions of real time auctions daily
A real time bid engine designed to scale out to billions of bid requests daily
Optimization Algorithms capable of scheduling and planning adserving opportunities to maximize revenue
Client side Javascript that performs real-time textual analysis of web pages to extract semantically meaningful data and structures
A web-scale key value store based on ideas from the Amazon Dynamo paper used to store 100s of millions of data points
Unique audience classification system using various technologies such as Solr and Javascript for rich, real-time targeting of web site visitors
Data Mining buying and selling strategies from a torrent of transactional data
Analytics systems capable of turning a trillion data points into real business insight

Job Description

Your job, should you accept it, is to build new systems, new features and extend the functionality of our existing systems. You will be expected to architect new systems from scratch, add incremental features on existing systems, fix bugs in other people's code and help manage production operations of the services you build. Sometimes you'll have to (or want to) do this work when you are not in the office, so working remote can't scare you off.

Most of our systems are written in Perl, Java, and C, but we have pieces of Python, Clojure and server-side Javascript as well. Hopefully you have deep expertise in at least one of these; you'll definitely need to have a desire to quickly learn and work on systems written in all of the above.

You should also have worked with and/or designed service oriented architectures, advanced db schemas, big data processing, highly scalable and available web services and are well aware of the issues surrounding the software development lifecycle. We expect that your resume will itemize your 3+ years experience, mention your BS or MS in Computer Science and be Big Data Buzzword Compliant.

Bonus points for experience with some of the technologies we work with:

Hadoop
NodeJS
MySql
Solr/Lucene
RabbitMQ
MongoDB
Thrift
Amazon EC2
Memcached
MemcacheQ
Machine Learning
Optimization Algorithms
Economic Modeling

Apply Now! Click the Apply button!

Posted via email from aicoder - nealrichter's blog

Monday, February 28, 2011

A note on software teams and individuals

I'm currently running a loosely coupled team of people all working on a common initiative. While this is not my first time running a team, the same set of things seem to happen with all 'new' teams. Here's a quick set of observations.

The first major observation is that teams of engineers can quickly fall into operating like a "golf team" versus a "football team". In Golf, each team member generally competes against all other players (and their different teams) as an individual. A given team wins if it's individual players collectively do better than some other team's players. Football (or Soccer or Basketball) is very different. A team wins in the face of good opposition only if it plays as a team.

For software teams, done means one thing: the team is done with the milestone or project. Done means finished, tested and shipped code. Does does not mean "my part works", or "my tasks are done".

IMO each team member should answer these questions to the group every day:

What direction I am going relative to team goals.
What specific items I am working on today.
Does anyone need any help from me?
Do I need any help with my work?

Team managers, both the overall and functional leads, should ask or answer these questions for the group every day:

Are we as a group going the right direction (towards the goal)?
Will we meet the timeline and/or functional goals?
Is there any functional or task ambiguity that needs working out?
Are any course corrections needed?

The second observation is that there are two major indicators of if a given individual is a good addition to the team:

Does this person communicate well and often?
Does this person have the capability and desire to resolve ambiguity on their own when possible?

The second skill, resolving ambiguity, is in my opinion the primary question that a software hiring manager needs to answer in the affirmative about a given candidate... assuming of course the candidate has the needed skills.

Much of this also circles back on a blog post that Jordan Mitchell wrote years ago when I was hip-deep in code at Others Online.

Actual vs. Perceived Progress

Posted via email from aicoder - nealrichter's blog

Monday, February 07, 2011

RightNow - Our cowboys ride code

This is a neat little ad in the January Delta Sky Magazine for RightNow Technologies, where I worked from 1999-2007.

RightNow Technologies serves about 10 billion [customer interactions] a year through he companies and institutions it works with. “Every person in North America has used one of our solutions about 25 times,” says Gianforte.

Why RightNow keeps its headquarters in Bozeman, MT

The quality of life here is a huge advantage, but more importantly, says Gianforte, “there’s a ranch saying around here that goes,‘When something needs to get done, well then, we’re just gonna get ‘er done.’ In many environments, they have to form a committee, pull in consultants and such to make things happen, but our clients appreciate that when something needs to get done, we can easily make that hap pen because of the work ethic here.”

Sunday, January 30, 2011

The Provenance of Data, Data Branding and "Big Data" Hype

The credibility of where data comes from in all these "big data" plays is absolutely crucial. Waving hands re "algorithms" won't cut it. @nealrichter Jan 27, 1010 Tweet

To expand on this tweet here's the argument: If one of your key products as a startup or business is to "crunch data" and derive or extract value from it then you should be concerned about data provenance. This is true whether you are crunching your own data or third-party data.

Some examples:

Web analytics - crunch web traffic and distill visitation and audience analytics reports for web site owners. Often they use these summaries to make decisions and sell their ad-space to advertisers.
Semantic Web APIs - crunch webpages, tweets etc and return topical and semantic annotations of the content
Comparison shopping - gather up product catalogs and pricing to aggregate for visitors
Web publishers - companies who run websites
Prediction services - companies that use data to predict something

In each of the above categories the provenance of the input data and brand of the output data is key. For each of the above one could name a company with either solid-gold data OR a powerful brand-name and good-enough data. Conversely we can find examples of companies with great tech but crappy data or a weak brand.

For web publishers, those that host user-generated content have poor provenance in general compared to news sites (for example). A notable exception is Wikipedia who has a pure "UGC" model but a solid community process and standards to improve provenance of their articles (those without references are targeted for improvement).

In comparison shopping Kayak.com has good data (directly from the airlines) and has built a good brand. The same is true of PriceGrabber and Nextag. TheFind.com on the other hand appears to have great data and tech, but no well known brand.

(I'm refraining from going into specific examples or opinions on big data companies to avoid poking friends in the eye.)

The issue of Provenance and Branding is especially important in sales situations where you are providing a tool (analytics) that helps your customer (a sales person) sell something to a third-party (their customer). If the input data you are using either has a demonstrable provenance or a good brand you'll have an easier time convincing people that the output of your product is worth having (and reselling).

The old saying for this in computer science is Garbage In, Garbage Out.

In "big data" world of startups that is blowing by Web 2.0 as the new hotness there is a startling lack of concern about data provenance. The essentially ethos is that if we (the Data Scientists) accumulate enough data and crunch it with magical algorithms then solid-gold data will come out... or at least that's what the hype machine says.

The lesson from the financial melt down is that magical algorithms making CDOs, CMOs and other derivatives should be viewed with a lens of mistrust. The GIGO principle was forgotten and no one even cared about the provenance (read credit quality) of the base financial instruments making up the derivatives. The credit rating agencies were just selling their brand and cared little about quality.

In my opinion, there is a clear parallel here to "big data". Trust must be part of the platform and not just tons of CPUs and disk-space. A Brand is a brittle object that is easily broken, so concentrate on quality.

Posted via email from nealrichter's posterous

Friday, January 14, 2011

Finance for Engineers

Last summer I took a great mini-course at MIT Sloan on Finance. It's essentially a breadth-first review of the MBA course complete with three case studies and a review of project evaluation methods via net present value analysis. Approximately 80% of the attendees were engineers/techies with 10+ years experience.. and maybe 25% w/ PhDs.

Fundamentals of Finance for the Technical Executive

TextBook was: Higgins - Analysis for Financial Management

The first case study is Wilson Lumber from Harvard. The material is copyrighted, yet these links look like accurate distillations by business students.

The initial position is that Wilson Lumber growing small business with good suppliers and loyal customers. Volume and revenue are all up period over period. Question is should the bank increase is line of credit to fund the business. Once you break down the financial statements and model the business, the answer is No. Essentially Mr Wilson is over extended by many measures and is growing at the expense of his balance sheet, loaning him money will only make the problem bigger down the road. His basic options are to take in a partner as co-owner for cash, go broke or raise prices to lower volume and improve margins and slowly rebuild the balance sheet.

We then went through two NPV exercises. The first was a basic analysis of go/no-go on an engineering project with a bottom up analysis via putting all cost/benefit assumptions in a model and iterating though possibilities. The second was an analysis of a joint-venture between two biotech companies. Everything from external capital, deal structure to market penetration projections were worked in. Very informative and pretty interesting work for engineers to do once the terminology and methods were explained.

Professor Jenter shared two amusing anecdotes:

His MIT and Stanford MBA students often run off to found start-ups and forget the basic Wilson Lumber case. By the time they approach him for help it's too late and they are in Mr Wilson's position: shut-down, take in $$ and lots of equity dilution (and loss of control) or slow growth dramatically.
Also a quote along the lines of "Startups founded by MIT PhDs fail at a rate above far average".

This certainly hammered home the lesson that strategic planning for growth is very important, even for what look like non hyper-growth (software) companies. I'd recommend this course to any engineer wanting a quick structured intro to basic financial management.

Posted via email from nealrichter's posterous

Thursday, December 30, 2010

List of Best Paper awards in CS/AI/ML conferences

The below is a great list of best paper awards for WWW, SIGIR, CIKM, AAAI, CHI, KDD, SIGMOD, ICML, VLDB, IJCAI, UIST since 1996

http://jeffhuang.com/best_paper_awards.html

Interesting thing to note: Google is ranked last in frequency, Microsoft first.

This needs NIPS and possibly UAI added to it.

http://nips.cc/ConferenceInformation/PaperAwards

Posted via email from nealrichter's posterous

Wednesday, December 29, 2010

Managing Open Source Licenses

From time to time I have helped companies do Open Source code audits in their own source code. Basically this consists of auditing their code to find open source code.

These code audits are particularly important during software releases and M&A events. I've helped companies do this for releases and been on both sides of M&A event driven audits.

If the developers have kept the attributions with any open source code they have re-used then grep is a fine tool for auditing. However this is a big IF. If your developers are sloppy and do not keep the attributions (ie copyright and license notices) with code they lift from open source you have a problem. A software tool needs to be used to scan the corporate source for hits in open source repositories.

There are at least three companies providing software to do this:

Ideally the outcome of this process is as follows:

A clear company policy is set on what open source licenses are allowed and how developers can use open source come or components.
The corporate code is cleanly annotated with any third party attributions (see below).
Open Source code that has bad licenses for commercial usage is identified and removed before release.
A Bill of Materials is created for each release listing third-party software in the release.
Necessary copyright or other notices appear in About dialogs, manuals or product websites.

Example comment block:

/*

* XYZ.com Third-party or Open Source Declaration

* Name: Bart Simpson

* Date of first commit: 04/25/2009

* Release: 3.5 “The Summer Lager Release”

* Component: tinyjson

* Description: C++ JSON object serializer/deserializer

* Homepage: http://blog.beef.de/projects/tinyjson/

* License: MIT style license

* Copyright: Copyright (c) 2008 Thomas Jansen (thomas@beef.de)

* Note: See below for original declarations from the code

*/

If the above were upgraded to be in a javadoc style comment then a tool could be built to auto-magically generate a Bill of Materials for each release.

There is one grey area in all this: how to handle developers using code from discussion sites like PHP.net, CodeProject, StackOverflow and similar sites. Generally code put in these type of forums has no defined license. In this case the code is either copyrighted by the site or the author of the post... and developers should not use the code without getting an explicit license. However developers generally feel like people put the code up there to share. This conflict means the company policy on usage of this type of code must be clearly communicated to all developers.

This is a nice review article of other considerations for open source auditing:

Dr Dobbs: Managing Open Source Licensing by Kamal Hassin

Posted via email from nealrichter's posterous

Friday, December 17, 2010

Stochastic Universal Sampling/Selection

Stochastic Universal Sampling is a method of weighted random sampling exhibiting less bias and spread that classic roulette wheel sampling. The intuition is a roulette wheel with n equally spaced steel balls spinning in unison around the wheel. This method has better properties and is more efficient that doing repeated samples from the wheel with or without replacement of the selected items.

Stochastic universal sampling

Baker, James E. (1987). "Reducing Bias and Inefficiency in the Selection Algorithm". Proceedings of the Second International Conference on Genetic Algorithms and their Application (Hillsdale, New Jersey: L. Erlbaum Associates): 14–21.

Reference implementations on the web are scare, so here are a few:

Christian Borgelt

http://fuzzy.cs.uni-magdeburg.de/studium/ga/src/sus.c

Dan Dyer

https://github.com/dwdyer/watchmaker/blob/master/framework/src/java/main/org/uncommons/watchmaker/framework/selection/StochasticUniversalSampling.java

University of New Mexico

http://epr.adaptive.cs.unm.edu/asm/code.html

GMU's ECJ

http://cs.gmu.edu/~eclab/projects/ecj/docs/classdocs/ec/select/SUSSelection.html

See the SUSSelection.java buried in the latest tarball.

Posted via email from nealrichter's posterous

Monday, November 22, 2010

Computing, economics and the financial meltdown (a collection of links)

This editor's letter from CACM last year is interesting: The Financial Meltdown and Computing by Moshe Y. Vardi

Information technology has enabled the development of a global financial system of incredible sophistication. At the same time, it has enabled the development of a global financial system of such complexity that our ability to comprehend it and assess risk, both localized and systemic, is severely limited. Financial-oversight reform is now a topic of great discussion. The focus of these talks is primarily over the structure and authority of regulatory agencies. Little attention has been given to what I consider a key issue—the opaqueness of our financial system—which is driven by its fantastic complexity. The problem is not a lack of models. To the contrary, the proliferation of models may have created an illusion of understanding and control, as is argued in a recent report titled "The Financial Crisis and the Systemic Failure of Academic Economics."

Krugman's essay at the time How Did Economists Get It So Wrong? gave a nice history of economic ideas, the models behind and his interpretations of their correctness.

The theoretical model that finance economists developed by assuming that every investor rationally balances risk against reward — the so-called Capital Asset Pricing Model, or CAPM (pronounced cap-em) — is wonderfully elegant

[snip]

Economics, as a field, got in trouble because economists were seduced by the vision of a perfect, frictionless market system.

[snip]

H. L. Mencken: “There is always an easy solution to every human problem — neat, plausible and wrong.”

I read this months ago and it's been percolating in my thoughts since then. My Manhattan Project - How I helped build the bomb that blew up Wall Street by Michael Osinski. Osinski wrote much of the software and models used to form CMOs and CDOs. Essentially the software aggregates debt instruments from mortgage and other debt markets and allowed a bond designer to issue tailor-made portfolio of debt while mitigating default risk of the debt via that aggregation. He called it his sausage grinder.

“You put chicken into the grinder”—he laughed with that infectious Wall Street black humor—“and out comes sirloin.”

Here's a large collection of links from that period that are worth reading. My thought at the moment is this nugget from Twitter:

Poormojo "Any sufficiently advanced financial instrument is indistinguishable from fraud."

Recipe for Disaster: The Formula That Killed Wall Street

Wall Street’s Math Wizards Forgot a Few Variables

Tales From Lehman’s Crypt

Economic View: Flaw in Free Markets: Humans

Andrew Low: This is your brain on prosperity

A crisis of politics, not economics: Complexity, Ignorance, and policy failure.

Revenge of the Nerd: Paul Wilmott is out to save Wall Street's soul—one dork at a time.

A Conversation with David E. Shaw

Don't Blame The Quants by Steven Shreve

http://www.nytimes.com/2009/10/14/opinion/14trillin.html?em=&adxnnl=1&adxnnlx=1255543552-DPpTSk3i4f5lEJZALsigRA

Sciam: Does Economics Violate the Laws of Physics?

Systemic Risk and Fannie Mae

Geeks trump alpha males as algos dominate Wall St

Posted via email from nealrichter's posterous

Wednesday, October 27, 2010

Review of "Learning to Rank with Partially-Labeled Data"

I've been attending the University of Utah Machine Learning seminar (when I can) this fall. PhD student Piyush Kumar Rai is organizing it.

I volunteered to take the group though Learning to Rank with Partially-Labeled Data by Duh & Kirchhoff. I have some experience researching and implementing LTR algorithms, mostly using reinforcement learning or ant-system type approaches. Some general intro here.

The paper presents the main Transductive Learning algorithm as a framework, then fills in the blanks with Kernel PCA and RankBoost. Several Kernels are used: Linear, polynomial, radial basis function and knn-diffusion. RankBoost learns a kind of ensemble of 'weak learners' with simple thresholds.

The main reason to read the paper if you are already familiar with LTR is the use of the transductive algorithm.

Note the DISCOVER() & LEARN() functions. These are the unsupervised and supervised algorithm blanks they fill with Kernel PCA and RankBoost. What the first actually does is learn a function we could call EXTRACT() that can extract or create features for later use. They do show that the basic idea of layering in unlabeled data with labeled data is a net gain.

There are some issues with the paper. First the computational time performance, as they admit, is not good. The other is that their use of Kernel PCA in an information retrieval context is a bit naive IMO. The IR literature is full of hard-won knowledge of extracting decent features from documents. See this book for example. This is mostly ignored here.

The more confusing thing is the use of K-Nearest Neighbor diffusion kernels. Basically they take the vector of documents, form a matrix by euclidean distance and then random-walk the matrix for a set number of time-steps. The PCA then takes this 'kernel' output and solves the eigenvalue problem, to get the eigenvectors. This all seems a round-about way of saying they approximated the Perron-Frobenius eigenvector (sometimes call PageRank) by iterating the matrix a set number of times and zeroed out low order cells. Or at least I see no effective difference between what they did and what I just described. Basically they just make the matrix sparse to solve it easier (ie this is the dual).

Their use of various classic IR features like TFIDF, BM25 etc needed help. There's pleny of IR wisdom on how to use such features, why let the DISCOVER() wander about attempting to rediscover this? The results were also muddled with only one of the three data sets showing a significant improvement over a baseline technique.

All that aside, it's worth a read for the intro to the transductive alg used with an IR centric task.

Posted via email from nealrichter's posterous

Tuesday, October 26, 2010

Stanford Computational Advertising course - Fall 2010

Andrei Broder and Vanja Josifovski of Yahoo Research Labs are again offering a Fall course on Computational Advertising at Stanford.

Great intro to the area.

Posted via email from nealrichter's posterous

Monday, August 16, 2010

Strategic Marketing

I highly recommend this ExecEd course at MIT Sloan: Strategic Marketing for the Technical Executive.

Duncan Simester taught it. The course looks to be a condensed form of this course: 15.810 Marketing Management

Key takeaways:
1) Making decisions by experimentation versus meetings+intuition is crucial.
2) Don't assume your role is to know the answer. Your role is really to work out how to find the answer as quickly as possible.
3) Brand is unimportant when customers can observe you are meeting their needs.
4) Brand is important when they can't search/observe and must reason with less data.
5) Don't price your products based upon cost or competition, work out your true value to the customer.

6) Your actual efficiencies might be different than you assume

7) Experiment and iterate as often as you can

Posted via email from nealrichter's posterous

Tuesday, July 27, 2010

Dissertation done!

The big event of the spring was defending and completing the below.
Really happy to be done.

Advice for working professionals attempting a PhD:
1) pick something relevant to your work.
2) think twice about a theoretical topic.
3) don't make it longer/bigger than necessary.
4) don't grow your family during this time

I did not follow this advice and this likely resulted in a 4 year delay. The outcome was great and the topic is now relevant to the new job at Rubicon.

http://nealrichter.com/research/dissertation/

On Mutation and Crossover in the Theory of Evolutionary Algorithms

Abstract:
The Evolutionary Algorithm is a population-based metaheuristic optimization algorithm. The EA employs mutation, crossover and selection operators inspired by biological evolution. It is commonly applied to find exact or approximate solutions to combinatorial search and optimization problems.

This dissertation describes a series of theoretical and experimental studies on a variety of evolutionary algorithms and models of those algorithms. The effects of the crossover and mutation operators are analyzed. Multiple examples of deceptive fitness functions are given where the crossover operator is shown or proven to be detrimental to the speedy optimization of a function. While other research monographs have shown the benefits of crossover on various fitness functions, this is one of the few (or only) doing the inverse.

A background literature review is given of both population genetics and evolutionary computation with a focus on results and opinions on the relative merits of crossover and mutation. Next, a family of new fitness functions is introduced and proven to be difficult for crossover to optimize. This is followed by the construction and evaluation of executable theoretical models of EAs in order to explore the effects of parameterized mutation and crossover.

These models link the EA to the Metropolis-Hastings algorithm. Dynamical systems analysis is performed on models of EAs to explore their attributes and fixed points. Additional crossover deceptive functions are shown and analyzed to examine the movement of fixed points under changing parameters. Finally, a set of online adaptive parameter experiments with common fitness functions is presented.

Finalized April 19, 2010

Posted via email from nealrichter's posterous

Monday, November 02, 2009

RFP: Bounding memory usage in Tokyo Cabinet and Tokyo Tyrant

I'm soliciting proposals to implement an absolute (or at least soft) bound on memory usage of TT + TC. The reward is cash. Send me a proposal. The patch needs to work, not crash TT/TC or corrupt data.

Here's a start: http://github.com/nealrichter/tokyotyrant_rsshack

I've attempted to do this myself and have not had the time to finish or fully test it. I've asked Mikio for feedback/help finishing this and he's been nearly silent on the request.

At the moment we (myself, Sam Tingleff and Mike Dierken) work around this issue by continuing to play with various parameter TC settings and restarting the daemon when the memory usage grows beyond a comfort level.

I'm a C coder and have hacked the internals of BerkeleyDB in the past, so can help review code, trade ideas, etc. We (as a team) don't have the time to work on this at the moment.

If you are interested contact me! We've got a few other ideas for TT enhancements as well...

Posted via email from nealrichter's posterous

Wednesday, October 28, 2009

Current Computational Advertising Course and Events

Andrei Broder and Vanja Josifovski of Yahoo Research Labs are offering a Fall 2009 course on Computational Advertising at Stanford. The lecture #1-#4 slides are up and it looks to be an interesting course. Will definitely continue to follow along remotely.

National Institute of Statistical Sciences is holding a workshop on CompAdvert in early November. The upcoming WINE'09 conference in Rome contains a few accepted papers in CompAdvert. SigKDD 2010 mentions it in the CFP..

Luckily the search engine results on 'Computational Advertising are still free of noise. Google, Yahoo, Bing

Prior Events:

Posted via email from nealrichter's posterous