aicoder: January 2012

Problem Statement

Imagine you are writing a webservice that must call a back-end service, such as a data store. Let's assume (with out loss of generality) that the data store (and your hardware supporting it) has some limit in QPS that it can handle. We'd like the client system (your web service) to impose a limit on the QPS to the back-end service. Also assume that this is a distributed webservice, lots of worker threads on lots of different machines.

Requirements

Given a goal in QPS manage the maximum outgoing requests per second to that goal.
Be fast. Maintain a fast controller settling time when the goal or queries change.
Be adaptive. Respond to swings of incomming requests that need to be queried against the service.
Be distributed. Locally active against global numbers without knowing the number of workers.
Be robust. Handle additions and subtractions of worker/clients to the system without coordination. Minimize overshoot.

Design

Assume that querying this backend service, while valuable and mostly needed, is optional under duress. It's far more important for your front-end service to be responsive and return some error or 'No Content' than hang on a busted back-end. As result we'll use a sampling rate 'r' to denote the % of time that the web service should query the back-end. Under normal conditions this rate is 1 (100%). Also assume that the goal in QPS to the back-end is set in some configuration area in your system. Under duress the rate r will be adaptively tuned to obey the QPS goal. Also assume you have smartly implemented some monitoring system like Ganglia/Nagios/Cacti and are emitting events to it when you call the back-end service.

Inputs

G = Goal QPS
M = Current measured QPS (from Ganglia).
r = Current sampling rate [0,1]

Outputs

r_new = A sampling rate [0,1]

Adaptive Mechanism

r_new = r * (G/M)
r_new = MAX(0,MIN(1,r_new)) //clamp r_new between [0,1]

Benefits

Needs only global G and M as inputs.
No-coordination needed between workers/servers other than the globally observed M.
Adaptively moves the per-worker sampling rate independent of all other worker's rates.
Workers can have different incoming QPS rates from a load balancer, the controller will adapt.

Failure Modes

If the sensor for M fails to be updated then the controller is blind
If the Goal is set or re-set to zero, then the controller will stop traffic
Both of these can be addressed easily.

Desired Response

The overshoot/undershoot is called 'ringing'
The time to approach the goal is called the 'settling time'

Implementation Notes

Emit a ganglia/graphite/XXX counter for both requests sent and skipped
Use a smoothed average of the measured QPS to smooth out controller jitter.
(Optional) Each worker should do a bit of randomization of when it performs its sampling-rate-update to smooth out any startup/restart ringing.

Invertability

This design could be inverted for a server implementation. If your webservice has a set of APIs that are heavy to execute, then this controller could be used to control the incomming QPS that are delegated to the heavy work.

Receive request
Submit to sampling rate
If Yes then delegate the request to the executor
If No then respond to the client with 'HTTP 204 No Content' or equivalent empty reponse.

aicoder

Saturday, January 07, 2012

Software pattern for proportional control of QPS in a webservice

Problem Statement

Requirements

Design

Implementation Notes

Invertability

Clustermaps