During the past year, CIM has submitted many features to the HttpClient Caching module. We recently ran some benchmarks to quantify the performance benefits and test a few failure scenarios.
Through our benchmarks, we aimed to do the following:
- Characterize latency and capacity benefits provided by the caching module.
- Characterize latency and capacity benifits of locally-bound memcached instances vs. memcached pool. (when is one better than the other?)
- Verify failover behavior of consistent-hashing memcached algorithm by killing off one of the memcached instances after the cache has warmed.
We made a few choices to help simplify our testing:
- The client issued unconditional requests. Conditional requests (If-Match, If-Modified-Since) were only sent from the cache to the origin.
- The client issued an even distribution of requests.
- The origin only responded with HTTP 200 or 304 status codes.
- Target figures are for a warmed cache. We let the tests run long enough so that the warm-up period was negligible.
- Java 1.6.0_22
- HttpClient trunk, revision 1024393
- 4 large EC2 instances running Ubuntu 10.04 (Lucid Lynx) (ami-4234de2b)
- Ubuntu packages:
- memcached 1.4.2-1ubuntu3
- apache2 2.2.14-5ubuntu8.3
- libapache2-mod-php5 5.3.2-1ubuntu4.5
At startup the client creates a connection pool and starts a group of worker threads, each of which submits requests to the server.
Request URLs are randomly selected from a configured range, resulting in an even distribution.
Each worker collects its own statistics, which are aggregated and logged by a reporter thread at the configured interval. The statistics included: requests per second, latency (average, median, 95 and 99th percentile), mapping of HTTP response codes to quantity, and a mapping of cache events (hits, misses, validations) to quantity.
Our origin server is a php script which was dropped into an Apache instance’s docroot.
Our origin requirements:
- Serve varied max-age values across the URL space to ensure staggered cache revalidation.
- Recognize conditional requests, and be able to decide to return a 200 or 304 in a predictable way.
- Sleep for a configured amount of time during each response in order to establish a minimum latency.
- Generate responses of a configured size.
We looked at production access logs to get an idea of our real-world cache hit ratio. We also derived that 25% of our conditional requests (from cache revalidation) should return a 304 response.
We ran the following tests:
1. No Caching
All requests were sent directly to the origin.
2. Local Memcached – Bounded
Each client used its own memcached instance. memcached was configured to store around half of our total data set size. (Total data set was 5MB, memcache stored only 2.5MB)
3. Local Memcached – Unbounded
Each client used its own memcached instance. memcached was able to store our entire data set.
4. Pooled Memcached
Each client shared a memcache pool which used consistent hashing (ketama) to store cache entries.
5. Consistent Hashing Memcached – Failover
Using the same shared memcached pool as test 4, we failed 2 of the memcached nodes one at a time and restored them.
This graph shows latency over time for all tests. After the cache filled, latency was constant.
Note that the high latency for the “Local Memcached – Bounded” test can be attributed to our even distribution of request URLs. This ensured that items were constantly being evicted from the cache. A more Pareto-like distribution would have kept the more frequently accessed items in the cache for longer periods of time, which is similar to the patterns we see in our production environments.
Memcached Failover – Cache Misses
The first post cache-warming spike corresponds to the first memcached node being killed. The second spike came after the second memcached node was killed.
The last 2 spikes correspond to the 2 failed memcached nodes being brought back to life, and the likely shuffling of data around the consistent hashing ring (with “shuffling” meaning that cache misses are occurring despite cache entries existing on some of the nodes that were alive for the entire test).
Memcached Failover – Cache Events
You can see that despite multiple memcached nodes failing, our cache hit ratio never dropped by more than 20%.
The HttpClient caching module successfully reduces request latency and server load given the proper operating conditions.
Using the cache, we observed that while higher percentile latency was not heavily effected (cache misses will always dominate the tail end of latencies), median and average latency were greatly reduced.
Using a local memcached instance lowered request latency, which can be partially attributed to memcached running on the same host.
Using pooled memcached instances we saw that load was greatly reduced on the origin at the expense of slightly higher latency. Lesser load can be attributed to the cache filling quicker and each node cooperating to keep the cache current. The higher latency can be attributed to the overhead of locating data and the cost of the communicating with other memcached instances over the network.
Finally, we see that when using consistent hashing, node failure is handled gracefully.
Thanks to Michajlo Matijkiw for doing the bulk of the benchmarking work.
Also thanks to everyone at CIM who has contributed to the HttpClient caching module: Jon Moore, Ben Schmaus, Joe Campbell, Mohammed Uddin, Dave Mays, Brad Spenla, Dave Cleaver.