Last days I worked setting up a new web serving structure for Wine, the largest wine’s e-commerce in Latin America. After testing, studying and learning a lot, we built a nice solution based on nginx and memcached. I will use a picture to describe the architecture (sorry, I’m not so good with pictures =P):

As you can see, when a client do a request to the nginx server, it first checks on memcached if the response is already cached. If the response was not found on cache server, then nginx forward the request to Tomcat, which process the request, cache the response on memcached and returns it to nginx. Tomcat works only for the first client, and all other clients requesting the same resource will get the cached response on RAM. My objective with this post is to show how we built this architecture.
nginx
nginx was compiled following Linode instructions for nginx installation from source. The only difference is that we added the nginx memcached module. So, first I downloaded the memc_module source from Github and then built nginx with it. Here is the commands for compiling nginx with memcached module:
$ make
$ sudo make install
After install nginx and create an init script for it, we can work on its settings for integration with Tomcat. Just for working with separate settings, we changed the nginx.conf file (located in /opt/nginx/conf directory), and it now looks like this:
worker_processes 1;
error_log logs/error.log;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
include /opt/nginx/sites-enabled/*;
}
See the last line on http directive: this line tells nginx to include all settings present in the /opt/nginx/sites-enabled directory. So, now, let’s create a default file in this directory, with this content:
listen 80;
server_name localhost;
default_type text/html;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
if ($request_method = POST) {
proxy_pass http://localhost:8080;
break;
}
set $memcached_key "$uri";
memcached_pass 127.0.0.1:11211;
error_page 501 404 502 = /fallback$uri;
}
location /fallback/ {
internal;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_redirect off;
proxy_pass http://localhost:8080;
}
}
Some stuffs must be explained here: the default_type directive is necessary for proper serving of cached responses (if you are cache other content types like application/json or application/xml, you should take a look at nginx documentation and deal conditionally with content types). The location / scope defines some settings for proxy, like IP and host. We just do it because we need to pass the right information to our backend (Tomcat or memcached). See more about proxy_set_header at nginx documentation. After that, there is a simple verification on the request method. We don’t want to cache POST requests.
Now we get the magic: first we set the $memcached_key and then we use the memcached_pass directive, the $memcached_key is the URI. memcached_pass directive is very similar to proxy_pass, nginx “proxies” memcached to get the response. So we can get some HTTP status code, like 200, 404 or 502. We define two error handlers for two status codes:
- 404: memcached module returns a 404 error when the key is not on memcached server
- 502: memcached module returns a 502 error when it can’t found memcached server (it is a bad gateway error, the same you get if you start nginx withou start Tomcat ;D)
So, when nginx gets any of those errors, it should forward the request to Tomcat, creating another proxy. We configured it out on fallback, an internal location that builds a proxy between nginx and Tomcat (listening on port 8080). Everything is set up with nginx. As you can see in the picture or in the nginx configuration file, nginx doesn’t put anything on cache, it only gets cached items. The application should put everything on cache. Let’s do it :)
Java application
Now is the time to write some code :) I chose an application written by a friend. It’s a very simple CRUD of users, written by Washington Botelho with the goal of introducing VRaptor, a powerful and fast development focused web framework. Washington also wrote a blog post explaining the application, if you don’t know VRaptor or want to know how the application was built, check the blog post “Getting started with VRaptor 3″. I forked the application, made some minor changes and added a magic filter for caching. All Java code that I want to show here is the filter code:
import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.net.InetSocketAddress;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;
import net.spy.memcached.MemcachedClient;
/**
* Servlet Filter implementation class MemcachedFilter
*/
public class MemcachedFilter implements Filter {
private MemcachedClient mmc;
static class MemcachedHttpServletResponseWrapper extends HttpServletResponseWrapper {
private StringWriter sw = new StringWriter();
public MemcachedHttpServletResponseWrapper(HttpServletResponse response) {
super(response);
}
public PrintWriter getWriter() throws IOException {
return new PrintWriter(sw);
}
public ServletOutputStream getOutputStream() throws IOException {
throw new UnsupportedOperationException();
}
public String toString() {
return sw.toString();
}
}
/**
* Default constructor.
*/
public MemcachedFilter() {
}
/**
* @see Filter#destroy()
*/
public void destroy() {
}
/**
* @see Filter#doFilter(ServletRequest, ServletResponse, FilterChain)
*/
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
MemcachedHttpServletResponseWrapper wrapper = new MemcachedHttpServletResponseWrapper((HttpServletResponse) response);
chain.doFilter(request, wrapper);
HttpServletRequest inRequest = (HttpServletRequest) request;
HttpServletResponse inResponse = (HttpServletResponse) response;
String content = wrapper.toString();
PrintWriter out = inResponse.getWriter();
out.print(content);
if (!inRequest.getMethod().equals("POST")) {
String key = inRequest.getRequestURI();
mmc.set(key, 5, content);
}
}
/**
* @see Filter#init(FilterConfig)
*/
public void init(FilterConfig fConfig) throws ServletException {
try {
mmc = new MemcachedClient(new InetSocketAddress("localhost", 11211));
} catch (IOException e) {
e.printStackTrace();
throw new ServletException(e);
}
}
}
First, the dependency: for memcached communication, we used spymemcached client. It is a simple and easy to use memcached library. I won’t explain all the code, line by line, but I can tell the idea behind the code: first, call doFilter method on FilterChain, because we want to get the response and work with that. Look the MemcachedHttpServletResponseWrapper object, it encapsulates the response and makes easier to play with response content.
We get the content, write it on response writer and put it in cache using the MemcachedClient provided by spymemcached. The request URI is the key and timeout is 5 seconds.
web.xml
Last step is to add the filter on web.xml file of the project, map it before the VRaptor filter is very important for proper working:
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
<display-name>memcached sample</display-name>
<filter>
<filter-name>vraptor</filter-name>
<filter-class>br.com.caelum.vraptor.VRaptor</filter-class>
</filter>
<filter>
<filter-name>memcached</filter-name>
<filter-class>com.franciscosouza.memcached.filter.MemcachedFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>memcached</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
<filter-mapping>
<filter-name>vraptor</filter-name>
<url-pattern>/*</url-pattern>
<dispatcher>FORWARD</dispatcher>
<dispatcher>REQUEST</dispatcher>
</filter-mapping>
</web-app>
That is it! Now you can just run Tomcat on port 8080 and nginx on port 80, and access http://localhost on your browser. Try some it: raise up the cache timeout, navigate on application and turn off Tomcat. You will still be able to navigate on some pages that use GET request method (users list, home and users form).
Check the entire code out on Github: https://github.com/fsouza/starting-with-vraptor-3. If you have any questions, troubles or comments, please let me know! ;)
Nice article – clear, clean and simple! I like that you access memcached directly from nginx instead of having Java code to check if an item is cached.
Was there a reason you didn’t use proxy_cache? You wouldn’t need memcached then.
Yeah, there was.
We also tested proxy_cache, and it is really fast and a solution so much easier to implement, but memcached is faster than it. The speed was the only reason.
Great post and good work on Wine Francisco.
Congratulations! (:
Thanks for the post. That’s a nice image — what tools did you use to create the architecture image?
Hi Steve!
I used Gliffy [1], an online diagram tool.
[1] http://www.gliffy.com/
“I spent last days setting …”
I see a lot of misusage of ‘last’ and ‘since’ in non-native English speakers.
In this post, you would have wanted to started it as, “I spent the last x days setting …”
Great article, regardless. Have a great Christmas!
Thanks Anonymous :)
Isn’t EHCache (or other Java-based caches) a better choice for Java apps? (than memcached). I’ve seem some benchmarks that showed EHCache performs better (lazy to find them now). Did you review Java caching options as well?
No, I didn’t review other Java caching options. We’ve chose memcached because we are more experienced with that.
Thank you for your sugestion and comment. I will check it! ;)
Very good article. Congratulations !
Pingback: World Spinner
In your doFilter, I would check for POST first. If it is then you don’t need to use your wrapper response and can just call chain.doFilter. This avoids the overhead of writing the response to an intermediate string and then writing the response again in POSTs.
Check out TeeOutputStream in Commons IO. This would allow the response to be streamed directly back to the browser and to your temporary buffer to write to memcached. Should be a smoother experience all round.
Looks great!
Thanks for the tip ;)
How much QPS (Queries per Seconds) you achieved with this architecture.
I am using Amazon EC2 instance with following configuration.
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
OS: Linux, 64-bit platform
With your architecture, I am not able reach more than 600 qps.
I tried by changing worker_processes to 5, worker_connections to 1000 and on Memcache side I changed MAXCONN to 20000.
But still it’s not going above 600 qps.
Instead of this, If I changed my java code and let it handle set & get data from memcache (instead of nginx). I am able to reach till 900 qps.
Thank you for this wonderful article. Thank you once again for the reference to gliffy.com.
Pingback: film izle
…just stumbled over your article while being redirected from page about page regarding the nginx topic – actually i forgot how i came initially about to start browsing on nginx at all :-) but just wanted to say thanks for the shared information. great read!
Hello, I’m also a great fan of NGINX and its caching system (I’ve developed a plugin to invalidate Nginx cache from wordpress http://wordpress.org/extend/plugins/nginx-manager/)
With your setup is it possibile to use this Nginx module https://github.com/FRiCKLE/ngx_cache_purge to delete cache entries ? Or I’ve to manage it at the application level ?
How do you manage cache lifetime with your system ?
Thanks
–
Simone
Hi Simone,
you’d need to invalidate the cache from the application, or use another plugin for memcached invalidation. It is not related to nginx’s cache, but memcached itself.
You can use the memc-nginx-module, which provides delete, so you can have your purge url that purges from memcached instead of nginx’s cache.
Regards,
Francisco