Varnish'ing over Drupal

by ekes

I've recently been spending more time getting Drupal sites served to survive the storm. We've had a site running the multiple apache mirrors trick using Boost and now a few sites using Varnish as a specialised reverse proxy.

There's some great work going on for Drupal 7 to make it more cache friendly. Looking at not actually starting sessions till they are needed, and adding appropriate headers for example. There are several issue to take into account here. For example Drupal is going to serve some content in different languages with the same URL, even if in the future it may only be the front page. It also will serve content to authorised, logged in users, as well as anonymous users on the same URL. Even if we cache cleverly we need to make sure that any other caches on the way do so too, or don't cache.

Varnish is quite nice for this as you can write pretty complicated rules for caching very simply in its vcl syntax. What I've here is just a start for what could be done, especially when Drupal is more cache aware. For now I'm just correcting some of the things that in the future will be done by core.

'Static' files

Drupal sends lots of files that can be cached and they could have headers so that they aren't rerequested - or if they are they can recieve a 304 not modified response rather than the file. Which these are will depend on your site set up, with different content changing for logged in users, or if you have messages displayed to who aren't logged in. Anything in your /sites directory (css, javascript), and depending where you put them, uploaded files are really promising candidates for general caching. In varnish this can be as simple as:-
  sub vcl_recv {
    if (req.request == "GET" && req.url ~ "^/sites/") {
      /* we only ever want to deal with GET requests, we are working
      /* on the assumption that everything in sites is served the same
      /* to all users so we don't want the cookie */

      unset req.http.cookie;
      lookup;
    }
  }

  sub vcl_fetch {
    if (req.request == "GET" && req.url ~ "^/sites/") {
      /* we can unset the Cookie Drupal adds, set a lifetime for the object
      /* and make it cacheable */

      unset obj.http.Set-Cookie;
      set obj.cacheable = true;
      # we can set how long Varnish will keep the object here, or later
      # set obj.ttl = 30m;
      # debug add this and you'll see it in the headers if we came here
      # set obj.http.X-Drupal-Varnish-Debug = "1";
    }
    if (obj.cacheable) {
      /* Things common to all cacheable objects, here it removes
      /* the Expires that are often in the past, sets cache control
      /* and how long varnish will keep it
      /* and mark it for delivery (and storing) */

      unset obj.http.expires;
      set obj.http.cache-control = "max-age = 900";
      set obj.ttl = 1w;
      set obj.http.magicmarker = "1";
      deliver;
    }
  }

  sub vcl_deliver {
    if (resp.http.magicmarker) {
      /* unset marker and serve it for upstream as new */
      unset resp.http.magicmarker;
      set resp.http.age = "0";
    }
  }
As I put my files in /files, and as files links that want storing often have a languge code before them I use another if block elsif (req.request == "GET") && req.url (^/[a-zA-Z]{2})?/files/"), and can use this to set alternative cache times on them. This could also help with the load created by the private file method, using the trick below for knowing which files can be seen by anonymous users and allowing them to have modified cacheable headers, thus reducing the number of times php has to serve files that can be seen to anonymous users.

Caching anonymous page views

In addition to the language and automatic session cookie issues mentioned above our cache doesn't know if users are logged in. We want to cache the anonymous page views (at least those that won't have drupal_set_messages, or other individual changing content, on them). Boost does this neatly by setting it's own cookie when users log in, and unsetting it when logged out. I've used this along side Boost checking for the req.http.Cookie !~ "DRUPAL_UID" and also pinched the code and made a very simple Varinish helper module. So adding some caching for these pages.
  sub vcl_recv {
    ...
    elsif (req.request == "GET" && req.http.Cookie !~ "DRUPAL_VARNISH") {
      /* this site has drupal_set_messages and importantly changing content
      /* for anon users only on  /user page */

      /* It was tempting to unset.http.cookie; here but it's needed to
      /* stop users who log out getting the last page they saw logged in */

      lookup;
    }
  }

  sub vcl_fetch {
    ...
    elsif (req.request == "GET" && req.http.cookie ~! "DRUPAL_VARNISH") {
      if (req.url !~ "(/[a-zA-Z]{2})?/user" && req.url !~ "(/[a-zA-Z]{2})?/admin") {
        /* We don't want the ttl so long on these pages, so we must set
        /* it in the different if blocks rather than cacheable here */

        set obj.ttl = 30m;
        unset obj.http.Set-Cookie;
        if (req.url !~ "^[a-zA-Z]{2}/") {
          /* make sure that language is taken into account on caching pages
          /* without a langage code in the url, and make sure that caches
          /* know if there is a cookie with the page it's not to use the
          /* cached one */

          set obj.http.Vary = "Accept-Language, Cookie";
        }
        else {
          set obj.http.Vary = "Cookie";
        }
      }
    }
  }

Older versions of Varnish

Varnish comes out of EPEL for RHEL/CentOS, and is pretty up todate. I've done this also with a Debian stable box and as the version of Varnish is older the supported syntax for vcl is a bit more limited. You can't change the obj.cacheable boolean for example so I used an obj.http.value that I then unset. The comparitive (a !~ b) was causing errors, when (! a ~ b) didn't. The command deliver is called insert, and unset is remove.

Comments

Older versions of Varnish

> I've done this also with a Debian stable box and as the version
> of Varnish is older the supported syntax for vcl is a bit more
> limited.

I found it to be impossible to get Pressflow working with varnish 1.1.2 from Debian GNU/Linux "Lenny".

Would you mind to elaborate your "Debian stable" setup?

Thanks & greetings, -asb

You may also want to try

You may also want to try running the update.php script. If you do not know how to run update.php or your user > does not have the permissions to run it (and you do not have the user1 login), Next, you'll want to confirm that the change has had an effect with a phpinfo() page. If you are hosting the site and it didn't work, check that you were modifying the correct file (it's named in the phpinfo). If your site is hosted by someone else and you failed to increase the memory limit, then your host has probably locked it down (for good reason) and you'll have to negotiate with them. There may be a few work-arounds to try, like creating a custom php.ini, but it will vary from host to host.

Incorrect operator

First of all, thanks for the article, you have some good tips here.

I did notice one problem, though: you have an incorrect operator in the second sub vcl_fetch routing:

elsif (req.request == "GET" && req.http.cookie ~! "DRUPAL_VARNISH") {

should read

elsif (req.request == "GET" && req.http.cookie !~ "DRUPAL_VARNISH") {

A Drupal 6 version of the module

Just the basic D5 module linked above with the basic changes for Drupal 6: varnish module 2009-05-08

And even on a quieter site the effort can be worth it

for that sudden rise in traffic. Varnish graph below is for a Drupal site that has a lot of anon node and comment posts and traffic, but not really that many logged in users:

May First graph

Except for a huge load spike at the start of the day, which solved itself without intervention, the rest wasn't really hugely noticeable on apache/mysql/load

I see you use the munin

I see you use the munin graph, I also use him, but I don't now what the script (or conf) should be for varnish analyz.
so I ask you, can you share with me ?

thanks

Another path to exclude

with /admin and /user for Drupal 6 is /openid