Slaan oor na inhoud

Part 5.2 Varnish

Varnish

This chapter will teach you about the web accelerator proxy cache: Varnish.


Objectives: You will learn how to:

✔ Install and configure Varnish;
✔ Cache the content of a website.

🏁 reverse-proxy, cache

Knowledge: ⭐ ⭐
Complexity: ⭐ ⭐ ⭐

Reading time: 30 minutes


Generalities

Varnish is an HTTP reverse-proxy-cache service or a website accelerator.

Varnish receives HTTP requests from visitors:

  • if the response to the cached request is available, it returns the response directly to the client from the server's memory,
  • if it does not have the response, Varnish addresses the web server. Varnish then sends the request to the web server, retrieves the response, stores it in its cache, and responds to the client.

Responding from the in-memory cache improves response times for clients. In this case, there is no access to physical disks.

By default, Varnish listens on port 6081 and uses VCL (Varnish Configuration Language) for its configuration. Thanks to VCL, it is possible to:

  • Decide the content the client receives by way of transmission
  • What the cached content is
  • From what site and how do modifications of the response occur?

Varnish is extensible with VMOD modules (Varnish Modules).

Ensuring high availability

The use of several mechanisms ensures high availability throughout a web chain:

  • If Varnish is behind load balancers(LBs), they are already in HA mode, as the LBs are generally in cluster mode. A check from the LBs verifies varnish availability. If a varnish server no longer responds, it is automatically removed from the pool of available servers. In this case, the Varnish is in ACTIVE/ACTIVE mode.
  • if varnish is not behind an LB cluster, clients address a VIP (see Heartbeat chapter) shared between the 2 varnishes. In this case, varnish is in ACTIVE/PASSIVE mode. The VIP switches to the second varnish node if the active server is unavailable.
  • When a backend is no longer available, you can remove it from the varnish backend pool, either automatically (with a health check) or manually in CLI mode (useful for easing upgrades or updates).

Ensuring scalability

If the backends are no longer sufficient to support the workload:

  • either add more resources to the backends and reconfigure the middleware
  • or add another backend to the varnish backend pool

Facilitating scalability

A web page is often composed of HTML (often dynamically generated by PHP) and more static resources (JPG, gif, CSS, js, and so on) during creation. It quickly becomes interesting to cache the cacheable resources (the static ones), which offloads many requests from the backends.

Note

Caching web pages (HTML, PHP, ASP, JSP, etc.) is possible but more complicated. You need to know the application and whether the pages are cacheable, which should be true with a REST API.

When a client accesses a web server directly, the server must return the same image as often as the client requests. Once the client has received the image for the first time, it is cached on the browser side, depending on the configuration of the site and the web application.

When accessing the server behind a properly configured cache server, the first client requesting the image will initiate an initial backend request. However, caching of the image will occur for a certain period of time, and subsequent delivery will be directed to other clients requesting the same resource.

Although a well-configured browser-side cache reduces the number of requests to the backend, it complements the use of a varnish proxy cache.

TLS certificate management

Varnish cannot communicate in HTTPS (and it is not its role to do so).

The certificate must, therefore, be either:

  • carried by the LB when the flow passes through it (the recommended solution is to centralize the certificate, etc.). The flow then passes unencrypted through the data center.
  • carried by an Apache, Nginx, or HAProxy service on the varnish server itself, which only acts as a proxy to the varnish (from port 443 to port 80). This solution is useful if accessing varnish directly.
  • Similarly, Varnish cannot communicate with backends on port 443. When necessary, you need to use an Nginx or Apache reverse proxy to decrypt the request for varnish.

How it works

In a basic Web service, the client communicates directly with the service with TCP on port 80.

How a standard website works

To use the cache, the client must communicate with the web service on the default Varnish port 6081.

How Varnish works by default

To make the service transparent to the client, you must change the default listening port for Varnish and the web service vhosts.

Transparent implementation for the customer

To provide an HTTPS service, add either a load balancer upstream of the varnish service or a proxy service on the varnish server, such as Apache, Nginx, or HAProxy.

Configuration

Installation is simple:

dnf install -y varnish
systemctl enable varnish
systemctl start varnish

Configuring the varnish daemon

Since systemctl, varnish parameters are setup thanks to the service file /usr/lib/systemd/system/varnish.service:

[Unit]
Description=Varnish Cache, a high-performance HTTP accelerator
After=network-online.target

[Service]
Type=forking
KillMode=process

# Maximum number of open files (for ulimit -n)
LimitNOFILE=131072

# Locked shared memory - should suffice to lock the shared memory log
# (varnishd -l argument)
# Default log size is 80MB vsl + 1M vsm + header -> 82MB
# unit is bytes
LimitMEMLOCK=85983232

# Enable this to avoid "fork failed" on reload.
TasksMax=infinity

# Maximum size of the corefile.
LimitCORE=infinity

ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,256m
ExecReload=/usr/sbin/varnishreload

[Install]
WantedBy=multi-user.target

Change the default values thanks to systemctl edit varnish.service: this will create the /etc/systemd/system/varnish.service.d/override.conf file:

$ sudo systemctl edit varnish.service
[Service]
ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,512m

You can select the option several times to specify a cache storage backend. Possible storage types are malloc (cache in memory, then swap if needed), or file (create a file on disk, then map to memory). Sizes are expressed in K/M/G/T (kilobytes, megabytes, gigabytes, or terabytes).

Configuring the backends

Varnish uses a specific language called VCL for its configuration.

This involves compiling the VCL configuration file in C. If compilation is successful with no alarms, the service can be restarted.

You can test the varnish configuration with the following command:

varnishd -C -f /etc/varnish/default.vcl

Note

It is advisable to check the VCL syntax before restarting the varnishd daemon.

Reload the configuration with the command:

systemctl reload varnishd

Warning

A systemctl restart varnishd empties the varnish cache and causes a peak load on the backends. You should, therefore, avoid reloading varnishd.

Note

To configure Varnish, please follow the recommendations on this page: https://www.getpagespeed.com/server-setup/varnish/varnish-virtual-hosts.

VCL language

Subroutines

Varnish uses VCL files, segmented into subroutines containing the actions to run. These subroutines run only in the specific cases they define. The default /etc/varnish/default.vcl file contains the vcl_recv, vcl_backend_response and vcl_deliver routines:

#
# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.
#
# See the VCL chapters in the Users Guide at https://www.varnish-cache.org/docs/
# and http://varnish-cache.org/trac/wiki/VCLExamples for more examples.

# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {

}

sub vcl_backend_response {

}

sub vcl_deliver {

}
  • vcl_recv: routine called before sending the request to the backend. In this routine, you can modify HTTP headers and cookies, choose the backend, etc. See actions set req.
  • vcl_backend_response: routine called after reception of the backend response (beresp means BackEnd RESPonse). See set bereq. and set beresp. actions.
  • vcl_deliver: This routine is useful for modifying Varnish output. If you need to modify the final object (e.g., add or remove a header), you can do so in vcl_deliver.

VCL operators

  • =: assignment
  • ==: comparison
  • ~: comparison in combination with a regular expression and ACLs
  • !: negation
  • &&: and logic
  • ||: or logical

Varnish objects

  • req: the request object. Creates the req when Varnish receives the request. Most of the work in the vcl_recv subroutine concerns this object.
  • bereq: the request object destined for the web server. Varnish creates this object from req.
  • beresp: the web server response object. It contains the object headers from the application. You can modify the server response in the vcl_backend_response subroutine.
  • resp: the HTTP response sent to the client. Modify this object with the vcl_deliver subroutine.
  • obj: the cached object. Read-only.

Varnish actions

The most frequent actions:

  • pass: When returned, the request and subsequent response will come from the application server. No application of cache occurs. pass returns from the vcl_recv subroutine.
  • hash: When returned from vcl_recv, Varnish will serve the content from the cache even if the request's configuration specifies passing without a cache.
  • pipe: Used to manage flows. In this case, Varnish will no longer inspect each request but let all bytes pass to the server. Websockets or video stream management, for example, use pipe.
  • deliver: Delivers the object to the client. Usually from the vcl_backend_response subroutine.
  • restart: Restarts the request processing process. Retains modifications to the req object.
  • retry: Transfers the request back to the application server. Used from vcl_backend_response or vcl_backend_error if the application response is unsatisfactory.

In summary, illustrated in the diagram below are the possible interactions between subroutines and actions:

Transparent implementation for the customer

Verification/Testing/Troubleshooting

It is possible to verify that a page comes from the varnish cache from the HTTP response headers:

Simplified varnish operation

Backends

Varnish uses the term backend for the vhosts it needs to proxy.

You can define several backends on the same Varnish server.

Configuring backends is through /etc/varnish/default.vcl.

ACL management

# Deny ACL
acl deny {
"10.10.0.10"/32;
"192.168.1.0"/24;
}

Apply ACL:

# Block ACL deny IPs
if (client.ip ~ forbidden) {
  error 403 "Access forbidden";
}

Do not cache certain pages:

# Do not cache login and admin pages
if (req.url ~ "/(login|admin)") {
  return (pass);
}

POST and cookies settings

Varnish never caches HTTP POST requests or requests containing cookies (whether from the client or the backend).

If the backend uses cookies, content caching will not occur.

To correct this behavior, you can unset the cookies in your requests:

sub vcl_recv {
    unset req.http.cookie;
}

sub vcl_backend_response {
    unset beresp.http.set-cookie;
}

Distribute requests to different backends

When hosting several sites, such as a document server () and a wiki (), it is possible to distribute requests to the right backend.

Backends declaration:

backend docs {
    .host = "127.0.0.1";
    .port = "8080";
}

backend blog {
    .host = "127.0.0.1";
    .port = "8081";
}

Modification of req.backend object occurs according to the host called in the HTTP request in the vcl_recv subroutine:

sub vcl_recv {
    if (req.http.host ~ "^doc.rockylinux.org$") {
        set req.backend = docs;
    }

    if (req.http.host ~ "^wiki.rockylinux.org$") {
        set req.backend = wiki;
    }
}

Load distribution

Varnish can handle load balancing with specific backends called directors.

The round-robin director distributes requests to the round-robin backends (alternately). You can assign a weight to each backend.

The client director distributes requests according to a sticky session affinity on any header element (that is, with a session cookie). In this case, a client is always returned to the same backend.

Backends declaration

backend docs1 {
    .host = "192.168.1.10";
    .port = "8080";
}

backend docs2 {
    .host = "192.168.1.11";
    .port = "8080";
}

The director allows you to associate the 2 defined backends.

Director declaration:

director docs_director round-robin {
    { .backend = docs1; }
    { .backend = docs2; }
}

All that remains is to define the director as a backend to the requests:

sub vcl_recv {
    set req.backend = docs_director;
}

Managing backends with CLI

Marking backends as sick or healthy is possible for administration or maintenance purposes. This action allows you to remove a node from the pool without modifying the Varnish server configuration (without restarting it) or stopping the backend service.

View backend status: The backend.list command displays all backends, even those without a health check (probe).

$ varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   probe      Healthy 5/5
site.front02                   probe      Healthy 5/5

To switch from one state to another:

varnishadm backend.set_health site.front01 sick

varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   sick       Sick 0/5
site.front02                   probe      Healthy 5/5

varnishadm backend.set_health site.front01 healthy

varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   probe      Healthy 5/5
site.front02                   probe      Healthy 5/5

To let Varnish decide on the state of its backends, it is imperative to manually switch backends to sick or healthy backends and back to auto mode.

varnishadm backend.set_health site.front01 auto

Declaring the backends is done by following: https://github.com/mattiasgeniar/varnish-6.0-configuration-templates.

Apache logs

As the HTTP service is reverse proxied, the web server will no longer have access to the client's IP address but to the Varnish service.

To take reverse proxy into account in Apache logs, change the format of the event log in the server configuration file:

LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" varnishcombined

and take this new format into account in the website vhost:

CustomLog /var/log/httpd/www-access.log.formatux.fr varnishcombined

and make it Varnish compatible:

if (req.restarts == 0) {
  if (req.http.x-forwarded-for) {
    set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
  } else {
   set req.http.X-Forwarded-For = client.ip;
  }
}

Cache purge

A few requests to purge the cache:

on the command line:

varnishadm 'ban req.url ~ .'

using a secret and a port other than the default:

varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082 'ban req.url ~ .'

on the CLI:

varnishadm

varnish> ban req.url ~ ".css$"
200

varnish> ban req.http.host == example.com
200

varnish> ban req.http.host ~ .
200

via an HTTP PURGE request:

curl -X PURGE http://example.com/foo.txt

Configuring Varnish to accept this request is done with:

acl local {
    "localhost";
    "10.10.1.50";
}

sub vcl_recv {
    # directive to be placed first,
    # otherwise another directive may match first
    # and the purge will never be performed
    if (req.method == "PURGE") {
        if (client.ip ~ local) {
            return(purge);
        }
    }
}

Log management

Varnish writes its logs in memory and binary to not penalize its performance. When it runs out of memory space, it rewrites new records on top of old ones, starting from the beginning of its memory space.

It is possible to consult the logs with the varnishstat (statistics), varnishtop (top for Varnish), varnishlog (verbose logging), or varnishnsca (logs in NCSA format, like Apache) tools:

varnishstat
varnishtop -i ReqURL
varnishlog
varnishnsca

Using the -q option to apply filters to logs is done using:

varnishlog -q 'TxHeader eq MISS' -q "ReqHeader ~ '^Host: rockylinux\.org$'"
varnishncsa -q "ReqHeader eq 'X-Cache: MISS'"

varnishlog and varnishnsca daemons logs to disk independently of the varnishd daemon. The varnishd daemon continues to populate its logs in memory without penalizing performance towards clients; then, the other daemons copy the logs to disk.

Workshop

For this workshop, you will need one server with Apache services installed, configured, and secured, as described in the previous chapters.

You will configure a reverse proxy cache in front of it.

Your server has the following IP addresses:

  • server1: 192.168.1.10

If you do not have a service to resolve names, fill the /etc/hosts file with content like the following:

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.10 server1 server1.rockylinux.lan

Task 1: Installation and configuration of Apache

sudo dnf install -y httpd mod_ssl
sudo systemctl enable httpd  --now
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
echo "<html><body>Node $(hostname -f)</body></html>" | sudo tee "/var/www/html/index.html"

Verify:

$ curl http://server1.rockylinux.lan
<html><body>Node server1.rockylinux.lan</body></html>

$ curl -I http://server1.rockylinux.lan
HTTP/1.1 200 OK
Date: Mon, 12 Aug 2024 13:16:18 GMT
Server: Apache/2.4.57 (Rocky Linux) OpenSSL/3.0.7
Last-Modified: Mon, 12 Aug 2024 13:11:54 GMT
ETag: "36-61f7c3ca9f29c"
Accept-Ranges: bytes
Content-Length: 54
Content-Type: text/html; charset=UTF-8

Task 2: Install varnish

sudo dnf install -y varnish
sudo systemctl enable varnishd --now
sudo firewall-cmd --permanent --add-port=6081/tcp --permanent
sudo firewall-cmd --reload

Task 3: Configure Apache as a backend

Modify /etc/varnish/default.vcl to use apache (port 80) as backend:

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "80";
}

Reload Varnish

sudo systemctl reload varnish

Check if varnish works:

$ curl -I http://server1.rockylinux.lan:6081
HTTP/1.1 200 OK
Server: Apache/2.4.57 (Rocky Linux) OpenSSL/3.0.7
X-Varnish: 32770 6
Age: 8
Via: 1.1 varnish (Varnish/6.6)

$ curl http://server1.rockylinux.lan:6081
<html><body>Node server1.rockylinux.lan</body></html>

As you can see, Apache serves the index page.

Some headers have been added, giving us information that our request was handled by varnish (header Via) and the cached time of the page (header Age), which tells us that our page was served directly from the varnish memory instead of from the disk with Apache.

Task 4: Remove some headers

We will remove some headers that can give unneeded information to hackers.

In the sub vcl_deliver, add the following:

sub vcl_deliver {
    unset resp.http.Server;
    unset resp.http.X-Varnish;
    unset resp.http.Via;
    set resp.http.node = "F01";
    set resp.http.X-Cache-Hits = obj.hits;
    if (obj.hits > 0) { # Add debug header to see if it is a HIT/MISS and the number of hits, disable when not needed
      set resp.http.X-Cache = "HIT";
    } else {
      set resp.http.X-Cache = "MISS";
    }
}

Test your config and reload varnish:

$ sudo varnishd -C -f /etc/varnish/default.vcl
...
$ sudo systemctl reload varnish

Check the differences:

$ curl -I http://server1.rockylinux.lan:6081
HTTP/1.1 200 OK
Age: 4
node: F01
X-Cache-Hits: 1
X-Cache: HIT
Accept-Ranges: bytes
Connection: keep-alive

As you can see, removing the unwanted headers occurs while adding the necessary ones (to troubleshoot).

Conclusion

You now have all the knowledge you need to set up a primary cache server and add functionality.

Having a varnish server in your infrastructure can be very useful for many things besides caching: for backend server security, for handling headers, for facilitating updates (blue/green or canary mode, for example), etc.

Check your Knowledge

✔ Can Varnish host static files?

  • True
  • False

✔ Does the varnish cache have to be stored in memory?

  • True
  • False

Author: Antoine Le Morvan

Contributors: Ganna Zhyrnova