Just enough Ansible for Drupal

Introduction

I had been getting by with shell scripts and SFTP to deploy Drupal sites until recently. After using Ansible for a few weeks, I realized how much I've been missing all these days. I share some of my notes on how to use Ansible to setup and deploy Drupal infrastructure in this post. Besides, this is also meant to be a full blown introduction to Ansible. A lot of tutorials don't cross the "hello world" realm and I wanted to go beyond that, hence an epic post!

There are some assumptions I'm holding about the reader and the setup, like:

  • You know what Ansible is and what it does. In simple terms, it is a tool to automate your IT infrastructure.
  • You already have Ansible setup and installed in your OS, whatever it is. The site contains instructions for each OS. Make sure you download the latest stable version of Ansible, which is 2.3.1.0 at the time of writing this.

  • Most ansible walkthroughs tell you how to setup and provision stuff using Vagrant. Though this is useful in a pedagogical sense, we don't use Ansible to provision on Vagrant in production. I suggest you try it on a real setup, like DigitalOcean or AWS. This post will use DigitalOcean(affiliate link), where you will run a server for a few hours at most. It will incur costs, but you will end up spending less than the cost of a latte for running a DO machine for a few hours. Besides, you get a 10$ credit while signing up, which means you can spin and use a 1GB RAM machine for 1 full month. That's good enough to get going with Ansible.

  • You have an idea of how to setup Drupal, as in, it runs on a LAMP stack and requires the settings.php file.

  • I'll stick to using the latest and most stable versions of everything. I'll run this on an Ubuntu 16.04 box. I'll setup Drupal on top of PHP 7. We will modify our ansible scripts to tweak PHP versions later in the article.

With assumptions out of the way, let's dive in and provision infrastructure. Your first step would be to sign up with DigitalOcean(affiliate link) if you haven't already, and create a 10$ 16.04 Ubuntu server. It could be from any region, preferably with some geographical proximity to your location. Go do it! Later in this article, we shall automate this server creation task as well using Ansible.

Once your machine is created(takes about a minute), note down the IP address of the machine, we are going to need it in a moment. Meanwhile, here's a brief intermezzo about how Ansible works. It communicates with target servers, runs remote commands etc. via SSH. This means that you need not have any other software installed on your infrastructure for Ansible to work. The fancy term for this is agentless architecture. This is one of the USPs of Ansible.

Ansible uses YAML syntax to describe the instructions needed to do a job, which in our case, is to deploy Drupal. These sequence of instructions are called "tasks" in Ansible lingo. Now, there are a lot of debatable aspects on using YAML to describe tasks. The major advantage being, YAML is easy to read, write and understand for both humans and computers. The downside being, YAML is not a real programming language, it is more of a declarative format. This implies that we can't write for-loops, functions and complex business logic. Ansible allows the use of basic logical and looping constructs though.

Almost any software setup is hardly a single task or command. It is a bunch of commands. For instance, Drupal involves:

  • Setting up the webserver, either Apache or Nginx
  • Setting up the database
  • Installing PHP and dependencies
  • Installing Drupal itself

Each of these tasks is a set of subtasks. For ex, setting up the database usually involves:

  • Installing the latest stable version of MySQL
  • Creating a database user with appropriate permissions
  • Creating a new database
  • Access control and security measures(sadly, this step is skipped in many installs.)

A bunch of subtasks which achieve a common goal is called a "playbook" in Ansible world.

Setup

Back to our newly minted machine. Let's ensure Ansible is able to communicate with our server. First, we need to create a configuration file for Ansible. This is called ansible.cfg and it's in INI format. This provides Ansible with a set of sensible defaults to operate on. This exists globally, but we can configure it on a per project basis.

[defaults]
hostfile = hosts
host_key_checking=false

The hostfile indicates the file which contains a set of machines on which our software needs to be provisioned. This is usually called the "inventory" in Ansible. In our case, the hosts file will be a plain text file which will contain the IP of the new DigitalOcean server we created.

The host_key_checking is a boolean which indicates whether we should check for the host key while doing an SSH. In other words, if you set this to false, you won't get this kind of a prompt while running ansible.

The authenticity of host '138.197.84.9 (138.197.84.9)' can't be established.
ECDSA key fingerprint is SHA256:cVfgg3nMw6K3sT/fQaLBiysbigx8YblQ7xaB8EtgpHw.
Are you sure you want to continue connecting (yes/no)?

We don't want any questions to prompt us and interfere with our automation, do we? There is one more caveat we should be aware of. Ansible expects a python interpreter to be available on the remote machine. All basic Linux setups have Python installed, which is the good part. The bad part is, Ansible expects this to be Python version 2.x. So, if the default Python version on the remote machine(s) is 3.x, Ansible bails out with this kind of an error.

104.236.233.105 | FAILED! => {
    "changed": false, 
    "failed": true, 
    "module_stderr": "Shared connection to 104.236.233.105 closed.\r\n", 
    "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", 
    "msg": "MODULE FAILURE"
}

We need to tell ansible to expect a Python 3 interpreter in all the hosts it hits. We do that by adding these lines in our hosts file.

04.236.233.105

[all:vars]
ansible_python_interpreter=/usr/bin/python3

The ansible_python_interpreter setting is the absolute path of the Python interpreter in the remote machine. The workaround for this is to ssh to the remote machine and install Python 2.x, as in:

$ apt -y update
$ apt install -y python-minimal

…which breaks the whole point of automation.

NOTE: Jeff Geerling has an alternate workaround to this issue.

The [all:vars] part needs a little explaining. It is possible to group hosts in Ansible. For example,

[webservers]
srv1.example.com
srv2.example.com

[dbs]
main.example.com
stg.example.com
bkup.example.com

The above hosts file contains 2 groups of infrastructure, webservers([webservers]), and databases, indicated by [dbs]. The purpose of groups is to have the provision to run ansible tasks on specific group(s) of servers, indicated by group name. For example, you might want to run DB migration related tasks only in DB servers, or, update Nginx only in the webservers. So, you run a bunch of tasks only for a group of servers. Ansible allows running some tasks across all groups. This is indicated by the [all] group. all stands for all the machines. [all:vars] are a set of variables which hold true for all the hosts in the inventory.

Let's run our first ansible command.

$ ansible -m ping

This command runs the "ping" module, indicated by the -m flag on the target machine and gets us the result.

104.236.233.105 | UNREACHABLE! => {
    "changed": false, 
    "msg": "Failed to connect to the host via ssh: Permission denied (publickey,password).\r\n", 
    "unreachable": true
}

This is because we haven't explicitly specified as to which user should we connect to the machine as. We can do that using a small change,

$ ansible all -u root -m ping
104.236.233.105 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}

Success! Let's run another command to list all files in / in the remote machine. Note that we can skip the -u root part in the command if we add ansible_user=root under the variables section of the hosts file, similar to ansible_python_interpreter variable. Let's make that change and run the next command,

$ ansible all -m shell -a 'ls /'
104.236.233.105 | SUCCESS | rc=0 >>
bin
boot
dev
etc
home
initrd.img
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
snap
srv
sys
tmp
usr
var
vmlinuz

Tasks

Enough with commandline stuff. Let's write an ansible playbook already. A playbook is a collection of Ansible tasks written in YAML, and like any other YAML file, begins with the ---. The hosts key indicates the hosts on which this playbook will be run. The tasks section contains the list of tasks to be run. Let's run a single task of updating the apt cache.

---
- hosts: all

  tasks:
    - name: Run apt update
      apt: update_cache=yes

This runs the command apt-get update in the target machine. To run this playbook, run this in your terminal.

$ ansible-playbook playbook.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [104.236.233.105]

TASK [Run apt update] **********************************************************
changed: [104.236.233.105]

PLAY RECAP *********************************************************************
104.236.233.105            : ok=2    changed=1    unreachable=0    failed=0

The TASK [setup] part gathers some general information about the target machine like distro name, CPU info, IP address etc. This is optional and you can skip this step if you don't need this information. In order to do this, set the gather_facts to False just below the hosts line.

Each task consists of an optional name, denoting the task name, usually a one line description of what the task is, followed by a module name, apt in this case and module parameters(update_cache). The actual grunt work for each task in Ansible is done by a module. A module can do any single task and can accept arguments. Modules can move around files, run shell commands, start/stop services etc. Most common tasks have modules for them in Ansible and you can also define your own. In this case, the module name is apt and update_cache is one of the arguments of the apt module and it holds the value yes. The name part in the task is optional but is strongly recommended so that other developers can read and understand what the task does.

For instance, here's a terse but less readable version of the same playbook(with gather_facts set to false).

---
- hosts: all
  gather_facts: False

  tasks:
    - apt: update_cache=yes

We can also run it as a typical shell script using another core module, command. As in,

- command: apt-get update

Ansible might also warn us to use the apt module instead.

$ ansible-playbook playbook.yml

PLAY [all] *********************************************************************

TASK [command] *****************************************************************
changed: [104.236.233.105]
 [WARNING]: Consider using apt module rather than running apt-get


PLAY RECAP *********************************************************************
104.236.233.105            : ok=1    changed=1    unreachable=0    failed=0

Ansible is superior to shell in many ways, but mainly,

  • Idempotency. You can run the same task on the same system again and again and expect the state of the system to remain the same. Shell scripts are not idempotent. Ansible also handles some edge cases for tasks which shell scripts don't do.
  • Readability. An ansible playbook is more human-friendly than a shell script, I'm sure this needs no explaining. The reporting capabilities of Ansible are better than shell scripts, indicating which task failed, with color coding.
  • Composability. As we will see, playbooks can be made reusable and packaged across different systems. Though this can be achieved with shell scripts also, Ansible makes this a lot easier.
  • Parallel execution. If we were to setup Apache webserver using a playbook on 4 different servers, they can be run across the 4 servers parallelly using Ansible. Shell scripts don't give this capability out of the box.

Tip: Always try to write a playbook when you think of writing a shell script, even locally!

We have successfully created our first playbook, which runs apt-get update on a remote machine. This is great, but not very useful. Let's add more tasks to it.

Installing PHP

We decided to stick with PHP 7, so let's install PHP 7 and its dependencies required to run Drupal.

tasks:
    # PHP 7 stuff
  - name: PHP | Add php-7.0 PPA
    apt_repository: repo='ppa:ondrej/php'
                state=present
                update_cache=yes

  - name: PHP | install php packages
    apt: pkg={{ item }} state=installed
    with_items:
      - php7.0-fpm
      - php7.0-cli
      - php7.0-common
      - php7.0-curl
      - php7.0-json
      - php7.0-gd
      - php7.0-mcrypt
      - php7.0-odbc
      - php7.0-mbstring
      - php7.0-mysql
      - php7.0-xmlrpc
      - php7.0-opcache
      - php7.0-intl
      - php7.0-bz2
      - php7.0-xml

The "install PHP packages task" is a frequently used looping pattern in Ansible. For every item in with_items list, the apt command is substituted with the package name in {{ item }} and run.

Each task is followed by the status of the task and also the state change. If there is no state change, the result starts with ok:, and changed: otherwise, both in different color codes. Try running the same playbook again on the same remote machine. Your output will be similar to this.

ansible-playbook playbook.yml

PLAY [all] *********************************************************************

TASK [PHP | Add php-7.0 PPA] ***************************************************
ok: [104.236.233.105]

TASK [PHP | install php packages] **********************************************
ok: [104.236.233.105] => (item=[u'php7.0-fpm', u'php7.0-cli', u'php7.0-common', u'php7.0-curl', u'php7.0-json', u'php7.0-gd', u'php7.0-mcrypt', u'php7.0-odbc', u'php7.0-mbstring', u'php7.0-mysql', u'php7.0-xmlrpc', u'php7.0-opcache', u'php7.0-intl', u'php7.0-bz2', u'php7.0-xml'])

PLAY RECAP *********************************************************************
104.236.233.105            : ok=2    changed=0    unreachable=0    failed=0

Download the code at this point.

Add a few variables

Let's install the next set of dependencies, MySQL. We begin by installing the required MySQL packages.

- name: MySQL | Install MySQL
  apt: pkg={{ item }} state=installed
  with_items:
    - mysql-common
    - mysql-server
    - python-mysqldb

While we are at it, let's implement some basic security measures, like disallowing root login from remote, resetting root password etc.

- name: MySQL | Disallow root login remotely
  command: 'mysql -NBe "{{ item }}"'
  with_items:
    - DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')

- name: MySQL | Update MySQL root password for localhost root account.
  shell: >
    mysql -u root -NBe
    'ALTER USER root@localhost IDENTIFIED WITH mysql_native_password BY "^superSecurePassword123$";'

We might want the user of this playbook to set their own root user password before running the playbook. Ansible allows this using variables. Variables are a separate section in the playbook, much like hosts or tasks, denoted by vars. Variables are case sensitive, can contain alphanumeric characters and _ and they start with _ or alphabets.

vars:
  mysql_root_password: ^superSecurePassword123$

And the updated root password reset task changes as,

- name: MySQL | Update MySQL root password for localhost root account.
  shell: >
    mysql -u root -NBe
    'ALTER USER root@localhost IDENTIFIED WITH mysql_native_password BY "{{ mysql_root_password }}";'

Ansible also provides a set of predefined variables about the remote machine, like the distribution name, the default package manager etc. This info can be obtained by running ansible all -m setup (warning: will throw up a huge dump of variables) and is available for use in the playbook if gather_facts is set to true(which is the default value). These predefined variables have the prefix ansible_. For instance,

- shell: echo "Either Ubuntu or Debian 'cause I'm using '{{ ansible_pkg_mgr }}'"
  when: ansible_pkg_mgr == "apt"

Sometimes, we might want to perform certain tasks depending on the output of some other task. We can do this by "register"ing the output of the latter into a variable and using it in the former. For example, this task gets the MySQL version and registers in a variable.

- name: Get MySQL version.
  command: 'mysql --version'
  register: mysql_version

These variables can be used in further tasks.

- name: Run a command
  command: 'some command'
  when: '5.7.' in mysql_version.stdout

Now that we have setup the root user for MySQL, let's use Ansible's MySQL module to create Drupal's DB and DB users.

First, augment the vars section.

mysql_root_home: /root
mysql_root_username: 'root'
mysql_root_password: "^superSecurePassword123$"

drupal_db:
  user: drupal
  password: "wo#24n$fTD&CqNSqD6"
  name: drupal

mysql_users:
  - name: "{{ drupal_db.user }}"
    host: "%"
    password: "{{ drupal_db.password }}"
    priv: "{{ drupal_db.name }}.*:ALL"

mysql_databases:
  - name: "{{ drupal_db.name }}"
    encoding: utf8mb4
    collation: utf8mb4_general_ci

This shows us 2 things about Ansible variables. They can be nested dictionaries, as in drupal_db.user. They can be referred to in later variables as well, as with the case of mysql_users.0.name.

Second, the actual task to create these users and databases.

- name: MySQL | Ensure MySQL users are present.
  mysql_user:
    name: "{{ item.name }}"
    host: "{{ item.host | default('localhost') }}"
    password: "{{ item.password }}"
    priv: "{{ item.priv | default('*.*:USAGE') }}"
    state: "{{ item.state | default('present') }}"
    append_privs: "{{ item.append_privs | default('no') }}"
  with_items: "{{ mysql_users }}"


- name: MySQL | Ensure MySQL databases are present.
  mysql_db:
    name: "{{ item.name }}"
    collation: "{{ item.collation | default('utf8_general_ci') }}"
    encoding: "{{ item.encoding | default('utf8') }}"
    state: present
  with_items: "{{ mysql_databases }}"

Now, when you try to run this playbook(don't do it yet!), you might run into an issue where mysql tries to login as root user without password and fail, thus failing both the above tasks. Something on the lines of,

TASK [MySQL | Ensure MySQL users are present.] *********************************
failed: [104.236.233.105] (item={u'host': u'%', u'password': u'wo#24n$fTD&CqNSqD6', u'name': u'drupal', u'priv': u'drupal.*:ALL'}) => {"failed": true, "item": {"host": "%", "name": "drupal", "password": "wo#24n$fTD&CqNSqD6", "priv": "drupal.*:ALL"}, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1045, \"Access denied for user 'root'@'localhost' (using password: NO)\")"}
        to retry, use: --limit @/home/lakshmi/ansible/d7/playbook.retry

When MySQL is installed remotely(in a headless fashion), no root password is set. To allow root login from Ansible and to secure our MySQL installation, we add these 2 steps in the same order after installing MySQL(Source).

- name: Update MySQL root password for localhost root account.
  shell: >
    mysql -u root -NBe
    'ALTER USER "{{ mysql_root_username }}"@"{{ item }}" IDENTIFIED WITH mysql_native_password BY "{{ mysql_root_password }}";'
  with_items: "{{ mysql_root_hosts.stdout_lines|default([]) }}"

- name: Copy .my.cnf file with root password credentials.
  copy:
    src: .my.cnf
    dest: "{{ mysql_root_home }}/.my.cnf"
    owner: root
    group: root
    mode: 0600

The first task updates the MySQL password of the root user using the shell module. The second task copies a .my.cnf file from the local machine to the target machine using the copy module. Here are the contents of .my.cnf file.

[client]
user=root
password=^superSecurePassword123$

The issue with the second task is, if the MySQL credentials are changed, the .my.cnf file needs to be updated to reflect the new credentials. It would be great if we could inject the mysql_root_password variable into the cnf file. Ansible allows us to do this using the template module. It works similar to the copy module, except that it injects the variables into the file before copying them. Here's how our updated .my.cnf file looks now,

[client]
user="{{ mysql_root_username }}"
password="{{ mysql_root_password }}"

It's renamed with a .j2 extension to indicate that it's a template file, and placed under a templates directory as a convention. Ansible uses the Jinja templating language for its templates. The second task is now modified as follows,

- name: Copy .my.cnf file with root password credentials.
  template:
    src: templates/root-my.cnf.j2
    dest: "{{ mysql_root_home }}/.my.cnf"
    owner: root
    group: root
    mode: 0600

Now, let's run this updated playbook. After the playbook is finished running successfully, try SSHing into the machine. You should be able to login to your MySQL terminal without giving any credentials.

Download the code at this point.

Strive for modularity

Our playbook contains a list of assorted tasks to install PHP and MySQL. We might add more tasks in future, like installing a web server like Apache or Nginx, downloading and setting up Drupal etc. Instead of having a big wall of code, splitting each set of tasks into separate files improves maintenance in the long run. Ansible has a construct called include to achieve this. There is also a provision to move all the variables to a separate file, usually called the vars.yml. This can be referred to in the main playbook under the vars_files section.

To start with, let's move the variables to vars.yml, split the current tasks into 2 files, one for PHP related stuff and another for MySQL. Then, let's update the playbook to include both these files.

---
- hosts: all

  vars_files:
    - vars.yml

  tasks:
    - include: mysql.yml
    - include: php.yml

Looks a lot cleaner. To add a task to install Apache, it's a matter of adding another file and including it in the playbook. While we are at it, let's also add 2 more files. One to install Drupal related dependencies, like Drush, in a file extras.yml. This might, in future contain other utilities like composer. The other file will download the latest stable version of Drupal 7 and place it in a folder of choice(currently hard coded).

Download the code at this point.

Handlers

Whenever we change some configuration related to Apache, we need to restart Apache for this change to take effect. In ansible, we do this using handlers. A handler is a task which gets triggered by calling a notify construct. Handlers are added to a playbook in the handlers section, similar to tasks and vars. Let's add a handler to restart Apache.

handlers:
  - name: restart apache
    service: name=apache2 state=restarted

We can call this handler using notify after the "Enable modules" task. This will trigger the "restart apache" task.

- name: Apache | Enable modules
  apache2_module:
    state: present
    name: rewrite
  notify: restart apache

How is this different from invoking a task to restart Apache? Firstly, handlers get triggered only if the corresponding task changes the state, and not otherwise. Secondly, a handler gets called only once(in the end) even if notified multiple times in several tasks.

Download the code at this point.

Multiple PHP versions

Currently, our PHP version is hard coded in the playbook. If we want to install a different version of PHP, we have to change the version in the vars.yml file. Let's make the PHP version configurable. This is just a simple change in the same file.

php_version: "5.6"
#php_version: "7.0"
php_packages:
  - "php{{ php_version }}"
  - "php{{ php_version }}-fpm"
  - "php{{ php_version }}-cli"
  - "php{{ php_version }}-common"
  - "php{{ php_version }}-curl"
  - "php{{ php_version }}-json"
  - "php{{ php_version }}-gd"
  - "php{{ php_version }}-mcrypt"
  - "php{{ php_version }}-odbc"
  - "php{{ php_version }}-mbstring"
  - "php{{ php_version }}-mysql"
  - "php{{ php_version }}-xmlrpc"
  - "php{{ php_version }}-opcache"
  - "php{{ php_version }}-intl"
  - "php{{ php_version }}-bz2"
  - "php{{ php_version }}-xml"
  - "libapache2-mod-php{{ php_version }}"

The actual task of installing PHP can get more detailed, as in uninstalling other versions of PHP if any in the system etc. For simplicity's sake, we don't include these tasks in our playbook.

Download the code at this point.

Apache or Nginx?

Drupal can be configured to work with both Apache and Nginx. Our playbook currently adds support only for the former. It's easy to add Nginx support. We first add a yaml to install and configure Nginx.

---
- name: Nginx | Install Nginx
  apt: pkg={{ item }} state=installed
  with_items:
    - nginx
    - "php{{ php_version }}-fpm"

- name: Nginx | Copy over vhosts configuration
  template:
    src: templates/nginx/vhosts.conf.j2
    dest: "/etc/nginx/sites-available/drupal.conf"
    owner: root
    group: root
    mode: 0600
  notify: restart nginx

- name: Nginx | Enable site
  file:
    src: "/etc/nginx/sites-available/drupal.conf"
    dest: "/etc/nginx/sites-enabled/drupal.conf"
    state: link
  notify: restart nginx

- name: Nginx | Disable default site
  file:
    path: "/etc/nginx/sites-enabled/default"
    state: absent
  notify: restart nginx

We add a configurable variable in vars.yml called webserver which is set to either apache or nginx.

# webserver: "apache"
webserver: "nginx"

In the main playbook, this can be referenced as follows,

tasks:
  - include: mysql.yml
  - include: php.yml
  - include: "{{ webserver }}.yml"
  - include: extras.yml
  - include: drupal.yml

That way, only the appropriate webserver task file in included in the first place! This is a common pattern in Ansible. I wouldn't go so far as to call it a best practice, but I've seen this style in a lot of playbooks. The handler part is a bit tricky. Nginx has 2 parts to it, the Nginx server itself and PHP FPM manager. If you have no idea about PHP FPM, I suggest you watch these videos, they're neat! We have to take care not to add the wrong handler, i.e. adding an apache handler when server is nginx, and vice versa. We use a when condition to prevent this:

handlers:
  - name: restart apache
    service: name=apache2 state=restarted
    when: webserver == "apache"
  - name: restart nginx
    service: name=nginx state=restarted
    notify: restart php-fpm
    when: webserver == "nginx"
  - name: restart php-fpm
    service: name="php{{ php_version }}-fpm" state=restarted
    when: webserver == "nginx"

Download the code at this point.

Great. How about we install some actual Drupal? First, we clone the repository using git module.

- name: Drupal | Get the latest stable version(7)
  git:
    repo: "https://github.com/drupal/drupal.git"
    version: "7.x"
    dest: "{{ drupal_docroot }}"

Git has a lot of other useful options, but we shall stick to the essentials here. The repo from where to clone from, the version(this can be a branch or tag name) and the directory where we clone the repo.

After getting the source, we can programmatically install Drupal using Drush. Let's write a task for this.

- name: Drupal | Drush site install
  command: >
    drush site-install standard -y
    --site-name="{{ drupal_site_name }}"
    --account-name={{ drupal_account_name }}
    --account-pass={{ drupal_account_pass }}
    --db-url=mysql://{{ drupal_db.user }}:{{ drupal_db.password }}@localhost/{{ drupal_db.name }}
    -r {{ drupal_docroot }}
  notify: "restart {{ webserver }}"

This invokes the Drush site-install command with the appropriate arguments picked up from variables section. This task is not exactly ansible friendly.

When run the second time, this task will fail stating that Drupal is already installed. This is the expected behavior, but we can make it more friendly and compatible with Ansible. First, we check if Drupal is already installed successfully,

- name: Drupal | Get site status
  command: >
    drush status --root={{ drupal_docroot }}
  register: drush_status

The register construct stores the output of the command that just ran previously in a variable, drush_status in this case. We then find out if the output contains the string "Drupal bootstrap Successful", which indicates that the site was successfully installed.

Here's what my drush_status holds after a successful install.

TASK [debug] *******************************************************************
ok: [139.59.76.57] => {
    "drush_status.stdout": " Drupal version                  :  7.56                       \n Site URI                        :  http://default             \n Database driver                 :  mysql                      \n Database hostname               :  localhost                  \n Database port                   :                             \n Database username               :  drupal                     \n Database name                   :  drupal                     \n Database                        :  Connected                  \n Drupal bootstrap                :  Successful                 \n Drupal user                     :                             \n Default theme                   :  bartik                     \n Administration theme            :  seven                      \n PHP configuration               :  /etc/php/5.6/cli/php.ini   \n PHP OS                          :  Linux                      \n Drush script                    :  /usr/local/bin/drush       \n Drush version                   :  8.1.12                     \n Drush temp directory            :  /tmp                       \n Drush configuration             :                             \n Drush alias files               :                             \n Install profile                 :  standard                   \n Drupal root                     :  /var/www/html/drupal       \n Drupal Settings File            :  sites/default/settings.php \n Site path                       :  sites/default              \n File directory path             :  sites/default/files        \n Temporary file directory path   :  /tmp                       "
}

We run the site install task only if Drupal was not bootstrapped successfully. Let's tweak our site-install task a bit.

- name: Drupal | Drush site install if not already installed
  command: >
    drush site-install standard -y
    --site-name="{{ drupal_site_name }}"
    --account-name={{ drupal_account_name }}
    --account-pass={{ drupal_account_pass }}
    --db-url=mysql://{{ drupal_db.user }}:{{ drupal_db.password }}@localhost/{{ drupal_db.name }}
    -r {{ drupal_docroot }}
  when: not drush_status.stdout | search("Drupal bootstrap\s+:\s+Successful")
  notify: "restart {{ webserver }}"

We just added a when condition which searches for a regex in drush_status. This, again, is a common pattern in Ansible.

Composer and Updating for Drupal 8

So far, we've only run Drupal 7 using this playbook. Drupal 8 is the new and happening thing now. People use both versions. Let's modify our playbook to accommodate both versions.

First, add a Drupal major version variable.

#drupal_major_version: 7
drupal_major_version: 8
drupal_7_branch: "7.x"
drupal_8_branch: "8.4.x"
drupal_version_branch: "{{ drupal_8_branch if drupal_major_version == 8 else drupal_7_branch }}"

The last variable is interesting. It sets the branch to clone from depending on the major version, as in,

- name: Drupal | Get the latest stable version
  git:
    repo: "https://github.com/drupal/drupal.git"
    version: "{{ drupal_version_branch }}"
    dest: "{{ drupal_docroot }}"

Also, we need to add composer support if we are installing Drupal 8 from source. We add 2 tasks for installing composer in extras.yml.

- name: Install Composer
  get_url:
    url: "{{ composer_phar_url }}"
    dest: /usr/local/bin/composer
  when: drupal_major_version == 8

- name: Ensure Composer is executable.
  file:
    path: /usr/local/bin/composer
    mode: 0755
  when: drupal_major_version == 8

Then, we run composer install inside Drupal's docroot. We can use the composer module for this.

- name: Drupal | Run composer if we are running D8
  composer:
    command: install
    working_dir: "{{ drupal_docroot }}"
  when: drupal_major_version == 8

Download the code at this point.

Converting our playbook into an ansible role

Our playbook is pretty useful now, but in Ansible terms, it could be made more reusable and modular. If we convert it into an Ansible Role that is. Roles present a higher level of abstraction and reusability. What if we could reuse not just playbooks by including them, but also variables, templates, and handlers? That's exactly what roles do. Ansible also provides a means to share roles via Ansible Galaxy unlike playbooks, which have limited reusability.

A role can be scaffolded by running the following command:

$ ansible-galaxy init lakshminp.drupal

The naming convention of a role is <user/organization name>.<role name>. Not a hard and fast rule, but just a rudimentary way to namespace roles. This will create an empty scaffold with the following files & directories:

  1. meta

A directory which contains meta information about the role, what OS the role is built for, author info, other roles this role depends on, code license etc.

  1. files

A directory of non templated files the role uses to provision.

  1. templates

Templates the role uses for provisioning.

  1. vars

The set of variables the role uses.

  1. defaults

Variables the role uses, but take a higher precedence when overriding than the variables inside vars.

  1. tasks

The actual tasks the role executes to provision.

  1. handlers

Contains handlers, if any, required for the role.

  1. README.md

A markdown flavored README which contains instructions readable by humans ;)

Note that not all roles need to contain all these directories. For instance, our Drupal role won't have the files directory. Likewise, a role which has no servers or services to restart won't have the handlers directory.

Once we've generated the scaffold, it's quite easy to migrate our current setup into a role. First, we move all the files included in our playbook.yml to the tasks directory, and move the contents of playbook.yml to the main.yml file inside tasks.

Next, we move all the templates contents to the new templates directory as is. For the handlers, we move the handlers section to a separate main.yml inside the handlers directory.

Now, we're left with only variables. How do we decide what goes into vars and what goes into defaults ? The rule of the thumb is, variables which are not frequently overridden or modified by us goes into vars, the rest go to defaults.

Things like mysql password, drupal admin user name, docroot, which webserver(apache vs nginx) go to defaults.

mysql_root_password: "SuperSecurePassword123"

drupal_db:
  user: drupal
  password: "tilInTIngyrAtr"
  name: drupal

drupal_docroot: "/var/www/html/drupal"
php_version: "5.6"
webserver: "nginx"

drupal_site_name: "Test site"
drupal_account_name: "admin"
drupal_account_pass: "admin123"

drupal_major_version: 8

Done. We've successfully converted our playbook into a role. Now, how to use this role? We have to do 2 things. First, point out to Ansible where to pick the role from, and second, actually include the role in our playbook. For the first part, we change our ansible.cfg,

[defaults]
hostfile = hosts
host_key_checking = false
roles_path = roles

For the second part,

---
- hosts: all
  gather_facts: no
  vars:
    php_version: "7.0"
    drupal_account_pass: "admin"
  roles:
    - lakshminp.drupal

The role variables which we want to override for the role can be added in the vars section, followed by the role name in the roles section. Our playbook looks a lot simpler now. But do you notice that we are going to check in this code(along with our secrets like password and username) into version control? We shall see how we handle this in the next section.

Download the code at this point.

Storing secrets in Ansible

Why check in our playbooks in the first place? It has a lot of benefits. We get reproducible builds and predictable deployments, for one. So, it's inevitable we check in our secret stuff as well, right?

Yes and no. Ansible has a provision to encrypt our secrets before putting them in version control, using Ansible Vault. Ansible vault(not to be confused with Hashicorp Vault, another excellent tool which addresses problems in the same space) encrypts a YAML file and decrypts it by using a user provided password.

Let's first move all the secrets to an individual file.

---
vault_mysql_root_password: "foridErYnThIvESTAItYpiGO"
vault_drupal_db_user: "drupal"
vault_drupal_db_password: "OGHaWMEdartotORdiCAnKLiE"
vault_drupal_account_name: "admin"
vault_drupal_account_pass: "rUsectaRATeNdEFIChUseadE"

We can reference this file in the main playbook as follows,

---
- hosts: all
  gather_facts: no
  vars_files:
    - secrets.yml
  vars:
    mysql_root_password: "{{ vault_mysql_root_password }}"
    drupal_db:
      user: "{{ vault_drupal_db_user }}"
      password: "{{ vault_drupal_db_password }}"
      name: drupal

    php_version: "7.0"
    drupal_account_name:  "{{ vault_drupal_account_name }}"
    drupal_account_pass: "{{ vault_drupal_account_pass }}"
  roles:
    - lakshminp.drupal

Download the code at this point.

The only thing left to do is to encrypt the secrets.yml file. We first set a vault password before running the encryption.

$ export VAULT_PASSWORD=POrmandecRyNoMEndUcHEiLe
$ ansible-vault encrypt secrets.yml

Our encrypted file now looks like this:

$ cat secrets.yml 
$ANSIBLE_VAULT;1.1;AES256
63326564336631643834623735613731356461633337373833656464376435313565393063383861
6164333035306164666263613236623638623237333265640a623234333235313238653364303863
# ... a long sequence of numbers.

We can safely check in this encrypted file in version control. The only thing left to do now is to make sure Ansible can read this encrypted file.

If we try to run the playbook, we will get an error.(Assuming that you didn't set the VAULT_PASSWORD env variable before running the playbook).

$ ansible-playbook playbook.yml
ERROR! Decryption failed on /home/lakshmi/ansible/d7/secrets.yml

We can mitigate it by asking ansible to prompt for the password,

$ ansible-playbook playbook.yml --ask-vault-pass

or even better, setting the VAULT_PASSWORD env variable before running the playbook.

Download the code at this point.

Spinning instances using Ansible

All this while, we were provisioning Drupal on an existing machine. Ansible allows us to create infrastructure and provision/configure on top of it. Though there are specialized tools to achieve this, I'll demonstrate how to do it using Ansible.

Before we jump into this, what is the rationale behind spinning infrastructure using Ansible?

  1. No assumptions are made about our infrastructure. Everything from creating a new VM down to the Drupal admin password is handled by a single system.
  2. Our whole infrastructure is codified into our version control. This is an industry-wide good practice and is termed as infrastructure as code.
  3. Everything is automated. We just need to run the playbook after tailoring our specifications and we can have a working setup ready within minutes.

Now that you are convinced, let's automate our infrastructure creation as well. We shall use DigitalOcean to spin our new inventory. Ansible has a digitalocean module which can help create DigitalOcean droplets based on our specification. In order to use this, we have to install the DigitalOcean API Python library.

$ sudo pip install dopy

The next thing to do is to generate a DigitalOcean API key and add the key as an environment variable.

DigitalOcean API key generation

$ export DO_API_TOKEN=3a5e32c759c073058ebd554e7e800c64ededed896c2cd7d782c02cbb015eca8f

We can use this key to make API calls.

$ curl --silent "https://api.digitalocean.com/v2/images?type=distribution&per_page=100" -H "Authorization: Bearer $DO_API_TOKEN" | python -m json.tool

Let's add a task to create a new droplet.

---
- hosts: localhost
  connection: local
  gather_facts: no
  tasks:
    - name: Create new Droplet.
      digital_ocean:
        state: present
        command: droplet
        name: ansible-drupal
        size_id: 1gb
        image_id: ubuntu-16-04-x64
        region_id: blr1
        ssh_key_ids: 10151022
        unique_name: yes
      register: drupal

How did we derive the values in the arguments above? We use the API and get them. For instance, we can get the SSH key IDs using this endpoint.

$ curl --silent "https://api.digitalocean.com/v2/account/keys" -H "Authorization: Bearer $DO_API_TOKEN" | python -m json.tool

Same with regions.

$ curl --silent "https://api.digitalocean.com/v2/regions" -H "Authorization: Bearer $DO_API_TOKEN" | python -m json.tool

You can check the DO API documentation for more details. If you noticed, we "register" the newly created droplet in a variable drupal. Previously we maintained a static inventory file which contained the IP address or hostname of the machine where Drupal needs to be installed. In this case, our inventory is dynamic. We will know the IP address of the newly created droplet only after running the playbook. We need to create the inventory file ad hoc. Ansible maintains dynamic inventory files for all popular cloud providers, which we can leverage for this.

$ wget https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/digital_ocean.py
$ chmod +x digital_ocean.py

We indicate that we will use this inventory file going forward, in ansible.cfg.

[defaults]
hostfile = digital_ocean.py

Finally, one last change. We need to create an inventory group from the newly created droplet. We add this as an ansible task in the playbook.

- name: Add new host to our inventory.
  add_host:
    name: "{{ drupal.droplet.ip_address }}"
    groups: drupal_group
    ansible_ssh_private_key_file: /home/lakshmi/.ssh/id_ansible_drupal
    ansible_python_interpreter: /usr/bin/python3
    ansible_user: root
  when: drupal.droplet is defined

We create a new drupal_group inventory group, add the newly created droplet to this group, along with some default variables, like which SSH key file to use, remote username etc. We can now install Drupal in this group using the lakshminp.drupal role.

- hosts: drupal_group
  gather_facts: no
  vars_files:
    - secrets.yml
# ...
  pre_tasks:
    - name: Wait for port 22 to become available.
      local_action: "wait_for port=22 host={{ inventory_hostname }}"
  roles:
    - lakshminp.drupal

There is a pre_tasks section which runs before all the tasks. Here, we wait for the machine to become available so that we can run the tasks in it. Let's give the new playbook a spin.

$ ansible-playbook playbook.yml
 [WARNING]: provided hosts list is empty, only localhost is available


PLAY [localhost] ************************************************************************************************************************************************************

TASK [Create new Droplet.] **************************************************************************************************************************************************
changed: [localhost]

TASK [Add new host to our inventory.] ***************************************************************************************************************************************
changed: [localhost]

PLAY [drupal_group] *********************************************************************************************************************************************************

TASK [Wait for port 22 to become available.] ********************************************************************************************************************************
ok: [139.59.68.236 -> localhost]

TASK [lakshminp.drupal : MySQL | Install MySQL] *****************************************************************************************************************************
changed: [139.59.68.236] => (item=[u'mysql-common', u'mysql-server', u'python3-mysqldb'])

TASK [lakshminp.drupal : Disallow root login remotely] **********************************************************************************************************************
changed: [139.59.68.236] => (item=DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1'))

...

RUNNING HANDLER [lakshminp.drupal : restart nginx] **************************************************************************************************************************
changed: [139.59.68.236]

RUNNING HANDLER [lakshminp.drupal : restart php-fpm] ************************************************************************************************************************
changed: [139.59.68.236]

PLAY RECAP ******************************************************************************************************************************************************************
139.59.68.236              : ok=26   changed=23   unreachable=0    failed=0   
localhost                  : ok=2    changed=2    unreachable=0    failed=0

Download the final code.

Where to go from here

We have done something substantial here. We have created our own infrastructure, installed a specific version of Drupal, specified our own config settings like user credentials, databases, and server configuration. From an Ansible/DevOps perspective, we've barely scratched the surface. There are a lot of places to foray into at this point:

  1. How to effectively test Ansible playbooks and roles.Resources
  2. Add SSL to your site/domain using Let's Encrypt.
  3. How to add your own custom site instead of Vanilla Drupal.
  4. How to run site updates periodically using Ansible.

If you are bogged down by so many details and just want to benefit from using Ansible to manage your Drupal setup, I suggest you take a look at DrupalVM and give it a try, which is a hundred times better and more battle-tested than this ansible role.