David Egan – David Egan

Checking Backup Integrity

If you’ve set up a backup solution for WordPress or other dynamic PHP websites, you will probably be backing up site files as well as the site database. For a proper backup solution, you need to check that the backup copy is viable.

You may have a copy of the site files, along with a (hopefully properly) dumped database, but unless you connect these up, how do you know that your backup copy is sound?

The integrity of your backups is not something that you should discover during an emergency recovery situation.

Manually rebuilding a working copy of a dynamic website is time consuming. For each site database, internal site URLs all relate to production domains, complicating the rebuild process. When you have a server backup with ten or twenty important client sites, verification of backups looks pretty daunting – and I suspect that a lot of people just don’t bother.

This article describes how to partially automate this process.

If you want to hack on this, get the files on GitHub.

Context

In our case, site files from the Apache root of a production server are backed up incrementally on a daily basis to a date-stamped directory. This contains:

A subdirectory html – which in turn contains a subdirectory for each site under the document root
A subdirectory sql which contains a collection of dumped databases for the sites in question

Important config files are also backed up, but that is beyond the scope of this article.

Production server backup procedure
Secure rsync between servers – describes consolidating backup copies on a central backup server
Rsync between backup server and local machine

This article assumes that the backup has been downloaded to a local machine.

Checking Integrity: Overview

To test the integrity of backed up sites, one option is to build working clones of the sites on a virtual machine. To avoid the need to change URLs on the backup copies, the /etc/hosts file is amended on the guest VM.

Obviously, the guest VM needs to run a server that broadly matches the original backed up server (in this case Apache), and the virtual hosts settings for the guest VM server need to be set up correctly (this is a one-time import from the backed-up config directory).

You don’t necessarily need to use a VM – you could use any machine on the local network. The reason this is done on a VM/separate machine is so that the main host computer can access the actual live sites for maintenance purposes.

This method also keeps seperation between backed up clones and ongoing development websites – which are two different things.

This article assumes that a backup archive is available. Building working copies involves:

One-time setup of a suitable Virtual Machine – in this case, a Ubuntu Xenial, Apache, MariaDB and PHP 7 LAMP stack
A one-time import of relevant database users to the VM
Exporting files from the Host machine backup archive to the Guest VM (run command in Host)
Importing databases in the Guest (run command in Guest)

Aims

Check the integrity of multiple site backups by building working local copies. This is achieved by:

Moving site files and databases for a backed-up production server from a host machine into a local virtual machine
Import MySQL/MariaDB databases and set up working sites on the VM

Backup integrity should be checked regularly, so this should be a simple process.

Ideally, once the system has been setup it should be run by administrators rather than developers.

Requirements

These BASH scripts have been tested on Ubuntu Xenial Xerus 16.04 Desktop.

Zenity is used to create user dialogues.

VirtualBox is required for the Virtual Machine. In this case, the VM runs Ubuntu 16.04 Xenial Xerus desktop – desktop rather than server because it allows easy checking of the moved sites. To achieve this, the guest machine hosts file (/etc/hosts) must be set up properly to point at the local copies.

The VM also runs Ubuntu 16.04 Desktop. The database server is MariaDB, but the commands would work on a standard MySQL database server.

The sql backups directory includes the performance_schema.sql, phpmyadmin.sql, mysql.sql and log files from the original server. These aren’t necessary to build clones from backups, and if imported will probably mess up the VM MySQL configuration. Because of this, we exclude these files from the transfer – see the sql-verification-exclude file in the linked repo for an example.

Note: For the backed up sites on the guest machine to work properly, the MySQL users from the original server should be imported in a one-time operation.

Move Files to the VM

This is achieved with the move-backups script. This script prompts the user to choose a directory to move. The script is tightly coupled to our requirements, but would be easy to amend.

The directory to be moved is a datestamped directory that contains the entire html directory (i.e. document root) from a backed-up Apache server. It also contains backed up MySQL files (originally created by mysqldump) in a sql directory.

Move Backups Script

Run on the Host computer.


#!/bin/bash
#
# Move a directory into a local Virtual Machine for testing purposes.
#
# This file should be executable and in your path. E.g.:
# - `mv move-backups /usr/local/bin`
# - `chmod +x /usr/local/bin/move-backups`
# - Run: `move-backups`
#
# Add username@network-IP for your VM in place of `david@192.168.1.145`
# Add root@network-IP for your VM in place of `root@192.168.1.145`
# ------------------------------------------------------------------------------
STORAGE=/media/david/storage/servername
SQL_DESTINATION=david@192.168.1.145:staging
HTML_DESTINATION=root@192.168.1.145:/var/www/html
VM=Xenial
SQL_EXCLUDE=/media/david/storage/sql-verification-exclude # rsync excludes are controlled in a file

cd $STORAGE;

# Select the Directory to move
# ------------------------------------------------------------------------------
zenity --info \
--text="Begin the build process for backup up client websites. Click \"OK\" to begin. Then select the date-stamped directory in the backup storage area."

SOURCE=`zenity --file-selection --directory --title="Select a Directory to Sync"`

case $? in
0)
echo "\"$SOURCE\" selected.";;
1)
echo "No file selected.";;
-1)
echo "An unexpected error has occurred.";;
esac

# Start the Virtual Machine - for headless, append --type headless
# ------------------------------------------------------------------------------
VBoxManage startvm "$VM" --type headless | zenity --progress \
--pulsate --width="320" --height="150" \
--text="Starting $VM Virtual Machine" \
--title="Please Wait while $VM is started" --auto-close

# Sync HTML directories INDIVIDUALLY
# ------------------------------------------------------------------------------
HTML_DIRS=($SOURCE/html)

# Loop through Directories only
for DIR in $HTML_DIRS/*/; do

# For our rsync setup, the source directory MUST NOT have a trailing slash -
# so that if the directory doesn't exist, it will be created.
SOURCE_DIR=${DIR%/}

# basename of the $DIR - used as the destination directory, under `/var/www/html`
DEST_DIR= $(basename $DIR)

rsync -azv --progress --delete $SOURCE_DIR $HTML_DESTINATION/$DEST_DIR | zenity --progress \
--pulsate --width="320" --height="150" \
--text="Syncing the HTML directory: $SOURCE_DIR" \
--title="Please Wait" --auto-close

done

# Sync SQL backups to a staging directory
# ------------------------------------------------------------------------------
rsync -azv --exclude-from=$SQL_EXCLUDE --progress --delete $SOURCE/sql/ $SQL_DESTINATION/sql | zenity --progress \
--pulsate --width="320" --height="150" \
--text="Syncing the SQL directory" \
--title="Please Wait" --auto-close

# Tidy up
zenity --question \
--text="Sync complete. Do you want to shut down the VM?"

case $? in

0)
echo "0"
# Close up the VM, maintain state
VBoxManage controlvm $VM savestate | zenity --progress \
--pulsate --width="320" --height="150" \
--text="Shutting down $VM Virtual Machine" \
--title="Please Wait while VMs are Saved" \
--auto-close
zenity --info\
--window-icon="info" \
--text="The VM $VM has been shut down."
echo "$vm was closed to a saved state"

;;

1)
echo "1"
zenity --info\
--window-icon="info" \
--text="Your VM $VM is Running - though it may be in a headless[GitHub repo with scripts](https://github.com/DavidCWebs/check-backups).

-1)
echo "An unexpected error has occurred."
;;

esac

Usage:

Add move-backups to usr/local/bin on the Host computer: mv move-backups /usr/local/bin
Make executable: chmod +x /usr/local/bin/move-backups
Run move-backups in a terminal and follow instructions

When prompted, you should select a directory that contains the backed-up html directory from the Apache doc root – the directory that is normally located at /var/www/ in a standard Apache setup.

Note that the moved files won’t do anything unless you also import the associated databases on the guest machine.

Import Databases

Add import-databases to usr/local/sbin on the Guest computer/VM: mv import-databases /usr/local/sbin
Make executable: chmod +x /usr/local/sbin/import-databases
Run sudo import-databases in a terminal on the Guest VM


#!/bin/bash
#
# The purpose of this script is to import databases so that working copies of
# backed up PHP/WordPress websites can be quickly and easily checked.
#
# The script loops through all databases in a staging directory and imports them
# into MySQL/MariaDB. Existing databases having the same name will be overwritten.
#
#-------------------------------------------------------------------------------

SOURCE=/home/david/staging/sql
PASSWORD=thenicelongpassword
DATABASES=($SOURCE/*)

for (( i = 0; i < ${#DATABASES[@]}; i++ )); do

# The file extension - in our case, there are *.log files that should be ignored
EXT=${DATABASES[$i]#*.}

if [[ "sql" == $EXT ]]; then

DB_SOURCE=${DATABASES[$i]}
DB_NAME=$(basename ${DATABASES[$i]} .sql)

# If a Databse exists with this name, DROP it
mysql --user=root --password=$PASSWORD -e "DROP DATABASE IF EXISTS \`$DB_NAME\`"

# Create new DB with the name of the DB backup file
mysql --user=root --password=$PASSWORD -e "create database \`$DB_NAME\`; GRANT ALL PRIVILEGES ON \`$DB_NAME\`.* TO root@localhost IDENTIFIED BY '$PASSWORD'"

# Import the Database
mysql --user=root --password=$PASSWORD $DB_NAME < $DB_SOURCE

fi

done

# Set proper ownership of site files
chown -R www-data /var/www/html/*

TODO

These scripts are a good start, and allow us to build and check backup copies quite easily. There is room for further automation – ideally we’d like the process to be fully automated, integrated naturally into the backup process.

Our setup includes passwordless SSH keys which allows for easier rsync’ing, and this has not been documented.

Other enhancements might include:

Prevent selection of the ‘wrong’ backup directory
Auto creating the staging directory for the sql files transfer
Trigger the `import-databases` script from the host, so working copies are built with a single command
Better feedback on the `import-databases` script (there’s none at the moment!)
Document how to import users from original server to the guest machine

Resources

Filtering the WordPress Menu

You can filter both the <li> tag and the contained anchor tag in a WordPress menu using the ‘nav_menu_css_class’ and ‘nav_menu_link_attributes’ filters.

The ‘nav_menu_link_attributes’ filter is not well documented – but very useful nonetheless.

This example shows how to add the required classes for a Bootstrap 4 menu markup – in which the <li> requires the class nav-item and the anchor tag requires the class nav-link:

<?php
add_filter( 'nav_menu_css_class', function($classes) {
$classes[] = 'nav-item';
return $classes;
}, 10, 1 );

add_filter( 'nav_menu_link_attributes', function($atts) {
$atts['class'] = "nav-link";
return $atts;
}, 100, 1 );

Prevent User Enumeration in WordPress

We recently had to deal with a hacking attempt against a client WordPress site that had a few interesting aspects. The fix involved additional .htaccess rules to block user enumeration.

We experienced multiple failed login attempts against WordPress. This wasn’t particularly worrying – the originating IP address was automatically blocked by our Fail2Ban setup after three unsuccessful attempts. The attacker (probably a script) then switched IP address and repeated the process, trigerring a further ban and repeating the cycle. The attack lasted for approximately 10 minutes and triggered more than 50 bans.

The attack focused on usernames that were very close to (but not actually the same as) actual usernames on the site. It looked like a partially-successful user-enumeration attempt made up the initial phase of the attack. Puzzlingly, only some usernames had been enumerated.

User Enumeration

User Enumeration is when would-be attackers collect usernames by interacting with your app. Unfortunately, by default WordPress makes this process easy. Entering http://example.com/?author=1 in the browser will trigger display of all articles authored by the user with an ID of ‘1’ – along with their registered username. This provides would-be attackers with a toe-hold – they can attempt to log in to valid usernames rather than having to guess.

Our usual setup involves user-enumeration prevention measures – so it was surprising to see (almost valid) usernames cropping up in the log.

It turns out that our user-enumeration prevention relied on ‘redirect_canonical’ WordPress filter. This filter is triggered if you navigate to http://example.com/?author=1 – in this case, it performs a redirect to the Author archives for the author with an ID of 1.

The problem: If a registered user on the site has not authored any articles, the redirect will not take place. The user does not have an archive, the redirect doesn’t take place, and the user-enumeration can proceed.

In our case, the enumerated users had a custom membership role rather than an author role – so they will never have an archive page. In our context, these are pretty low risk users, with very few permissions on the site. Nevertheless, it’s a pain having to check when these attacks occur, and it places unecessary load on the server.

We verified the partially successful enumeration attempt by doing some penetration testing using WPScan – this turned up the exact “usernames” that were tried during the hack attempt.

The solution involved extra .htaccess rules to prevent user-enumeration. We also added some extra rules to block login attempts using the enumerated (incorrect) usernames – just in case the attacker is logging them for future usage.

.htaccess Rule to Prevent User enumeration


RewriteEngine On
%{REQUEST_URI} !^/wp-admin [NC]
RewriteCond %{QUERY_STRING} author=\d
RewriteRule (.*) $1? [L,R=301]

Explanation

Line One

Turn on rewriting functionality – the Apache mod_rewrite module must be installed on the server. This module rewrites requested URLs on the fly by means of a rule-based rewriting engine. The rewrite engine is based on a Perl Compatible Regular Expressions(PCRE) parser.

Line 2

Apply a rewrite condition such that the rule will be ignored if the REQUEST_URI begins with /wp-admin.

REQUEST_URI

The path component of the requested URI, such as “/index.html”. This notably excludes the query string which is available as its own variable named QUERY_STRING. — Apache mod_rewrite Docs

REQUEST_URI in simple terms is the bit after your domain.

The author=\d string that we’ll use to match the user enumeration attempt is used legitimately to display author posts in back end – so the rewrite rule should not apply if the request takes place in the WordPress admin area.

Line 3

Specify the rewrite condition – the target query string must include 'author=\d', where \d means a single digit.

This means that http://example.com/?dummy&author=1 will trigger the rewrite, as will http://example.com/?dummy&author=100 – provided we’re not in the admin area, as specified by the previous condition.

Note that the rule doesn’t specify that the ‘author’ variable is at the start of the query string (e.g. ^/?author=([0-9]*) – a query string that starts with /?author= followed by any number of digits).

Line 4

The rewrite rule: replace the entire path (.*) with itself $1 but with an empty query string ?.

Make this the last rule and specify that it is a permanent redirect [L,R=301].

TLDR: .htaccess Rules

Add these rules to .htaccess to prevent all malicious user-enumeration attempts. Such attempts will redirect to the site home page:



RewriteEngine On
RewriteCond %{REQUEST_URI} !^/wp-admin [NC]
RewriteCond %{QUERY_STRING} author=\d
RewriteRule (.*) $1? [L,R=301]

Note that preventing user-enumeration is only one component of an effective security policy.

References

Parse YAML in PHP Using Symfony YAML

Convert data in YAML format into a PHP array.

I do a lot of work in WordPress. I also build a lot of static websites – both for rapid design in-the-browser and as a low-cost small-business website solution. I mainly use the excellent Jekyll static site generator.

Jekyll uses YAML for config and data files (it can also use CSV format, but that’s another story). WordPress doesn’t use YAML.

I like YAML because it is very human friendly – the whole team (including non-developers) can easily build YAML config files in a way that you’re not going to see with formats like JSON or XML. This is an example of a YAML array used to create Javascript variables for use in a Google map:


# Map centre Latitude & Longitude
# ------------------------------------------------------------------------------
latitude: 52.7157856867271
longitude: -8.8741735070805

zoom: 15
#height: 548px

# Set custom colour variables
# ------------------------------------------------------------------------------
waterColour: "#398A8D"
landColour: "#dec7c7"
mainRoadColour: "#777777"
minorRoadColour: "#a9a9a9"

# A nested array
# ------------------------------------------------------------------------------
test:
- One
- Two
- Three

In the context of Jekyll, you could place this data in a data file – e.g./_data/map.yml – writing Javascript variables into <head> something like this:


<script>
var cwCentre = {
latitude:{{ site.data.map.latitude }},
longitude:{{ site.data.map.longitude }},
zoom:{{ page.map-zoom }},
mainMarker:"{{ site.baseurl}}/{{ site.data.map.mainMarker }}",
secondaryMarker:"{{ site.baseurl}}/{{ site.data.map.secondaryMarker }}",
waterColour:"{{ site.data.map.waterColour }}",
landColour:"{{ site.data.map.landColour }}",
mainRoadColour:"{{ site.data.map.mainRoadColour }}",
minorRoadColour:"{{ site.data.map.minorRoadColour }}",
title: "{{ site.data.map.title | escape }}",
description:'{{ map_description | markdownify | strip_newlines }}',
};
{% if "multi-centre" == page.map-type %}
var markers = [
{% for location in site.data.secondary-map-coords %}
[
'{{ location.name | escape }}',
{{ location.latitude }},
{{ location.longitude }},
'{{ location.description | escape }}'
]
{% unless forloop.last %},{% endunless %}
{% endfor %}
];
{% endif %}
</script>

YAML in WordPress

I recently needed to convert a Jekyll site to a WordPress theme. Moving the map config settings required parsing YAML data into a PHP array. Fortunately this can be achieved pretty easily thanks to the Symfony YAML component.

I’m a recent convert to Composer, and find it amazingly powerful. You can add the Symfony YAML component with a single composer command.

Add Symfony/YAML Using Composer


composer require symfony/yaml

When you run this, composer will add a new `symfony/yaml` directory under the project ‘vendor’ directory. It will also add the relevant namespace to the ‘autoload_psr4.php’ file, so that the new class will be autoloaded.

Using the YAML parser

To read the YAML contents of the config fields into a PHP array:


<?php
use Symfony\Component\Yaml\Parser;

$yaml = new Parser();

$value = $yaml->parse( file_get_contents( get_template_directory() . '/assets/map.yml' ) );

For the YAML content presented above, the following will be output:


$value = array (
'latitude' => 52.715785686727102,
'longitude' => -8.8741735070804992,
'zoom' => 15,
'waterColour' => '#398A8D',
'landColour' => '#dec7c7',
'mainRoadColour' => '#777777',
'minorRoadColour' => '#a9a9a9',
'test' => array (
0 => 'One',
1 => 'Two',
2 => 'Three',
),
);

This array can be passed to wp_localize_script() when enqueuing the map script.

The WordPress/PHP way would be to collect such data from a form on an admin page, storing the data in the wp_options table. However taking variables from YAML files can be a good way to quickly port settings, which might even be used as defaults. It might also be a good way to configure certain project settings.

Semantic Bootstrap Layout Classes with an Offset

Making CSS classes descriptive, or semantic, can help improve code maintainability by describing an elements purpose, rather than it’s presentational function.

People level this as a criticism against Bootstrap (and other CSS frameworks) – where column names are presentational rather than semantic. In other words, in a typical Bootstrap project the main content area might have a class name like .col-md-8, which is not semantic.

Fortunately, Bootstrap makes it pretty easy to define semantic classes that apply the built-in presentational logic, by use of the make-x-column()mixins.

SAGE – EXAMPLE OF SEMANTICALLY DEFINING CLASSES

The Sage WordPress starter theme (a great starting point for WordPress projects) defines a .main and a .sidebar class out of the box.

The column widths for these elements can then be set in a _variables.scssfile.

I’ve modified the Sage definition of .main and .sidebar to add an offset to the sidebar, using the additional Bootstrap mixin make-sm-column-offset().

I find the offset really good for user experience – constraining the main content area generally makes for greater readability, and the extra whitespace between content and sidebar can reduce the sensation of clutter.



// Grid system

.main {

// No sidebar, `.main` is full width
@include make-sm-column($main-sm-columns);

// `.main` is contained by the `.sidebar-primary` parent class -

// this class is added to the <body> of pages that display the sidebar.

// If a sidebar displays, `.main` can take a reduced width.

.sidebar-primary & {

// `.main` is narrower by 2 columns, to give room for the offset
@include make-sm-column($main-sm-columns - $sidebar-sm-columns - 2 );

}

}

.sidebar {

// Set the width, then the offset, using Bootstrap mixins
@include make-sm-column($sidebar-sm-columns);
@include make-sm-column-offset(2);

}

RESOURCES

Checking Bash Scripts

I found this very useful resource that allows you check bash scripts online:

http://www.shellcheck.net/

I’ve been writing quite a few bash scripts lately – amongst other things, I find them useful for running automatic backups and scaffolding out projects from a Github repo starting point.

The tool helps with:

…typical beginner and intermediate level syntax errors and pitfalls where the shell just gives a cryptic error message or strange behavior, but it also reports on a few more advanced issues where corner cases can cause delayed failures.

http://www.shellcheck.net/about.html

It is certainly helping me to improve my bash scripting.

Linter Shellcheck in Atom

You can set up Shellcheck as an Atom package.

In Ubuntu:


# Install shellcheck on your system
sudo apt-get install shellcheck

# Install Base linter for Atom
apm install linter

# Install shellcheck
apm install linter-shellcheck

You’ll need to restart Atom.

Resources

Apollo Images

Public domain image archive of the Apollo missions

As a kid I was obsessed with the Apollo missions – it’s great to see that NASA have made the images available on Flickr – with no copyright restrictions.

Continue reading “Apollo Images”

Adding a .htaccess File to a Jekyll Site

How to have Jekyll build a .htaccess file into your project

If you serve your site with Apache, adding a .htaccess file to your document root allows fine control over access permissions.

Amongst other things, .htaccess rules can set:

In-browser caching
Access – you could allow/disallow access from certain IP addresses or ranges
Redirects
Rewrites

You can also prevent modification of code over 3G on some European providers (I’ve experienced UK providers in particular totally mangling site styles).

When setting up WordPress sites, I would typically lock down access to the entire admin area by IP address as a security measure. While this isn’t necessary for Jekyll sites (where there is no login), .htaccess rules can be a useful way of controlling how your site resources are cached. This has the potential to speed up site loading times.

.htaccess in Jekyll

Update: by default, Jekyll includes .htaccess files – so explicitly including your .htaccess file is unnecessary.

By default Jekyll excludes dotfiles – but you can easily override this behaviour.

In the project config.yml (or config_prod.yml if you have a production environment config file), add this line:


include: ['.htaccess']

Jekyll will now build the .htaccess file – which is much more convenient than editing the file on the server.

Create .htaccess file in the root of your Jekyll project.

When you build the project, the .htaccess file will be included in the project root.

Sample .htaccess for Cache Control

The following .htaccess directive tells browsers to use cached content for the specified files. Just add the directive to the project .htaccess file:


# Set browser caching
# ------------------------------------------------------------------------------
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access 1 year"
ExpiresByType image/jpeg "access 1 year"
ExpiresByType image/gif "access 1 year"
ExpiresByType image/png "access 1 year"
ExpiresByType text/css "access 1 month"
ExpiresByType text/html "access 1 month"
ExpiresByType application/pdf "access 1 month"
ExpiresByType text/x-javascript "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType image/x-icon "access 1 year"
ExpiresDefault "access 1 month"
</IfModule>
# End caching block

Hide Git Repos on Public Sites

Don’t provide would-be hackers with your source code!

If you use git to manage projects, you should be careful about publicly disclosing your repo – this would give would-be hackers access to your entire source code.

Continue reading “Hide Git Repos on Public Sites”

WordPress Debugging

On-screen debugging output, writing errors to a log file, real time monitoring of the debug log

In general, error logging should be enabled in the development environment and disabled in production environments.

To enable WordPress error reporting to the browser, and to enable error logging to file, add the following lines to wp-config.php:

<?php
// Enable error reporting output to browser. Default value is false.
 define( 'WP_DEBUG', true );
// log errors to `/wp-content/debug.log`. Useful when debugging code that does not output to browser.
define('WP_DEBUG_LOG', true);

Continue reading “WordPress Debugging”

Context

Checking Integrity: Overview

Aims

Requirements

Move Files to the VM

Move Backups Script

Import Databases

TODO

Resources

User Enumeration

.htaccess Rule to Prevent User enumeration

Explanation

Line One

Line 2

Line 3

Line 4

TLDR: .htaccess Rules

References

YAML in WordPress

Add Symfony/YAML Using Composer

Using the YAML parser

SAGE – EXAMPLE OF SEMANTICALLY DEFINING CLASSES

OFFSET SIDEBAR

RESOURCES

Linter Shellcheck in Atom

Resources

.htaccess in Jekyll

Sample .htaccess for Cache Control

Follow Me

My Tweets