Intro

One of the most important components of a robust infrastructure is an automated backup solution. Typically this consists of both onsite and offsite backup solutions, both of which are separate host machines from the one or ones that are being backed up. One of the things I like most about Proxmox is the companion backup solution called Proxmox Backup Server or PBS. This works great and allows automatic backups but at the cost of another server running and sipping power. In a production environment, the power draw and associated costs for one additional sever are negligible. However in my home lab, and others like it, an additional 200W of power draw 24/7 works out to almost $50/month which is a lot for an appliance that is only utilized a few times a week. With this in mind, I started down the path of configuring my backup solution to be warm storage for power and cost savings.

What are Storage Temperatures?

Storage temperatures (hot and cold) are used to describe the latency to access data on a storage device. Hot means the data can be accessed immediately and frequently like an SSD with a high bandwidth link. Cold means the data can only be accessed slowly and infrequently like an external HDD of old archival storage. Often cold also means the storage is durable, not networked, and located offsite for security purposes. For my onsite backup application, I want an in-between solution or warm storage. I know the exact times each week that I need the storage to be accessible for backup jobs and I also don't mind having several minutes of latency to access the data. The goal with warm storage is to keep the backup server powered off until a backup job is scheduled or the data needs to be manually accessed.

Implementation

Overview

Great, so how do we implement a solution to automatically power on/off the server based on backup jobs? Well, there are many ways to skin this cat but I chose to start with the simplest solution. Let's break down what I want to have happen:

  1. The backup server is powered on a few minutes before a backup job starts.
  2. The backup job starts and takes some time to complete.
  3. The backup job completes and the backup server is gracefully shut down.
  4. This behavior repeats for every scheduled backup job automatically.

Let's go over how to implement each one of these behaviors.

To power on the server, I will be using IPMI Tool to send a power-on command. This is most reliable for my setup and my Dell PowerEdge server supports it. Alternatively, you can use wake-on-lan (WOL) if your server does not have an IPMI interface.

Once the server is powered on, for simplicity, I was planning to wait a predetermined amount of time which slightly exceeds the time it takes for the backup job to complete. However, it turns out that the backup time is highly variable depending on how much of the image has changed (dirty) since the last backup job. Due to this, I found a way to detect when the backup job is completed by watching journalctl logs for the backup completion log message.

To shut down the server, IPMI can not be used (even the soft shutdown) since this does not result in a truly graceful shutdown. This is atleast true with my server model (Dell PE R730XD) and OS (Proxmox PBS). Instead, I use SSH and run a shutdown command to initiate a graceful shutdown from within PBS.

Scheduling and executing these commands are performed with a systemd service and timer which calls a bash script. I matched the time and days of the week I scheduled backup jobs in Proxmox with the systemd timer plus a few minutes of buffer for the backup server to boot.

Technical Details

Two dependancies (ipmitool and sshpass) are required for the bash script to wake and shutdown the backup server and can be installed with the following commands:

apt install ipmitool
apt install sshpass

Below is the simple bash script (/root/backup_power.sh) I made to power on the server, wait for the backup job to complete, and then gracefully power down the server. This script is on my main Proxmox server that schedules the backup jobs and also runs 24/7.

#!/bin/bash

# backup_power.sh
# A script called by systemd to wake pbs server
# over ipmi before backup job starts and shutdown
# after backup job completes

# Power on server once called by systemd timer:
echo "Powering on PBS"
ipmitool -I lanplus -H <server-ip> -U <user> -P <pass> chassis power on

# Set variables equal to number of backup completions today so far
backup_finished_count_initial=$(journalctl --since today | grep "Backup job finished" | wc -l)
backup_finished_count=$(journalctl --since today | grep "Backup job finished" | wc -l)

echo "Waiting for backup job completion"

# Wait for backup to complete and check every second
while [ $backup_finished_count -eq $backup_finished_count_initial ]; do
  backup_finished_count=$(journalctl --since today | grep "Backup job finished" | wc -l)
  sleep 1s
done

echo "Detected backup job completion, powering off PBS in 1 minute"
sleep 1m  # Wait one minute before graceful shutdown

# Gracefully power off server after backup job is complete
echo "Gracefully powering off PBS"
sshpass -p "<pass>" ssh -o StrictHostKeyChecking=no root@<server-ip> shutdown

Fill in your server-ip, ipmi user and pass, and ssh user and pass and don't forget to make the script executable with:

 chmod +x backup_power.sh 

Below is the systemd service (/etc/systemd/system/backup_power.service) which executes the above bash script.

[Unit]
Description=Service to power on and off PBS server for backup jobs
Wants=backup_power.timer

[Service]
Type=oneshot
ExecStart=/root/backup_power.sh

[Install]
WantedBy=multi-user.target

The service is enabled with the following.

systemctl enable backup_power.service

Below is the systemd timer (/etc/systemd/system/backup_power.timer) which fires the above service on my backup days (Monday and Friday) 5 minutes before the backup job begins (5:55 AM).

[Unit]
Description=Calls backup_power service on backup days 5 minutes before backup job starts.
Requires=backup_power.service

[Timer]
Unit=osmium_backup_power.service
OnCalendar=Mon,Fri *-*-* 5:55:00
AccuracySec=10s

[Install]
WantedBy=timers.target

You'll want to enable and start the timer with the following after creation:

systemctl enable timer backup_power.timer
systemctl start timer backup_power.timer

This will also start the service and run your script for the first time. Make sure it is functioning as expected. Check the status of the service and timer with the following.

systemctl status backup_power.service
systemctl status backup_power.timer

Check the logs for more detailed debugging with

journalctl

or search for specifics in the logs with

journalclt | grep <string-to-search>

Hopefully your timer and service run without issue and you now have automatic power on and power off of your now warm storage backup server.

Final Thoughts

This solution has worked well for my needs and was very simple to get running. One improvement I would like to add is scheduled ZFS pool scrubbing once a month or so to make sure the disks are healthy. Also, adding a 10GBE fiber link between the servers would make the backup jobs on the order of a few minutes. Well, that's it for this project, I hope this was useful for some people setting up their backup solution. Hopefully, this saves you some power and money as it has for me.

An avid engineer with more projects and ideas than time.