Synchronize directories with Ansible

Introduction

Ansible has a synchronize module, which is a wrapper around rsync to synchronize 2 directotries.It does not provide the full rsync functionality, but gets the job done.

However, if you had sudo with a password on the remote host, and wanted to use the synchronize module, it might become tricky. From the synchronize module documentation:

Currently, synchronize is limited to elevating permissions via  passwordless sudo.  This is because rsync itself is connecting to the  remote machine and rsync doesn’t give us a way to pass sudo credentials  in.

This situation may have workarounds/hacks to make it work, which I will leave for Github and StackOverflow, and not duplicate them here, however, I will note another alternative way to synchronize 2 directories with Ansible, using other modules, which have no problem with sudo and passwords.

Another way

In this particular use case, we were interested in keeping a directory up-to-date - not only adding new and modifying existing files, but also making sure we remove any files from the remote system, that were not present locally anymore.

One way, of course, is to remove or empty the whole target directory on the remote system or just remove the password requirement, but that might not always be the safest way or even possible to do it.

With that in mind, another way to accomplish the goal was devised, with the help other Ansible modules, namely - copy, find and file.

First off, we copy all of the local files to remote target. A simple copy task:

- name: update files on remote host
  copy:
    src: "{{ role_path }}/files/rules-files/"
    dest: "/target/rules"

This takes care of the new and updated files. However, there still might be files on the remote system, which we do not need to be there anymore, because we have removed them locally.

To make the same changes remotely (remove these files), we would need to compare local and remote directories, somehow figure out, which files need to be removed and remove them.

We start with getting a list of the files in question, both - from local and remote systems:

- name: get a list of local files
  find:
    paths:
      - "{{ role_path }}/files/rules-files"
    patterns:
      - "*.rules"
    file_type: file
  delegate_to: localhost
  register: local_rules

- name: get a list of remote files
  find:
    paths:
      - /target/rules
    patterns:
      - "*.rules"
    file_type: file
  register: remote_rules

A side note - both of these variables should be defaulted to an empty list, in case any of the directories are empty. That should go somewhere in defaults, probably:

local_files: []
remote_files: []

Next, the actual file names are extracted, so we could compare them, since the full paths, returned by find module, are different:

- name: prepare a list of local rules files
  set_fact:
    local_filenames: "{{ local_filenames + [item.path | basename] }}"
  with_items: "{{ local_rules.files }}"

- name: prepare a list of remote rules files
  set_fact:
    remote_filenames: "{{ remote_filenames + [item.path | basename] }}"
  with_items: "{{ remote_rules.files }}"

We can now compare these 2 lists to find out, which filenames are present on the remote, but not locally. We will use one of the Ansible set theory filters I wrote about earlier:

- name: prepare a list of files not present locally anymore
  set_fact:
    absent_filenames: "{{ absent_filenames | default([]) +
                          remote_filenames | difference(local_filenames) }}"

With that, we should have a list of files, which should be removed from the remote system:

- name: remove locally absent files from remote
  file:
    path: "/target/rules/{{ item }}"
    state: absent
  when:
    - absent_filenames | length > 0
  with_items: "{{ absent_filenames }}"

Conclusion

The initial version of this used MD5 hashes of the files to compare them, but that approach would not work, if the file would be renamed locally. In that case there would be 2 identical files on the remote with different names, which would be unacceptable.

As it often is, there is more than one way to do something. This approach allows to use synchronization-like workflow in an environment, where you have to use a password for sudo, and the use of Ansible synchronize module becomes less likely to be successful.