Synchronize directories with Ansible
Introduction
Ansible has a synchronize
module, which is a wrapper around rsync
to synchronize 2 directotries.It does not provide the full rsync
functionality, but gets the job done.
However, if you had sudo
with a password on the remote host, and wanted to use the synchronize
module, it might become tricky. From the synchronize
module documentation:
Currently, synchronize is limited to elevating permissions via passwordless sudo. This is because rsync itself is connecting to the remote machine and rsync doesn’t give us a way to pass sudo credentials in.
This situation may have workarounds/hacks to make it work, which I will leave for Github and StackOverflow, and not duplicate them here, however, I will note another alternative way to synchronize 2 directories with Ansible, using other modules, which have no problem with sudo
and passwords.
Another way
In this particular use case, we were interested in keeping a directory up-to-date - not only adding new and modifying existing files, but also making sure we remove any files from the remote system, that were not present locally anymore.
One way, of course, is to remove or empty the whole target directory on the remote system or just remove the password requirement, but that might not always be the safest way or even possible to do it.
With that in mind, another way to accomplish the goal was devised, with the help other Ansible modules, namely - copy
, find
and file
.
First off, we copy all of the local files to remote target. A simple copy task
:
- name: update files on remote host
copy:
src: "{{ role_path }}/files/rules-files/"
dest: "/target/rules"
This takes care of the new and updated files. However, there still might be files on the remote system, which we do not need to be there anymore, because we have removed them locally.
To make the same changes remotely (remove these files), we would need to compare local and remote directories, somehow figure out, which files need to be removed and remove them.
We start with getting a list of the files in question, both - from local and remote systems:
- name: get a list of local files
find:
paths:
- "{{ role_path }}/files/rules-files"
patterns:
- "*.rules"
file_type: file
delegate_to: localhost
register: local_rules
- name: get a list of remote files
find:
paths:
- /target/rules
patterns:
- "*.rules"
file_type: file
register: remote_rules
A side note - both of these variables should be defaulted to an empty list, in case any of the directories are empty. That should go somewhere in defaults, probably:
local_files: []
remote_files: []
Next, the actual file names are extracted, so we could compare them, since the full paths, returned by find
module, are different:
- name: prepare a list of local rules files
set_fact:
local_filenames: "{{ local_filenames + [item.path | basename] }}"
with_items: "{{ local_rules.files }}"
- name: prepare a list of remote rules files
set_fact:
remote_filenames: "{{ remote_filenames + [item.path | basename] }}"
with_items: "{{ remote_rules.files }}"
We can now compare these 2 lists to find out, which filenames are present on the remote, but not locally. We will use one of the Ansible set theory filters I wrote about earlier:
- name: prepare a list of files not present locally anymore
set_fact:
absent_filenames: "{{ absent_filenames | default([]) +
remote_filenames | difference(local_filenames) }}"
With that, we should have a list of files, which should be removed from the remote system:
- name: remove locally absent files from remote
file:
path: "/target/rules/{{ item }}"
state: absent
when:
- absent_filenames | length > 0
with_items: "{{ absent_filenames }}"
Conclusion
The initial version of this used MD5 hashes of the files to compare them, but that approach would not work, if the file would be renamed locally. In that case there would be 2 identical files on the remote with different names, which would be unacceptable.
As it often is, there is more than one way to do something. This approach allows to use synchronization-like workflow in an environment, where you have to use a password for sudo
, and the use of Ansible synchronize
module becomes less likely to be successful.