Rewriting git committer information

There are situations, when the committer information of a git repo must be rewritten. For instance, a foreign repo is to be migrated into an environment, that has strict(er) rules for committer information to contain names/surnames and proper e-mail addresses, rather than nicknames and user@workstation.local-type of e-mail addresses.

If a repo has small history, that may be possible manually, commit-by-commit, but in many cases it might be too irrational, especially, if the commits are in the hundreds or more.

Fortunately, it is possible to do this automatically, with the help of a native git functionality within a script, coupled with a mapping file, which maps the improper committer e-mail addresses to proper names, surnames and e-mail addresses.

To start, a mapping would be needed, so that the script would know, which e-mail addresses to replace with which names, surnames and e-mail addresses. It is a simple plain-text file with specific format.

First, to get a list of all unique committers (and count of their commits) in repo history, git shortlog can be used:

git shortlog -sne

This would output a list similar to this:

   123  Name Surname <zing@zingg.me>
    91  nickname <hey@me.local>
    10  w00t <w00t@w00t.w00t>
   ...

To rewrite these inconsistent names in git repo history with proper names, this list would need to be converted to a mapping file like this:

Name Surname name.surname@domain.com zing@zingg.me
Nick Name nick.name@domain.com hey@me.local
Wo Ot wo.ot@domain.com w00t@w00t.w00t

Email addresses from the history are used to identify users (4th field). It is required to also add a proper name (1st field), surname (2nd field) and email address (3rd field) into the mapping file. The mapping file will be used by a conversion script, which would allow rewriting all matching commits in git history.

If using a centralized solution for git repo management (e.g. Bitbucket), it might be using a commit checker (e.g. Yet Another Commit Checker), to ensure that the committer information matches the user, that is pushing the changes. In this case, as all the commits in this imported repo might not be from the same user, this commit checker should be disabled for the repo in question, to enable pushing of the changes.