Git submodules are somewhat of an ‘advanced’ git feature, akin to Subversion
externals for those of us unlucky enough to have the pleasure of knowing svn.
The most common usage is pulling in third party libraries to your project. You
can think of git submodules as git checkouts within checkouts. The ‘parent’
checkout or ‘super-project’ as I call it, knows that there are submodules, and
knows which SHA each submodule is at. Since most of my work as both a Systems
Engineer at $dayjob and a general IT consultant part-time, I needed to use
submodules slightly differently than the most common use-case.
During some time off during 2011 from having a dayjob, I developed a lot of
Puppet code that used what was at the time new features like Parameterized
classes, hashes, etc. I also pretty painstakingly made sure the code would
work on OpenBSD, Debian, Ubuntu, Red Hat, and CentOS. The plan was to keep
this code private and charge people for the service of supporting this set of
modules and develop new ones etc. So, I needed a way to let people pull this
code into their Puppet repo and use it. Submodules are the only easy way to do
this. This code, along with some scripts to set up an environment utilizing
this code was what I called the GRand Unified Modular Puppet System (GRUMPS).
First, not to go too far off track, but if you are pulling puppet modules
on the net, you should already be aware of git submodules. If you are down
loading modules then just copying them into your setup you’re doing it wrong.
Puppet code is code. You need to treat improvements to public modules the
same way you treat any open-source project. You have your own local branch,
submit fixes to upstream etc.
Now let’s talk about Puppet environments. You should at the very least have
two environments. The name of the one that isn’t production probably doesn’t
matter much, but we’ll call this staging, since that’s what I use in the code
snippets later on. Many people will make these git checkouts under /etc/puppet
that they run git pull on to update. This is an atrocious update mechanism, and
a poor development layout.
Instead, what I’ve done is to make each sub-directory of each Puppet env correspond
to specific branches in various sub-modules. Example:
“Develop” was a pre-existing convention I had for branch naming, but that’s
really irrelevant, make it what you want. Now this shows I have 8 submodules
for my two environments. I actually have one more for the SSL dir, but that
is off-topic really. Some of the code samples may reference develop or master.
To get all this setup you will need to run git submodule add from the root of
your checkout for each submodule, and you will also need to run git submodule
init, which adds the added submodules to the .gitmodules file. Totally not
confusing :-).
The 1000-foot view of the workflow after setup is complete:
Team members committing like it’s April 29 1992 on develop
Everybody’s seeing each other’s work, reviews it
Merge every sub-module’s develop branch to merge
Update super-project
Push and Deploy
Another, more common workflow for me is:
Committing like fire on develop
Reviewing work
Some guy runs up to your desk and needs XXX fixed in production
O NO! you have about 20 commits that can’t go to prod
Relax, we use modern version control, Cherry pick change(s)
Update super-project
Push and Deploy
You’re thinking “this is just a bunch of overhead”. Well yea if I didn’t
script all this monotony away from me I would say the same thing. But one
thing I’ve learned from heavy git sub-module usage, is that if you don’t
script all these things, you will forget a step and break something. You’re
also constantly repeating yourself, since you always require N+1 changes for
N changes (you must update the git super-project, which updates the SHA’s
found in .gitmodules).
So, here is the script for committing a single change, which may be across
several submodules. Please note that I have had to on one system change the
git-submodule script that comes with git to use bash in the shebang. This is
because I use shell regex in the foreach commands, often:
123456789101112131415161718192021222324
#!/usr/bin/env bash# This is a silly QnD that lets you develop in the staging sub modules, then# run this to go into each one to commit and push. Saves you some keystrokes.# Force needs to be used sometimes to really push things. What force does is# just change the command chain to use ; instead of &&.export COMMIT_MESSAGE="$1"if[ -z "$COMMIT_MESSAGE"]; thenprintf"You must enter a commit message as the only argument to this script!\n" >&2
exit -1
fiif["$2"=="force"]; thenprintf"Forcing the operation...\n" git submodule foreach 'if [[ $path =~ ^staging.* ]]; then git checkout develop; git add .; git commit -am "$COMMIT_MESSAGE"; git push -u origin develop; fi'elsegit submodule foreach 'if [[ $path =~ ^staging.* ]]; then git checkout develop && git add . && git commit -am "$COMMIT_MESSAGE" && git push -u origin develop; fi'figit commit -am "Updated submodules: $COMMIT_MESSAGE"git push
#vim: set expandtab ts=3 sw=3:
So, this needs to be run from the root of the checkout. My convention for all
code checkouts is $HOME/working/$vcs/$project, so I do:
1234
[~/working/git/puppet]> vim staging/grumps-modules/common/manifests/debuntu.pp
[~/working/git/puppet]> vim staging/manifests/node_templates.pp
[~/working/git/puppet]> scripts/stagcom.sh "Wrote debuntu class for Debian and Ubuntu machines, made sure basenode includes it"
[~/working/git/puppet]> cap deploy
This is all I need to do to get commits in staging out to the master. The
snippet which shows the capistrano stuff I will show last. Promotion is as
easy as:
12
[~/working/git/puppet]> scripts/promote.sh
[~/working/git/puppet]> cap deploy
Or to promote a single commit:
12
[~/working/git/puppet]> scripts/promote.sh -c 73acef8 -m grumps-modules
[~/working/git/puppet]> cap deploy
Now let’s talk about promotion of changes. This is somewhat dependent on your
CR process (if you have one), but here is the script I use for full-env pro
motion as well as cherry-pick promotions:
#!/usr/bin/env bash# == Synopsis# This is a small script that will promote the develop branches in each git# submodule in the Puppet staging environment to master, then then pull in# the changes to the production environment. Note that you can promote single# commits by using the cherry-pick (-c) functionality. If you use cherry pick# you must pass a modulename, ie 'grumps-modules' or 'manifests', with the -m# switch## == Usage# See usage() function below## == Notes# This script does serious changes and pushes to master. Don't run it all# willy nilly.## == Authors# Joe McDonagh <jmcdonagh@thesilentpenguin.com>## == Copyright# 2012 The Silent Penguin LLC## == License# Licensed under The Silent Penguin Proprietary License#puppetroot="$HOME/working/git/puppet"# Useful helper function, shorthand for cat'ing files that have error outputfunction caterror(){ cat "$1" >&2
}# Like caterror, shorthand for sending strings to stderrfunction perror(){printf"%s\n""$1" >&2
}# Print a fatal error and exit with exit code $2function die(){ perror "$1"exit$2}# This will go through all submodules and run the appropriate git log command# to show what commits master needs to be on track with develop.function list_pending_commits(){pushd$puppetroot >/dev/null 2>&1
git submodule --quiet foreach 'if [[ $path =~ ^production.* ]]; then printf "%s:\n" "$(basename $name)"; changes=$(git log master..develop); if [ -z "$changes" ]; then printf "No pending commits.\n"; else printf "%s\n" "$changes"; fi; fi'popd$puppetroot >/dev/null 2>&1
}# Use this function to print out usage information, and exit with code of $2.# This is useful to exit with a non-zero code due to an error in argument# processing.function usage(){printf"Usage:\n"printf" %s [-c commit -m modulename] [-h]\n""$(basename $0)"printf" -l List all commits that master needs in all submodules. Make sure\n"printf" your checkout is up to date if you run this, otherwise results may\n"printf" be inaccurate.\n"printf" -c Cherry pick the commit given as the argument to this switch\n"printf" -f Force mode- this will bypass the prompt when promoting all of staging\n"printf" -h Print this message\n"printf" -m This is required if you use -c; it is the submodule directory name\n"printf" -M This overrides the default commit message with whatever you specify\n"printf"\n"printf"Example:\n"printf" %s -c d4cb267 -m manifests\n\n""$(basename $0)"printf"This will cherry-pick commit d4cb267 from the manifests submodule.\n"printf"\n"printf"Passing no arguments to this script will promote the entire staging env to\n"printf"production.\n"exit$1}while getopts c:lfm:M:h option; do case"$option" in
c)export COMMIT="$OPTARG" ;;
f)force="true" ;;
h) usage 0
;;
l) list_pending_commits
exit 0
;;
m)export MODULE="$OPTARG" ;;
M)export MESSAGE="$OPTARG" ;;
*) perror "Passing bad arguments" usage -1
esacdoneif[ -n "$MODULE" -a -z "$COMMIT"]; thenperror "You passed -m but did not pass -c, you need both." usage -10
fiif[ -z "$MODULE" -a -n "$COMMIT"]; thenperror "You passed -c but did not pass -m, you need both." usage -20
fiif[ ! -e "production/$MODULE"]; thendie "The module $MODULE does not exist!"fi# Verify whether or not commit given actually exists in develop branchif[ -n "$COMMIT"]; thenpushd staging/$MODULE >/dev/null 2>&1
if git branch --contains "$COMMIT" 2>/dev/null | grep -q develop; thencommit_exists="true"elsecommit_exists="false"fipopd >/dev/null 2>&1
if["$commit_exists"=="false"]; thendie "It appears commit $COMMIT does not exist in the develop branch of module $MODULE!"fifi# Force mode in case the script is being used in a batch fashion.if["$force" !="true" -a -z "$COMMIT"]; thenread -p "Are you sure you want to promote the entire staging environment? (y/N) " answer
if["$answer" !="y" -a "$answer" !="Y"]; thendie "Did not confirm full promotion, exiting." -5
fifi# Set commit message, dependent on whether message is passed and if cherry# picking or promoting the whole environment.if[ -z "$MESSAGE" -a -n "$COMMIT"]; thenMESSAGE="Promote commit $COMMIT in submodule $MODULE from staging to production"fiif[ -z "$MESSAGE" -a -z "$COMMIT"]; thenMESSAGE="Promote all of staging to production."fi# Change dir into puppetroot with pushd so we can keep track of where we werepushd$puppetroot >/dev/null 2>&1
# Make sure everything is up to date.scripts/updatecheckout.sh
# Do the actual merging or cherry-picking into production and pushif[ -z "$COMMIT"]; thengit submodule foreach --quiet 'if [[ $path =~ ^production.* ]]; then git checkout develop && git pull && git checkout master && git merge develop && git push; fi'elsegit submodule foreach --quiet 'if [ "$name" == "production/$MODULE" ]; then git checkout develop && git pull && git checkout master && git cherry-pick $COMMIT && git push; fi'figit commit -am "$MESSAGE"git push
popd >/dev/null 2>&1
#vim: set expandtab ts=3 sw=3:
One major point I want to drive home is that your commits in git should fix
the smallest error or add the smallest feature as possible. This is just a
version control best practice, and makes cherry picking possible. If you’re doing
giant commits that change a million things, cherry pick promotions aren’t
going to work for you.
You may have noticed that I call a script named updatecheckout.sh above. This
is the script for updating all the submodules because it’s a serious PITA to
do anything with sub-modules that isn’t scripted. That includes just keeping
your checkout up to date and on the proper branches:
#!/usr/bin/env bash# == Synopsis# This is a small script that will update your whole checkout.## == Usage# See usage() function below.## == Notes# Don't eat the yellow snow.## == Authors# Joe McDonagh <jmcdonagh@thesilentpenguin.com>## == Copyright# 2012 The Silent Penguin LLC## == License# Licensed under The Silent Penguin Proprietary License#puppetroot="$HOME/working/git/puppet"# Useful helper function, shorthand for cat'ing files that have error outputfunction caterror(){ cat "$1" >&2
}# Like caterror, shorthand for sending strings to stderrfunction perror(){printf"%s\n""$1" >&2
}# Print a fatal error and exit with exit code $2function die(){ perror "$1"exit$2}# Change dir into puppetroot with pushd so we can keep track of where we werepushd$puppetroot >/dev/null 2>&1
# Make sure everything is up to date.git pull --all
git submodule update --merge
git submodule foreach --quiet 'if [[ $path =~ ^staging.* ]]; then git checkout develop && git pull; fi'git submodule foreach --quiet 'if [[ $path =~ ^production.* ]]; then git checkout master && git pull && git checkout develop && git pull && git checkout master; fi'popd >/dev/null 2>&1
#vim: set expandtab ts=3 sw=3:
Last but not least, you’ll need some capistrano action. I use capistrano to
deploy, with the railsless-deploy.rb floating around the net. I have one small
modification to the railsless-deploy.rb which will ensure that the submodules
are on the proper branch in the cached-checkout:
# Override deploy! to do some submodule branch stuffmoduleCapistranomoduleDeploymoduleStrategyclassRemoteCache<Remotedefdeploy!update_repository_cache# This will make sure the proper branches are used in the submoduleslogger.info"Ensuring staging sub-modules are using proper develop branches..."devmodules=capture("ls -d #{shared_path}/cached-copy/staging/*")devmodules=devmodules.split("\n")devmodules.eachdo|devmod|run"cd #{devmod}; git checkout develop; git pull; cd -"endlogger.info"Ensuring production sub-modules are using proper master branches..."prodmodules=capture("ls -d #{shared_path}/cached-copy/production/*")prodmodules=prodmodules.split("\n")prodmodules.eachdo|prodmod|run"cd #{prodmod}; git checkout master; git pull; cd -"endcopy_repository_cacheendendendendend
This is done for posterity and to avoid confusion. If you are familiar with
git submodules you know they are typically in a detached state. This means
they are not on any branch in particular. The super-project simply knows what
SHA a given submodule is at. That SHA may correspond to the tip of a branch,
but git submodule isn’t really aware of that. So, the enable submodule var
of Capistrano should be enough, but I like to make sure the cached-checkout is an
exact mirror of what Puppet developers have locally. You also don’t want to
ever be developing in a detached state cause you want your commits to stay
on the proper branch. In fact you probably get an error when trying to commit in
a detached state. Haven’t had to deal with those little mistakes since I wrote
this set of scripts. Here is the actual deploy.rb:
# Configuration Variablesset:admin_group,"webops"set:admin_group_gid,"1337"set:app_user,"puppet"set:application,"puppet"set:copy_exclude,[".git"]set:deploy_lockfile,"/tmp/puppet_being_deployed"set:deploy_to,"/usr/local/puppet"set:deploy_via,"remote_cache"set:git_enable_submodules,1set:keep_releases,5set:local_checkout,"#{ENV['HOME']}/working/git/puppet"set:notification_email,"webops@yourorg.com"set:repository,"ssh://git@github.com/yourorg/puppet"set:scm,"git"set:storeconfig_password,"oFzx.218.jkshfkh82."set:use_sudo,"true"set:use_storeconfigs,"true"# SSH Config Variablesssh_options[:forward_agent]=true# Role for deploymentrole:puppet_masters,"puppet.yourorg.com"# Task to run after everything is done. This runs every time.task:afterpartydorun"if [ ! -d /etc/#{application} ]; then sudo -u root mkdir /etc/#{application}; fi"run"sudo -u root rsync --exclude='tagmail.conf' --delete -cvrP --no-o --no-g --no-p #{deploy_to}/current/ /etc/#{application}/"run"sudo -u root chown -R root:#{app_user} /etc/#{application}; sudo -u root find /etc/#{application} -type d -name 'lib' -prune -o -type d -exec chmod 750 {} \\; && sudo chmod -R g+r /etc/#{application}"run"sudo -u root find /etc/#{application} -type d -name 'lib/*' -exec chmod -R 755 {} \\;"run"sudo -u root chmod -R g+w #{deploy_to}/shared/cached-copy"run"sudo -u root /etc/init.d/apache2 restart"end# Generate puppet docstask:gendocs,:on_error=>:continuedorun"sudo rm -rf /var/www/puppetdocs/staging /var/www/puppetdocs/production"run"sudo /usr/bin/puppet doc -a -m rdoc --outputdir /var/www/puppetdocs/staging/ --manifestdir /etc/puppet/staging/manifests --modulepath '/etc/puppet/staging/grumps-modules:/etc/puppet/staging/yourorg-modules'"run"sudo /usr/bin/puppet doc -a -m rdoc --outputdir /var/www/puppetdocs/production/ --manifestdir /etc/puppet/production/manifests --modulepath '/etc/puppet/production/grumps-modules:/etc/puppet/production/yourorg-modules'"end# This task won't do much to repair a broken git repotask:fix_deploys,:on_error=>:continuedologger.info"Make cached check out group-writeable by #{admin_group}..."run"sudo -u root chgrp -R #{admin_group}#{deploy_to}/shared/cached-copy"run"sudo -u root chmod -R g+w #{deploy_to}/shared/cached-copy"logger.info"Get repo to pristine state..."run"git checkout #{deploy_to}/shared/cached-copy"logger.info"Force permission fixes..."run"sudo -u root find #{deploy_to}/shared/cached-copy -type d -exec chmod 2770 {} \\;"run"sudo -u root find #{deploy_to}/shared/cached-copy -type f -exec chmod 660 {} \\;"logger.info"Unlocking deploys..."run"sudo -u root rm -f #{deploy_lockfile}"endtask:notifydorequire'etc'require'rubygems'require'action_mailer'ActionMailer::Base.delivery_method=:sendmailActionMailer::Base.sendmail_settings={:location=>'/usr/sbin/sendmail',:arguments=>'-i -t'}classNotificationMailer<ActionMailer::Basedefdeployment(application,message,notification_email)mail(:from=>"#{Etc.getpwnam(ENV['USER']).gecos} <#{ENV['USER']}@yourorg.com>",:to=>notification_email,:subject=>"Puppet Deployment - #{Time.now.to_s}",:body=>message)endendmessage="This is a notification of deployment of a Puppet update.\n\n"message<<"Deployed at: #{Time.now.to_s}\n"message<<"Revision: #{real_revision}\n\n"# if the revision has not changed then don't look for logs, also if #SEC# is in the commit message, a full diff is not displayed for security# reasons.beginifprevious_revision!=real_revisionmessage<<"SCM Revisions Deployed\n"gitlog=`#{source.local.log(latest_revision,real_revision)} -v --oneline`ifgitlog.include?'#SEC'message<<gitlogelsemessage<<`#{source.local.log(latest_revision,real_revision)} --submodule=log -v --patch-with-stat`endendrescuemessage<<"SCM Revisions Deployed\n"message<<'Previous revision not available'endmail=NotificationMailer.deployment(application,message,notification_email)mail.deliverend# Task that cats out what revision is deployedtask:getrevdorun"sudo -u root cat /etc/puppet/REVISION"end# Task to add a lock file and bail deploys with a message if one existstask:lock_deploysdorequire'etc'logger.info"Locking deploys..."ifENV.has_key?('lock_reason')lock_reason=ENV['lock_reason']elselock_reason="Deployment"enddata=capture("cat #{deploy_lockfile} 2>/dev/null; echo").to_s.stripif!data.empty?logger.info"\e[0;31;1mATTENTION:\e[0m #{data}"abort"Deploys are locked."endtimestamp=Time.now.strftime("%m/%d/%Y %H:%M:%S %Z")lock_message="Deploys locked by #{Etc.getpwnam(ENV['USER']).gecos} (#{ENV['USER']}) at #{timestamp} for #{lock_reason}"putlock_message,"#{deploy_lockfile}",:mode=>0644endtask:unlock_deploysdologger.info"Unlocking deploys..."run"rm -f #{deploy_lockfile}"end# Before and After hooksafter"deploy:symlink",:afterpartyafter"deploy:rollback",:afterpartybefore"deploy","deploy:cleanup"before"deploy:cleanup",:lock_deploysafter"deploy",:notifybefore"notify",:unlock_deploysafter"notify",:gendocs#vim: set expandtab ts=3 sw=3:
I take no responsibility whatsoever for how you use these scripts. This is
mostly just a demonstration of a workflow that works for me, and keeps everything
clearly separated. I might be able to help people who try to do this, or I
might not (unless you are a paying customer).