Whopper Tacos

Ops, Security, Efficiency

Puppet With Git Submodules for Fun and Profit

| Comments

Git submodules are somewhat of an ‘advanced’ git feature, akin to Subversion externals for those of us unlucky enough to have the pleasure of knowing svn. The most common usage is pulling in third party libraries to your project. You can think of git submodules as git checkouts within checkouts. The ‘parent’ checkout or ‘super-project’ as I call it, knows that there are submodules, and knows which SHA each submodule is at. Since most of my work as both a Systems Engineer at $dayjob and a general IT consultant part-time, I needed to use submodules slightly differently than the most common use-case.

During some time off during 2011 from having a dayjob, I developed a lot of Puppet code that used what was at the time new features like Parameterized classes, hashes, etc. I also pretty painstakingly made sure the code would work on OpenBSD, Debian, Ubuntu, Red Hat, and CentOS. The plan was to keep this code private and charge people for the service of supporting this set of modules and develop new ones etc. So, I needed a way to let people pull this code into their Puppet repo and use it. Submodules are the only easy way to do this. This code, along with some scripts to set up an environment utilizing this code was what I called the GRand Unified Modular Puppet System (GRUMPS).

First, not to go too far off track, but if you are pulling puppet modules on the net, you should already be aware of git submodules. If you are down loading modules then just copying them into your setup you’re doing it wrong. Puppet code is code. You need to treat improvements to public modules the same way you treat any open-source project. You have your own local branch, submit fixes to upstream etc.

Now let’s talk about Puppet environments. You should at the very least have two environments. The name of the one that isn’t production probably doesn’t matter much, but we’ll call this staging, since that’s what I use in the code snippets later on. Many people will make these git checkouts under /etc/puppet that they run git pull on to update. This is an atrocious update mechanism, and a poor development layout.

Instead, what I’ve done is to make each sub-directory of each Puppet env correspond to specific branches in various sub-modules. Example:

1
2
3
4
5
6
7
8
9
10
11
Module Layout

                          _ grumps-modules => git@github.com://thesilentpenguin/grumps-modules  | master
                         /  dayjob-modules => git@github.com://dayjob/dayjob-modules            | master
/etc/puppet/production ->   manifests      => ssh://git.yoursite.com/dayjob-manifests           | master
                         \_ extdata        => ssh://git.yoursite.com/dayjob-extdata             | master

                          _ grumps-modules => git@github.com://thesilentpenguin/grumps-modules  | develop
                         /  dayjob-modules => git@github.com://dayjob/dayjob-modules            | develop
/etc/puppet/staging    ->   manifests      => ssh://git.yoursite.com/dayjob-manifests           | develop
                         \_ extdata        => ssh://git.yoursite.com/dayjob-extdata             | develop

“Develop” was a pre-existing convention I had for branch naming, but that’s really irrelevant, make it what you want. Now this shows I have 8 submodules for my two environments. I actually have one more for the SSL dir, but that is off-topic really. Some of the code samples may reference develop or master.

To get all this setup you will need to run git submodule add from the root of your checkout for each submodule, and you will also need to run git submodule init, which adds the added submodules to the .gitmodules file. Totally not confusing :-).

The 1000-foot view of the workflow after setup is complete:

  1. Team members committing like it’s April 29 1992 on develop
  2. Everybody’s seeing each other’s work, reviews it
  3. Merge every sub-module’s develop branch to merge
  4. Update super-project
  5. Push and Deploy

Another, more common workflow for me is:

  1. Committing like fire on develop
  2. Reviewing work
  3. Some guy runs up to your desk and needs XXX fixed in production
  4. O NO! you have about 20 commits that can’t go to prod
  5. Relax, we use modern version control, Cherry pick change(s)
  6. Update super-project
  7. Push and Deploy

You’re thinking “this is just a bunch of overhead”. Well yea if I didn’t script all this monotony away from me I would say the same thing. But one thing I’ve learned from heavy git sub-module usage, is that if you don’t script all these things, you will forget a step and break something. You’re also constantly repeating yourself, since you always require N+1 changes for N changes (you must update the git super-project, which updates the SHA’s found in .gitmodules).

So, here is the script for committing a single change, which may be across several submodules. Please note that I have had to on one system change the git-submodule script that comes with git to use bash in the shebang. This is because I use shell regex in the foreach commands, often:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env bash
# This is a silly QnD that lets you develop in the staging sub modules, then
# run this to go into each one to commit and push. Saves you some keystrokes.
# Force needs to be used sometimes to really push things. What force does is
# just change the command chain to use ; instead of &&.

export COMMIT_MESSAGE="$1"

if [ -z "$COMMIT_MESSAGE" ]; then
   printf "You must enter a commit message as the only argument to this script!\n" >&2
   exit -1
fi

if [ "$2" == "force" ]; then
   printf "Forcing the operation...\n"
   git submodule foreach 'if [[ $path =~ ^staging.* ]]; then git checkout develop; git add .; git commit -am "$COMMIT_MESSAGE"; git push -u origin develop; fi'
else
   git submodule foreach 'if [[ $path =~ ^staging.* ]]; then git checkout develop && git add . && git commit -am "$COMMIT_MESSAGE" && git push -u origin develop; fi'
fi

git commit -am "Updated submodules: $COMMIT_MESSAGE"
git push

#vim: set expandtab ts=3 sw=3:

So, this needs to be run from the root of the checkout. My convention for all code checkouts is $HOME/working/$vcs/$project, so I do:

1
2
3
4
[~/working/git/puppet]> vim staging/grumps-modules/common/manifests/debuntu.pp
[~/working/git/puppet]> vim staging/manifests/node_templates.pp
[~/working/git/puppet]> scripts/stagcom.sh "Wrote debuntu class for Debian and Ubuntu machines, made sure basenode includes it"
[~/working/git/puppet]> cap deploy

This is all I need to do to get commits in staging out to the master. The snippet which shows the capistrano stuff I will show last. Promotion is as easy as:

1
2
[~/working/git/puppet]> scripts/promote.sh
[~/working/git/puppet]> cap deploy

Or to promote a single commit:

1
2
[~/working/git/puppet]> scripts/promote.sh -c 73acef8 -m grumps-modules
[~/working/git/puppet]> cap deploy

Now let’s talk about promotion of changes. This is somewhat dependent on your CR process (if you have one), but here is the script I use for full-env pro motion as well as cherry-pick promotions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
#!/usr/bin/env bash
# == Synopsis
# This is a small script that will promote the develop branches in each git
# submodule in the Puppet staging environment to master, then then pull in
# the changes to the production environment. Note that you can promote single
# commits by using the cherry-pick (-c) functionality. If you use cherry pick
# you must pass a modulename, ie 'grumps-modules' or 'manifests', with the -m
# switch
#
# == Usage
# See usage() function below
#
# == Notes
# This script does serious changes and pushes to master. Don't run it all
# willy nilly.
#
# == Authors
# Joe McDonagh <jmcdonagh@thesilentpenguin.com>
#
# == Copyright
# 2012 The Silent Penguin LLC
#
# == License
# Licensed under The Silent Penguin Proprietary License
#

puppetroot="$HOME/working/git/puppet"

# Useful helper function, shorthand for cat'ing files that have error output
function caterror() {
   cat "$1" >&2
}

# Like caterror, shorthand for sending strings to stderr
function perror() {
   printf "%s\n" "$1" >&2
}

# Print a fatal error and exit with exit code $2
function die() {
   perror "$1"
   exit $2
}

# This will go through all submodules and run the appropriate git log command
# to show what commits master needs to be on track with develop.
function list_pending_commits() {
   pushd $puppetroot >/dev/null 2>&1
   git submodule --quiet foreach 'if [[ $path =~ ^production.* ]]; then printf "%s:\n" "$(basename $name)"; changes=$(git log master..develop); if [ -z "$changes" ]; then printf "No pending commits.\n"; else printf "%s\n" "$changes"; fi; fi'
   popd $puppetroot >/dev/null 2>&1
}

# Use this function to print out usage information, and exit with code of $2.
# This is useful to exit with a non-zero code due to an error in argument
# processing.
function usage() {
   printf "Usage:\n"
   printf "   %s  [-c commit -m modulename] [-h]\n" "$(basename $0)"
   printf "   -l  List all commits that master needs in all submodules. Make sure\n"
   printf "       your checkout is up to date if you run this, otherwise results may\n"
   printf "       be inaccurate.\n"
   printf "   -c  Cherry pick the commit given as the argument to this switch\n"
   printf "   -f  Force mode- this will bypass the prompt when promoting all of staging\n"
   printf "   -h  Print this message\n"
   printf "   -m  This is required if you use -c; it is the submodule directory name\n"
   printf "   -M  This overrides the default commit message with whatever you specify\n"
   printf "\n"
   printf "Example:\n"
   printf "   %s -c d4cb267 -m manifests\n\n" "$(basename $0)"
   printf "This will cherry-pick commit d4cb267 from the manifests submodule.\n"
   printf "\n"
   printf "Passing no arguments to this script will promote the entire staging env to\n"
   printf "production.\n"

   exit $1
}

while getopts c:lfm:M:h option; do
   case "$option" in
      c)
         export COMMIT="$OPTARG"
      ;;
      f)
         force="true"
      ;;
      h)
         usage 0
      ;;
      l)
         list_pending_commits
         exit 0
      ;;
      m)
         export MODULE="$OPTARG"
      ;;
      M)
         export MESSAGE="$OPTARG"
      ;;
      *)
         perror "Passing bad arguments"
         usage -1
   esac
done

if [ -n "$MODULE" -a -z "$COMMIT" ]; then
   perror "You passed -m but did not pass -c, you need both."
   usage -10
fi

if [ -z "$MODULE" -a -n "$COMMIT" ]; then
   perror "You passed -c but did not pass -m, you need both."
   usage -20
fi

if [ ! -e "production/$MODULE" ]; then
   die "The module $MODULE does not exist!"
fi

# Verify whether or not commit given actually exists in develop branch
if [ -n "$COMMIT" ]; then
   pushd staging/$MODULE >/dev/null 2>&1

   if git branch --contains "$COMMIT" 2>/dev/null | grep -q develop; then
      commit_exists="true"
   else
      commit_exists="false"
   fi

   popd >/dev/null 2>&1

   if [ "$commit_exists" == "false" ]; then
      die "It appears commit $COMMIT does not exist in the develop branch of module $MODULE!"
   fi
fi

# Force mode in case the script is being used in a batch fashion.
if [ "$force" != "true" -a -z "$COMMIT" ]; then
   read -p "Are you sure you want to promote the entire staging environment? (y/N) " answer

   if [ "$answer" != "y" -a "$answer" != "Y" ]; then
      die "Did not confirm full promotion, exiting." -5
   fi
fi

# Set commit message, dependent on whether message is passed and if cherry
# picking or promoting the whole environment.
if [ -z "$MESSAGE" -a -n "$COMMIT" ]; then
   MESSAGE="Promote commit $COMMIT in submodule $MODULE from staging to production"
fi

if [ -z "$MESSAGE" -a -z "$COMMIT" ]; then
   MESSAGE="Promote all of staging to production."
fi

# Change dir into puppetroot with pushd so we can keep track of where we were
pushd $puppetroot >/dev/null 2>&1

# Make sure everything is up to date.
scripts/updatecheckout.sh

# Do the actual merging or cherry-picking into production and push
if [ -z "$COMMIT" ]; then
   git submodule foreach --quiet 'if [[ $path =~ ^production.* ]]; then git checkout develop && git pull && git checkout master && git merge develop && git push; fi'
else
   git submodule foreach --quiet 'if [ "$name" == "production/$MODULE" ]; then git checkout develop && git pull && git checkout master && git cherry-pick $COMMIT && git push; fi'
fi

git commit -am "$MESSAGE"
git push

popd >/dev/null 2>&1
#vim: set expandtab ts=3 sw=3:

One major point I want to drive home is that your commits in git should fix the smallest error or add the smallest feature as possible. This is just a version control best practice, and makes cherry picking possible. If you’re doing giant commits that change a million things, cherry pick promotions aren’t going to work for you.

You may have noticed that I call a script named updatecheckout.sh above. This is the script for updating all the submodules because it’s a serious PITA to do anything with sub-modules that isn’t scripted. That includes just keeping your checkout up to date and on the proper branches:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/usr/bin/env bash
# == Synopsis
# This is a small script that will update your whole checkout.
#
# == Usage
# See usage() function below.
#
# == Notes
# Don't eat the yellow snow.
#
# == Authors
# Joe McDonagh <jmcdonagh@thesilentpenguin.com>
#
# == Copyright
# 2012 The Silent Penguin LLC
#
# == License
# Licensed under The Silent Penguin Proprietary License
#

puppetroot="$HOME/working/git/puppet"

# Useful helper function, shorthand for cat'ing files that have error output
function caterror() {
   cat "$1" >&2
}

# Like caterror, shorthand for sending strings to stderr
function perror() {
   printf "%s\n" "$1" >&2
}

# Print a fatal error and exit with exit code $2
function die() {
   perror "$1"
   exit $2
}

# Change dir into puppetroot with pushd so we can keep track of where we were
pushd $puppetroot >/dev/null 2>&1

# Make sure everything is up to date.
git pull --all
git submodule update --merge
git submodule foreach --quiet 'if [[ $path =~ ^staging.* ]]; then git checkout develop && git pull; fi'
git submodule foreach --quiet 'if [[ $path =~ ^production.* ]]; then git checkout master && git pull && git checkout develop && git pull && git checkout master; fi'

popd >/dev/null 2>&1
#vim: set expandtab ts=3 sw=3:

Last but not least, you’ll need some capistrano action. I use capistrano to deploy, with the railsless-deploy.rb floating around the net. I have one small modification to the railsless-deploy.rb which will ensure that the submodules are on the proper branch in the cached-checkout:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  # Override deploy! to do some submodule branch stuff
  module Capistrano
    module Deploy
      module Strategy
        class RemoteCache < Remote
          def deploy!
            update_repository_cache

            # This will make sure the proper branches are used in the submodules
            logger.info "Ensuring staging sub-modules are using proper develop branches..."

            devmodules = capture("ls -d #{shared_path}/cached-copy/staging/*")
            devmodules = devmodules.split("\n")
            devmodules.each do |devmod|
              run "cd  #{devmod}; git checkout develop; git pull; cd -"
            end

            logger.info "Ensuring production sub-modules are using proper master branches..."

            prodmodules = capture("ls -d #{shared_path}/cached-copy/production/*")
            prodmodules = prodmodules.split("\n")
            prodmodules.each do |prodmod|
              run "cd #{prodmod}; git checkout master; git pull; cd -"
            end

            copy_repository_cache
          end
        end
      end
    end
  end

This is done for posterity and to avoid confusion. If you are familiar with git submodules you know they are typically in a detached state. This means they are not on any branch in particular. The super-project simply knows what SHA a given submodule is at. That SHA may correspond to the tip of a branch, but git submodule isn’t really aware of that. So, the enable submodule var of Capistrano should be enough, but I like to make sure the cached-checkout is an exact mirror of what Puppet developers have locally. You also don’t want to ever be developing in a detached state cause you want your commits to stay on the proper branch. In fact you probably get an error when trying to commit in a detached state. Haven’t had to deal with those little mistakes since I wrote this set of scripts. Here is the actual deploy.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# Configuration Variables
set :admin_group,             "webops"
set :admin_group_gid,         "1337"
set :app_user,                "puppet"
set :application,             "puppet"
set :copy_exclude,            [ ".git" ]
set :deploy_lockfile,         "/tmp/puppet_being_deployed"
set :deploy_to,               "/usr/local/puppet"
set :deploy_via,              "remote_cache"
set :git_enable_submodules,   1
set :keep_releases,           5
set :local_checkout,          "#{ENV['HOME']}/working/git/puppet"
set :notification_email,      "webops@yourorg.com"
set :repository,              "ssh://git@github.com/yourorg/puppet"
set :scm,                     "git"
set :storeconfig_password,    "oFzx.218.jkshfkh82."
set :use_sudo,                "true"
set :use_storeconfigs,        "true"

# SSH Config Variables
ssh_options[:forward_agent] = true

# Role for deployment
role :puppet_masters,
   "puppet.yourorg.com"

# Task to run after everything is done. This runs every time.
task :afterparty do
   run "if [ ! -d /etc/#{application} ]; then sudo -u root mkdir /etc/#{application}; fi"
   run "sudo -u root rsync --exclude='tagmail.conf' --delete -cvrP --no-o --no-g --no-p #{deploy_to}/current/ /etc/#{application}/"
   run "sudo -u root chown -R root:#{app_user} /etc/#{application}; sudo -u root find /etc/#{application} -type d -name 'lib' -prune -o -type d -exec chmod 750 {} \\; && sudo chmod -R g+r /etc/#{application}"
   run "sudo -u root find /etc/#{application} -type d -name 'lib/*' -exec chmod -R 755 {} \\;"
   run "sudo -u root chmod -R g+w #{deploy_to}/shared/cached-copy"
   run "sudo -u root /etc/init.d/apache2 restart"
end

# Generate puppet docs
task :gendocs, :on_error => :continue do
   run "sudo rm -rf /var/www/puppetdocs/staging /var/www/puppetdocs/production"
   run "sudo /usr/bin/puppet doc -a -m rdoc --outputdir /var/www/puppetdocs/staging/ --manifestdir /etc/puppet/staging/manifests --modulepath '/etc/puppet/staging/grumps-modules:/etc/puppet/staging/yourorg-modules'"
   run "sudo /usr/bin/puppet doc -a -m rdoc --outputdir /var/www/puppetdocs/production/ --manifestdir /etc/puppet/production/manifests --modulepath '/etc/puppet/production/grumps-modules:/etc/puppet/production/yourorg-modules'"
end

# This task won't do much to repair a broken git repo
task :fix_deploys, :on_error => :continue do
   logger.info "Make cached check out group-writeable by #{admin_group}..."
   run "sudo -u root chgrp -R #{admin_group} #{deploy_to}/shared/cached-copy"
   run "sudo -u root chmod -R g+w #{deploy_to}/shared/cached-copy"
   logger.info "Get repo to pristine state..."
   run "git checkout #{deploy_to}/shared/cached-copy"
   logger.info "Force permission fixes..."
   run "sudo -u root find #{deploy_to}/shared/cached-copy -type d -exec chmod 2770 {} \\;"
   run "sudo -u root find #{deploy_to}/shared/cached-copy -type f -exec chmod 660 {} \\;"
   logger.info "Unlocking deploys..."
   run "sudo -u root rm -f #{deploy_lockfile}"
end

task :notify do
   require 'etc'
   require 'rubygems'
   require 'action_mailer'

   ActionMailer::Base.delivery_method = :sendmail
   ActionMailer::Base.sendmail_settings = {
      :location   => '/usr/sbin/sendmail',
      :arguments  => '-i -t'
   }

   class NotificationMailer < ActionMailer::Base
      def deployment(application, message, notification_email)
         mail(
            :from    => "#{Etc.getpwnam(ENV['USER']).gecos} <#{ENV['USER']}@yourorg.com>",
            :to      => notification_email,
            :subject => "Puppet Deployment - #{Time.now.to_s}",
            :body    => message
         )
      end
   end

   message = "This is a notification of deployment of a Puppet update.\n\n"
   message << "Deployed at: #{Time.now.to_s}\n"
   message << "Revision: #{real_revision}\n\n"

   # if the revision has not changed then don't look for logs, also if #SEC
   # is in the commit message, a full diff is not displayed for security
   # reasons.
   begin
      if previous_revision != real_revision
         message << "SCM Revisions Deployed\n"
         gitlog = `#{source.local.log(latest_revision, real_revision)} -v --oneline`
         if gitlog.include? '#SEC'
            message << gitlog
         else
            message << `#{source.local.log(latest_revision, real_revision)} --submodule=log -v --patch-with-stat`
         end
      end
   rescue
      message << "SCM Revisions Deployed\n"
      message << 'Previous revision not available'
   end

   mail = NotificationMailer.deployment(application, message, notification_email)
   mail.deliver
end

# Task that cats out what revision is deployed
task :getrev do
   run "sudo -u root cat /etc/puppet/REVISION"
end

# Task to add a lock file and bail deploys with a message if one exists
task :lock_deploys do
   require 'etc'

   logger.info "Locking deploys..."

   if ENV.has_key?('lock_reason')
      lock_reason = ENV['lock_reason']
   else
      lock_reason = "Deployment"
   end

   data = capture("cat #{deploy_lockfile} 2>/dev/null; echo").to_s.strip

   if !data.empty?
      logger.info "\e[0;31;1mATTENTION:\e[0m #{data}"
      abort "Deploys are locked."
   end

   timestamp = Time.now.strftime("%m/%d/%Y %H:%M:%S %Z")
   lock_message = "Deploys locked by #{Etc.getpwnam(ENV['USER']).gecos} (#{ENV['USER']}) at #{timestamp} for #{lock_reason}"
   put lock_message, "#{deploy_lockfile}", :mode => 0644
end

task :unlock_deploys do
   logger.info "Unlocking deploys..."
   run "rm -f #{deploy_lockfile}"
end

# Before and After hooks
after "deploy:symlink", :afterparty
after "deploy:rollback", :afterparty
before "deploy", "deploy:cleanup"
before "deploy:cleanup", :lock_deploys
after "deploy", :notify
before "notify", :unlock_deploys
after "notify", :gendocs

#vim: set expandtab ts=3 sw=3:

I take no responsibility whatsoever for how you use these scripts. This is mostly just a demonstration of a workflow that works for me, and keeps everything clearly separated. I might be able to help people who try to do this, or I might not (unless you are a paying customer).

Comments