Category Archives: Uncategorized

Comparing Merge vs Rebase for GitHub Desktop

Recently I’ve been using and recommending the GitHub Desktop client for non-programmers that need to interact with a Git repository. However, I ran into some interesting issues with the end result when multiple users are participating. This blog post takes a methodical walk through the common actions and how they behave when ‘merge’ or ‘rebase’ is the default.

The process I’ll follow will be to have two separate projects (test-merge and test-rebase), each being edited by two users (Mike and Sonja). The actions that will be tested are:

  • Add a new file
  • Change an independent file, commit, pull, push
  • Change an independent file, pull, commit, push
  • Change the same file, commit, pull, push
  • Change the same file, pull, commit, push
  • Delete a file

Add a New File

Initially we’ll start with Mike adding a file, committing and pushing to GitHub. Then Sonja will clone the repo, add a second file, commit and push. Mike will then pull.

As expected, both sides end up with both files, and the repo looks good.

Change an independent file, commit, pull, push

In this case, Mike will change his file, commit it to his local repository. Then Sonja will change her file, commit it to her local repository, push the changes. Finally, Mike will pull the changes in, and push his changes.

Interestingly, this generated a special merge commit. Because Mike attempted to push his changes after Sonja, GitHub Desktop prompted with a message indicating that Mike’s local repository was behind and needed to be updated.

He then needed to click the ‘Pull’ button to bring Sonja’s changes into his local repository. A new change was automatically created that merged Sonja’s ‘newer’ changes into his repository. Mike could then click ‘Push’ to send the changes to GitHub. Note that there are 2 changes (Mike’s original change and the merged change from Sonja) that Mike is sending. That’s reflected in the Graph that shows Sonja’s changes (blue line) being merged together.

Now, if instead, we do the process with a rebase (git config pull.rebase true), then the experience is almost identical. The only perceived difference is that Mike sees a “Pull with rebase” button instead of a “Pull” button. Once Mike does the Pull and Push, the history looks much cleaner, since Mike’s latest changes are rebased on top of Sonja’s changes.

Change independent file, pull, commit, push

This time, the Mike user will make the change, but won’t commit before pulling Sonja’s changes.

In the case of the merge, everything worked as expected. No special commits were made.

In the case of the rebase, when Mike went to pull in Sonja’s changes, an error was displayed:

Mike was able to click ‘Stash changes and continue’, the pull then succeeded. Mike had to then click ‘Stash’ and ‘Restore’ to pull the changes. As expected, the graph looks fine:

While this provides a better Git history, it is much more complicated for the user. Fortunately, there is another Git option that can fix it. Setting git config rebase.autoStash true enables the Stash process transparently. The end result, is Mike just clicks Pull, Commit, Push normally, and doesn’t notice his changes were Stashed and Unstashed before and after the Pull.

However, one important point is that the stashing can cause difficult merge conflicts depending on the type of file and the type of changes.

Change the same file, commit, pull, push

In this scenario, Mike will change his file, commit, then Sonja will change Mike’s file, commit and push, and finally, Mike will pull Sonja’s changes and push. To make the conflict easier, Sonja’s change will be 100 lines away from Mike’s.

In the case of the Merge, we again see the dirty error, then a pull, and a push. Since Sonja’s change was far enough way, there was no merge conflict. The end graph again shows the merge update.

In the case of the Rebase, the same error, Pull with rebase, and push, worked, and the graph is clean:

Change the same file, pull, commit, push

And finally, Mike will make a change, Sonja will make a change, Sonja will push, Mike will pull, then Mike will commit and push.

In the case of the Merge, we again see the dirty error, then a pull, with a stash.

And with the rebase, we get the same messages, and a clean graph.

Summary

For certain cases, most notably when there is only one branch, and multiple people are working on the same branch, and there is not alot of changes of the same file by multiple people, I would recommend using a rebase with autostash. This can be set into your Git repo by typing (if starting in GitHub Desktop, click Repository -> Open in Command Prompt to get a shell):

git config pull.rebase true
git config rebase.autoStash true

Setting up a build environment

A build environment is composed of many moving parts.

  1. Source control environment
  2. Build process
  3. Build environment
  4. Build Orchestration
  5. Build Results
  6. Build Artifacts

All of these require their own specific setups, and there can be many different ways to accomplish each. I’ll talk about my opinionated way, with detailed setups.

Source Control Environment

First, there is the source code itself. Generally, this will be stored in something like Git/GitHub/GitLab, SVN, RTC, etc. For most open source environments, GitHub is currently the king, but I’m seeing quite alot of movement to GitLab as well. Even for closed source, GitHub is a pretty cheap environment to use, and cuts down on one more system to setup and manage at the tool level.

All of my code is currently stored in a variety of repositories within GitHub. Quite a few are stored as private repositories, especially before I decide that I’m ready to ‘make it public’.

Build Process

There is an infinite number of ways that the build process can be set up, but I generally like to break it into 4 stages.

  1. Code Compiling
  2. Unit Testing
  3. Assembly
  4. Integration Testing

Coding Compiling

When it comes to compiling the code, I’m generally a fan of Maven, but there’s, obviously, alot of different choices such as SBT, Gradle, Ant, etc. I highly recommend that you try to have one consistent tooling, since it keeps things consistent. Regardless of choice, I do recommend that you leverage the Maven-style artifact repository, since managing dependencies manually is too error-prone. Fortunately, just about every build tool today supports it.

Within Maven, I like using a specific directory structure that works well with Eclipse, and that means that all pom’s need to be in ‘non-recursive’ folders (ie. one pom cannot be in a child folder of another pom). Thus, my project structures tend to look like:

MyProject/
   root/
      pom.xml
   services/
      root/
         pom.xml
      service1/
         pom.xml

In this structure, each level has a ‘root’ folder that contains a parent pom. Building the entire project simply means going into the top level root folder, and running

mvn install

Unit Testing

While I tend to be somewhat hit-and-miss around my consistent application of unit testing, I do heavily believe in them. There are many different unit testing frameworks for each language you work in. For Java, JUnit is the most well known.

I generally put all my unit testing within each project, and have them run as part of the compiling process. Thus, each project doesn’t complete compiling until the unit tests are complete.

Unit tests are meant to be reasonably quick. The entire suite shouldn’t take more than 5-10 minutes to run. Anything more complicated should be located with the Integration Testing.

Assembly

Once all the pieces are compiled and unit tested, the next step is to assembly into the final structure. These days, that usually means Docker for me. This is where I build the docker image. This isn’t a complex stage, since it’s almost always just following a Dockerfile which is checked into source control, but, it’s critical to have the final assembly done before Integration Testing happens.

Integration Testing

This is where more complex and longer running testing happens. This can range anywhere from a few minutes to days of testing. I’ll generally set up my testing environment to only do ONE integration test in parallel. This means that if we’re still running an integration test when a new build comes along, the old integration test is cancelled, and the new integration test is started.

There are many different tools to use here, and I generally use multiple different ones. Things like Gatling, …

Build Environment

Over the years, I’ve used many different build environments, but I’m currently moving towards standardizing on Jenkins. It’s well known, easy to set up, and easy to customize.

Like everything else I do, I’ll only run tools within a Docker environment. Currently, I’m using the out-of-the-box build, so

docker run -d -v /var/run/weave/weave.sock:/var/run/docker.sock -v /data/docker:/usr/bin/docker -v /data/jenkins:/var/jenkins_home -t --name jenkins jenkins:latest

Because I’m going to use Docker during the build process, I want to expose the docker socket (in this case the Weave.Works socket, since I also do everything in a weave controlled environment). Additionally, I need the docker binary to be available. NOTE: Make sure that you mount a statically compiled docker binary, since the majority of ones installed by default are NOT statically bound, and your Jenkins container won’t have all the missing libraries. Finally, I’ve mounted a data folder to hold all the updates.

NOTE: That because this Jenkins is running in my isolated Weave environment, I do need to expose it to the outside world so that GitHub hooks, etc can reach it.

In my case, since I’m usually working at my house, which is NAT’d behind a firewall/router, I find it easier to just expose all my different servers under specific port numbers. Eventually, I’ll probably set up a good reverse proxy for it all, but until then, I’ve decided to expose Jenkins under port 1234. Additionally, enp0s3 happens to be the linux network interface on my box, most other people probably have eth0 or eth1.

iptables -t nat -A PREROUTING -p tcp -i enp0s3 --dport 1234 -j DNAT --to-destination $(weave dns-lookup jenkins):8080

With a small change to my DNS Provider (ie. Cloudflare), I know have http://jenkins.mydomain.com:1234 now available to everyone.

Build Orchestration

I’ve just started playing with Jenkinsfile’s and the Multi-branch Pipeline code, but so far, it’s been very good and easy to use (although I still don’t have the GitHub change hook working after fighting with it for 5 hours).

Groovy has never been a language of interest, but it’s close enough to Java, that I don’t generally have a problem. Of course, every example uses Groovy’s DSL structure instead of just plain functions, so it always “looks” weirder then it is.

One of the things I like to do is to have all my POM’s use a consistent XXX-SNAPSHOT version, and then, as part of the build, replace the versions with the latest build number. Within my Jenkinsfile, I’m currently using this:

// First, let's read the version number
def pom = readFile 'root/pom.xml'
def project = new XmlSlurper().parseText(pom)
def version = project.version.toString()
//set this to null in order to stop Jenkins serialisation of job failing
project = null
 
// Update the version to contain the build number
version = version.replace("-SNAPSHOT", "")
def lastOffset = version.lastIndexOf("-");
if (lastOffset != -1)
   version = version.substring(0, lastOffset);
version = version + "-" + env.BUILD_NUMBER
env.buildVersion = version

Two key issues here.

1) In Jenkins2’s new security Sandbox, the use of XmlSlurper causes a whole bunch of run errors, so you’ll have to ‘run this’ about 3 or 4 times, each time approving a new method call (the new(), the parseText(), the getProperty(), etc.). A little annoying, but once you do it once, it won’t bother you again.

2) Any non-serializable object, such as the XmlSlurper results, can’t be kept around since, some of the later “special functions”, actually serialize the entire context so that the data can be restored later. Thus, a simple solution is to just assign null to those variables (such as the project variable above). There’s a @NonCPS annotation as well, but I don’t really understand it, and this works fine for these simple variables.

Build Results

I generally like to make available the different build results, such as the test results, etc. The Jenkinsfile has a couple of step commands to make that pretty easy.

 step([$class: 'ArtifactArchiver', artifacts: '**/target/*.jar', fingerprint: true])
 step([$class: 'JUnitResultArchiver', testResults: '**/target/surefire-reports/TEST-*.xml'])

In this case, I’m storing all the JARs as artifact results, and the JUnit results as JUnit Results (Jenkins has some nice tooling to make the JUnit results display nicely).

Build Artifacts

Finally, I want to store the artifacts, such as the Maven artifacts or Docker images into a repository, either public or private. I’ll talk more later about hooking up to the public Maven repository or standing up your own personal Docker repository.

docker-machine, none/generic drivers and TLS

I really like the idea of docker-machine. It provides a nice interface where I can see the machines that I’m working with. It’s easy to use the commands to quickly switch between machines, and it has lots of great commands for scripting.

However, if you didn’t create the machine on the computer where you are running docker-machine, it’s a complete mess (at least as of Docker 1.9). There are quite a few reported issues, and acknowledgements that it’s broken (See Docker Issue #1221).

But I was able to get at least the basic connections working by piecing together different comments and other google articles.

The primary issue is the TLS security that surrounds the Docker socket and allowing docker-machine to have access to it.

Additionally, the only docker-machine driver that ‘kind of works’ is the ‘none’ driver. However, it’s really meant as a test driver, so the fact that it works is a hack, and it sounds like that they plan to remove it (See Docker Issue #2437). It seems that the intent in the future is for the ‘generic’ driver to be used for this purpose, but at this point, the generic driver automatically regenerates all certificates and restarts the driver. So, completely useless when you have multiple docker-machine’s managing the same box (ie. in a production environment, you might have multiple administrators who look after the boxes).

So, for now, these steps work, but this will likely fail before long.

Download the necessary files

At this point, the complete set of TLS files are needed on the client box. This is the ca.pem, ca-key.pem, server.pem and server-key.pem.

Most of these are present in the /etc/docker folder on the host, but the ca-key.pem may only be present whereever you originally created this (ie. if you used docker-machine create on some other box, the ca-key.pem is only on the ‘other box’).

Copy all these files to a directory on your client box.

Generate a new Client Certificate

Now, we need to generate a client certificate for your client box, and then sign it with the server certificate.

openssl genrsa -out key.pem 4096
openssl req -subj '/CN=client' -new -key key.pem -out client.csr
echo extendedKeyUsage = clientAuth > extfile.cnf
openssl x509 -req -days 365 -sha256 -in client.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out cert.pem -extfile extfile.cnf

Create the Machine

Now we need to create the machine using docker-machine and then fix up the configuration.

docker-machine --tls-ca-cert ca.pem --tls-client-cert cert.pem --tls-client-key key.pem create -d "none" --url tcp://104.236.140.57:2376 digitalocean-wordpress

Of course, replace your IP address with the IP address of your Docker host. The final argument is the machine name that you want to have it known by.

Unfortunately, the driver doesn’t copy the certificate information into the right folder, so you have to fix things up.

Navigate into the ~/.docker/machine/machines/digitalocean-wordpress folder

cd ~/.docker/machine/machines/digitalocean-wordpress

Now, copy all 5 files (ca.pem, server.pem, server-key.pem, client.pem and key.pem) into this folder.

NOTE: Annoyingly, Docker expects the files to have specific names, even though there is a config file that points to it, so don’t rename them from what’s listed.

Next, modify the config.json, and update the bottom section:

 "AuthOptions": {
 "CertDir": "/home/mmansell/.docker/machine/certs",
 "CaCertPath": "/home/mmansell/.docker/machine/machines/digitalocean-wordpress/ca.pem",
 "CaPrivateKeyPath": "/home/mmansell/.docker/machine/certs/ca-key.pem",
 "CaCertRemotePath": "",
 "ServerCertPath": "/home/mmansell/.docker/machine/machines/digitalocean-wordpress/server.pem",
 "ServerKeyPath": "/home/mmansell/.docker/machine/machines/digitalocean-wordpress/server-key.pem",
 "ClientKeyPath": "/home/mmansell/.docker/machine/machines/digitalocean-wordpress/key.pem",
 "ServerCertRemotePath": "",
 "ServerKeyRemotePath": "",
 "ClientCertPath": "/home/mmansell/.docker/machine/machines/digitalocean-wordpress/cert.pem",
 "ServerCertSANs": [],
 "StorePath": "/home/mmansell/.docker/machine/machines/digitalocean-wordpress"
 }

In specific, you be updating the CaCertPath, ClientKeyPath and ClientCertPath entries.

Testing

At this point, you should be able to use the docker-machine commands.

docker-machine ls

Or

eval $(docker-machine env digitalocean-wordpress)
docker ps

However, some commands such as docker-machine ssh, etc. will not work, since the ssh keys are not present. According to some of the discussions, this functionality is completely broken in the none driver.

Hopefully they’ll fix the generic driver (or create a new one) that allows full access without a ‘reinstall’ that the generic driver does currently.

Setup up a Weave Environment

So, I’ve been doing some more reading on running a production docker environment, and it’s clear that docker really “stops” at the host level. Managing multiple hosts and multiple applications is a real hassle in the default docker environment.

Enter Weave.

It automatically provides a more robust network overlay that your docker containers can work with. I highly suggest that you read through their website.

I wanted to record the process I went through to set up a Weave environment across two different Cloud Providers (Digital Ocean and Vultr).

Vultr is not directly supported for the new docker-machine functionality, so I used their website to create a basic host using the Ubuntu 14.04 LTS version.

Once the host was provisioned, I connected using SSH, and setup and installed docker:

vultrbox% wget -qO- https://get.docker.com/ | sh

As of this article it was Docker 1.8, which is pretty close to the minimal version needed to get Weave working properly.

Back in my docker management box (at my house), I added the new box to management under docker-machine:

home% docker-machine create --driver generic --generic-ip-address vultrbox --generic-ssh-user root --generic-ssh-key .ssh/MikeMansell.priv vultr3

This box is now managed by docker-machine, which makes it much easier to issue commands.

The next step was to download the Weave commands onto the management box.

home% curl -L git.io/weave -o /usr/local/bin/weave
home% chmod a+x /usr/local/bin/weave

Since we’re going to issue all the next set of commands in the context of our vultr box, we’ll set up an environment variable so that it automatically connects there:

home% export DOCKER_CLIENT_ARGS="$(docker-machine config vultr3)"

Next, we need to set up the Weave agent, but we want it to be secure and to support multiple isolated applications. So, I ran:

home% weave launch -password $WEAVE_PASSWORD -iprange 10.2.0.0/16 -ipsubnet 10.2.1.0/24

This sets up a password that will be used to make sure that all traffic between the different Weave proxies are encrypted. It also says that Weave will manage the 10.2.0.0 subnet, and if an application is launched without explicitly defining a subnet to run it in the 10.2.1.0 subnet. This launches the proxy as a container on the vultr3 box.

Next, we want to have a DNS server running.

home% weave launch-dns

And finally, we want the Docker API proxy so that any docker container command are automatically routed through weave. However, since we want to make sure that everything remains secure, we’ll have to use the same TLS commands that were used to secure docker in the first place.

home% weave launch-proxy --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem

We now have three containers running on our vultr3 box. This represents our ‘base’ environment. We can do a quick test to make sure that everything is fine by redirecting our docker commands through weave.

home% eval $(weave proxy-env)

And now we can run a test command to see how it’s working.

home% docker run --name a1 -ti ubuntu
root@a1:/# hostname
a1.weave.local

We started a ubuntu shell container over on vultr3, but we can see that it’s run through weave, since it was assigned a DNS name in the weave.local DNS namespace.

We’ll clean that up.

root@a1:/# exit
exit
home% docker rm a1
a1

For a real scenario, we’re going to run a basic Cassandra server on this box. NOTE: We’ll run it in a separate isolated subnet.

home% docker run --name cassandra1 -e WEAVE_CIDR=net:10.2.2.0/24 -v /root/data/cassandra1:/var/lib/cassandra/data -d cassandra:2.2.0

Next, we want to get a Digital Ocean host set up, so that it can join the network. Similar to my other posts, I’ll assume that the Digital Ocean Acess Token is stored in an environmental variable.

home% export DO_API_TOKEN=xxxxx

Then, we’ll create the machine

home% docker-machine create --driver digitalocean --digitalocean-access-token $DO_API_TOKEN --digitalocean-image "ubuntu-14-04-x64" --digitalocean-region "nyc3" --digitalocean-size "512mb" donyc3

This creates a new Docker host running in New York, with the cheapest hosting plan.

We’ll switch over to using this host for the next set of commands

home% eval $(docker-machine config donyc3)
home% export DOCKER_CLIENT_ARGS="$(docker-machine config donyc3)"

And, we’ll launch the Weave environment here as well. The big change is that when we launch Weave, we’ll provide the host of the vultr3 box so that they can connect together.

home% weave launch -password $WEAVE_PASSWORD -iprange 10.2.0.0/16 -ipsubnet 10.2.1.0/24 $(docker-machine ip vultr3)
home% weave launch-dns
home% weave launch-proxy --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem

Now, we can use the Docker proxy for this second box:

home% eval $(weave proxy-env)

And lastly, we can run a Cassandra query from Digital Ocean in NYC to the Cassandra server box over in Vultr.

home% docker run --name cq -e WEAVE_CIDR=net:10.2.2.0/24 -ti cassandra:2.2.0 cqlsh cassandra1

From the point-of-view of the containers, they think they are running in the same network. That’s pretty cool.