Scribblings: 2011

Git, as we know is a fast, open source, distributed version control system that is quickly replacing subversion in open source and corporate programming communities. As a developer, many a times i have been amazed by the power of git and how it takes care of our code repo. It track files in the project, we periodically commit the state of the project when we want a saved point. The history of our project is shared with other developers for collaboration, merge between their work and ours, and compare or revert to previous versions of the project or individual files.

As mentioned earlier, Git, at a fast pace, is replacing subversion in open source and corporate programming communities. Hence most open source developers would have had a taste of git and its power. We all would have done a git init, push, pull, rebase and stuff in our day to day programming activity and those would be quite trivial to most developers.

However there are certain facets of git(merges, conflicts, reverts and such) which does create some kind of confusion to developers, at least when they use it for the first time. What made me write down this post is an incident that happened to my colleague while he was on work. Will get into that shortly. Before getting into that, let me just stitch in a brief on Revert and Reset in git.

Revert and Reset

Git provides us multiple methods for fixing up mistakes while in development mode. This is important, because it saves not just our work but the others who are involved in the same project.

If you have actually done a mess with your working directory, but actually haven't committed the changes, the best way is to perhaps do a hard reset.

$ git reset --hard HEAD

This would just wipe off the changes that you have made in your git index and also any outstanding changes that you have made in your repo.

Now suppose you have committed your changes, but haven't pushed it into master, and then suddenly you feel like you shoudn't have made the previous commit(or a sequence of your previous commits), you could again reset hard. This is as simple as doing

$ git reset --hard HEAD~n

This would set the HEAD of the git index to 'n' commits prior to your current head. The problem though with doing a git reset --hard is very obvious. This is how your commit log looks like with at A its head

o  ->  o  ->  o  ->  D  ->  C  ->  B  ->  A

Suppose you do

$ git reset --hard HEAD~3

Now your commit log would be.

o  ->  o  ->  o  ->  D

This means that the changes that you made right from A to C have been vanished and you are not going to get it back. The bottom line is simple. You are not able to change the effects made by a single commit(ofcourse, the exception is your last commit as we have already seen).

git-revert is just for that.

The current commit log would look like this

o  ->  o  ->  o  ->  D  ->  C  ->  B  ->  A

At any point of time, you realize that 'C' is bound to break your code(hopefully it still hasn't), you may well want to undo the changes made by C. This could be done by

$ git revert (commit id of C).

This would create a new commit that undoes the commit C. You will be given a chance to enter a new commit message, but the default message that indicates its 'the reverse of the commit C' would be the most indicative commit message to have.

o  ->  o  ->  o  ->  D  ->  C  ->  B  ->  A  ->  rC

where rC is the reverse of C.

This revert is a straightforward revert(i.e. it just undoes the data made by the commit reverted). Since all thats being talked about is a single branch, there aren't any complications that would arise here.

Merge and reverting a faulty merge

Now let me talk about the incident that i had mentioned earlier. All these happened as a result of an accidental Merge. My friend did this

$ git pull origin experimental

while he was still sitting in his master branch. The experimental branch has now been merged into the branch master. This was totally unintentional(he never planned to do a merge). There were no merge conflicts however. The mainline code broke. We had to revert this faulty merge.

Master ->            P   ->   x    -> M
                      \                 /
                               \                /
Experimental ->        A    ->   B

This would give you a picture. P is the point of branching. x is some commit made in the mainline branch totally unrelated to the side line branch. The side line branch itself has got two commits of its own, A and B. M is the merge commit (experimental has been merged with master). The code broke. Hence, we need to revert M(the merge commit).

Master ->            P   ->   x    -> M -> W
                      \                 /
                               \                /
Experimental ->        A    ->   B

Now as seen, the merge has been reverted(W is the reverse of M). This was done with

$ git revert -m 1 (Commit id of M)

This adds W to the commit log as well. Now the faulty code in the experimental branch was worked upon and fixed and its been made ready for the merge (again!). The experimental branch is now merged with the master branch. What was weird(for us, at that point of time) and noticeable was that the code changes that were made after the 'merge revert' appeared in the master branch whereas the ones made before the revert didn't appear. i.e.

Master - >          P -> x -> M -> W -> x -> x -> M2

Experimental ->        A -> B  -  -  -  -  -  -  -   C -> D

Again, x are the commits unrelated to the experimental branch. M2 is the second merge. Commits in the experimental branch,C and D, fixes the faulty code in A and B. Whats to be noticed is that, after the updated experimental branch has been merged, none of the changes made by A and B would appear in the master branch, whereas the changes made in C and D would.The reason was found out soon.

Linus Torvalds explains the situation:

     Reverting a regular commit just effectively undoes what that commit
     did, and is fairly straightforward. But reverting a merge commit also
     undoes the _data_ that the commit changed, but it does absolutely
     nothing to the effects on _history_ that the merge had.

     So the merge will still exist, and it will still be seen as joining
     the two branches together, and future merges will see that merge as
     the last shared state - and the revert that reverted the merge brought
     in will not affect that at all.

Thats what just happened here. W(merge revert) undoes the data made by M(merge) but does nothing to the commit history brought in by M.There fore when the second merge,M2, is made, the commit history is checked and M is found to be 'last shared state'. Hence, only those changes that has been made after the 'last shared state', M, will be merged into the master branch now(i.e. commits C and D). None of the data created in A and B would merge, because as per the commit history, they are already merged.

Solution to this problem is also explained by Linus himself. The fix is to 'revert the revert that brought in W', i.e, revert W before you do in the next merge,M2.

Thus the main line commit log would be

P  ->  x  ->  M  ->  W  ->  x  ->  x  ->  Y  ->  M2.

where Y is the reverse of W and M2 is the merge made after that.

$ git revert (commit id of W)

adds Y to the commit log. The above commit log would be equivalent to

P  ->  x  -> M  ->  x  ->  x  ->  M2

where there is no W nor a Y and then the second merge has been performed, M2. Now this would be fine, and all the changes made in the experimental branch should be seen in the master branch(ignoring merge conflicts). If there are any merge conflicts arising, git leaves the index and the working tree in a special state that gives us all the information needed to resolve the merge.

Merge Conflict

A Merge conflict would throw in the following message:

CONFLICT (content): Merge conflict in sample_script.rb Automatic merge failed; fix conflicts and then commit the result

Trying to switch to the experimental branch would give you this

error: you need to resolve your current index first

The files with conflicts will have markers upon them.

<<<<<<< HEAD:sample_script.rb "We would be starting off now" ======= "This would be the end" >>>>>>> d31f96832d54c2702914d4f605c1d641511fef13:sample_script.rb

Now we need to resolve these conflicts manually followed by adding the file and commit it.

$ git add sample_script.rb
$ git commit -a

The commit message would already be filled in indicating that its a conflict resolving commit. I always prefer not to add in anything extra on that.

gitk

It would also be helpful to have the 'gitk' tool when you are analyzing your commit logs, specially when you have more than once branch. You would be given a neat graphical representation of your working directory.

$ sudo apt-get install gitk

if you already don't have one.

This definitely would be helpful in getting a better picture.

Consider you have a module and a class.

module Mymod and a class Myclass.

The situation in hand is such that certain functions in the module need to end up being instance methods of the class Myclass and certain functions need to be Class methods. You could very well imagine of such situations. Consider you are using ActiveRecord and have a sub class Subscription in correspondence with a DB table. You want to insert logic, within the module, that would work in each of the following case.

1) When a subscription fails or succeeds.

2) When an unsubscription fails or succeeds

You do this.

module SubscriptionLogic

def after_sub

....

end

def after_unsub

....

end

def after_sub_fail

....

end

def after_unsub_fail

....

end

class Subscription

   include SubscriptionLogic

   .....

end

You insert the logic as functions of a module, say SubscriptionLogic. Ideally you want the methods containing the logic to be instance methods of the class Subscription. You include the module SubscriptionLogic in the class and you avail all the functions in the module as Instance methods. Now you consider the case when an unsubscription fails - i.e, there is no existing subscription so that an unsub could take place. No way is it possible that you could have a function 'logic_after_unsub_fail' in the module SubscriptionLogic and use it as an instance method, simply because there is no instance available. You think and decide to use the function as your class method, but you have 'included' the module in your class and hence its not possible to use it as your class method. You cannot extend the entire module coz , ideally you want the logic to be instance methods.

So you could get this solved up by a simple piece of extra coding.

module SubscriptionLogic

def after_sub

      ....

end

def after_unsub

    ....

end

def after_sub_fail

    ....

end

module ClassMethods

      def after_unsub_fail

         ....

      end

      def self.included(base)

          base.extend(ClassMethods)    # base pertains to the class within which you include the module.

      end

   end

end

Now within the class insert this line.

class Subscription

   include SubscriptionLogic

   extend SubscriptionLogic::ClassMethods

   .....

end

Scribblings

Sunday, 18 December 2011

Git Reset, Revert, Merge Conflicts and the Case Of The Faulty Merge

Tuesday, 12 July 2011

Solution to 'Wireless disabled by Hardware switch'

Monday, 4 July 2011

RubyconfIndia 2011

Module functions as class and instance methods

Sunday, 3 July 2011

Back after a Long Gap...

About Me