Tuesday, October 02, 2007

I'm a Git (User)

Ok, I'm not a git. At least I don't call myself so. I just wanted to say I'm now a devoted Git user (yet novice). There are quite a lot of articles about Git you can find by Googling, so I'm not going to give a detailed technical document. Instead I'll just say why I like Git and why I'm going to stick with it. Some of the things I state are features common to distributed Version Controlling Systems (VCS). According to my experience and according to what I've heard Git excells in these features in a way no other does. For example performance alone is enough to win over users to Git. There are quite a few benchmarks of Git compared to other VCSs.

I first heard of Git when Linus Torvalds announced that he was going to write a new Version Control System in 2005. A few days later the project was announced and in weeks it was out. After reading about it in the Net, I heard that it was aimed at kernel developers, to who I obviously had nothing to do with. Time passed and I never gave git a try. This year I got to see a speech of Linus for Google Tech Talks. After watching it I was really, really convinced that it was time I try it. After all , I felt great about the concept of a distributed version controlling system.

Distributed VCS concept was new to me. All I knew was I really liked Subversion (svn) which I felt was a lot user friendly than CVS. For that matter, it isn't only me. I know for a fact even the largest software firms in Sri Lanka use Subversion internally (yes, even with .NET development. :) More and more people are now switching to svn from cvs. So I was feeling good about it and I told a few people also to switch to Subversion. However Linus's speech sent shudders on my beliefs and understanding about VCSs. And then I started learning about distributed VCS. As you know that I can be obsessive when learning about new technologies, it didn't take me too long to learn about a couple of them. These included Git, Mercurial, Monotone and Bazaar.

After some more researching I realized that if I was to start using a VCS seriously, I'd choose between Git or Mercurial (hg). I cloned a few repos using both git and hg and started using them to see about branching and merging and so on. So here I am, in the Git camp today. Here's a few things why I like Git. Some are my own observations, and some are not thoroughly experienced by me, so I rely on Internet sources.


For one thing I can't help but to just admit that Git is blazingly fast. Not hyping you, go and see for yourself. After using Git, anything longer than seconds in a VCS looks pathetic now. :) If you love Linux and the command line, most probably you are going to love git too. For GUI tools, Git have gitk and git-gui among other things. Some Git GUI tools are even used with Mercurial, too.

Some degree of performance is contributed from the fact that some operations on Git are local oppsed to remote operations in centralized systems. But even among distributed systems like Mercurial, Bazaar, Monotone, etc. Git stands high with it's performance. See the above benchmarks for proof.

Work Offline:

With Git I can clone a repo and go home to work with it. There's no need to be connected to a server to see a log or anything else related to metadata. In Git, when you clone a repo, you literally clone it. Which means you have a complete repository with all the history and so and so. Which also means if I clone Linus Torvald's kernel repo on my notebook, it is no less than the official repo (not that I'm also a kernel hacker :)

Working offline is a huge thing for me. I can clone a repo go anywhere I want and continue using it.

Creating a Repository is nothing:

This is another big reason for me to love Git. To create a repo in a directory, I'd just

$ cd myproject
$ git init
$ git add .
$ git commit -a

Just like that I'm ready to go. No need to set up servers, no checking server/network configurations, no imports and no etc., etc. of grunt work. I can just jump in and start working.

Give this ease of creating and using Git repos, I pretty much create repos for anything I'm working on these days. For example if you are working on a research paper, you may well be able to take advantage of Git to keep track of the changes you make. It can be code, research paper, novel, etc. but Git will be able to create a repo to track your content in a snap.

No overhead for SysAdmins:

In a traditional VCS, there's a SysAdmin overhead for creating and maintaining a server, Then there are other important things like backups and security. But with Git you can create as much repositories for you without even thinking about the SysAdmin. You get to keep your repos behind several security measures. Backing up a repo can even be done by the developers.

Since all commits are stored with a SHA-1 hash, any corruption (due to filesystem corrution, hard disk failure, etc.) can be easily traced. Even if you get to loose a repo, most probably someone has cloned it already. Which means you have everything including history and metadata safe. Nice huh?

Branching and Merging a reality:

I have never tried to use branches on svn or cvs servers, mainly I never got the point of being eligible of doing it. :) But people say it's hard,.... I mean put a pen though your eye hard. I've actually tried a few simple things with svn and had no clue how to get past certain things. But with Git branching and merging (if you can't merge back, branches are not much of a use) is easy.

This means if I want to test something, I don't have to plan for weeks in fear. I just can create a branch, work on it and merge when it's ready, without being a nuisance to the project maintainer. I know for real projects this is a big factor.

Commit Access Always:

Although I'm not going to talk about the underlying project politics, I have to talk about how I feel about this commit access thing. It may be ot may not be good to have a separate group who have commit access. But as a developer what affects me most is the thought "what if my commit breaks things?". So I tend to keep my commits till I'm sure of it, and till I have something a little large to commit (that is, if I have commit access).

With Git I don't have to worry about that because I commit to my own repo. And I don't have to have special commit permission from a project leader. I just can commit and keep working. I can commit early and often. When I'm ready with my code I can ask the project leader to pull from my repo or I can push to her repo.

Distributed or Centralized:

Although Git is a distributed system, there's no restriction how you use it. It can be pure distributed or it can be used in the same spirit we use todays centralized systems, but with a twist. We get all the facilities of a distributed VCS, if we know how to use it. Many people believe that a centralized VCS is a must for their project. But after giving some thought I'm beginning to think that in most cases, they can switch to a distributed model without much fuss. This is no accident. If you give some thought, you might see that a centralized system is a special formation of a distributed system. So think again. :)

Lets consider a scenario. For this example lets say I started a project called RailzCRM. I'm the lead developer + maintainer. So I set up a public Git repository in somewhere like ShareSource (they are still working on to set up Git support). I have my working repository on my notebook PC and I can commit as frequent as I like to my own repo. I'll pull from my trusted fellow developers or they'll push to my repo. They in turn will work with their trusted sources and online contributors via push/pull, patches, etc.

When I think my repo is ready, I'll just push my changes to the public repo and then merge them. Then the public can get the latest code of our official development tree. This way a peoples hierarchy can be created if we want, but I can get the benefit of distributed development.

Interoperability with Other VCSs:

Git can work with many other VCSs without making any fuss. End users may most probably not realize that Git is involved. Git can work with CVS, Subversion, Mercurial among other things. For example git-cvsserver can be used to enable end users work with existing CVS clients without problem. Git can import from a good number of VCSs too.

A Brief History of Git:

Git is a version/revision control system software used in the process of software development. Git was started by Linus Torvalds (yes, the same Linus) when he had to stop using BitKeeper, a commercial VCS (Version Control System). BitKeeper was used by Linus and several other Linux kernel developers, to track the kernel source. However there was a fiasco (not a fiaSCO) about licensing which made the Open Source community crave for an alternative to BitKeeper (actually a distributed VCS).

They started considering the available Open Source tools. According to Linus it wasn't much of a trouble. Everything other than Monotone was ruled out early. I haven't heard about Bazaar, but looks like Mozilla people ruled it out later on performance issues. Even Monotone was ruled out at the performance. Then Linus thought that he could write something better than any VCS around, in 2 weeks. It looks like he was right, yet again.

Although Linus started the project, now it's under the patronage of Junio Hamano. Lot of fuss has been made about Git not being a general purpose tool in earlier days since it was started targeting the kernel. However Git has come a long way since and undoubtedly is one of the best VCSs around now.

The only major drawback there as I see is the lack of support on Windows platform (which I don't care much, but obviously not all people think so :). There already are ports for Windows which people claim they are using at their work without any trouble. Even it there are troubles coming up, I believe full Windows support will be available very soon. Anyway just to let you know, Git on Linux is blazing fast and way fast than any other option.

So..... that's all for now folks, on Git. I wrote this in a bit hurry. So I won't be surprised to find mistakes and things. Just let me know. I'll fix them later.

And oh........... I also hope that the projects I keep track of like Nmap and MPlayer will switch to Git. Yes, that's a hope. :)

Here's a basic diagram which might help you to grok the concept.

Monday, October 01, 2007

ShareSource.org - The Next SourceForge?

Leave a Comment
ShareSource is a new site providing hosting facility to FOSS projects. As you might already know, SourceForge is the most popular choice for this type of service. SourceForge (owned by SourceForge Inc, earlier know as VA Systems) is without a doubt the largest FOSS hosting provider. It has been and still continue to be the trail blazer in FOSS project hosting. However I see a lot of promise and potential in ShareSource. Only downside I see is, the name reminds of a Microsoft license. :) ShareSource is maintained by Tim Groeneveld. I guess this is the same Tim Groeneveld who created AgeanLinux (never used it) because aegeanlinux.org now points to ShareSource.org. It's still way too early to say whether ShareSource can be the next SourceForge, but it's definitely worth watching.

The first thing you'll notice is the faster loading of the site (compared to SourceForge). It's interface is simple and nice, and is fast. ShareSource provides many nice services such as VCS support for Mercurial, Subversion and to my utter happiness (very soon) Git. It provides a release management mechanism, bug tracker and so on. True, it has a lot to catch if it is to reach the SourceForge standards, but ShareSource is improving very quickly. It's only 3 months old roughly and already has more than 90 projects registered.

I liked the site and I'm hoping to give it a try. Honestly, I feel like continuing with ShareSource after seeing the nice and flexible features they provide. So I invite you, all the FOSS developers out there to try ShareSource.org too. And let's not forget to give Tim a big round of applause for starting ShareSource where FOSS projects can find a (quality) home for free. Kudos.