Don't understand german? Read or subscribe to my english-only feed.

Use of VCS in Debian packages – some stats

Everyone loves stats, ok well – at least I do. I was doing some research with regards to package maintenance within the Debian distribution and since the results might be interesting for someone else – there we are.

On 19th of August 2011 there have been:

  • 16935 unique source packages in Debian/sid
  • 9977 packages with Vcs-* field in Debian/sid
  • 6957 packages without a Vcs-* field in Debian/sid

Therefore ~59% of all packages in Debian/sid are officially managed with a version control system (VCS). Now, which VCS do those packages use?

  1. Svn: 4939
  2. Git: 4377
  3. Darcs: 284
  4. Bzr: 247
  5. Hg: 61
  6. Cvs: 31
  7. Arch: 28
  8. Mtn: 10

I’ve retrieved the numbers from the Ultimate Debian Database (UDD). Sadly there’s a bug in UDD regarding the Vcs-Type information, see #637524. Therefore I’ve extracted a list of 80 packages where a Vcs-Browser header is available but the Vcs-Type entry is empty in UDD. 29 packages of them are managed inside CVS but don’t appear as such in UDD, so I manually corrected the number for CVS in the numbers above. The remaining 51 packages have a Vcs-Browser field set but lack the according Vcs-* entry, some of them pointing to upstream VCS instead of the according Debian package repository, some of them result in 404 errors, etc. As a result I’ve reported bugs where applicable (#638466, #638468, #638469, #638470, #638471, #638472, #638474, #638475, #638476, #638477, #638479, #638482, #638486, #638488, #638493, #638497, #638501, #638475, #638475, #638502, #638503, #638505, #638506, #638508, #638509, #638510, #638511, #638512, #638513, #638516, #638518, #638519, #638520, #638522, #638523, #638524, #638525, #638526, #638527, #638528, #638529, #638530, #638516, #638531).

Disclaimer: I found Debian’s Statistics wiki page and Zack’s VCS usage stats after starting to play with my own stats. AFAICT Zack’s slightly higher numbers are the result of looking at multiple versions for the same source packages, as you’ll see when comparing numbers from UDD’s sources_uniq view (which I used) with either 1) UDD’s sources table, 2) source table count from projectb or 3) Package count from http://$DEBIAN_MIRROR/debian/dists/unstable/{main,contrib,non-free}/source/Sources.bz2.

Conclusion: 9316 packages are officially managed with Subversion and Git as of today, representing ~94% of the VCS managed packages. This means ~55% of all the Debian (source) packages are available through either a Git or Subversion repository – and that’s actually the number I was originally interested in.

Thanks to Alexander Wirt, Christian Hofstaedter, Gerfried Fuchs, Jörg Jaspert and Michael Renner for hints in forming up the final stats results.

6 Responses to “Use of VCS in Debian packages – some stats”

  1. bremner Says:

    FWIW the balance will shift to git as pkg-perl uploads its roughly 2000 packages.

  2. mika Says:

    Hi David,

    I’m aware of this step (I attended your nice Git Packaging talk and the discussions afterwards at DC11 :)) and am looking forward to this. Would also be interesting to have team-based stats in general, though that’s still on my todo list. ;)

    Thanks for mentioning!


  3. Stefano Zacchiroli Says:

    JFTR, I confirm your diagnosis: my stats show higher number due to duplicate count of multiple versions of the same package. I guess the code has been written before we allowed them in a single Sources file and has never been updated to be correct. If some kind soul wants to fix it (which should be trivial), the bug is clearly here. (git format-)patches welcome!

  4. mika Says:

    Hi Zack,

    thanks for verification. :)


  5. Gio Says:

    Wonderful, apparently the Haskell Team manages (nearly) alone to put Darcs at the third place, just after the two unbeatable Git and Svn. :-)

    Thanks for the stats.

  6. mirabilos Says:

    Oh well… I just need to upload another 31 packages to take over
    the next place ;)