Giving Away Software For Free Costs More Than You Would Think (Part 2)

Starting with Part 1 of this series, I introduced the idea that free software is expensive to give away.

For example, in 2000, the Debian distribution would have cost nearly 1.9 billion dollars to re-write from the ground up.

It’s been almost 8 years since that report was generated, and Debian has somewhat faded from the limelight. Ubuntu has taken the lead as the most-used desktop Linux distribution since then, and I would like to roughly calculate how much Ubuntu costs to give away for no charge.

I will outline and detail my method and solution to this question in 8 steps:

  1. Download list of all programs in the Ubuntu repositories.
  2. Reformat the list, create shell script to download all source.
  3. Build dependencies.
  4. Download the actual source.
  5. Filter out the garbage.
  6. Use SLOCCount to count lines of code.
  7. Estimate the average 2007 programmer salary.
  8. Plug the numbers SLOCCount spits out along with all variables needed into the COCOMO model.

In this article, I will detail how I completed steps 1 through 4.

First, download the entire list in text format:

wget http://packages.ubuntu.com/gutsy/allpackages.en.txt.gz

I reformatted the list into a bash script that looks like the following using awk and OO.org Calc:

#!/bin/sh
sudo apt-get -y build-dep 2vcard
sudo apt-get -y build-dep 3270-common
sudo apt-get -y build-dep 3dchess
sudo apt-get -y build-dep 4digits
sudo apt-get -y build-dep 4g8
sudo apt-get -y build-dep 6tunnel
[...]

The -y flag keeps me from being prompted 27,163 thousand times. This simple switch saved me 54,326 keystrokes. :) Building dependencies took many hours to complete.

Then I ran a similar script that started downloading all of the source code:

#!/bin/sh
sudo apt-get -y source 2vcard
sudo apt-get -y source 3270-common
[...]

Now comes the no so fun part. Once the source is downloaded, I have to make sure that I am not counting things that should not be counted, like readme files. Filtering the garbage out is likely going to be the most difficult part.

Article Index:
Part 1: Introduction
Part 2: Method
Part 3: Conclusion

  • http://smashcrab.org SmashcraB

    Thats crazy, I cant even imagine how long it’d take to get source for EVERY package.. Good luck with that :P

    Btw, I’ve added your rss feed to my site, hope this is okay, loving your work so far :)

  • http://www.fsckin.com/ Wayne

    Thanks for the positive comment SmashcraB – comments like yours make my day, that’s for sure.

    It’s taking a long time.

    It’s been going for 7 HOURS and I am barely 50% done.

    This is on a 25 mbit/sec cable connection. Can you imagine it on 56k?

  • venkatesh nandakumar

    added u to my blog-roll, good going!

  • http://www.fsckin.com/ Wayne

    2 days later, this morning it was pretty close to being done downloading all the source code.

  • Pingback: Giving Away Software For Free Costs More Than You Would Think (Part 3) | fsckin w/ linux