Starting with Part 1 of this series, I introduced the idea that free software is expensive to give away.
For example, in 2000, the Debian distribution would have cost nearly 1.9 billion dollars to re-write from the ground up.
It’s been almost 8 years since that report was generated, and Debian has somewhat faded from the limelight. Ubuntu has taken the lead as the most-used desktop Linux distribution since then, and I would like to roughly calculate how much Ubuntu costs to give away for no charge.
I will outline and detail my method and solution to this question in 8 steps:
- Download list of all programs in the Ubuntu repositories.
- Reformat the list, create shell script to download all source.
- Build dependencies.
- Download the actual source.
- Filter out the garbage.
- Use SLOCCount to count lines of code.
- Estimate the average 2007 programmer salary.
- Plug the numbers SLOCCount spits out along with all variables needed into the COCOMO model.
In this article, I will detail how I completed steps 1 through 4.
First, download the entire list in text format:
wget http://packages.ubuntu.com/gutsy/allpackages.en.txt.gz
I reformatted the list into a bash script that looks like the following using awk and OO.org Calc:
#!/bin/sh
sudo apt-get -y build-dep 2vcard
sudo apt-get -y build-dep 3270-common
sudo apt-get -y build-dep 3dchess
sudo apt-get -y build-dep 4digits
sudo apt-get -y build-dep 4g8
sudo apt-get -y build-dep 6tunnel
[...]
The -y flag keeps me from being prompted 27,163 thousand times. This simple switch saved me 54,326 keystrokes. 🙂 Building dependencies took many hours to complete.
Then I ran a similar script that started downloading all of the source code:
#!/bin/sh
sudo apt-get -y source 2vcard
sudo apt-get -y source 3270-common
[...]
Now comes the no so fun part. Once the source is downloaded, I have to make sure that I am not counting things that should not be counted, like readme files. Filtering the garbage out is likely going to be the most difficult part.
Thats crazy, I cant even imagine how long it’d take to get source for EVERY package.. Good luck with that 😛
Btw, I’ve added your rss feed to my site, hope this is okay, loving your work so far 🙂
Thanks for the positive comment SmashcraB – comments like yours make my day, that’s for sure.
It’s taking a long time.
It’s been going for 7 HOURS and I am barely 50% done.
This is on a 25 mbit/sec cable connection. Can you imagine it on 56k?
added u to my blog-roll, good going!
2 days later, this morning it was pretty close to being done downloading all the source code.