I’m going to do something very dangerous – talk about specific raw, unanalyzed, and likely inaccurate statistics.
But I don’t know how else to combine radical transparency with my work of tracking and analyzing community health.
Does that mean someone is likely to read my posts and cherry-pick information that serves their own agenda? Perhaps. But I don’t choose to be open and transparent because it’s the risk-free position. I do it because it’s the better way to do things.
Do I expect some healthy criticism of my methods and results? Yes, I encourage it, and figure criticism will be more accurate and reasonable when the critic can view my methods transparently. I know I’m doing things incorrectly and getting things wrong. I need the eyes and help of others who care as much (or more) than I do.
Part of the better way is that I’m doing work on tracking community health directly in the communities themselves. The tools and data already belong there, so will the gathered statistics and (at least) base analyses. There is no opportunity or reason for me to keep all this a secret, but it is going to be a while until a coherent story unfolds. In the meanwhile, I’m making the proverbial sausage but without a recipe.
A website we’ve been working on at Red Hat is going live next week – and I’ll announce it when it goes live, but give us our rare Big Reveal without tittering, thankyouverymuch – and one part of that work is going to be based on these statistics I cooked up about Fedora Packaging today. At a minimum, this post is a reference for how those statistics came about.
Thankfully Toshio still had the script he wrote with Max a few years ago (2007!), and with some tweaks to the interaction with the Fedora account system (FAS), I got these raw, unanalyzed, and likely inaccurate statistics about who owns and co-maintains Fedora 17 packages.
Why am I so certain they are inaccurate? Simply, folks at Red Hat who work on Fedora don’t always use their @redhat.com email address. Unfortunately, this tool creates statistics based solely on the email address of the packager. We don’t have anything more than a wild guess about how many Red Hat people use a different address (most likely firstname.lastname@example.org but others from personal domains and mail hosting services.)
Is there anything I can do to make them more accurate? I need to make a mapping of the different email addresses Red Hat folks use, mapped to their main Red Hat account. This will help with sorting out this sort of detail.(Currently 659 accounts to check, not a terrible manual research job, just tedious.)
Why am I so concerned with who owns or maintains what packages? Aren’t Red Hat people community contributors, too? Darn tootin’ they are, but that’s not the point. I know people look at such statistics as a competition, but that’s not my goal. These are statistics that are all out there in the public, I’m just doing a job of gathering things together. I’m doing that job because it’s one useful way of knowing if projects are being successful. We all want to see a steady, sustainable growth in packages in Fedora, with a healthy balance of package ownership so that one person or one organization isn’t taking on too much for itself. I’d like to be able to give a more accurate account of who at Red Hat contributes to Fedora – I’m sure even Red Hat doesn’t know exactly how much effort goes in to Fedora (and other upstreams) from Red Hat folks.
What is my future plan for these specific statistics? I feel responsible for reporting these, now that I’m starting. I recall Simon Phipps once wisely saying, “Don’t start reporting on any statistics that you don’t want to report on forever.” I accept that, and that any changes to the reporting – the methods, sources, results, etc. – need to be highlighted and explained. (I also hope that by putting tools and methodologies directly in to the projects, others can be involved in the creation and delivery of statistics and reporting.) I’ll link out to everything I do from the canonical Fedora statistics page, and I’ll host tools, configurations, and documentation on Fedora Hosted and the Fedora Project Wiki. Anything that is generic will also get contributed to the Metrics Working Group (metrics-wg) of TheOpenSourceWay.org.
And now, the stats and how I got them:
./rhpkgers.py f17 maint.list users.list Total Packages: 12157 Total RH Maintainers: 386 Total NonRH Maintainers: 659 These stats are for people able to commit to the package. The first set disregards packages which are open for anyone to commit: Packages which have at least one Red Hat maintainer and packages which have at least one non-Red Hat maintainer: @redhat.com: 5330 !@redhat.com: 8575 Packages which have solely Red Hat maintainers, solely non Red Hat maintainers, and a mixture of both: solely @redhat.com: 3490 solely !@redhat.com: 6735 mixed redhat+!redhat: 1840 orphaned packages: 92 This set factors in the possible effects of open acls (ie: anyone in cvsextras can commit: Packages to which only Red Hat packagers can commit, only non Red Hat packagers can commit, or both: solely @redhat.com: 0 solely !@redhat.com: 0 mixed redhat+!redhat: 12157 Total Packages to which anyone can commit: 12154
- Get a list of all maintainers for all versions of Fedora (8.2 Mb file in the end):
curl https://admin.fedoraproject.org/pkgdb/lists/vcs?tg_format=plain \ > maint.list
- Get a list of all active user accounts (61201 accounts, 3 Mb file); substitute the USERNAME and PASSWORDwith those of any active FAS user (I used my own):
curl -d 'user_name=USERNAME&password=PASSWORD&login=Login' \ 'https://admin.fedoraproject.org/accounts/group/dump' > users.list
- Run the scriptspecifying the package version:
./rhpkgers.py f17 maint.list users.list
I will be getting a git repo started on Fedora Hosted soon to put relevant bits in, and I’ll update this post with a link to that repo when ready.