CSDb Analyses II

This time we need to take a more detailed look at what is going on behind the data. We need see what the distribution of various CSDb variables is like.

First of, the Released By field. Unfortunately, this can be anything and anyone. It can be a group or a scener. In addition, it can be coops between groups or persons. And this information can be incomplete on top of that. However, be that as it may, let’s see the distribution of the number of releases per unique Released By entity.

dis_releasedby_1.jpg Figure 1. Distribution of number of unique Released By entities by Number of Releases in CSDb.

As can be seen, there are rather a lot of Released By entities that have reportedly released only 1 product. In fact, out of a total of 8769 unique Released By entities, including Coops (joined like “No Name and Xentax” for example), 4582 entities released only 1 product. Coops, fake groups, one-day flies, sceners, what have you. Those are among the 4582. What the figure also shows, is how enormously skewed this distribution is. This is what should be expected and quite logical. After all, not every group has as much talent or time to release a lot, and not every group keeps at it for more than a few years. Indeed, Figure 2 shows what the proportions really are.

releasedby_crelease_1.jpg Figure 2. Percentage of groups versus percentage of releases entered in CSDb. The X-axis shows all found categories of total number of releases per unique Released By entity. The Y-axis (blue line) shows the (cumulative) percentage of the total number of Released By entities that fall into the X-axis categories, while the dark coloured line shows the (cumulative) percentage of releases NOT released by each X-axis category.

While the figure may seem puzzling at first it can be simply read as the percentage of Released By entities that are not responsible for a percentage of releases. The X-axis shows all observed total number of releases per Released By entities. So if Xentax released 50 releases, and two other groups as well, all three would fall into that category (“50″). Starting from entities that created only 1 product, and cumulating releases and showing them as the inverse of percentages of the total number of releases, and the dark line is what you get. I inversed it, to see where the lines would cross. This is depicted by the arrow. At that specific point, 82% of Released By entities are not responsible for about 82% of all releases. In other words, 82% of all Released By entities are responsible for only 18% of all releases in CSDb. Read as Groups, there are a lot of inactive groups out there. Now, the Released By variable is much distorted and noisy and inaccurate. However, this is what we have to go on at the moment. It is responsible for an huge skewness of data. Consequently, 82% of the Released By entries match the X-axis categories up to “7″ releases in total. So it is a bunch of groups, sceners, coops that only released up to 7 releases (1-7) in their lifetime!

The figure also identifies a number of stages when looking at the percentage of releases (dark line). I’ve named them Acceleration, Linear and Exponential, rather inaccurately, since it has nothing to do with Time, but you’ll catch my drift if I do.  I also wrote some number next to it. I’ve chosen some cut-off points by eye, to see the proportions. So in terms of groups/entities releasing a number of products, the higher the number of products (from 1 onward) the faster we reach high numbers of releases. This is caused by a steep increase in groups/entities that produce only little, but by being so much of them around, this is a steep increase in releases. This Acceleration phase is responsible for about 50% of all releases, while 8520 (97%) Released By entities were behind it! Then come the groups/entities that make of the Linear phase. These are 200 unique Released By entities that created 25% of all releases at CSDb. It is apparently linear, because each step up in total number of releases was rather linear. Then finally, the exponential phase. These are only 49 groups/entities that made the final quarter (25%) of all releases in CSDb. These obviously are groups that were around long, and when they were there, they produced vastly above average. Mind, this figure tells NOTHING about the quality of releases, nor the type of release. Just the quantity of releases. Of course, Triad tops this list, for being around very long and producing 1181 products, 800 of them Cracks. Still, 400 of these are demos, one-file demos etc. Certainly an active group, no doubt.

Skewness and Kurtosis of N, Start Year, End Year, Active Year and Period.

Next we take a look at the distribution of some interesting computed variables. Judging from the Released By variables and the Release Year variables, we can calculate when the first release of any Released By entity appeared (Start Year), when the last appeared (End Year), how many Active Years (at least 1 release in any year is an Active Year, between 1982 and 2010) there were for this Released By entity, and how long the Period was (year of first release and year of last release; intermittent inactive years are also counted in this one!).

descr_releasedby_1.jpg Table 1. Descriptives of Released By derivatives. Note the large variances and skewness, kurtosis.

The table above shows that the skewness and kurtosis of the number of releases by each Released By entity (N) is huge. Both should be 0 in a normal distribution, but the C64 scene as depicted by CSDb has nothing to do with normal distribution. This means that your average statistic like mean has no meaning here (pun absolutely intended). Ye olde mean means nothing in an abnormal situation, after all. In fact, looking at the above table, there are two variables that more or less come near to 0 in terms of skewness and kurtosis. Those are Start Year and End Year. Let’s take a look at their histograms.

startyear_1.jpg Figure 3. Distribution of Start Year for Released By entities.

endyear_1.jpg Figure 4. Distribution of End Year by Released By entities.

As you can see, both the Start Year and End Year show a more or less normal pattern in the first era of the scene.