CZ:Statistics: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Aleksander Stos
(→‎Progress in time: yet this..)
imported>Aleksander Stos
(updated)
Line 3: Line 3:
Since its 2006 inception and official launch (March 28, 2007), Citizendium has grown. This page provides statistics on Citizendium's output of articles and its contributor base.<ref>The graphs have been produced using the publicly available data from the history of edits of all Citizendium pages. Concerning the  comparison with the Wikipedia, the "stub-meta-history" dump files were used (see the appropriate [http://download.wikimedia.org/backup-index.html subpages from this index]).</ref> Our meta-discussions take place on the [http://forum.citizendium.org/index.php forum], the relevant statistics page is [http://forum.citizendium.org/index.php?action=stats here].
Since its 2006 inception and official launch (March 28, 2007), Citizendium has grown. This page provides statistics on Citizendium's output of articles and its contributor base.<ref>The graphs have been produced using the publicly available data from the history of edits of all Citizendium pages. Concerning the  comparison with the Wikipedia, the "stub-meta-history" dump files were used (see the appropriate [http://download.wikimedia.org/backup-index.html subpages from this index]).</ref> Our meta-discussions take place on the [http://forum.citizendium.org/index.php forum], the relevant statistics page is [http://forum.citizendium.org/index.php?action=stats here].


The statistical charts below draw upon data between October 22, 2006 and October 28, 2007. Note that this implies that the data for this month may be incomplete.
The statistical charts below draw upon data between October 22, 2006 and November 1, 2007.


==Pages==
==Pages==
Line 35: Line 35:
The following graphs describe the CZ human resources.  
The following graphs describe the CZ human resources.  


* How many authors edit each month? The Fig. 6 presents the number of users that made at least one  edit (separately for each month).  Note that this does not include the entire month of October.
* How many authors edit each month? The Fig. 6 presents the number of users that made at least one  edit (separately for each month).   




[[Image:Editing_users.jpg|thumb|center|600px|Fig. 6. Editing authors]]
[[Image:Editing_users.jpg|thumb|center|600px|Fig. 6. Editing authors]]


* How many users are active? If by "activity" we define at least 20 edits per month, and by "high activity" we understand at least 100 edits per month, then the answer is given by the Fig. 7 below.  Note that this does not include the entire month of October.
* How many users are active? If by "activity" we define at least 20 edits per month, and by "high activity" we understand at least 100 edits per month, then the answer is given by the Fig. 7 below.  




Line 46: Line 46:


===Daily use===
===Daily use===
How many users you could meet here daily? While correlated with other HR measures, this one seems to be interesting since it shows how many people make the community on a daily basis. See the figure below.  Note that this does not include the entire month of October.
How many users you could meet here daily? While correlated with other HR measures, this one seems to be interesting since it shows how many people make the community on a daily basis. See the figure below.  




Line 52: Line 52:


===New arrivals===
===New arrivals===
Fig. 9: How many new authors arrive each month? This can be measured by counting new user pages. More substantial metric would be, however, to detect a new user on his first edit. Notice that in the period of self-registration (essentially, one week in January and two weeks in February 2007) the two metric largely coincide, as the new users were supposed to provide their bio.  There was also a spike in March, which continued into April, due to our launch.  New arrivals have been almost exclusively the result of press coverage, of which there has been relatively little over the summer, since our public launch.  There were also fewer arrivals in the summer, probably due to the lower amount of academic activity generally.  We do expect another spike soon as we do a press release and initiate a formal and permanent recruitment program--our first serious such program.  Note that this does not include the entire month of October.
Fig. 9: How many new authors arrive each month? This can be measured by counting new user pages. More substantial metric would be, however, to detect a new user on his first edit. Notice that in the period of self-registration (essentially, one week in January and two weeks in February 2007) the two metric largely coincide, as the new users were supposed to provide their bio.  There was also a spike in March, which continued into April, due to our launch.  New arrivals have been almost exclusively the result of press coverage, of which there has been relatively little over the summer, since our public launch.  There were also fewer arrivals in the summer, probably due to the lower amount of academic activity generally.  We do expect another spike soon as we do a press release and initiate a formal and permanent recruitment program--our first serious such program.   


[[Image:New_users.jpg|thumb|center|600px|Fig. 9. New arrivals]]
[[Image:New_users.jpg|thumb|center|600px|Fig. 9. New arrivals]]
Line 129: Line 129:


===Progress in time===
===Progress in time===
Here we graph the  number of articles in various workgroups vs. time. For the time being they are scaled in days, starting from Oct 22, 2006 to Oct 28, 2007.
Here we graph the  number of articles in various workgroups vs. time.  


Remarks.
Remarks.

Revision as of 02:38, 2 November 2007

Citizendium Article Lists
CZ Articles | Subpages | Main Articles with Citable Versions | Citable Versions | Developed | All pages (incl. non-articles)

|width=10% align=center style="background:#F5F5F5"|  |}

Since its 2006 inception and official launch (March 28, 2007), Citizendium has grown. This page provides statistics on Citizendium's output of articles and its contributor base.[1] Our meta-discussions take place on the forum, the relevant statistics page is here.

The statistical charts below draw upon data between October 22, 2006 and November 1, 2007.

Pages

Number of articles and pages

The first graph shows the number of articles (technically speaking, all pages from mainspace without redirects), including articles that are not "live."[2]

Fig. 1. Number of articles

The second graph shows number of all pages from all namespaces (e.g. userpages, talk pages and images are included, redirects are _not_). This is the green line. The blue line is the one from the first graph (i.e. the mainspace pages). What happened about 125th day? It was Saint Valentine's, 14/02/2007, when after slashdotting many new users registered (and were welcomed on their talk pages!). Notice that at the same time there was no parallel growth in the mainspace. Apparently, the newly registered users were mainly watching, since at that time there was no unregistered access. Again, a more stable growth rate has been established after the launch.

Fig. 2. Number of all pages from all namespaces without redirects (green) and articles (blue)

Rate of article and page creation

The third and fourth figure present "global creation rate". It measures somehow the activity on the wiki expressed in new pages per day. The rate for "pure" articles (technically: mainspace without redirects) is depicted in blue; the green line corresponds to all pages (still, without redirects). This is calculated as the number of articles (pages, respectively) divided by the number of working days from the beginning. Obviously, this is a "global average", to be compared with a recent creation rate on the 5th graph of this section. It represents the creation rate for articles taking into account last 30 days only.

Fig. 3. Creation rate (articles per day)
Fig. 4. Overall creation rate in pages per day (green line); the blue line corresponds to articles' creation rate
Fig. 5. Recent creation rate (last 30 day counted)

Edits daily

The number of edits is highly varying from one day to another and the graph of the actual data is hardly readable. More meaningful is the 30 days moving average[3] depicted below. Trends are easily visible. The price for readability is a little shift from the actual events: the changes on graph appear a few days after it happened. For example the impact of the launch that occurred on 157th day can be observed here a bit later. The graph takes into account edits in all namespaces.

Fig. 5a. Edits by day, 30 days moving average.

Human resources

Number of authors

The following graphs describe the CZ human resources.

  • How many authors edit each month? The Fig. 6 presents the number of users that made at least one edit (separately for each month).


Fig. 6. Editing authors
  • How many users are active? If by "activity" we define at least 20 edits per month, and by "high activity" we understand at least 100 edits per month, then the answer is given by the Fig. 7 below.


Fig. 7. Active authors

Daily use

How many users you could meet here daily? While correlated with other HR measures, this one seems to be interesting since it shows how many people make the community on a daily basis. See the figure below.


Fig. 8. Authors daily

New arrivals

Fig. 9: How many new authors arrive each month? This can be measured by counting new user pages. More substantial metric would be, however, to detect a new user on his first edit. Notice that in the period of self-registration (essentially, one week in January and two weeks in February 2007) the two metric largely coincide, as the new users were supposed to provide their bio. There was also a spike in March, which continued into April, due to our launch. New arrivals have been almost exclusively the result of press coverage, of which there has been relatively little over the summer, since our public launch. There were also fewer arrivals in the summer, probably due to the lower amount of academic activity generally. We do expect another spike soon as we do a press release and initiate a formal and permanent recruitment program--our first serious such program.

Fig. 9. New arrivals

Comparison to other wikis

How does the statistical data shed light on Citizendium's strength in terms of human resources? Since April 2007 is the first month after the wiki's official launch, it is instructive to compare Citizendium with several active projects to similar size and mission. In the chart below, Citizendium is compared to several language Wikipedias. This analysis counts the registered users of each site.[4]

Fig. 10. CZ - WP human resources comparison

As of April 2007, the human resources of CZ are comparable to resources of these Wikipedias from the category "more than 25,000 entries" [5]. For example, CZ would be of the same order of magnitude as hr.wikipedia.org, lt.wikipedia.org, sl.wikipedia.org (these were slightly smaller) or sr.wikipedia.org (this one was a bit bigger than CZ). As a sidenote, there were not many active IP anons on these wikis (about 10), roughly as many as robots that were taken into account. Notice also that there are 24 Wikipedias altogether in the categories "more than 50000", "more than 100000" and "more than 250000" entries.


Structure of articles and workgroups

Checklisted articles

Recall that we categorize the articles as follows

  • External (imported and not yet improved)
  • Stubs (no more than few sentences)
  • Developing (beyond a stub but incomplete)
  • Developed (complete or nearly so)
  • Approved (that's it!)

It turns out that external articles and stubs are less numerous than developing, developed and approved ones.

Structure of articles.

Articles by workgroup

Articles by workgroups

Members by workgroup

Authors
Registered editors

Progress in time

Here we graph the number of articles in various workgroups vs. time.

Remarks.

  • The legends show the number of articles as of June 9. This may slightly differ from what was presented above due to constant changes in the database.[6]
  • Some classes (e.g. Applied Arts an Sciences) are too "crowded" and for technical reasons different workgroups are shown with the same colour. Then, the associated number of articles allows the identification.
  • Topic Informant Workgroup, regarding living people, was traced within Humanities.
Natural Sciences.
Social Sciences.
Humanities.
Arts.
Applied Arts and Sciences.
Recreation.

Word count

As of end of July, 2007, Citizendium contained about 4100K words in its articles. A typical article was about 562 words long.[7] In fact, this is the median size, which means that, at the time, half of our articles were longer. There were about 3170 clusters.

As of end of August, 2007, Citizendium contained about 4420K words in its articles[8]. The number of clusters increased by about 300 (to about 3480), the median size was 551.

As of end of September, 2007, Citizendium contained about 4600K words in its articles.[9] The number of clusters increased by about 300 (to about 3770), the median size was 511.

As of end of October, 2007, Citizendium contained about 4900K words in its articles.[10] The number of clusters increased by about 430 (to about 4200), the median size was 468. Apparently, new articles belong in stubs.

Notes

  1. The graphs have been produced using the publicly available data from the history of edits of all Citizendium pages. Concerning the comparison with the Wikipedia, the "stub-meta-history" dump files were used (see the appropriate subpages from this index).
  2. Here we do not count the subpages, but the clusters. We are working on a presentation taking the subpages into account.
  3. That is the average calculated for every day, taking into account the 29 preceding days.
  4. Although the analysis excludes IP anonymous users, globally such users to not make too many edits (8-15%, depending on the wiki) and rarely an IP is really active (makes more than 20 edits). Excluding those active IPs is somewhat compensated by the fact that, for the sake of simplicity, we count Wikipedia 'robots' as regular users.
  5. As listed on the Main Page of the English Wikipedia
  6. More precisely, the above "Articles by Workgroups" graph was created by simple counting the workgroups' categories at a moment in time. To create the "progress in time" graph, however, this is not enough. One should additionally look at the history of groups of articles and this involves a dump file created on another occasion. Only articles present at both moments in time can be traced back. The difference, however, should not be significant, if any.
  7. We do not count the tables, nor "infoboxes". Obvious technical informations, as e.g. categories or http links are not counted. Draft pages are excluded. As an "article" we consider here the set of pages describing given subject (i.e. the cluster, the basic unit of Citizendium).
  8. This means 10K words per day last month. Ekhem, how much did you contributed to that and why so little? ;-)
  9. This means about 6K words per day last month.
  10. This means about 10K words per day last month.