- #cemetech Channel Stats
- 11 Jul 2012 12:13:34 am
- Last edited by _player1537 on 11 Jul 2012 03:26:57 pm; edited 1 time in total
I decided to go through my logs today and create some general stats on the things we've said over the past year (almost, 11 months). These are just my own, personal logs, so there were a few downtimes, but nothing more than a week or two, I believe. I still think that they are pretty accurate, though.
I wrote a Perl script (Which I can share if you want, but it's very tailored to my own log format) to do all of these. And I'll go over what each one of these means, in case it's not clear.
Also, I just remembered about the Laughing Out Loud filter, so anywhere you see a 0x5, mentally replace it with l o l, since that's what actually shows up in the logs. (Except for one place, and I've made a note there)
Code:
We average about 53 seconds in between each person saying something, and get a total of ~1624 lines per day.
Code:
These are the lines that people have said on their own. In other words, there have been 2261 times that someone has said "lol" on its own line, 1374 times for "", and so on.
Code:
Pretty self-explanatory. Kerm's said 54004 lines, Ashbad: 25421, Me: 21117, etc.
Code:
This one is meant to count the number of times people have highlighted another. A highlight counts as (And this is from my code):
Code:
Which means that any time someone says "KermM, [. . .]" or "Benryves: [. . .]" it will match. It also means that any time someone does "http://cemetech.net" it gets matched (I could take that out, but it's kinda interesting). And the number in brackets in the number of times it's been said before, like the others.
Code:
I felt the SAX users were under-appreciated, so I did a check for them, too. These are the number of lines each user from SAX has said.
Code:
These are also from SAX. They're similar to the first set, at the top of this post.
Edit:
I checked for runs, smileys, and the most-used words overall. I also added Merth's DecBot links to the program, so now all of the variants of "Kerm" contribute to his count. I left the old numbers in there so that everything still makes sense, but the ones above them are more accurate.
Code:
A "run" is defined as being when the same user says something 4 times in a row without being interrupted during a period of time. For short runs, it's 1 minute; medium - 3 minutes; long - 5 minutes. And, runs aren't counted twice, so if you say 5 things, it still counts as 1 run.
Code:
Kerm wanted me to see how long the lines were for people who did runs (Since he did them more than Geekboy). This is the number of characters on average.
Code:
Since I already had the run data for line lengths, I figured it'd be interesting to see what the longest line lengths were for it. This one is also the number of characters on average.
Code:
This one just counts the number of words that are actually smileys. I have some terrible regular expression that matches as many smileys as I could think of, including backwards (To get all those fiends that do smileys like "(:").
Code:
These are the people that used smileys the most. I just realized that it counts people multiple times if they say "something like this :/".
Code:
No surprises here. This just splits all the words we've ever used and counts their occurences.
EndEdit
Observations:
:: Ashbad says a freaking TON of things. In total, he's said 50345 lines. That's only 3659 lines less than Kerm has said from IRC, and 4916 lines less than Kerm's said total. Ashbad accounts for 9% of the lines for #cemetech.
:: Originally, before I took out all the "USER has entered the room." or "USER has logged in" lines, Souvik ranked up at the top. "(C) *Souvik has entered the room." showed up 2717 times, 456 more times than "lol".
:: Kerm does a lot in this channel. That's to be expected, obviously, but it's neat to see the data prove it, too.
:: Thank GOD "xD" didn't show up anywhere near the top. I'd have to stab something if it ranked #1, or even #6. I'm not too happy about "lol" being up there, but I don't suppose there's much to do about that after The Rise and Fall of L O L (And, I'm guilty of using it, too).
:: I am kind of suprised about the smileys being ranked so highly. "" being #2 and #3, and "" being #5 in the lists. I expected them both to be around #6.
:: I also think it's interesting that the people who spoke the most tended to be intelligent. Or, at least, Kerm says high-quality things a lot; Ashbad also does, for the most part. I like to think that I do, but I know a lot of it can just be rambling. And then there's Aes, Seana, and Qazz. Aes has been missing for a couple weeks/months. Seana says random stuff all the time (<3). And Qazz... is Qazz. I find it weird that there's this divide in that sense.
:: I added a quick check to see how many unique messages there are in the channel: 372229. ~66%, or 2/3 of our channel is completely unique in that sense! The other 1/3..? I guess a lot of it has to do with "lol" and "", along with the ones further down the list (I only listed the top 6 for each category, but I assume that 7-15 are going to be high numbers, too).
Edit:
:: I like smileys. I find it kind of funny that I used a little of half as many smileys as Kerm used. Also, about 20% of the things I've said have at least one smiley in them.
:: Another interesting thing is that Kerm does more runs than Geekboy. Almost 4 times as many. He also only uses ~1.4 characters more than Geekboy does, on average.
:: Who is this "jared__" guy? Whoever he is, he must have been telling a story (perhaps to himself!)
:: I am very, very happy to see that "XD" or any variant isn't in the top 6 for smiley usage. I was pretty worried it would be #6, but we only seem to use colon-smileys.
:: Merth jumped up to #3 on the Most Highlighted Users list! He was right about being about half-and-half when it comes to which nickname he used ("Shaun" or "Merth").
:: The number of lines that have "" in them make up 3% of our unique messages. 7% of our total messages have at least one smiley in them (from the top 6 list).
:: The most-used words overall are still the most-used when you include punctuation words. I originally had some regex in there that would get rid of the punctuation, but when I removed that line, the rankings stayed the same:
Code:
:: I also got the regex wrong for smileys the first time, and found out that Aes used a smiley "<<<.>>>" one time when his nickname was "Aes_santa".
EndEdit
What does everyone else think about this? If you want me to add any other tests or post the source code, reply and I will. I find this sort of thing really fascinating, and I'm sure several of you do too. I also want to try to make these numbers as correct as possible, so if you think I might have messed up on my calculations, please tell me.
I wrote a Perl script (Which I can share if you want, but it's very tailored to my own log format) to do all of these. And I'll go over what each one of these means, in case it's not clear.
Also, I just remembered about the Laughing Out Loud filter, so anywhere you see a 0x5, mentally replace it with l o l, since that's what actually shows up in the logs. (Except for one place, and I've made a note there)
Code:
Average response time: 53.1253440333749s, 0.885422400556248m, 0.0147570400092708h, 0.00061487666705295d
Average messages per day: 1624.32941176471/day
We average about 53 seconds in between each person saying something, and get a total of ~1624 lines per day.
Code:
Most used messages:
[2267] >> lol
[1375] >> :p
[1346] >> haha
[1209] >> ah
[1202] >> ok
[1198] >> heh
# OLD
Most used messages:
[2261] >> lol
[1374] >> :p
[1345] >> haha
[1209] >> ah
[1202] >> ok
[1198] >> heh
These are the lines that people have said on their own. In other words, there have been 2261 times that someone has said "lol" on its own line, 1374 times for "", and so on.
Code:
The users that spoke the most are:
[70542] >> kermm
[25835] >> ashbad
[21495] >> catherine
[21427] >> aes_sedia5
[21194] >> merth
[18725] >> forty-two
# OLD
The users that spoke the most are:
[54004] >> kermm
[25421] >> ashbad
[21117] >> catherine
[18215] >> aes_
[14753] >> forty-two
[14231] >> qazz42
Pretty self-explanatory. Kerm's said 54004 lines, Ashbad: 25421, Me: 21117, etc.
Code:
The most highlighted users are:
[7238] >> kermm
[3567] >> http
[2667] >> merth
[2560] >> well
[2349] >> catherine
[2303] >> benryves
# OLD
The most highlighted users are:
[4493] >> kermm
[3567] >> http
[2556] >> well
[2303] >> benryves
[2287] >> catherine
[2144] >> oh
This one is meant to count the number of times people have highlighted another. A highlight counts as (And this is from my code):
Code:
$msg =~ /^([^ ]+)[:,]/
Which means that any time someone says "KermM, [. . .]" or "Benryves: [. . .]" it will match. It also means that any time someone does "http://cemetech.net" it gets matched (I could take that out, but it's kinda interesting). And the number in brackets in the number of times it's been said before, like the others.
Code:
The SAX users that spoke the most are:
[24929] >> ashbad
[2998] >> sarah
[2947] >> zeldaking
[2374] >> aetios
[1921] >> lincolnb
[1716] >> willrandship
# OLD
The SAX users that spoke the most are:
[24924] >> ashbad
[2998] >> sarah
[2947] >> zeldaking
[2374] >> aetios
[1921] >> lincolnb
[1716] >> willrandship
I felt the SAX users were under-appreciated, so I did a check for them, too. These are the number of lines each user from SAX has said.
Code:
The most-used SAX phrases are:
[487] >> hi
[369] >> 0x5 (This one is actually 0x5)
[363] >> :p
[211] >> ok
[195] >> :)
[167] >> oh
These are also from SAX. They're similar to the first set, at the top of this post.
Edit:
I checked for runs, smileys, and the most-used words overall. I also added Merth's DecBot links to the program, so now all of the variants of "Kerm" contribute to his count. I left the old numbers in there so that everything still makes sense, but the ones above them are more accurate.
Code:
The short-runners are:
[205] >> kermm
[55] >> geekboy
[52] >> ashbad
[42] >> nikky
[40] >> forty-two
[35] >> comicidiot
The medium-runners are:
[266] >> kermm
[224] >> ashbad
[175] >> doorscs
[101] >> comicidiot
[86] >> ahelper
[84] >> forty-two
The long-runners are:
[72] >> doorscs
[25] >> ashbad
[20] >> ahelper
[16] >> forty-two
[16] >> kermm
[15] >> dtal
A "run" is defined as being when the same user says something 4 times in a row without being interrupted during a period of time. For short runs, it's 1 minute; medium - 3 minutes; long - 5 minutes. And, runs aren't counted twice, so if you say 5 things, it still counts as 1 run.
Code:
The average line-lengths for short runs are:
[7.98819301848049] >> kermm
[6.576] >> geekboy
[10.5921926910299] >> ashbad
[6.14189189189189] >> nikky
[5.41071428571429] >> forty-two
[6.18262411347518] >> comicidiot
Kerm wanted me to see how long the lines were for people who did runs (Since he did them more than Geekboy). This is the number of characters on average.
Code:
The highest line-lengths for runs are:
[108.5] >> jared__
[33.5] >> brandonw|
[27.8888888888889] >> iambian
[25.25] >> alberth
[25.125] >> valberth
[24.5] >> jet322
Since I already had the run data for line lengths, I figured it'd be interesting to see what the longest line lengths were for it. This one is also the number of characters on average.
Code:
The most-used smileys are:
[11395] >> :p
[11203] >> :)
[6468] >> :d
[4870] >> :(
[2616] >> :/
[2516] >> ;)
This one just counts the number of words that are actually smileys. I have some terrible regular expression that matches as many smileys as I could think of, including backwards (To get all those fiends that do smileys like "(:").
Code:
The smiley-users are:
[8204] >> kermm
[4163] >> catherine
[3769] >> benryves
[2987] >> ashbad
[2805] >> tifreak8x
[2097] >> alberthro
These are the people that used smileys the most. I just realized that it counts people multiple times if they say "something like this :/".
Code:
The most-used words overall are:
[95352] >> i
[88523] >> the
[68434] >> to
[63401] >> a
[47308] >> it
[42311] >> you
No surprises here. This just splits all the words we've ever used and counts their occurences.
EndEdit
Observations:
:: Ashbad says a freaking TON of things. In total, he's said 50345 lines. That's only 3659 lines less than Kerm has said from IRC, and 4916 lines less than Kerm's said total. Ashbad accounts for 9% of the lines for #cemetech.
:: Originally, before I took out all the "USER has entered the room." or "USER has logged in" lines, Souvik ranked up at the top. "(C) *Souvik has entered the room." showed up 2717 times, 456 more times than "lol".
:: Kerm does a lot in this channel. That's to be expected, obviously, but it's neat to see the data prove it, too.
:: Thank GOD "xD" didn't show up anywhere near the top. I'd have to stab something if it ranked #1, or even #6. I'm not too happy about "lol" being up there, but I don't suppose there's much to do about that after The Rise and Fall of L O L (And, I'm guilty of using it, too).
:: I am kind of suprised about the smileys being ranked so highly. "" being #2 and #3, and "" being #5 in the lists. I expected them both to be around #6.
:: I also think it's interesting that the people who spoke the most tended to be intelligent. Or, at least, Kerm says high-quality things a lot; Ashbad also does, for the most part. I like to think that I do, but I know a lot of it can just be rambling. And then there's Aes, Seana, and Qazz. Aes has been missing for a couple weeks/months. Seana says random stuff all the time (<3). And Qazz... is Qazz. I find it weird that there's this divide in that sense.
:: I added a quick check to see how many unique messages there are in the channel: 372229. ~66%, or 2/3 of our channel is completely unique in that sense! The other 1/3..? I guess a lot of it has to do with "lol" and "", along with the ones further down the list (I only listed the top 6 for each category, but I assume that 7-15 are going to be high numbers, too).
Edit:
:: I like smileys. I find it kind of funny that I used a little of half as many smileys as Kerm used. Also, about 20% of the things I've said have at least one smiley in them.
:: Another interesting thing is that Kerm does more runs than Geekboy. Almost 4 times as many. He also only uses ~1.4 characters more than Geekboy does, on average.
:: Who is this "jared__" guy? Whoever he is, he must have been telling a story (perhaps to himself!)
:: I am very, very happy to see that "XD" or any variant isn't in the top 6 for smiley usage. I was pretty worried it would be #6, but we only seem to use colon-smileys.
:: Merth jumped up to #3 on the Most Highlighted Users list! He was right about being about half-and-half when it comes to which nickname he used ("Shaun" or "Merth").
:: The number of lines that have "" in them make up 3% of our unique messages. 7% of our total messages have at least one smiley in them (from the top 6 list).
:: The most-used words overall are still the most-used when you include punctuation words. I originally had some regex in there that would get rid of the punctuation, but when I removed that line, the rankings stayed the same:
Code:
s/[^a-z]//g
:: I also got the regex wrong for smileys the first time, and found out that Aes used a smiley "<<<.>>>" one time when his nickname was "Aes_santa".
EndEdit
What does everyone else think about this? If you want me to add any other tests or post the source code, reply and I will. I find this sort of thing really fascinating, and I'm sure several of you do too. I also want to try to make these numbers as correct as possible, so if you think I might have messed up on my calculations, please tell me.