I got a list of all the usernames on cemetech using methods that were neither automated or programmatic, as per the TOS. I only used services cemetech provides to users in intended ways, and exactly zero code was written by me to retrieve the list from the site. All of the regex stripping and stuff to get these statistics happened outside of cemetech in a text editor. Here's some stats.

  • The average username length is about 9.2 characters long.
  • In order from most common to least common the characters in usernames (and their counts) can be found here.
  • You can find a list of email domains for users who registered with an email as their username here This was particularly interesting to me. I was expecting there to be more users who registered with an email, but only a minuscule 1.6% did. I was also expecting there to be entire classes who registered with their school accounts to use, say, jstified, but there did not appear to be any.
  • About a third of all cemetech usernames contain capital letters.
  • About 97% of all cemetech usernames contain lowercase letters.
  • About 7% of all cemetech usernames contain characters that are not letters, numbers, or an underscore.
  • About 0.4% of all cemetech usernames are palindromic.
  • About 1.4% of all cemetech usernames start with a number, but 30% of all cemetech usernames end with one.
  • About 3% of all cemetech usernames contain the string 'ti'

For those who want to know how I did it- I'm not going to share how I did it in the post because it's probably not amazing for cemetech's servers to do a lookup like that for however many people are interested. If you just want a query done, post it in the thread and I'll do it if I find time.
This is very interesting! I think the tidbit that stood out the most is that only 96% of usernames contain lowercase letters. That would mean we have some all uppercase, all number, or a combo of the two (and symbols).

If you find other stats to pull, I'd love to see them. Perhaps usernames that share the first 3 characters? (Since SAX highlights users when a string matches the first three of their username), and anything else that comes to your mind!
100% of users named MateoConLechuga manage all the lettuce.
MateoConLechuga wrote:
100% of users named MateoConLechuga manage all the lettuce.


You did a fantastic job keeping Cemetech safe during that E. Coli lettuce outbreak a short while back.
Thank you.
How many usernames have the word calc in them?
Alex wrote:
If you find other stats to pull, I'd love to see them. Perhaps usernames that share the first 3 characters? (Since SAX highlights users when a string matches the first three of their username), and anything else that comes to your mind!


That's an interesting one, actually. Of the 25248 usernames I have, there are only 8629 unique three-letter starting sequences (warning for mobile users on data, the file is around 100kb). This means that the odds of you sharing a sax highlight with another user is quite high, actually.

Unlike the SAX highlights, my search was case sensitive. If you change it to a case-insensitive search, there are much fewer, only 6371 unique three-letter starting strings.

In conclusion, the odds of a random cemetech user being highlighted by the same stuff as someone else is quite high, nearly a 75% chance.




But wait, it turns out that it's the first four characters in the highlight string.

With this small adjustment, the number of starting sequences is 14047, changing the odds to just above half. The ranking data my program generated is really interesting, actually, because you see a lot of popular four-letter words and names.


I found this result of 1/2 really surprising, thanks for the suggestion!

jcgter777 wrote:
How many usernames have the word calc in them?


115.
This seems pretty familiar. I was once into a certain web-based MMO, and did a similar analysis to put on my then-website. That was the night before my birthday and I couldn't sleep. Rolling Eyes

What brought this on? I mean, I get it. A corpus is a corpus, and deserves to be scraped.
Weregoose wrote:
What brought this on? I mean, I get it. A corpus is a corpus, and deserves to be scraped.


There was a conversation in discord where KryptonicDragon was talking about drawing dragons, and it made me wonder what kinds of trends could be found in usernames.

Inspired by your post, I did a search for the only mathematical constants people seem to care about these days.


  • The number 1337 occurs exactly 18 times.
  • The number 666 also occurs exactly 18 times.
  • The number 69 appears in 92 usernames.
  • The number 420 appears all of 24 times.


For the curious, the number 314 only appears 16 times.
Wow that is really cool. I never knew that people put the most random characters in their usernames. I am also surprised that 0.4% of usernames are palindromic. I wonder who those people are.
_iPhoenix_ wrote:
the number of starting sequences is 14047,

I believe that your analysis may be faulty. For instance, my username is "nik", which has a character count smaller than 4, but I share highlights with nikky, whose username has the starting sequence "nikk". If a username has less than 4 characters but it matches the beginning of an existing 4 character sequence, it should not be counted separately, as that is not how SAX treats it.

_iPhoenix_ wrote:
In conclusion, the odds of a random cemetech user being highlighted by the same stuff as someone else is quite high, nearly a 75% chance.

That is only partially true. Typing "nikky" highlights me ("nik") on SAX (well, at least it used to, I am not sure what it does nowadays), but all of my IRC clients do not treat that as a highlight. Another example: With most IRC clients, users abcd1 and abcd2 would be highlighted by their mutual starting sequence "abcd", but once you add the rest of their usernames, only one will receive the highlight. Not sure about SAX, though.
Hmm...
I`m the one username of the 3% of usernames whitout lowercase letters.

Very interesting data about the users, still wondering how you found this things out
without any programming!
What is the percentage of Cemetech usernames whose first post is an advertisement?
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement