Statistics on HN usage

A few months ago, I created some figures for some various statistics about HN usage. By themselves, I didn’t think they were very interesting, and I’m running some more sophisticated analysis of comments at the moment that was going to accompany them. However, in light of the recent announcement that there will soon be pending comments on HN, I thought there might be some interest in seeing them.

The data for these figures was taken from (now-defunct) HNSearch in mid October 2013, so I only included analysis up to September 30, 2013. I should note that I started the graphs in early 2007 because, though HN dates back to late 2006, there was a fairly long dry spell without any activity.

“Basic” Variables

The first figure is submissions per day. I separated between weekdays and weekends and added a 7-day average curve:

submissionsPerDay.png

The next figure is total (registered) users. I’m not sure how HNSearch was getting this number (if they had access to an internal HN api or not), so it might not be totally accurate:

totalUsers.png

I was curious about what the user growth curve looked like, so I made that too:
userGrowth.png

Topic Analysis

To understand how the interactions between users and submissions has changed over time, I looked at a couple of different quantities for the top 30 submissions, by points, for each day. Obviously, this has some pit falls because some days had fewer than 30 submissions. In principle, HN stories can be commented and up-voted at any time after their submission, but the only two threads that had any reasonable activity like this are the first HN submission and the HN suggestions thread.

The first figure shows the average number of points and the average number of comments that the top 30 submissions received, along with the 7 day average of both:
avgQuantities_30.png

The second figure shows the ratio of these two quantities and the 7 day average:
ratio_30.png

Other Notes and Future Work

Overall, I think the data are consistent with expectations for a growing, engaged user base. It would probably be interesting to see the how the number of points top (or all) submissions per day looks normalized by number of registered users.

The comparison of cohorts of earlier users (by registration date) to later users would be very interesting. I did make a figure of the karma distributions divided, by registration date, but I never took the time to make it look pretty enough to show and it pulls the input directly from HNSearch, so if anyone wants to see it, that’ll have to wait until I convert the code to pull from the new Algolia search. But I think someone who has access to the voting records of all users could make a very interesting figure comparing how an early cohort votes now to a later cohort.

Another interesting thing to see might be how the shape of the comment graphs for each submission have changed over time, but I haven’t come up with a good statistic that isn’t heavily influenced by the number of comments yet.

The efforts I have been focusing on are to understand how the commenting sentiment has changed over time.

 
0
Kudos
 
0
Kudos

Now read this

Why We Built The Best Particle Physics Analysis Framework

There are three major timescales to consider when you’re trying to carry out a particle physics analysis: How long it takes to figure out what you want to measure or search for and how to do it. How long it takes you code up the... Continue →