Visualization, D3.js, Python, Reddit, Mobile

November 10, 2019

In this project, I...

  • Connect to the streaming and static Reddit API using Python and Django.
  • With Vader, plot the trending sentiment and activity of specified subreddits and comments.
  • Display encoded results in a mobile-friendly real-time data visualization using D3.js.
Side by Side comparison of Dottit screenshots

This project was born out of a conversation with a couple Reddit moderators, after I told them of my interest in addressing issues of hate, disinformation and abuse on social media. They had two main complaints about the suite of tools that Reddit provided them to deal with abusive or toxic comments:

  1. Reddit can flag comments, but it can't give a heads-up warning when conversations are trending towards toxicity.
  2. Mobile tools are limited, compared to their desktop counterparts.

Dottit is an attempt to address these issues, within the constraints of the Reddit public API. Designed from the ground-up to be usable on mobile as well as desktop, the visualization uses Vader sentiment as a stand-in for a more appropriate NLP model trained to recognize toxic language in Reddit posts. The intention is to allow a moderator or admin the ability to track when a subreddit is going sour, as well the ability to drill into any given submission to see which comments are pulling the conversation down.

There are two layers to the visualization, with the first showing thread activity (Reddit calls them 'submissions') on a given set of subreddits in real-time, up to the most recent 500 comments across four subreddits. Each dot is sized to reflect the volume of comments, and placed on the y-axis according to the average sentiment score of the last ten comments. Selecting a submission reveals ALL the comments, charted by time and sentiment. Selecting a single comment reveals its place in the comment forest, traced up to the top level and down to all subsequent branches. In this manner, it is possible to browse a subreddit and submission by sentiment, as opposed to the more generic rankings of "New," "Hot," etc. that Reddit provides. Clusters of dramatic negativity (or positivity) can be readily surfaced and explored, to identify turning points in the conversation.

While exploring by sentiment is fun, a more practical version of this tool would focus on the things that are of greater concern to moderators: violations of community guidelines. An NLP model trained to recognize toxic or abusive language would be useful across all subreddits, while more specific models, trained on deleted or flagged posts within a subreddit's history, would provide more subreddit-specific visibility.

Try It Out!