README.txt There are five files in this data repository. Files titled EntrepreneurForum_2018.csv & LibertarianForum_2018.csv are files that contain data taken from Reddit. The data were collated from Google BigQuery and consist of all posts and comments in the r/Libertarian and r/Entrepreneur subreddits in the year 2018. In the interests of ethics, all data have been anonymised and the post content has been removed. The files contain the counts of linguistic categories as output using LIWC 2015. The column headings refer to the LIWC variables. There is also a column heading 'p_c' which specifies whether the data is a comment or a post. The authors have been anonymised using numbers, and are stored in the column titled 'users_anon. The column 'subreddit' refers to the forum from which the data were collected. All bots and deleted or removed posts/individuals have been removed. The file titled SilkRoadData.csv contains data taken from the publicly available dataset collated by Branwen and colleagues and available at Once again, these data have been anonymised and contain no post content in line with ethics. They also contain counts of linguistic categories per post as calculated through LIWC 2017. This file also contains a column titled 'users_anon' which is a numerised anonymous version of the usernames displayed in the original dataset. There is also a column titled 'C' which refers to the forum from which the data were collected. Data from both Vendor Roundtable and Philosophy Economics and Justice are stored in this one file. The final four files relate to Study 2 of the Using Computational Techniques to Study Social Influence Online paper. These files contain the social interactions indices used to examine the relationship between influential individuals and prototypicality. Both these files contain: an anonymised username (users_anon), a prototypicality score determined through using the Extra Trees classifer outlined in the main paper (prototypicality), a value of the total number of contributions that a user has submitted to the forums (total_contributions), the total number of posts that a user has submitted to the forum (total_posts), the total number of comments that a user has submitted to the forum (total_comments), a measure of their indegree centrality as calculated using UCINET and verified using the NetworkX module in python (indegree) and; the centrality ratio calculated from dividing the individuals indegree centrality by their total number of contributions (CentralityRatio). These metrics were determined using the post and comments data taken from Google BigQuery from the Libertarian and Entrepreneur subreddits in the month of March 2019. The centrality measures were determined by constructing a network of edges between two individuals when one user commented on another users post or comment. For more detail about the data outlined here, please email