Mission
Does positive sentiment lead to improved engagement? This is the theory we wanted to test while I was working with Digital Yalo, a marketing agency based in Atlanta, GA. Utilizing open-source Natural Language Processing algorithms, we wanted to analyze thousands of social media posts to understand the correlation between the positivity of an author’s tone and the engagement they would receive from their followers. Utilizing this data, we wanted to build a tool that could assist social media influencers in editing the sentiment of their posts to boost their engagement online.
While some variants of this tool are already available online, we specifically wanted to learn about domain-specific sentiment analysis. This will provide custom data sets available for individual industries. Instead of trying to generate a one-size fits all model, we wanted our data sets to be laser-focused on the target audience for each of our specific clients.
Challenge
The most crucial challenge was working ethically with user-generated content to conduct our research. We had to work closely with legal consultants, data analysts, and social media sites to understand the best way to access user information without compromising their content ownership.
Beyond this, we needed to set up a technical stack that could ingest online content, clean the data to make it more suitable for our research, feed it into our Natural Language Processing engine and then train our lexicon to grade every learned word with an associated sentiment score.
Finally, we will need to package this into a prototype website where users can send their draft posts and receive direct sentiment scoring and possible revisions.
Solution
While it still took a lot of work and research, we were very fortunate to be able to build on top of existing solutions to generate our desired goal rapidly. With expert guidance from our legal consultants and data analysts, we developed a linear technology stack that resulted in our final prototype.
For the initial prototype, we chose to use Reddit as the social media website we would use to train our lexicons. Reddit’s “subreddit” feature, which categorizes users’ posts under predetermined topics, made it an excellent testing ground for being able to train several domain-specific lexicons without having to make any modifications to our code. To automate the ingestion, we used the Pushift.io API to read thousands of posts and convert them into a raw text file.
Once we acquired these raw data files, we needed to be able to convert them into mathematical models that a computer algorithm could read and understand. We chose to use a popular Python (our language of choice) plugin called Word2Vec, which can take in plain text and convert it into a vector model, which allows the computer to extract useful information such as synonyms, antonyms, parts of speech, and more. By limiting our data sets to domain-specific messages, we can improve the relationship-building of individual words within the industry of choice. Especially words that may only exist within that domain, such as SCRUM (Systemic Customer Resolution Unravelling Meeting) or GOAT (Greatest of all time)
The actual algorithm which handles the sentiment grading is called SocialSent, which was developed from a research study at Cornell University. Based on the original SentProp algorithm, it can read our custom lexicon models and grade each word on its overall sentiment based on how it is used within the larger context. We chose SocialSent because it was focused on improving the grading of domain-specific lexicons rather than larger general-purpose data models.
Once we have our graded lexicon, we can upload these to our analysis tool, which can input drafted posts and cross-reference them against the lexicon we have provided. We based our version of the analysis tool heavily on the open-source VADER, a popular general-purpose sentiment analysis tool.
Once we got the initial sentiment analysis grading to work with this technical stack, we could take advantage of existing capabilities to extend the functionality of our tool. Even though we only set out to grade users’ drafts, we managed to utilize our word vector model to allow the website to additionally recommend synonyms to replace words that scored low sentiment with more positive alternatives. This allowed our tool to not only inform its users of potential issues with their drafts but provide precise recommendations to improve their sentiment as well!
Result
By the end of my contract, we had deployed a fully functional ALPHA that was available internally to the authors who worked at Digital Yalo. While it was difficult to track the exact metrics of how this tool affected engagement, we received lots of anecdotal praise from the authors who used it about how the Sentiment Analysis tool helped boost their confidence when drafting a post and revolutionized the way that people responded to their content.
Once the initial technology stack was implemented, we could consistently reuse it to grow our existing lexicons and add new ones. At the time of writing, we were able to provide targeted sentiment analysis for specific domains such as gaming, technology, retail, real estate, marketing, and more!
You can learn about this from Digital Yalo’s website: https://digitalyalo.com/sentiment-analysis-services/