Content Moderation for User Generated Content (UGC) | HCL Technologies

Text Moderation – A Use Case on Detecting Offensive Text Using NLP

Text Moderation – A Use Case on Detecting Offensive Text Using NLP
November 03, 2015


We have observed a phenomenal increase in user-generated content (UGC) in the last decade. An estimate shows that Facebook users share nearly 2.5 million pieces of content every minute! [1]. This is made possible through having more people getting access to internet and mobile phones. A recent report showed that the share of population having mobile phones increased from 1% of global population to 73% of global population in the decade ending 2014 [2]. The number of internet users increased to about 3.3 billion globally, and registered a robust growth of more than 800% during the period 2000-2015 [3]. However, internet penetration (number of internet users / total population) is still modest at around 45%. This is indicative of potentially higher number of internet users and concomitantly even higher user-generated content in the ensuing decade.

However, there is an obvious question on the kind of content generated. One of the monthly Global Threat Report by ScanSafe showed that up to 80% of blogs contain offensive content included in the form of adult language, or pornographic images and videos [4]. The next question arises is whether there is a need for offensive content moderation on user forums, blogs, and social networking sites. Some forums are meant for adult content and hence using profanity words may be permissible.  It may be fine to leave these posted on forum as it is or redact the words (f***, s***). However, it may have repercussions if the offensive content is exposed to adolescents. Some recent statistics showed that almost 70% of American teens (aged 12- 17 years) use social media sites on a daily basis and they are at an increased risk of smoking, drinking, and drug use [5]. In response to this, regulations like Children’s Internet Protection Act (CIPA) are enforced in the United States. In addition, the quality of user-generated content is of utmost importance for brand awareness and the success of digital marketing promotions and campaigns [6]. A report by Ipsos MediaCT, Crowdtap and the Social Media Advertising Consortium showed users (aged 18-36 years) spend about 6 hours in a day on generating and viewing UGC, and it is 20% more influential on their purchase decisions than other forms of media [7].

To address these concerns, many user forums, blogs, and social networking sites apply several mechanisms to screen and filter offensive contents. However, most of the filtering happens using lexicon based approach [8]. It implies that if the user’s comments involve any profanity or offensive content, the system will search for the typed word or synonyms of the words in the repository built, and it will remove / or redact the word (***) on the forum. However, in this case, the strength of the system to deal with offensive content depends upon how robust the repository is built. In some cases, lexicons are pre-defined (like YouTube’s safety mode) or composed by the users themselves (like Facebook) [8].

But the lexicon-based approach does not seem to be enough as shown by the recent Yik Yak incident in the US. Yik Yak is a social media smartphone application which allows users to anonymously microblog to other nearby users and works by combining the technologies of GPS and instant messaging. A member of the feminist group in Virginia University was murdered after she was harassed on Yik Yak. Some of the comments posted on Yik Yak in that context included,”Gonna tie these feminists to the radiator and grape them in the mouth”, “Can we euthanize whoever caused this bulls--t?” etc. [9]. As these comments show, users found ways to get around the offensive words by fooling the lexicon-based system by putting ‘grape’ instead of ‘rape’ and ‘bulls--t’ instead of ‘bullshit’. In this case, lexicon-based approach to detect offensive text fails as ‘grape’ and ‘bulls--t’ will be identified as non-offensive, but in intent, it is indeed offensive. The next question that arises is whether such content can be marked as offensive by manually reviewing each content and text posted on the user blogs/ forums. But in this digital world, this is simply impossible because a lot of content is generated online [1]. Hence, there is a need for automated non-lexicon based approach for offensive comment moderation to identify the offensive text.

Against this background, we attempt to solve this online moderation making use of natural language processing (NLP) and machine learning (ML). We implemented end-to-end text moderation for one of the content management engagement with a strategic partner in US.  It involves performing feature engineering (more on that below), and eventually generating a numeric probability score indicating how likely the text is offensive on a near real time basis. In this particular engagement, the offensiveness definition is primarily confined to misogynous behavior. However, we expect it can be easily extended to incorporate racism or cyberbullying behavior, without much loss in the accuracy of the predictions.

As in any ML exercise, a data scientist spends more than two-third of his/her time in performing feature engineering. In the context of NLP, feature engineering involves generating numeric features out of the individual text and, hence converting the unstructured information (text) in the form of a structured data matrix containing rows and columns. For the purpose of offensive text detection, we generate features on character and tokens, among others. For instance, character features indicate how many alpha, numeric, non-alphanumeric characters are present in the text and token features indicate how many tokens (words) are having less than 5 characters or more than 5 characters, how many tokens contain ASCII / non-ASCII characters etc. And then we have sentiment features, readability features, and lexical diversity features. Sentiment features produce a sentiment score on how positive/negative the text is, while readability and lexical diversity features produce important readability, lexical diversity indices like Bormuth, Spache and different type/token ratios among others. The part-of-speech tagging and entity recognition features include features on the count of nouns, adjectives, adverbs, verbs, date and time etc. We also generated n-grams features to better understand and capture the context of the text.

There are additional hiccups in the way the offensive text is usually written. In general, offensive text includes many non-alphanumeric characters for instance f***, s*** etc. This may not gel well from the perspective of ML to identify offensive text. Hence, to circumvent this problem we designed a heuristic which transforms the original text typed. The transformed text contains only alpha and numeric character words and searches the best approximate match for the original text in the English lexicon, for instance f*** gets converted to ‘fuck’. As a next step, we generated all the features mentioned above like character, tokens, sentiment, readability, and lexical diversity on this transformed text also.

Overall, the solution built generates about 1600 features for each individual text. This captures information on which the text is written, and most importantly how the text is written. The features generated serve as an input in applying ML classifiers and this produce a numeric probability score (in the range of 0 – 1) indicating how offensive the text is. It may happen that ML classifier treats the same text differently, for instance, ”You are stupid !”, “You are STUPID” are more stronger (having higher probability score) than simply “You are stupid”.

To conclude, the ML and NLP based approach to identify offensive text is more robust and automated. As a next step, it can be further advanced by incorporating the user level information usually captured on social networking sites. The text moderation methodology adopted can also be deployed in other content management engagements involving customer feedback, online product reviews etc.