TY - JOUR
T1 - BiSAL - A bilingual sentiment analysis lexicon to analyze Dark Web forums for cyber security.
AU - Al-Rowaily, Khalid
AU - Abulaish, Muhammad
AU - Haldar, Nur Al Hasan
AU - AlRubaian, Majed A.
N1 - DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2015
Y1 - 2015
N2 - In this paper, we present the development of a Bilingual Sentiment Analysis Lexicon (BiSAL) for cyber security domain, which consists of a Sentiment Lexicon for ENglish (SentiLEN) and a Sentiment Lexicon for ARabic (SentiLAR) that can be used to develop opinion mining and sentiment analysis systems for bilingual textual data from Dark Web forums. For SentiLEN, a list of 279 sentiment bearing English words related to cyber threats, radicalism, and conflicts are identified and a unifying process is devised to unify their sentiment scores obtained from four different sentiment data sets. Whereas, for SentiLAR, sentiment bearing Arabic words are identified from a collection of 2000 message posts from Alokab Web forum, which contains radical contents. The SentiLAR provides a list of 1019 sentiment bearing Arabic words related to cyber threats, radicalism, and conflicts along with their morphological variants and sentiment polarity. For polarity determination, a semi-automated analysis process by three Arabic language experts is performed and their ratings are aggregated using some aggregate functions. A Web interface is developed to access both the lexicons (SentiLEN and SentiLAR) of BiSAL data set online, and a beta version of the same is available at http://www.abulaish.com/bisal.
AB - In this paper, we present the development of a Bilingual Sentiment Analysis Lexicon (BiSAL) for cyber security domain, which consists of a Sentiment Lexicon for ENglish (SentiLEN) and a Sentiment Lexicon for ARabic (SentiLAR) that can be used to develop opinion mining and sentiment analysis systems for bilingual textual data from Dark Web forums. For SentiLEN, a list of 279 sentiment bearing English words related to cyber threats, radicalism, and conflicts are identified and a unifying process is devised to unify their sentiment scores obtained from four different sentiment data sets. Whereas, for SentiLAR, sentiment bearing Arabic words are identified from a collection of 2000 message posts from Alokab Web forum, which contains radical contents. The SentiLAR provides a list of 1019 sentiment bearing Arabic words related to cyber threats, radicalism, and conflicts along with their morphological variants and sentiment polarity. For polarity determination, a semi-automated analysis process by three Arabic language experts is performed and their ratings are aggregated using some aggregate functions. A Web interface is developed to access both the lexicons (SentiLEN and SentiLAR) of BiSAL data set online, and a beta version of the same is available at http://www.abulaish.com/bisal.
U2 - 10.1016/j.diin.2015.07.006
DO - 10.1016/j.diin.2015.07.006
M3 - Article
SN - 1742-2876
VL - 14
SP - 53
EP - 62
JO - Digital Investigation
JF - Digital Investigation
ER -