TY - GEN
T1 - Identifying Unconvincing User on Social Media with Limited Features
AU - Li, Yufei
AU - Chen, Tianhao
AU - Pang, Patrick Cheong Iao
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Unconvincing users on social media are responsible for many malicious online activities. Therefore, developing classifiers to identify them is crucial. Due to the trade-off between the crawling costs of features and sample size, it remains challenging to distinguish unconvincing users from normal human users using limited features with low API call cost. To address the challenge, we present a case study where we develop machine learning-based classifiers with limited features to specifically identify a predefined category of unconvincing users: fake Twitter followers. Four models, namely Random Forest (RF), XGBoost, LightGBM, and Support Vector Machine (SVM) are employed and trained using a tabular dataset consisting of only 19 low-cost features. Experimental results show that the LightGBM classifier achieves the highest accuracy at 98.7%. Furthermore, feature importance analysis is carried out on LightGBM, and both SHAP analysis and Information Gain (IG) results indicate that the ratio between friends and followers is the most important feature.
AB - Unconvincing users on social media are responsible for many malicious online activities. Therefore, developing classifiers to identify them is crucial. Due to the trade-off between the crawling costs of features and sample size, it remains challenging to distinguish unconvincing users from normal human users using limited features with low API call cost. To address the challenge, we present a case study where we develop machine learning-based classifiers with limited features to specifically identify a predefined category of unconvincing users: fake Twitter followers. Four models, namely Random Forest (RF), XGBoost, LightGBM, and Support Vector Machine (SVM) are employed and trained using a tabular dataset consisting of only 19 low-cost features. Experimental results show that the LightGBM classifier achieves the highest accuracy at 98.7%. Furthermore, feature importance analysis is carried out on LightGBM, and both SHAP analysis and Information Gain (IG) results indicate that the ratio between friends and followers is the most important feature.
KW - bot detection
KW - fake Twitter followers
KW - machine learning
KW - social media analysis
UR - http://www.scopus.com/inward/record.url?scp=85192981398&partnerID=8YFLogxK
U2 - 10.1109/ICCC59590.2023.10507669
DO - 10.1109/ICCC59590.2023.10507669
M3 - Conference contribution
AN - SCOPUS:85192981398
T3 - 2023 9th International Conference on Computer and Communications, ICCC 2023
SP - 2217
EP - 2221
BT - 2023 9th International Conference on Computer and Communications, ICCC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Computer and Communications, ICCC 2023
Y2 - 8 December 2023 through 11 December 2023
ER -