TY - JOUR
T1 - Performance Evaluation and Application Potential of Small Large Language Models in Complex Sentiment Analysis Tasks
AU - Yang, Yunchu
AU - Li, Jiaxuan
AU - Guo, Jielong
AU - Pang, Patrick Cheong Iao
AU - Wang, Yapeng
AU - Yang, Xu
AU - Im, Sio Kei
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Sentiment analysis using Large Language Models (LLMs) has gained significant attention in recent research due to its outstanding performance and ability to understand complex texts. However, popular LLMs, such as ChatGPT, are typically closed-source and come with substantial API costs, posing challenges for resource-limited scenarios and raising concerns about privacy. To address this, our study evaluates the feasibility of using small LLMs (sLLMs) as alternatives to GPT for aspect-based sentiment analysis in Chinese healthcare reviews. We compared several Chinese sLLMs of varying sizes with GPT-3.5, using GPT-4o's results as the benchmark, and assessed their classification accuracy by computing F1 scores for each individual aspect as well as an overall F1 score. Additionally, we examined sLLMs' instruction-following capabilities, VRAM requirements, generation times, and the impact of temperature settings on the performance of top-performing sLLMs. The results demonstrate that several sLLMs can effectively follow instructions and even surpass GPT-3.5 in accuracy. For instance, InternLM2.5 achieved an F1 score of 0.85 with zero-shot prompting, while the smaller Qwen2.5-3B model performed well despite its minimal size. Prompt strategies significantly influenced smaller and older models like Qwen2.5-1.5B and ChatGLM3.5 but had limited impact on newer models. Temperature settings showed minimal effect, while older models generated responses faster, and newer models offered higher accuracy. This study underscores the potential of sLLMs as resource-efficient, privacy-preserving alternatives to closed-source LLMs in specialized domains. Our work demonstrates versatility, with potential applications across domains such as finance and education, and tasks like sentiment analysis, credit risk assessment, and learning behavior analysis, offering valuable insights for real-world use cases.
AB - Sentiment analysis using Large Language Models (LLMs) has gained significant attention in recent research due to its outstanding performance and ability to understand complex texts. However, popular LLMs, such as ChatGPT, are typically closed-source and come with substantial API costs, posing challenges for resource-limited scenarios and raising concerns about privacy. To address this, our study evaluates the feasibility of using small LLMs (sLLMs) as alternatives to GPT for aspect-based sentiment analysis in Chinese healthcare reviews. We compared several Chinese sLLMs of varying sizes with GPT-3.5, using GPT-4o's results as the benchmark, and assessed their classification accuracy by computing F1 scores for each individual aspect as well as an overall F1 score. Additionally, we examined sLLMs' instruction-following capabilities, VRAM requirements, generation times, and the impact of temperature settings on the performance of top-performing sLLMs. The results demonstrate that several sLLMs can effectively follow instructions and even surpass GPT-3.5 in accuracy. For instance, InternLM2.5 achieved an F1 score of 0.85 with zero-shot prompting, while the smaller Qwen2.5-3B model performed well despite its minimal size. Prompt strategies significantly influenced smaller and older models like Qwen2.5-1.5B and ChatGLM3.5 but had limited impact on newer models. Temperature settings showed minimal effect, while older models generated responses faster, and newer models offered higher accuracy. This study underscores the potential of sLLMs as resource-efficient, privacy-preserving alternatives to closed-source LLMs in specialized domains. Our work demonstrates versatility, with potential applications across domains such as finance and education, and tasks like sentiment analysis, credit risk assessment, and learning behavior analysis, offering valuable insights for real-world use cases.
KW - aspect-based sentiment analysis
KW - data privacy
KW - natural language processing
KW - resource-constrained environments
KW - sentiment analysis
KW - Small large language model
UR - http://www.scopus.com/inward/record.url?scp=105001654450&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3549733
DO - 10.1109/ACCESS.2025.3549733
M3 - Article
AN - SCOPUS:105001654450
SN - 2169-3536
VL - 13
SP - 49007
EP - 49017
JO - IEEE Access
JF - IEEE Access
ER -