Comparative assessment of ChatGPT, DeepSeek, and human reviewers for full-text screening in systematic reviews on the impact of air pollution in respiratory diseases
Conference paper
Manullang, A., Gao, X., Viavattene, C. and Rahmanti, A.R. 2025. Comparative assessment of ChatGPT, DeepSeek, and human reviewers for full-text screening in systematic reviews on the impact of air pollution in respiratory diseases. Bramer, M. and Stahl, F. (ed.) 45th SGAI International Conference on Artificial Intelligence. Cambridge, UK 16 - 18 Dec 2025 Springer. pp. 478-484 https://doi.org/10.1007/978-3-032-11442-6_39
| Type | Conference paper |
|---|---|
| Title | Comparative assessment of ChatGPT, DeepSeek, and human reviewers for full-text screening in systematic reviews on the impact of air pollution in respiratory diseases |
| Authors | Manullang, A., Gao, X., Viavattene, C. and Rahmanti, A.R. |
| Abstract | Large language models (LLMs) are increasingly used in scientific research, education, and healthcare. Their roles in data searching, screening, extraction, and quality assessment hold promise for improving the systematic review process. However, concerns remain about their accuracy in literature research. This study aimed to compare the accuracy of LLMs and human reviewers in full-text screening for a systematic review. We searched for relevant studies from databases and registers. Full-text screening was performed by human reviewers, ChatGPT, and DeepSeek based on predefined inclusion and exclusion criteria. We observed that ChatGPT had 73.6% agreement with human reviewers (κ = 0.43), while DeepSeek had 70.3% (κ = 0.35). Moreover, ChatGPT showed 73.6% accuracy, high sensitivity (0.923), but low specificity (0.487) compared to the human consensus. Similarly, DeepSeek had 70.3% accuracy, higher sensitivity (0.962), but lower specificity (0.359). Both ChatGPT and DeepSeek show promise for assisting full-text screening in systematic reviews but require further evaluation with well-defined prompt engineering. |
| Sustainable Development Goals | 11 Sustainable cities and communities |
| Middlesex University Theme | Sustainability |
| Research Group | Flood Hazard Research Centre (FHRC) |
| Conference | 45th SGAI International Conference on Artificial Intelligence |
| Page range | 478-484 |
| Proceedings Title | Artificial Intelligence XLII: 45th SGAI International Conference on Artificial Intelligence, AI 2025, Cambridge, UK, December 16-18, 2025, Proceedings, Part II |
| Series | Lecture Notes in Computer Science |
| Editors | Bramer, M. and Stahl, F. |
| ISSN | 0302-9743 |
| Electronic | 1611-3349 |
| ISBN | |
| Paperback | 9783032114419 |
| Electronic | 9783032114426 |
| Publisher | Springer |
| Copyright Year | 2026 |
| Publication dates | |
| Online | 24 Nov 2025 |
| 24 Nov 2025 | |
| Publication process dates | |
| Accepted | Aug 2025 |
| Deposited | 02 Dec 2025 |
| Output status | Published |
| Accepted author manuscript | File Access Level Open |
| Copyright Statement | This version of the paper has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-ma...), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-032-11442-6_39 |
| Digital Object Identifier (DOI) | https://doi.org/10.1007/978-3-032-11442-6_39 |
| Web address (URL) of conference proceedings | https://doi.org/10.1007/978-3-032-11442-6 |
| Language | English |
https://repository.mdx.ac.uk/item/2zqvxw
Restricted files
Accepted author manuscript
45
total views5
total downloads17
views this month0
downloads this month