Towards new material discovery from CdTe solar cell literature with machine learning
Masters thesis
Liu, X. 2021. Towards new material discovery from CdTe solar cell literature with machine learning. Masters thesis Middlesex University Science and Technology
Type | Masters thesis |
---|---|
Title | Towards new material discovery from CdTe solar cell literature with machine learning |
Authors | Liu, X. |
Abstract | CdTe solar cells are the most successful second-generation solar technology and produce the lowest-cost electricity in the solar industry. The overarching aim of this project is to apply natural language processing (NLP) technologies to accelerate research in the field of CdTe photovoltaic devices by automatically discovering new material applications. The NLP technologies use various language models to extract most similar words. Consequently, a knowledge diagram is established by connecting these relevant similar words. The Language models include word2vec, GloVe, fastText and BERT, which are trained on a dataset of more than 22,500 paper abstracts. The performance of these language models is evaluated using a custom test dataset. The test dataset consists of 62-word pairs, which are conceptually related in the field of CdTe solar cells. The more similar the first word is to the second word in a word pair, the higher the trained language model scores. The goal of evaluating the trained language model is to find the related concepts in more similar words. The GloVe model achieves the highest score with the custom test dataset. The knowledge diagram established in this work shows the relationships between materials and concepts of interest. In addition, the language model trained on consecutive periods is used to track the timeline of material applications. The top 500 most similar words to “defect” are tracked with timeline and “selenium” is observed to appear in the GloVe model trained on paper abstracts between 2010 and 2020. This corresponds to a journal paper abstract published in 2019, which discussed the selenium passivation effect on the bulk defects of CdTe. Therefore, the knowledge diagram and timeline of material applications provide useful insights for future research and will accelerate material discoveries in the field of CdTe solar cells. |
Sustainable Development Goals | 7 Affordable and clean energy |
9 Industry, innovation and infrastructure | |
Middlesex University Theme | Sustainability |
Department name | Science and Technology |
Institution name | Middlesex University |
Publication dates | |
31 Oct 2022 | |
Publication process dates | |
Deposited | 31 Oct 2022 |
Accepted | 30 Mar 2021 |
Output status | Published |
Accepted author manuscript | |
Language | English |
https://repository.mdx.ac.uk/item/8q210
Download files
35
total views11
total downloads2
views this month1
downloads this month