Towards new material discovery from CdTe solar cell literature with machine learning

Masters thesis


Liu, X. 2021. Towards new material discovery from CdTe solar cell literature with machine learning. Masters thesis Middlesex University Science and Technology
TypeMasters thesis
TitleTowards new material discovery from CdTe solar cell literature with machine learning
AuthorsLiu, X.
Abstract

CdTe solar cells are the most successful second-generation solar technology and produce the lowest-cost electricity in the solar industry. The overarching aim of this project is to apply natural language processing (NLP) technologies to accelerate research in the field of CdTe photovoltaic devices by automatically discovering new material applications. The NLP technologies use various language models to extract most similar words. Consequently, a knowledge diagram is established by connecting these relevant similar words. The Language models include word2vec, GloVe, fastText and BERT, which are trained on a dataset of more than 22,500 paper abstracts. The performance of these language models is evaluated using a custom test dataset. The test dataset consists of 62-word pairs, which are conceptually related in the field of CdTe solar cells. The more similar the first word is to the second word in a word pair, the higher the trained language model scores. The goal of evaluating the trained language model is to find the related concepts in more similar words. The GloVe model achieves the highest score with the custom test dataset. The knowledge diagram established in this work shows the relationships between materials and concepts of interest. In addition, the language model trained on consecutive periods is used to track the timeline of material applications. The top 500 most similar words to “defect” are tracked with timeline and “selenium” is observed to appear in the GloVe model trained on paper abstracts between 2010 and 2020. This corresponds to a journal paper abstract published in 2019, which discussed the selenium passivation effect on the bulk defects of CdTe. Therefore, the knowledge diagram and timeline of material applications provide useful insights for future research and will accelerate material discoveries in the field of CdTe solar cells.

Sustainable Development Goals7 Affordable and clean energy
9 Industry, innovation and infrastructure
Middlesex University ThemeSustainability
Department nameScience and Technology
Institution nameMiddlesex University
Publication dates
Print31 Oct 2022
Publication process dates
Deposited31 Oct 2022
Accepted30 Mar 2021
Output statusPublished
Accepted author manuscript
LanguageEnglish
Permalink -

https://repository.mdx.ac.uk/item/8q210

Download files


Accepted author manuscript
  • 35
    total views
  • 11
    total downloads
  • 2
    views this month
  • 1
    downloads this month

Export as