About Us +
The Energy Technologies & Systems Division (ETS) works to enable and accelerate the development and adoption of new advanced technologies for reliable transportation and power and energy systems.
Research +
We work closely with academic, government and industry partners to conduct foundational and applied research that provides the groundwork for the development of transformative new energy technologies in the areas of energy storage and conversion, electrical grid, advanced materials for the energy infrastructure, science of manufacturing and water-energy nexus.

Visit our focus areas and research groups at the right to find out more.
Broad Challenges We Face +
Research Groups +
Publications
News
Seminars

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature

Publication Type

Journal Article

Date Published

07/2019

Authors

Weston, L, V Tshitoyan, J Dagdelen, O Kononova, A Trewartha, K A Persson, G Ceder, A Jain

DOI

10.1021/acs.jcim.9b00470

Abstract

The number of published materials science articles has increased manyfold over the past few decades. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw text of published articles onto structured database entries that allow for programmatic querying. To this end, we apply text mining with named entity recognition (NER) for large-scale information extraction from the published materials science literature. The NER model is trained to extract summary-level information from materials science documents, including inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifier achieves an accuracy (f1) of 87%, and is applied to information extraction from 3.27 million materials science abstracts. We extract more than 80 million materials-science-related named entities, and the content of each abstract is represented as a database entry in a structured format. We demonstrate that simple database queries can be used to answer complex “meta-questions” of the published literature that would have previously required laborious, manual literature searches to answer. All of our data and functionality has been made freely available on our Github (https://github.com/materialsintelligence/matscholar) and website (http://matscholar.com), and we expect these results to accelerate the pace of future materials science discovery.

Journal

Journal of Chemical Information and Modeling

Volume

Year of Publication

2019

Issue

ISSN

1549-9596

Organization

Applied Energy Materials Group, Energy Technologies & Systems Division

Research Areas

No Research Area