Information Extraction from Chinese Scientific Literature
The Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) today reports the successful completion of a pilot project in the area of Chinese Text Mining, conducted at Merck Serono, a division of Merck KGaA, Darmstadt, Germany. The pilot project was initiated as a feasibility study to evaluate how far current text mining technology is able to support automated information extraction from Chinese text sources such as scientific publications and the patent literature.
In the course of this project, ProMiner, the named entity recognition software developed at Fraunhofer SCAI, has been adapted to the specific requirements of text mining in Chinese scientific biomedical and pharmaceutical literature. Most commercial text mining technology is able to analyze English text, and some solutions provide functionalities for the analysis of German or French text. However, due to the steep increase in Chinese scientific output and the ever growing importance and attractiveness of the Chinese market to Western companies, the ability to automatically analyze Chinese unstructured information sources is of utmost importance for scientific and competitive intelligence aiming to closely follow what happens in China.
Evaluation of the performance of the pilot system jointly demonstrates that Chinese literature can be mined for biomedical terms with similar performance as English literature. However, “the challenge of Chinese Text Mining cannot be regarded as being solved”, Dr. Juliane Fluck, Head of the Text Mining Team at Fraunhofer SCAI makes clear: “we have just demonstrated that we are able to mine the Chinese biomedical scientific literature automatically. The real work — which is aiming at providing all functionalities needed for true knowledge discovery from Chinese unstructured text sources — starts now, after the proof-of-principle”.
Prof. Martin Hofmann-Apitius, Head of the Department of Bioinformatics at Fraunhofer SCAI sheds some light onto another, rather “academic” aspect of this work: “we were in the favourable situation that we have Chinese students doing their Master degree in Life Science Informatics at Bonn-Aachen International Center for Information Technology (B-IT).
The next steps in this collaboration will see an extension to another Fraunhofer Institute: the Fraunhofer Institute for Systems and Innovation Research (ISI). ISI in Karlsruhe has strong ties to China and is specialized on monitoring Chinese research, innovation and markets. Through collaboration with the Chinese Institute of Policy and Management, an institute of the Chinese Academy of Sciences (CAS), ISI is a premier partner when it comes to understanding science and innovation in China.