Authors
Lukas Jonathan Weber1, Alice Kirchheim2, Axel Zimmermann3, 1Mercedes-Benz AG, Stuttgart, Germany, 2Helmut-Schmidt-University, Germany, 3esz-partner Eber, Baden-Württemberg, Germany
Abstract
The request for precise text mining applications to extract information of company based automotive warranty and goodwill (W&G) data is steadily increasing. The progress of the analytical competence of text mining methods for information extraction is among others based on the developments and insights of deep learning techniques applied in natural language processing (NLP). Directly applying NLP based architectures to automotive W&G text mining would wage to a significant performance loss due to different word distributions of general domain and W&G specific corpora. Therefore, labelled W&G training datasets are necessary to transform a general-domain language model in a specific-domain one to increase the performance in W&G text mining tasks. The article describes a concept for adapting the generally pre-trained language model BERT with the popular two-stage language model training approach in the automotive W&G context. We plan to use the common metrics recall, precision and F1-score for performance evaluation.
Keywords
Natural language processing, Domain-specific language models, BERT, Labelled domainspecific datasets, Automotive warranty and goodwill.