keyboard_arrow_up
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles

Authors

Kais Dai, Celia Gonzalez Nespereira, Ana Fernandez Vilas and Rebeca P. Diaz Redondo, University of Vigo, Spain

Abstract

The socialization of the web has undertaken a new dimension after the emergence of the Online Social Networks (OSN) concept. The fact that each Internet user becomes a potential content creator entails managing a big amount of data. This paper explores the most popular professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million public profiles. The application of natural language processing techniques (NLP) to classify the educational background and to cluster the professional background of the collected profiles led us to provide some insights about this OSN’s users and to evaluate the relationships between educational degrees and professional careers.

Keywords

Scraping, Online Social Networks, Social Data Mining, LinkedIn, Data Set, Natural Language Processing, Classification, Clustering, Education, Professional Career.

Full Text  Volume 5, Number 1