Chunli Liu, Nanhong Sheng
Data science is an emerging interdisciplinary subject in the era of big data, integrating knowledge in many fields such as machine learning, statistics, and data visualization. By analyzing the output and basic characteristics of data science papers from 2015 to 2021, this paper examines the influence of author country, open access status, discipline category, literature type, publication year, and research hotspot on the number of citations and social attention score of data science papers. The results show that data science papers continue to increase annually, with the highest number in 2017. The authors are mainly from the United States, England, Germany, and China, and accordingly mainly from North America, Europe, and Asia. Article, Review and Editorial's material are the main types of papers. Open-access papers are nearly twice as likely as non-open-access papers; Statistical analysis further confirmed that publication age and literature type had significant influence on citation times. The age of the paper, the type of the paper, the country of the author, the state of open access, and the discipline category have a significant influence on the score of social concern. Then, the comparison of keyword co-occurrence clustering diagram between highly cited papers and papers with high social attention shows that there are similarities and differences between the research hotspots of highly cited papers and papers with high social attention. The similarities are that machine learning, big data visualization and big data analysis of electronic health records are common research hotspots. While the difference is that highly cited data science papers also focus on big data analysis of business competitive advantage and big data analysis of social media. Data science papers with high AAS scores focus on open science big data analysis, bioinformatics big data analysis, and reproducible research as well.
Data science; Citation; Altmetric attention score; Influencing factor