Chaolemen Borjigin, Chen Zhang
Data science is a rapidly growing academic field with significant implications for all conventional scientific studies. However, most relevant studies have been limited to one or several facets of data science from a specific application domain perspective and less to discuss its theoretical framework. Data science is unique in that its research goals, perspectives, and body of knowledge are distinct from other sciences. The core theories of data science are the DIKW pyramid, data-intensive scientific discovery, data science life cycle, data wrangling or munging, big data analytics, data management, and governance, data products DevOps, and big data visualization. Six main trends characterize the recent theoretical studies on data science are: (1) the growing significance of DataOps, (2) the rise of citizen data scientists, (3) enabling augmented data science, (4) integrating data warehouse with data lake, (5) diversity of domain-specific data science, and (6) implementing data stories as data products. Further development of data science should prioritize four ways to turn challenges into opportunities: (1) accelerating theoretical studies of data science, (2) the trade-off between explainability and performance, (3) achieving data ethics, privacy and trust, and (4) aligning academic curricula with industrial needs.
CCS concepts; General and reference; Surveys and overviews; Data science; Big data; Data products; Data-driven management; The DIKW pyramid