Yanan Liu, Fang He, Jin Wen, Zhiguang Zhou, Jinchang Li
With the rapid development of Internet technology, a rich set of e-government data are collected by the government departments. For example, a variety of feedback text data can be obtained quickly and efficiently through various channels such as the mayor's mailbox. It is an effective way to improve the working efficiency of the government to extract hot topics from large-scale e-government text data, establish the correlation between topics and geographic space, and interactively explore the sources of public feedback problems. However, it is a difficult task to explore the large-scale e-government text data with traditional visualization methods such as word cloud, because too many words are hardly distributed in a limited space which will largely disturb the visual perception. In this paper, we propose a visual analytics system for large-scale e-government data exploration by means of simplified word cloud. Firstly, a representation learning model is used to embed the text data into high-dimensional space to quantitatively represent the semantic structure features of e-government text data. Then, the high-dimensional vectors are projected into a two-dimensional space where the coordinate distribution of points effectively expresses the semantic similarity of original words, which also presents geographic features that can be quantized by means of a similarity computing model. In order to simplify the understanding of large-scale e-government data and improve the cognitive efficiency of word could, we adopt the adaptive blue noise method to sample the topic words, which can simplify the visual expression of word cloud and improve the understanding efficiency of e-government data without losing the semantic structure features. Furthermore, an abstraction and visual analysis system for large-scale e-government text data is designed and implemented by integrating the above representation learning model, sampling-based abstraction model of word cloud, and topic and geographic correlation analysis model. This system provides convenient human-computer interaction modes and supports users to explore the analysis and extraction of the characteristics hidden in large-scale e-government data. It also helps government departments quickly locate the hot topics of public concern and their related regional distribution, and provides decision support to further improve the work efficiency of the government. Case studies based on real-world datasets further verify the effectiveness and practicability of our system.
E-government; Text mining; Text visualization; Visual analytics