Barbara Hammer, Bielefeld University
Laurens van der Maaten, Delft University of Technology
Daniel Keim, University of Konstanz
The rapidly increasing availability of electronic data in virtually all aspects of daily life confronts people with huge amounts of potentially valuable information which is often not easily accessible due to its sheer size and complexity. In this context, data visualization plays a key role to establish an intuitive interface to the data such that humans can directly rely on their astonishing cognitive capabilities for visual perception. Data visualization constitutes a matured field of research with widespread applications in diverse areas such as medical data analysis, text mining, bioinformatics, geoinformatics, scientific visualization, etc., and dedicated topics of data visualization have been discussed in numerous contributions to international conferences and journals. This is accompanied by open source and commercial software packages which allow users to directly visualize data. In the context of machine learning and data mining numerous algorithms have recently been developed: popular dimensionality reduction techniques of (mostly high dimensional vectorial) data which replace a given set of data points by low-dimensional vectors and display these on the screen include, for example, t-SNE, Isomap, Isotop, SOM, XOM, MDS, LLE, ...; recent efficient clustering techniques such as affinity propagation, spectral clustering, or relational neural gas allow to deal with general dissimilarity data; further developments which are important for the proposed topic are topic modeling such as latent Dirichlet allocation, developments in collaborative filtering such as weighted SVD, or methods which allow an efficient inspection of receptive fields or similar such as LVQ, ICA, etc. Typically, classical information visualization as considered e.g. in connection to computer graphics goes far beyond data projection and deals with dedicated techniques to display information in appropriate images or video streams. Interestingly, while many dimensionality reduction techniques have been developed in the field of data mining and machine learning, more advanced information visualization tools are widely unknown to the CIS community. Conversely, many recent dimensionality reduction and data mining techniques are not yet sufficiently known in classical information visualization. One goal of the proposed task force is to bridge this gap between the communities.
Further, the rapid technological development in terms of sensor technology, dedicated data formats, and automated data storage, pose new challenges for data visualization: methods have to deal with very large data sets such that only tools which rely on finite memory and at most linear time are still affordable; rather than simple euclidean vectors, data displays very high dimensionality or even more complex structures, such that methods have to cope with specific (probably non-metric) dissimilarity measures or dedicated data formats; often, a direct projection of data to low dimensions is no longer sufficient for adequate data visualization, rather dedicated data mining tools have to be integrated such as prior clustering, feature selection, or information extraction; users of data visualization tools are usually not experts in the field of data visualization (or even computer science), such that there is a need for parameterless methods and an automatic selection of the best method for the given problem - currently, users often still rely on in general unsatisfactory methods such as euclidean MDS simply because of its easy availability; data visualization per se constitutes an ill-posed problem and it is easily possible to visualize aspects of the data which are not relevant for the specific application or which are even due to pure noise within the data, hence it is necessary to choose an appropriate bias of what to visualize; since the objective of data visualization is not clear a priori nor universal for any given task, the question of how methods can be evaluated is an important unsolved problem. These issues pose quite a few problems to data visualization which will be addressed in the proposed task force.
|IEEE CIS||KDnuggets - Analytics and Data Mining Resources||Barbara Hammer|