WHY analyze provenance data

The spectrum of possible reasons for conducting meta-analyses on provenance data is broad. Our goal is to provide a comprehensive overview of existing body of literature that analyze provenance datafor specific purposes. At a high-level, we can categorize the goals of the existing work as:

Understanding the User
"The goal of visualization is to create visual tools to support the user’s reasoning and decision-making with data."
Evaluation of System and Algorithms
"Evaluation of a visual design or system with the primary purpose of non-trivial analysis of provenance data."
Adaptive Systems
"A better understanding of the system and the user’s analytic process give rise to opportunities to create adaptive systems."
Model Steering
"Modeling steering leverages provenance data to improve the underlying data representations, machine learning models, or projection calculations in the case of high-dimensional datasets."
Replication, Verifiction, and Re-Application
"Using interaction logs to perform real-time or post-hoc quantification to validate the analysis results or to replicate the process when a similar problem arises."
Report Generation and Storytelling
"Provenance data are used to automatically generate summary reports of an analysis session."

WHAT types of provenance data to analyze

Now that we have considered the different reasons WHY researchers analyze provenance data, the next challenge is to determine how the user’s interactions can be encoded, represented, and stored. The choice of the encoding has a direct impact on the downstream analysis of the provenance data as well as the expected outcome. From the perspective of analysis techniques, the decision of how to encode the provenance data can also dictate the types of analyses that can be applied.

"Record of user's interactions with a visualization tool into a temporally ordered list."
"To examine and re-apply users' provenance information for automated analyses, rules and grammars are applied."
"The goal of provenance and interaction analysis is often expressed as the (machine learning) model that a user is constructing, steering, or exploring."
"Since the purpose of using a visualization is to explore data, discover patterns and relations, and eventually build knowledge, many visualization systems encode the user’s interactions as knowledge graphs, concept graphs, or history graphs."

HOW to analyze provenance data

User interactions collected from a visual data exploration or analysis session can be analyzed in a variety of ways. In the most simplistic cases, the user interaction data can be stored as part of the “undo/redo” mechanism with little data processing required. In thw WHAT section, we focus on complex analysis methods that researchers apply to the interaction data to derive insight into the user’s analysis intent, re-purpose past analyses, predict future user actions, or create analysis summaries.We organize the observed methods into five primary categories:.

Classification Models
"The overarching goal of classification is to map a user action to one or more categories."
Pattern Analysis
"Detect patterns in data or logs, either as an automated pattern analysis or a manual pattern analysis."
Probabilistic Models / Prediction
"Identify trends and predict possible future interactions in provenance data with probabilistic models."
Programm Synthesis
"Convert the user interactions into an executable script that can be applied to a new dataset by using a script or executable sets of operations based on past user interactions."
Interactive Visual Analysis
"Additional analysis methods, with an overarching theme of interacting with visual representations of provenance data and generating insight from such interactions."