Multimodal Interactions in Network Data Visualization

Background

For my HCI Masters Project, I pursued a research project to investigate multimodal interaction in Network Visualization. Specifically, I studied how the individual modalities, Speech and Touch, lend themselves to Network Exploration tasks as well as the effects of priming with one modality when it comes to using both modalities in a multimodal system.

I collaborated with a PhD student in the Vis Lab at Georgia tech. He had created a multimodal system for network Visualization called Orko. He had conducted preliminary evaluations of the system earlier.

ORKO - Original system developed by PhD student

Research Objective

As part of my objectives, we wanted to take a closer look at really understanding how these modalities were utilized, and came up with the bunch of research questions outlined below.

Objectives

Research Questions

Process

I made improvements to the multimodal system that was already built, based on the feedback from the previous evaluative user study.

In order to truly understand how Natural language and Touch truly support interactions individually, I derived two new systems that were functionally equivalent to the multimodal system supporting interactions only via Touch and speech respectively.

This gave us the ability to deeply investigate the individual modalities as well as set us up for studying priming effects of one modality in how a multimodal system is used.

Improved ORKO
Unimodal System - Using only Speech input
Unimodal System - Using only Touch input

Design of Experiment

I designed an experiment to answer our research questions mentioned above. I recruited a total of 18 participants and split them into 3 groups of six each.

  1. The first group interacted with 2 systems  - Touch-only system followed by the improved multimodal system.
  2. The second group interacted with 2 systems - Speech-only system followed by the improved multimodal system.
  3. The third group interacted with just the improved multimodal system.
Design of the study as it relates to the research questions

Design of Tasks

I designed a total of 6 tasks for each system, 5 of which were close-ended and one was designed to encourage open exploration of the system. The tasks also were constructed with an aim to span a set of common network exploration tasks (finding nodes, finding connections, finding paths, filtering nodes, visually encoding nodes, etc.). Participants could choose any of the above operations supported by the system to complete a specific task.

In order to really achieve our objectives of understanding the modalities and how they are utilized in the system, the tasks used in the evaluation needed to have the following characteristics:

Example Close-ended Task

Let us call airports that have direct flights to 55 or more other airports as “popular” airports. Visually prove that China has the most number of “popular” airports. Now, assume that you are traveling from Sydney Kingsford Smith airport to Domodedovo through one of these “popular” airports. Yes or No, must you then be travelling through either Thailand or China?

Example Open-ended Task

Pick any two airports that have at least one direct international flight. Consider these two airports and the airports they have direct flights to. Now compare the two groups of airports with regard to
Accessibility
Altitude ranges
Variety of time zones
You may also list any additional observations you make based on interacting with the network.

Measures

Data Analysis

The sessions were video recorded and we performed a closed coding of the recordings, using the 6 different network operations as our pre-established codes. A total of  945 interactions corresponding to the different network data operations across the 18 participants and the two study interfaces were recorded as shown below.

Distribution of interactions used by the 1st group, in both the Unimodal (touch) and Multimodal systems.
U: Unimodal interface, M: Multimodal interface, S: Speech, T: Touch, ST: Multimodal interactions.
A ‘-’ indicates that a modality was not supported in a condition or that participants were not assigned to a condition.
Distribution of interactions used by the 2nd group, in both the Unimodal (speech) and Multimodal systems.
U: Unimodal interface, M: Multimodal interface, S: Speech, T: Touch, ST: Multimodal interactions.
A ‘-’ indicates that a modality was not supported in a condition or that participants were not assigned to a condition.
Distribution of interactions for the 3rd Group of participants using only Multimodal System.
 S: Speech, T: Touch, ST: Multimodal interactions.
A ‘-’ indicates that a modality was not supported in a condition or that participants were not assigned to a condition.

High Level Findings

100% participants preferred Multimodal Interaction to unimodal interaction
Our qualitative observations and feedback based on the post-session debrief suggested that participants preferred multimodal interactions for the following reasons.
  • Freedom of Expression
The combination is certainly better. Voice is great when I was asking questions or finding something I couldn’t see. Touch let me directly interact.
  • Complementary nature of modalities
I liked that I could correct with touch. Because it’s not always going to be perfect right. Like the smart assistant on the phone sometimes gets the wrong thing but doesn’t let me correct and just goes okay.
I used voice when I didn’t know how to do it with touch.
  • Integrated interaction experience
It was somehow less complex even though more interactions were added.
Priming participants with one input modality did not impact how they interacted with the multimodal system
Our findings indicated that participants who had prior experience working with the unimodal system (P1-P12) interacted with the multimodal system comparably to participants (P13-P18) who worked only with the multimodal system.
The single most important aspect that decided which modality would be used was the operation being performed. for Example. participants P1-P6 when interacting with the multimodal interface, they switched to using only speech commands even though they had all previously performed the operation using touch. This sort of observation was also confirmed when we asked the participants about these at the end of the sessions.
Now that I think of it, not consciously but I did use speech to mostly narrow down to a subset and then touch to do more detailed tasks.
Participants expected the system to be more conversational and even “answer” questions
Given the availability of speech as an input modality, unsurprisingly, participants expected the system to be more conversational and even “answer” questions. For instance, one participant said
Working with the system for a while starts making you want to ask higher level questions and get specific answers or summaries as opposed to just the visualization.