A large part of what participants "accountably" do in face-to-face interaction is accomplished via actions constructed in a visuo-spatial mediums as oppose to aural: i.e., gesture, the face, body, orientation and movement, co-present space, and tools and other materials.

We have orthography for transcribing talk, conventions for capturing certain non-segmental features, as well as for depicting its temporal organization (i.e., the horizontal and vertical axis both depict linear movement in time). These make talk especially its segmental construction most amenable to analysis, whether investigating sequential organization, turn-construction, turn-taking, action-formation, etc.

Despite the trend for interactional studies of co-present interaction and recent development of conventions for depicting visual data and organizing it with textual transcripts, I think there is still room for exploring how social actors coordinate aural and visuo-spatially encoded actions.

A lot of the work I will discuss here really intersects with this problem: adequately depicting those interactive practices accomplished through spatial, material, and visually encoded actions--and how their depiction in disciplinary texts either informs or detracts from the analysis.