Thanks for Jiajin's reference list, and I'd like to add two. In the recent e-mail communication, Prof. Gu just sent his two new articles on situated discourse to me. As they are working papers, it might not be ok to unload them here. But, for a sneaky view, here are the abstracts of the two articles:
1. A multimodal text analysis of an anniversary ceremony in China
This paper presents a multimodal text analysis of the 60th anniversary ceremony of Beijing Foreign Studies University (BFSU for short hereafter) in China held in 2001 in a lecture theatre with about 500 participants. The ceremony, lasting about 66 minutes, was videotaped live. The video streams were digitalized and stored in avi format. The term “multimodal text” will be used to refer to these digitalized moving images, and the term “multimodal content” to what the moving images are about. The interacting participants that give life to the ceremony are the authors, so to speak, of the multimodal text. The camera man or woman who produces the footage is a footage person. The footage person is not the “author” of an multimodal text, since s/he has no control at all on the multimodal content. All s/he does is the selection of shooting angles, and close-up shots. Leaving these two biases aside, the multimodal content of a multimodal text is factually accurate.
2. Multimodal Text Analysis: A corpus linguistic approach to Situated Discourse
Discourse analysis or conversation analysis approaches audio or video data by way of
transcription. It takes sentence or utterance as its point of departure, from which it moves up to discourse or conversation, or down to parts of a sentence or utterance. The present study departs from this mainstream paradigm by outlining and demonstrating a corpus linguistic approach to multimodal text analysis which starts from the analytic unit of social situation, to that of activity type, task/episode, and the participants’ behavior of talking and doing. The primary data consists of video streams with synchronized sounds rather than orthographic transcripts. The segmentation and annotation of non-discrete streams of a multimodal text are demonstrated in accordance with the latest Text Encoding Initiative (TEI P4) .