Interactions and Screens in Research and Education

Theoretical and methodological framework

Theoretical and methodological framework for visual reflexive ethology

Christine Develotte

Morgane Domanchin

Samira Ibnelkaïd

Version française > Christine Develotte, Morgane Domanchin, Samira Ibnelkaïd, « Theoretical and methodological framework for visual reflexive ethology », Interactions and Screens in Research and Education (enhanced edition), Les Ateliers de [sens public], Montreal, 2023, isbn:978-2-924925-25-6,
version:0, 11/15/2023
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

The research presented in this book is based on an interdisciplinary approach to multimodal and multisemiotic interactional data. The polyartefacted seminar analysed here is the subject of a multidimensional study, which in our approach required audiovisual access to the sequences of verbal and non-verbal actions of the participants. This involves adopting a comprehensive ethological approach (Cosnier 1978), i.e., a “direct observation of behaviours experienced in the here and now” (Cosnier 2013, 258), taking into account interactional events as much as affects and empathic processes (Cosnier 2013).

In this chapter, we introduce the theoretical and methodological foundations underlying the collection, selection and analysis of this corpus of audiovisual data. We also justify our interest in this field of research leading to the emergence of what we call a “visual ethology”.

Theoretical and methodological choices

Ethology as a global approach to the field

Jacques CosnierJacques Cosnier was originally trained as a biologist.↩︎, a researcher and one of the founders of a laboratory in LyonCommunication ethology laboratory, the foundation of the current ICAR laboratory.↩︎, chose ethology to describe situations involving interpersonal communication (1978 ; 1986 ; 1987). Based on a descriptive analysis of human behaviour, this approach also includes individuals’ points of view observed through interviews.

Cosnier has called this naturalistic approach “comprehensive ethology”.

Reflexive ethology

He explains that:

the ethological method is particularly heuristic in approaches where observation is essential, for example in clinical, developmental and social psychology, i.e., wherever inter-individual communication is a privileged object of study (in Hotier 2001).

Comprehensive ethology

Cosnier notes that “human ethology is obviously in line with the ethnography of communication of Hymes and Gumperz, the microsociology of Goffman and the ethnomethodology of Garfinkel. Goffman himself spoke of the ethology of interaction” (Cosnier in Hotier 2001). He also specifies that “in this type of approach, we do not start with hypotheses, we end up with them. […] This naturalistic observational approach does not preclude complementary interviews with agents and users nor taking their experiences into account”. For these reasons, he refers to this approach as “comprehensive ethology”. An advantage of this approach is that it allows different types of data (behavioural and interview data, for example) to be cross-referenced. The approach was adopted by Cosnier and Develotte (2011) in their first research on online conversation (Develotte, Kern, and Lamy 2011).

Taking up this ethological perspective, we have sought here to develop a new approach: “visual reflexive ethology”. Our new approach deals with video interaction data and is applied to ourselves, thereby integrating the advantages and limitations due to the fact that the ethologist and their object are intertwined and that, in this case, the interviews were conducted between and among ourselves.

Collecting data for each of the sessions naturally modified the traditional seminar environment by adding microphones and cameras which, by their presence, could influence the participants’ behaviour. Our research takes this factor into account; it does not invalidate our naturalistic approach, which is accomplished precisely through the recording of our behaviour.

The “observer’s paradox”

The problem raised by the scientific observation of social behaviour from a social scientist is not new. The “observer’s paradox” was first described by William Labov in 1978 in the course of his sociolinguistic research – his study aimed at “trying to observe the language that people speak when they [were] not being observed” (Traverso 1999, 22). This paradox is based on researchers’ desire to “reproduce elements that are as close as possible to lived reality, whereas this reality must be subjected to the artificiality of systematic observation” (Mouchon 1985, 2). The methodological reflections undertaken since then have led to two main ways of bypassing this difficulty:

  • Either the researcher becomes a member of the community through immersion and a long period of observation in the field (Mouchon 1985, 2),

  • Or the observer and the observed are one and the same – “the linguist is also a member of the community being observed, as Labov undertook in his study of the Harlem ghetto” (Boutet 2002).

Our research group chose the latter solution. Certain artefacts used by the members of the group also served simultaneously to allow the participants to communicate with each other and to collect data (for example, the laptop both used video-conference platforms to communicate and dynamic screen capture software to collect interactional data). The computer screen was used both to interact online during the seminar and to capture these interactions for future analysis, as a medium for communication and for the observation of this same communication. Using the same screen for interaction and data collection thus reduced the number of artefacts we needed to handle simultaneously.

This approach requires making informed technical choices regarding the number of cameras and their location. We relied on the ICAR laboratory’s expertise when dealing with such matters. The visual reflexive ethological appraoch takes its place in a landscape of human and social sciences delineated by visual ethnography and interactional analysis.

Visual ethnography

In our view, the complexity of studying the presence of subjects on a screen requires a multimodal and multisemiotic approach. Therefore, we draw on visual ethnography (Ruby 1996 ; Banks and Morphy 1997 ; Pink 2007 ; Dion 2007) to explore the general ecology of physical-digital interactions and to explore the flow of these interactions across different media platforms and formats. In order to do so, we make use of digital tools available to researchers in digital humanities (digital cameras, dynamic screen captures, video editing software, etc.). This approach allows us to study both verbal and non-verbal communicative behaviours on and off screen, and leads us to understand onscreen presence as a linguistic, sensory and technical phenomenon.

Visual ethnography

Visual ethnography originates from the idea that social practices are manifested through visible symbols embodied in gestures, ceremonies, rituals and artefacts located in both natural and constructed environments (Ruby 1996, 1345).

As soon as social practices become visible, researchers must be able to use (audio)visual technologies (photos, videos etc.) in order to collect them and constitute data that can be exploited, analysed and disseminated (Ruby 1996, 1345). The image is an “intrinsic and not extrinsic element of the research process” in visual ethnography (Dion 2007, 62), a heuristic methodology seeking to “graph” (study and represent) the “ethnos” (culturalities, practices and social relations) through (audio)visual data and media. The visual medium, with still or moving images, is both a tool and an object of research (Dion 2007).

The visual approach cannot be a copy or substitute for verbal ethnography but must develop an alternative methodology and objectives that benefit anthropology as a whole (MacDougall 1997, 292). By focusing on (audio)visual data, visual ethnography offers new ways of understanding individuals, social relations, material cultures and ethnographic knowledge itself (Pink 2007, 22). The research methodology is thus based on three main activities (Banks and Morphy 1997):

  • Collecting (audio)visual data (analysing social practices by producing images);

  • Examining pre-existing (audio)visual data (analysing images that provide knowledge about society);

  • Collaborating with social actors in the production of (audio)visual data.

Within visual methods, video recording is more than a data collection tool – it is a technology involved in negotiating social relationsTechnology can only be used in the field if there is informed cooperation and explicit negotiations with the participants in order to establish a relationship of trust that is essential for the ethical constitution of the interactional dataset.↩︎ and a medium through which ethnographic knowledge is produced (Pink 2007, 173). Moreover, new digital technologies, interfaces and sociodigital networks are gradually introducing ethnographic studies of the everyday digital communication practices of individuals and communities (Pink 2007, 197). In addition to visual ethnography, a form of digital ethnography that is delinearised, multimodal and multisemiotic is emerging (Pink 2007, 197).

Interactional analysis

The notion of interaction has more or less restricted definitions depending on one’s approach towards it. Goffman, a linguist and sociologist, and one of the founders of interaction analysis, explains that:

Interaction (i.e., face-to-face interaction) is defined as the reciprocal influence of individuals upon one another’s actions when in one another’s immediate physical presence (Goffman 1973, 23).

Recipient design principle

Catherine Kerbrat-Orecchioni notes that in order to qualify a situation of interaction, “it is necessary and sufficient to have a group of participants that is variable but without rupture, who speak about an object that is alterable but without disruption in a spatio-temporal framework that is dynamic but without disruption” (1990, 216).

Therefore, in order to guarantee the flow of the interaction, the speaking activity needs to be adapted to its audience according to the principle of “recipient design”. This concept implies that “throughout the production process, the speaker takes into account the projective interpretation that they assume the listener will make of their words” (Kerbrat-Orecchioni 2005, 16). In developing the notion of recipient design, Harvey Sacks, Emanuel A. Schegloff and Gail Jefferson refer to the multiple resources visible in a speaker’s turn, which demonstrate an obvious orientation towards the co-participants. This process is reflected in the selection of lexical and thematic units, in the way sequences are ordered, and also in the obligations and alternatives chosen to open and close an interaction (Sacks, Schegloff, and Jefferson 1974, 727). The principle of recipient design allows interactants to structure their language resources in order to create a common focus of conversational attention and allows them to jointly construct and control the interaction. Recipient design also ensures the intelligibility of the elements that seem relevant and preserves the stability of the interactional link (Sacks, Schegloff, and Jefferson 1974, 727).

The overall conversational resources thus inform us about the activity that the participants construct, from turn-taking to the overall structure of the interaction. They result in the definition of the content, form and the modalities of presence brought into play. Some of the chapters presented here will analyse the participants’ language productions from an interactionist perspective initiated by Goffman, Sacks, Schegloff and Jefferson, and then pursued, by Cosnier, Kerbrat-Orecchioni, Véronique Traverso and Lorenza Mondada in France. In addition, the research gathered here aims at extending this interactionist approach by studying the impact of the screen on interactional rituals observed off-screen until now. This volume describes “the boundary between new practices and normative structures, as well as the appropriation by human actors of both the tools and the discursive or semiotic practices they induce” (Develotte, Kern, and Lamy 2011, 19).

Décrire la conversation en ligne (Describing Online Conversation)

The contributors to the book Décrire la conversation en ligne (Develotte, Kern, and Lamy 2011) were already part of a renewed approach to traditional logocentric analyses. These authors identified and adapted methods for analysing interactions by integrating elements of multimodality, vocal-postural-facial-gestural resources and multisemioticity that can be found especially in graphics and audio and video recordings. The present research emerges from these foundations and seeks to gain an interdisciplinary understanding of onscreen experiences based on multimodal and multisemiotic behaviours that we have made observable (through dynamic screen captures, video recordings, etc.).

A transdisciplinary approach: Visual reflexive ethology

We have chosen to employ a video-based approach to record, analyse and illustrate interactional phenomenaSupplemented by semi-guided and explanatory interviews.↩︎. Therefore, we chose not to transcribe verbal productions complemented with gesture-related annotations in the tradition of Conversation Analysis (initially based on audio recordings). Instead, we aimed at preserving the primary audiovisual material and guiding the reader-observer via a semiotic and narrative enrichment process applied in post-production. The video thus constitutes a mode of analytical representation in itself which follows a scenario established beforehand by the researcher. Video clips as dynamic illustrations are thus an innovative way of displaying data analyses and contributing to the renewal of the study of social interactions by making use of the technological tools available to researchers in digital humanities.

Beyond this general theoretical-methodological framework, the authors of the different chapters of this book have chosen other frameworks specifically adapted to their topic and presented in each chapter. The fact that we call upon different fields in our analyses implies that the same concepts are sometimes used differently depending on the chosen approach.

Material situation

In this section, we will first describe the “digital ecosystem(Bourassa 2018) of the seminar, emphasizing its material and human dimensions. The concept of digital ecosystem allows us to think of contexts as sites where multiple actors, both human and non-human, come into play, linked by organic, technical and dynamic relationships.

Renée Bourassa’s conception of the digital

The digital does not isolate itself, it is irreducibly interwoven with the physical world, and in an equally material way (Bourassa 2018).

In the case of our seminar, the face-to-face and remote dimensions are intertwined through communication tools and artefacts.

Spatial organisation

The Pedagogical and Digital Innovation Room (LiPeN)

The “Screen-based Multimodal Interactions” (IMPECInteractions Multimodales Par ÉCrans (IMPEC).↩︎) seminar was held at the École Normale Supérieure (ENS) in Lyon in a room used for teaching workshops. The open-plan room contained mobile and modular furnitureThe room is called LiPeN: “Laboratoire d’Innovation Pédagogique et Numérique”  (Pedagogical and Digital Innovation Room) ↩︎.

Communication set up

The first figure illustrates the communication device used in this hybrid seminar which included face-to-face participants in Lyon and remote participants connected via different artefacts.

Figure 1: Communication set up

The second figure describes the technical set up used to collect the data.

Figure 2: Recording set up

The third figure illustrates the phenomenological experience of the seminar from the different perspectives of the participants.

Figure 3: The seminar room in Lyon during the first session, as seen by Amélie in Caen

The process involved in bringing the subjects together is called here a “chronotope”, following the literary work of Bakhtin (1978). This notion refers to the enactment of a form of unity between time and place. Here the chronotope is enacted within a hybrid (physical-digital), reticular (network of participants, places, and communicational artefacts) and ever-evolving spatio-temporal framework. Each participant progresses from an objective place (any room) to a subjective place (their connection on their computer and one or more software programs) to an intersubjective place (the perception of the other participants via their artefacts)

Examples of chronotopes

Figure 4 : Seminar, Session 2, from Samira’s perspective
Figure 5 : Seminar, Session 3, from Tatiana’s perspective
Figure 6 : Seminar, Session 4, from Christelle’s perspective
Figure 7 : Seminar, Session 5, from Jean-François’s perspective

In the recorded sessions, depending on the sessionSee “Introduction”.↩︎, the remote participants were located in London (UK), Hangzhou (China), Besançon, Caen and Aix-en-Provence (France). As they were geographically spread around the globe, they used various artefacts to communicate.

Remote communication artefacts and their positions in the room

In this book we differentiate between the notions of “set-up”, “artefact” and “platform”. By set-up, we mean the organisation of multiple artefacts and the use of different platforms to produce forms of presence.

Communication device, artefact and platform

For example, the remote communication set-up in the seminar relied on communication and document transfer platforms (Adobe Connect, Skype, Beam’s embedded software, Google Drive, etc.) as well as the artefacts hosting these platforms (computers, tablets, a Kubi robot, a Beam robot, a video projector, a mobile webcam, etc.).

The notion of “artefactThe notion of artefact is also referred to in the “Introduction”, section “Multidisciplinary inspirations”.↩︎” distinguishes the human from the non-human. It thus designates any non-animated object without specifying its functionIn contrast, the term “tool” is based on the object: tools allow you to do something that you cannot do without them, or at least facilitate the action.↩︎. Artefacts (i.e., physical objects) are in this sense distinct from platforms (i.e., the software embedded in those artefacts): Skype is a platform that can be used on different artefacts – a computer, a tablet, a phone, etc.

The data collection set-up included microphones and video cameras to record interactional data.

Figure 8: The Beam robot
Figure 9: The Kubi robot

Project timeline

The general research program was set as follows:

Conduct of the seminar

The IMPEC seminar which hosted the “Digital presences” workshop was held on a monthly basis for a full day. In 2016 we began to structure the monthly seminars in two parts. In the sessions we studied, the first part focused on the work of doctoral students or guest lectures, while the second part was used to work on the project. While the first part of the seminar was open to all members of the ICAR laboratory, the second part was restricted to the participants in the study. Of the dozen or so participants in the group, about a third (not always the same people) attended the seminar remotely (either occasionally or regularly).

Two data sets

Two types of data were collected and will be presented in the following sections: first, the interactional data, and then the data from interviews conducted with the participants.

Interactional data

During the 2016-2017 academic year, the choice of five seminar sessions (see “Data collection system” below) was based on varying the communication situations as much as possible. We sought to place the guest lecturers alternately in face-to-face and remote situations (remotely via Beam or the Kubi robot or, in person, in Lyon), so as to multiply the communication scenarios to be studiedSee section on the “Specificities of a reflexive study”.↩︎. We also tried to vary different criteria, such as the status of each speaker (doctoral student or senior researcher).

Choice of conferences and speakers

Each intervention, whether a talk or a data session, was followed by a 45-minute discussion period with the in-person and remote participants (who were also part of the corpus). The guest lecturers were chosen according to the proximity of their research interests to ours, as they were likely to enrich our reflection on the notion of “digital presences”.

Data on participants’ affects

The interviews were mainly audio or video recorded. According to the research perspective being adopted, two types of methodologically different interviews were conducted: explicitation interviews (Vermersch 1994) and semi-guided interviews to clarify specific aspects.

Explicitation interview

The explicitation interview “focuses on the lived experience, and more precisely on procedural information, with the aim of reconstructing the structure of the action” (Martinez 1997, 2). In other words, explicitation aims at leading the participant to describe an action as precisely as possible and integrating their emotions, thoughts and actions associated with the provided description. All of this constitutes (in part) the structure of the action. Some of the interviews were filmed in order to capture the multimodal dimension of speech (behavioural cues, voice inflexions, gestures, etc.) of the subjective experience.

Other data collections related to individual perceptions were conducted in writing at the end of each session in order to evaluate participants’ feelings of co-presence. This type of data was collected asynchronously and hence mirrors a deeper reflective approach. In total, 4 questions were related to the participants’ feelings.

In addition, 18 audio and video interviews were transcribed and reviewed by the participant in questionFor the participants, multiple experiences were often not possible. Only a few were able to experience the use of all the remote communication artefacts.↩︎.

Data collection set-up

Selected sessions

The five selected sessions are summarised and commented on below:

Data collection 1 from 21/10/2016 Data collection 2 from 18/11/2016 Data collection 3 from 20/01/2017 Data collection 4 from 24/03/2017 Data Collection 5 from 28/04/2017
Session 1
Morgane’s data session
Session 2
The anthropologists’ talk
Session 3
Group work (Part 1 and Part 2)
Session 4
Susan Herring’s talk
Session 5
Christelle’s data session (Part 1 and Part 2)

Data collection 1 (in Lyon): Doctoral student Morgane Domanchin presented her progress on her doctoral thesis. The collection includes a 20-minute presentation and a 28-minute discussion. The presentation is entitled “Complexities in screen-based pedagogical interactions: the case of multitasking among learners”.

Data collection 2 (in Lyon): Evelyne Lasserre (University of Lyon 1) and Axel Guïoux (University of Lyon 2) gave a talk entitled “Mobilis Immobile – La présence au-delà de l’empêchement“Moving the Unmoveable – Presence beyond impediment”↩︎”. This collection includes a 45-minute presentation and a 52-minute discussion.

Data collection 3 (Lyon and remote): This working session contains a 32-minute presentation and a 68-minute discussion. The interactions took place between the participants of the “Digital presences” workshop in Lyon and its remote members. The aim of this session was to determine which research focus would be adopted by the different sub-groups. Each sub-group, in turn, presented its ideas which were discussed collectively in order to articulate the different research foci all together.

Data collection 4 (remote): This remote talk given by Susan Herring in San Diego (USA) was presented via the Beam robot and Adobe Connect. The talk includes a 50-minute presentation entitled “Discourse Pragmatics of Robot-Mediated Communication” and a 50-minute discussion.

Data collection 5 (remote): This remote talk given by Christelle Combe (Aix-en-Provence) conducted via the Kubi robot and Adobe Connect includes a 46-minute presentation and a 52-minute discussion. The presentation is entitled “From the imagined ethos to the ethos produced by online apprentices”. The presentation was then followed by a collective working session.

Five of the ten sessions from 2016–2017 were selected to constitute the research corpus which has a total duration of 9 hours and 16 minutes. Each session was filmed in Lyon from three to four different angles, and two to four different sound recordings were produced. In addition, at least two video recordings were collected at each session to document the behaviour of the remote participants through dynamic screen captures or external videos. These data can be arranged in a multi-screen format and, depending on the analysis, certain aspects can be zoomed inSee photos in appendix “Technical issues and methodological challenges”.↩︎.

Complex Corpora Center

The first two sessions (September and October) as well as the February session focused entirely on group discussions on the project. The latter session focused on exchanges aimed at adjusting the ongoing data collection.

In order to ensure the quality of the data collection and especially the videos which constitute the basis of our analyses, we received technical and material support from the Cellule de Corpus Complexes (CCC)Complex Corpora Center.↩︎ within the ICAR laboratory. The CNRS research engineers at this research support structure – particularly Julien Gachet, Justine Lascar and Daniel Valero for this project – offered us assistance at the different stages of our data collectionThe CCC was also of great help in the post-production process, and more specifically in the dissemination and publishing process.↩︎.

Data collection

The first step involved identifying sites and selecting the technical equipment to be used for the data collection (i.e., microphones, a webcam and video cameras). The purpose was to collect a large amount of footage that would be relevant for our research objectives.

The following choices were made:

View from video cameras on tripods

View 1
View 2

“Overhead” view from the GoPro action video camera

View from the 360° video camera

Setting up the equipment

We began setting up the equipment the day before each session so that the device and the electrical wires could be installed. The recordings started once all the remote participants were connected, at the beginning of each session of the seminar.

In order to standardise the data referencing, we have adopted the following names for the sessions and video camera views:

  • Session 1 – Morgane’s data session

  • Session 2 – The anthropologists’ talk

  • Session 3 – Group work (Part 1 and Part 2)

  • Session 4 – Susan Herring’s talk

  • Session 5 – Christelle’s data session (Part 1 and Part 2)

The excerpts analysed throughout this book will therefore be referred to using the session numbers above.

Post-production work

After each data collection, each recording source (audio, video, screen capture, remote view) was processed and synchronised with the same time scale (also called a “time code”).

This synchronisation helps to facilitate and enrich the analysis of phenomena by integrating different viewpoints (for instance, in-situ and ex-situ). Subsequently, the audio and video data were edited using Final Cut Pro X and QuickTime Pro software. The video clips chosen were multiscope (combining several shooting angles on the same screen) in which six to eight views were selected and combined simultaneously. During the production of these videos, the audio sources were integrated into the video filesFor example, unlike.mp4 files, .mov files using QuickTime Pro software allow researchers to check or uncheck an audio track. In the event of overlap between participants, for example, this feature allows one of the unchecked audio tracks to be silenced and can be useful for transcribing speaking turns.↩︎ to provide better distribution of the sound. These initial edited video clips formed the basis for the research subgroups’ analyses.

Video editing

Subsequently, the videos were edited to include semiotic enrichment (embedding verbal transcripts, graphics with analytical meaning — arrows, circles, and zoom-ins). This type of multimodal video has the particularity of making the overall ecology of the interaction accountable and highlights significant interactional micro-events. The video is therefore not a simple illustration of the scientific statement but an integral part of the scientific argument. The analyses proposed in this book are built around videos which are simultaneously a source of study, an analytical process and a demonstration of new theoretical concepts resulting from these analyses.

Adding photos with editing

Data collection’s summary

The table below summarises our data collection. It shows the recordings’ duration from the video and audio tracks, as well as the artefacts used for each session.

Session 1
Morgane’s data session
Session 2
The anthropologists’ talk
Session 3
Group work
(Part 1 Part 2)
Session 4
Susan Herring’s talk
Session 5
Christelle’s data session
(Part 1 Part 2)
00:48:44 01:37:00 (1) 00:32:43 (2) 01:08:00 01:40:00 (1) 00:46:49 (2) 00:52:57
8 video tracks 7 video tracks 7 video tracks 7 video tracks 7 video tracks
4 audio tracks 4 audio tracks 4 audio tracks 4 audio tracks 4 audio tracks
Adobe Connect and Beam Adobe Connect and Beam Adobe Connect, Kubi robot and Beam Adobe Connect, Kubi robot and Beam Adobe Connect, Kubi robot and Beam

In total, the “Digital Presences” corpus includes:

Data storage

The data has been stored in the Ortolang database which will be presented in “Document sharing tools”. In order to make it easier to share data among group members, the digitised sound and video data were classified and listed according to a nomenclature that made them easy to find. The data were then stored in folders associated with each of the five sessions presented above (i.e., IMPEC_LiPeN-year-month-day). A summary data sheet containing a brief description of all the views available is included in each of the data collections.

Developing synopses and setting up a collective workspace

During our meetings, we sought to establish an effective methodology to collectively annotate our data. We thus created “synopsis” files in digital workspaces (Google Drive) which were accessible to the whole group, in which each person was asked to enter events that were particularly relevant to their research focus.

Synopsis and key moments

More specifically, the participants were asked to 1) annotate time codes (starting and ending points) of events identified as in line with their research focus and 2) to write a descriptive comment. These synopses provided an overall view of the collection and helped to compare the annotations. The three research subgroups (attention, politeness and corporeality) thus identified the same five key moments in the data set that synthesised different significant aspects. The identification of these key moments helped each subgroup to focus on their research objects. This process later helped to open discussions on transcription methodology.


We transcribed the discussions that followed the talks in order to see how each of the artefacts we used took part in the exchanges, and the effects their presence had on the interactions. We opted for a minimal transcription limited to the verbal cues, providing a first “basic” transcript. The 18 (audio and video) interviews conducted with each participant from the research group were transcribed in the same way. These transcripts allowed the group to study the 14 hours and 36 minutes of audio recordings. The transcribed version of these interviews resulted in a 231-page transcript booklet, in which the transcripts were arranged in chronological order. This booklet was produced and distributed to the various participants in May 2019 and later on was released to the public.

The decision-making process and organisation of discussions

Participants’ roles in the seminar

Apart from the role of the seminar leader mentioned by Christine in the Introduction, other roles were assigned while still others emerged spontaneously during the sessions. The organisation of the seminar – both logistically and technically – was ensured by its members.

For example, Morgane, a doctoral student in Lyon who is highly involved in the life of the ICAR laboratory, supported the technical set-up of the room hand-in-hand with the members of the Cellule Corpus Complexes. She also monitored the digitisation of the videos and transcribed the interviews. In addition to this “official” technical and methodological assistance, other types of support helped the operation to function properlyDorothée was in charge of the Beam robot; Christelle set up the video-conferencing sessions on Adobe Connect and provided a space on Ortolang for our data storage; Caroline booked the material and the room, took notes on Google Doc and managed the Google Drive storage system.↩︎.

Collaborative data collection and analysis

The choice was made to involve all participants in each and every stage of the research project. The contribution of its members resulted in the co-construction of the project, implying a collaborative consulting policy among members throughout the tasks and sub-tasks we encountered. The discussions related to decision-making took the form of brainstorming workshops, data sessions or even opinions written in emails in between sessions. The topics of the discussions included for instance the choice of the camera’s location in the room in Lyon or the anonymisation or not of the data for the publications.

Decision-making process

Once a decision was agreed upon by the whole group, any members absent from the discussion were considered to have endorsed the decisions taken by the others, so as not to slow down progress on the project and to respect its timetable.

During the data sessions, visualisation proposals emerged, which were improved collectively and integrated as common resources for the group (Morgane’s device illustration, Samira’s chronotopes, etc., with mention of the authors involved).

Document-sharing tools

A scientific repository platform

The Ortolang platform is a facility designed for language data storage and processing, supported by the Huma-Num infrastructure. Its aim is to construct a network infrastructure including a repository of language data (corpora, lexicons, dictionaries etc.) and readily available, well-documented tools for its processing.


The platform’s “expected outcomes comprise:

  1. Promoting research on analysis, modelling and automatic processing of our language to their highest international levels thanks to effective resource pooling;

  2. Facilitating the use and transfer of resources and tools set up within public laboratories to industrial partners, notably SMEs which often cannot develop such resources and tools for language processing given the cost of investment;

  3. Promoting French language and the regional languages of France by sharing expertise acquired by public laboratories.” (Ortolang website)

This platform has hosted our data since the beginning of our project. It was chosen not only for its simplicity of use, user-friendly interface, and large storage capacity, but also for a feature which makes it possible to provide access to the corpus to various audiences, including researchers and even the general publicAccess the “Digital Presences” corpus on the Ortolang platform.↩︎. This open data approach reflects this project’s position supporting open science.

A shared space for non-sensitive data

We occasionally used Google Drive to store elements related to the project, especially to collaborate on publications, plan abstracts for conferences, comments on video extracts, etc. Google Documents was used for collective note-taking during the seminars, which were also archived on Google Drive. We also created and stored collaborative synopses associated with each session.

Google Drive

As we are aware of the ethical issues raised by the use of private platforms especially concerning data protection, we chose this shared space in a responsible and thoughtful way. This platform was only used to exchange notes and non-sensitive information, and the data corpus was stored on Ortolang. Google Drive provided the most convenient platform for us to use, as no other platform offered an equivalent service at the institutional level at the timeOther solutions have since been implemented, such as AMUbox in Aix-en-Provence.↩︎.

Writing process

The idea was to give an account of the group’s experience in a diffracted way by highlighting different aspects that seemed the most interesting to prioritise in our project.

Choice of the different chapters

In 2017, our brainstorming session based on research areas resulted in the identification of three key themes: attention, corporeality and politeness. The participants were each asked to join a team on one of the three themes. In September 2018, four additional chapters were added: 1) comparing the effect of presence according to the artefacts, 2) the co-construction of a form of collective intelligence, 3) research training and 4) the effect of group dynamics. For this reason, additional groups were then formed to work together.

The authors were chosen freely and the analyses were conducted in sub-groups. The idea underlying this choice was to let each subgroup convene a unique study angle which would differ from those adopted by the whole volume.

Group process

During the seminar, each of the sub-groups was asked to present their intended approach to the analyses, their theoretical-methodological angle and a few examples of relevant data. Each presentation led to numerous exchanges with the whole group, allowing certain points to be clarified and others to be enriched.

The feedback provided by the group throughout the writing process led to a two-day research workshop in June 2019 which focused on the first drafts of the various chapters.

Specificities of a reflexive study

Using oneself as the object of study on a topic such as digital presences is by no means trivial. On the contrary, it generates effects that must be integrated into the analyses.

Reflexivity on the research purpose

The effects of familiarity with the subject suggest that the research results relate to a “non-naive” audience who may adopt more appropriate behaviours (e.g., in positioning themselves in relation to the webcam or using the chat function on Adobe Connect) than an unsuspecting audience.

Moreover, neutrality becomes relative when the interviewers are close colleagues involved in the same project. It can be assumed that the preservation of each other’s face is reinforced, especially when all parties know that everything said will be made public. The quality of the socio-affective relationship between the members of the group is taken into account in handling the data (especially regarding opinions collected in interviews and questionnaires).

Ethical aspects of participants’ consent

Studying oneself as a group certainly avoids the problems of image rights and consent to use the videos for research purposes. This is what makes such a study feasible. However, the self-exposure implied by this decision must be consented to by every member throughout the various research stages, since this commitment engages the image of each participant over time. We will reconsider this aspect in more detail in the conclusion of this book.


Self-exposure is a classic feature of today’s online videos and is often associated with public lectures. Recording an online video enlists different discursive strategies according to the image that one wishes to be preserved of oneself (correction of language, humour, etc.). Yet the situation of enunciation here is much more complex than a situation of controlled monologue. The spontaneity that the polyartefacted set-up imposed on the participants – such as their reactions to the various technological glitches – were likely to disrupt the interaction flow. This means that they cannot control their ethos as they might have wished. Though it is precisely these communication difficulties that are at the core of our study, it can be challenging to manage the feeling of being overwhelmed by events, entangled in an unexpected situation, while knowing that our experiences of disarray are being recorded.

Public release of the dataset

The participants’ awkward and laborious exchanges are at the very heart of our project and are therefore acknowledged. However, the prospect of knowing that the participants’ voices and behaviours will serve not only as data for the study but also as data for other researchers is not insignificant. In addition to the pressure of recording the data, there was also the pressure of knowing that these data would ultimately be accessible to the scientific community. As a result, a number of nts were necessary to negotiate both the participants’ face work and the scientific community’s access to the data.

Adjustments to the dataset

The decision to use the participants’ first names in the volume breaks with a methodological tradition in the field of interaction analysis. Usually, only initials or pseudonyms are used. In our case, the reflexive nature of our study implies exposure both as an author and as a research object.

Moreover, the collective decision not to resort to anonymisation stemmed from its artificial nature in this specific context, as we would have been de facto recognised by the other group members. Assigning ourselves different first names would have therefore complicated the analyses in vain, which is why we decided to use our real first names.

As the interviews were conducted within our group and not by outsiders, the forms of language, humour or language level of the exchanges relate to the social proximity of the participants. For this reason, it was agreed that the audio files would only be used by the participants in the research group and that only their transcripts would be made public.

In addition, once the interviews had been transcribed, they were discussed in groups and it was decided that some members would review the initial transcripts and “clean them up” by completing sentences, elucidating implicit and deictic statements, and clarifying them before releasing the transcripts publicly.

Towards “visual ethology”

Through the theoretical and methodological framework presented in this chapter, we aim to lay the foundations of what we call “visual reflexive ethology”. This research approach takes on a video-based methodology and places emphasis on the participants’ (inter)subjective experience.

Visual reflexive ethology

Christine Develotte and Samira Ibnelkaïd

Visual reflexive ethology is based on the ethology of human communication as defined by Jacques Cosnier (1978 ; 1986 ; 1987): a science of observation of human behaviour paired with an awareness of the feelings and emotions of the subjects observed. Here, we applied this “comprehensive” ethology (éthologie comprehensive, Cosnier in Hotier 2001) to ourselves (both participants producing spontaneous interactional data and researchers analysing this data). For this reason, we describe our approach as “reflexive”. The notion of reflexivity in social sciences can be defined as “the ability of the subject to consider their own activity in order to analyse its genesis, procedures or consequences” (Bertucci 2009).

The reflexive posture entails the need to develop both a capacity for subjective introspection (being able to observe one’s own practices) and a decentring of one’s own perspective (looking critically at one’s practice and approach). Reflexive ethology brings about technical benefits for researchers (access to the field, easy understanding of deictic elements, of spatio-temporal context, of interactional history, etc.). Nonetheless, this approach also gives rise to intersubjective difficulties (self-exposure, group management, interpersonal relationships, levels and modalities of involvement, scholarly rigour and “axiological neutrality” (Weber 1917), etc.). Reflexive ethology thus requires a strong commitment on the part of the researchers involved.

Tools can facilitate reflexive practice by mediating between interactional events and the subjective experience of the “researcher-participants”: this is the case for video recordings in particular. Indeed, the observation and analysis of practices in this research is based on video recordings of interactional data (using digital cameras, dynamic screen captures, video editing software, etc.). The use of video resources in this reflexive ethological and hence “visual” approach can be connected to studies in visual ethnography (Pink 2007 ; Dion 2007). The image, whether static or dynamic, is shown to be both the instrument and the object of research (Dion 2007). Per this logic, video images are not only illustrations, but constitute the material for the collection and analysis of the interactional data, and they are also used semiotically to present the results of these analyses. Various graphic techniques (multicam editing, circling and arrows, zoom-ins, gifs...) provide a visualisation or explicitation of the analyses or the theoretical concepts that were created within the very interaction being studied.

Visual reflexive ethology aims to be a transdisciplinary approach that places multimodality, subjective experience and sensory perception at the heart of interactional analysis, from observation to scholarly dissemination.

Bakhtin, Mikhaïl. 1978. Esthétique et théorie du roman. Collection Tel 120. Paris: Gallimard.
Banks, Marcus, and Howard Morphy, eds. 1997. Rethinking Visual Anthropology. New Haven: Yale University Press.
Bertucci, Marie-Madeleine. 2009. “Place de la réflexivité dans les sciences humaines et sociales : quelques jalons.” Cahiers de sociolinguistique n° 14 (1): 43–55.
Bourassa, Renée. 2018. “Design des écosystèmes numériques : des modèles éditoriaux stabilisés vers l’intégration de la conversation scientifique.” Lyon.
Boutet, Josiane. 2002. “Pratiques langagières ; Formation langagière.” In Dictionnaire d’analyse du discours, edited by Patrick Charaudeau and Dominique Maingueneau. Paris: Éditions du Seuil.
Cosnier, Jacques. 1978. “Spécificité de l’attitude éthologique dans l’étude du comportement humain.” Psychologie Française 23 (1): 19–26.
———. 1986. “Ethology : A Transdisciplinary Discipline.” In Ethology and Psychology, 19–28. Toulouse: Privat, I.E.C.
———. 1987. “L’éthologie du dialogue.” In Décrire la conversation, edited by Jacques Cosnier and Catherine Kerbrat-Orecchioni, 291–315. Linguistique et Sémiologie : travaux du Centre de recherches linguistiques et sémiologiques de l’Université de Lyon II. Lyon: Presses universitaires de Lyon.
———. 2013. “Cinquante ans d’interactionnisme : Introduction pour une éthologie compréhensive. Écrits colligés (1963-2013).”
Cosnier, Jacques, and Christine Develotte. 2011. “Éthologie compréhensive de la conversation en visioconférence poste à poste.” In Décrire la conversation en ligne : le face à face distanciel, edited by Christine Develotte, Richard Kern, and Marie-Noëlle Lamy. Lyon: ENS Éditions.
Develotte, Christine, Richard Kern, and Marie-Noëlle Lamy, eds. 2011. Décrire la conversation en ligne: la face à face distanciel. Lyon: ENS Éditions.
Dion, Delphine. 2007. “Les apports de l’anthropologie visuelle à l’étude des comportements de consommation.” Recherche et Applications en Marketing (French Edition) 22 (1): 61–78.
Goffman, Erving. 1973. La présentation de soi. La mise en scène de la vie quotidienne 1. Paris: Les Éditions de Minuit.
Hotier, Hugues. 2001. “Entretien avec Jacques Cosnier.” Communication et organisation, no. 19 (May).
Kerbrat-Orecchioni, Catherine. 1990. Les interactions verbales. Tome I. Paris: Armand Colin.
———. 2005. Le discours en interaction. Paris: Armand Colin.
MacDougall, David. 1997. “The Visual in Anthropology.” In Rethinking Visual Anthropology, edited by Marcus Banks and Howard Morphy, 276–95. New Haven: Yale University Press.
Martinez, Claudine. 1997. “L’entretien d’explicitation comme instrument de recueil de données.” Expliciter 21: 2–7.
Mouchon, Jean. 1985. “À propos de la notion de ‘paradoxe de l’observateur’ en sciences humaines.” Semen. Revue de sémio-linguistique des textes et discours, no. 2.
Pink, Sarah. 2007. Doing Visual Ethnography: Images, Media, and Representation in Research. 2nd ed. London ; Thousand Oaks, Calif: Sage Publications.
Ruby, Jay. 1996. “Visual Anthropology.” In Encyclopedia of Cultural Anthropology, edited by David Levinson and Melvin Ember, 4:1345–51. New York: H. Holt.
———. 2000. Picturing Culture: Explorations of Film & Anthropology. Chicago: University of Chicago Press.
Sacks, Harvey, Emanuel A. Schegloff, and Gail Jefferson. 1974. “A Simplest Systematics for the Organization of Turn-Taking for Conversation.” Language 50 (4): 696–735.
Traverso, Véronique. 1999. L’analyse des conversations. Collection 128 Linguistique 226. Paris: Nathan.
Vermersch, Pierre. 1994. L’entretien d’explicitation en formation initiale et en formation continue. Paris: ESF Éditeur.
Weber, Max. 1917. “Essai sur le sens de la « neutralité axiologique » dans les sciences sociologiques et économiques.” In Essais sur la théorie de la science, 2006th ed. Classiques des sciences sociales. Chicoutimi: J.-M. Tremblay.