Coordinating Joint Action
This is a slightly extended version of published chapter. (The pdf is the chapter version.)
Introduction
It is often necessary that agents’ actions are coordinated if they are to successfully exercise shared (or ‘collective’) agency in acting together. An eloping couple clink plastic beakers of cheap wine together to toast their escape, sharing a smile of achievement; on the beach in front of them a small group of roadies are putting up a marquee outside for a concert later that evening while the musicians, having been made to wait while the audio technicians replace a cable, playfully improvise on stage. In cases like these, successfully exercising shared agency involves coordinating actions precisely in space and time. Such precise coordination is not, or not only, a matter of having intentions and knowledge, whether individual or collective. Intentions and knowledge states may play a role in long-term coordination—they may explain, for instance, why the couple’s both being on the beach tonight is no accident. But they cannot explain how the precise coordination needed to clink beakers or to share a smile is achieved. Given that is not only intention or knowledge, what does enable two or more agents’ actions to be coordinated and so enables exercises of shared agency such as these?
Much psychological and neuroscientific research bears directly on this question. This chapter introduces that research: it outlines some of the key findings and describes a minimal theoretical framework, identifying along the way issues likely to be of interest to researchers studying collective intentionality.
Joint Action
Where philosophers tend to focus on notions such as intentional shared agency, scientific research on coordination mechanisms is usually interpreted in terms of a broader and simpler notion of joint action. This is standardly defined by appeal to Sebanz, Bekkering, and Knoblich (2006)’s working definition as:
‘any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment’ ((Sebanz, Bekkering, and Knoblich 2006, 70)).
Although widely used, this working definition has some drawbacks. It requires that joint actions should be ‘social interactions’, thereby raising tricky issues about which interactions are social. The working definition also appears to require that coordinating their actions is something the individuals involved in joint action do, perhaps even requiring that this is done with the end of bringing about a change. As we will see, there are reasons to consider the possibility actions can be coordinated without both (or even either) requirements being met. We can avoid the drawbacks while remaining true to the implicit conception underlying scientific research with a simpler and extremely broad definition:
A joint action is an event grounded 1 by two or more agents’ actions.
This definition of joint action, like Sebanz, Bekkering, and Knoblich (2006)’s working definition, is neutral on representations and processes. So when two people swing their arms in synchrony, the event of them swinging their arms is a joint action. Likewise, if fish are agents then the movements of a shoal are joint actions. 2
How does research on coordination in joint action bear on the question about shared agency? Not all joint actions involve exercising shared agency, but some or all exercises of shared agency are, or involve, joint actions. It is reasonable to conjecture that what enables the actions of agents exercising collective agency to be precisely coordinated are mechanisms of coordination common to many different forms of joint action. To illustrate, consider entrainment.
Entrainment
Entrainment, the process of synchronizing two or more rhythmic behaviours with respect to phase, is a feature of everyday life. People walking side by side may fall into the same walking patterns (Ulzen et al. 2008; Nessler and Gilliland 2009), conversation partners sometimes synchronize their body sway (Shockley, Santana, and Fowler 2003) and gaze (D. C. Richardson, Dale, and Kirkham 2007), clusters of male fiddler crabs wave their claws synchronously to attract mates (Backwell et al. 1998; Merker, Madison, and Eckerdal 2009), and an audience will sometimes briefly synchronise its clapping (Néda et al. 2000).
As these examples suggest, entrainment enables the coordination of a wide range of joint actions, not all of which involve shared agency. In fact interpersonal entrainment is sometimes treated as a special case of a process by which sequences of actions can be synchronised with sequences of environmental stimuli such as a metronome (e.g. Repp and Su 2013; Konvalinka et al. 2010), and, more boldly, sometimes even as just one instance of what happens when oscillators are coupled (e.g. Shockley, Richardson, and Dale 2009, 314).
Which exercises of shared agency might entrainment enable? Entrainment allows for extremely precise coordination of movements (Repp 2000) and is probably essential for joint actions involving rhythmic music, dance, drill, and some martial arts.
How is entrainment related to agents’ intentions concerning coordination or the lack thereof? Entrainment of two or more agents’ actions can occur without any intention concerning coordination (e.g. Varlet et al. 2015), and without the agents being aware of the coordination of their actions (Michael J. Richardson, Marsh, and Schmidt 2005). Further, although subjects can sometimes intentionally prevent entrainment, entrainment and related forms of coordination do sometimes occur even despite individuals attempting not to coordinate their actions (e.g. Ulzen et al. 2008; Issartel, Marin, and Cadopi 2007). So whether two agents’ actions become entrained is not always, and perhaps not typically, something which they do or could control.
But entrainment is not always independent of agents’ intentions (Miles et al. 2010; Nessler and Gilliland 2009). Because no one can perform two actions without introducing some tiny variation between them, entrainment of any kind depends on continuous monitoring and ongoing adjustments (Repp 2005, 976). One kind of adjustment is a phase shift, which occurs when one action in a sequence is delayed or brought forwards in time. Another kind of adjustment is a period shift; that is, an increase or reduction in the speed with which all future actions are performed, or in the delay between all future adjacent pairs of actions. These two kinds of adjustment, phase shifts and period shifts, appear to be made by mechanisms acting independently, so that correcting errors involves a distinctive pattern of overadjustment. 3 Repp (2005, 987) argues, further, that while adjustments involving phase shifts are largely automatic, adjustments involving changes in frequency are to some extent controlled. This may be key to understanding the influence of intention on entrainment. One way or another (contrast Fairhurst, Janata, and Keller (2013, 2599) with Repp and Keller (2008)’s ‘coordinative strategies’ proposal), intentions play a role in frequency adjustments and thereby influence how tightly agents synchronise their actions.
Entrainment is clearly necessary for coordination in many joint actions requiring precise synchronisation such as those involving rhythmic music or dance. Entrainment may also be important in ways as yet barely understood for a much wider range of joint actions in which such precise synchronisation initially appears unnecessary. 4 But there must be more to coordinating joint actions than entrainment. After all entrainment depends on repetition whereas many joint actions are one-off events, as when the a couple clink plastic beakers. Which forms of coordination enable one-off joint actions?
Motor Simulation
Many one-off joint actions—those which do not depend on repetition or rhythm—require precise coordination. In clinking beakers, swinging a toddler between our arms, and executing a pass in football, the window for success may be fractions of a second in duration and but millimetres wide. One way—perhaps the only way—of achieving such precise coordination depends on the existence of a phenomenon often called ‘motor simulation’ or ‘mirroring’. What is this?
To understand motor simulation it is necessary first to get a rough fix on the idea that motor processes and representations are involved in performing ordinary, individual actions. Preparing for, and performing, bodily actions involves not only intentions and practical reasoning but also motor representations and processes. To illustrate, consider a cook who has grasped an egg between her finger and thumb and is now lifting it from the egg box. She will typically grip the egg just tightly enough to secure it. But how tightly she needs to grip it depends in part, of course, on the forces to which she will subject the egg in lifting it. The fact that she grips eggs just tightly enough throughout such action sequences which vary in how she lifts the egg implies that how tightly she grips the egg depends on the path along which she will lift it. This in turn indicates (along with much other evidence) that information about her anticipated future hand and arm movements appropriately influences how tightly the cook initially grips the egg (Kawato 1999). This fine-grained, anticipatory control of grasp, like many other features of action performance ((see Rosenbaum 2009, chap. 1) for more examples), is not plausibly a consequence of mindless physiology, nor of intention and practical reasoning. The processes and representations it depends on are motoric.
Motor processes and representations lead a double life: they occur not only in performing actions but also observing them. For instance, in someone observing the cook gripping and lifting the egg, there may be motor processes and representations related to those which would occur in her if she, the observer, were performing this action herself. One dramatic piece of evidence for this claim comes from a study in which activity in an observer’s motor cortex was artificially boosted with transcranial magnetic stimulation (TMS). This caused minute patterns of activation (specifically, motor-evoked potentials) to occur in a muscle of the observer at just the times the agent being observed used the corresponding muscle. 5 As this illustrates, motor processes in an observer can carry detailed information about the timing of components of actions. Motor simulation is the occurrence of motor processes and representations in an observer concerning an action which she is observing or imagining and which are driven by observing or imagining that action.
Motor simulation enables observers to anticipate how others’ actions will unfold and the likely outcomes the actions will achieve (Wolpert, Doya, and Kawato 2003; Wilson and Knoblich 2005). Such anticipation is reflected both in explicit judgements (e.g. Aglioti et al. 2008) and in spontaneous eye movements (e.g. Flanagan and Johansson 2003; Rotman et al. 2006; Costantini et al. 2014; Ambrosini, Costantini, and Sinigaglia 2011).
How does any of this bear on the coordination of joint action? If motor simulation is to play a role in coordinating joint actions, agents must be capable of using anticipation based on motor simulation in preparing and performing actions different from those simulated. Accordingly, Kourtis, Sebanz, and Knoblich (2013) used neural markers of motor activity to show that motor simulation can occur in joint action even where agents are performing different actions in close succession. To investigate, further, whether motor simulation in joint action can facilitate coordination, Vesper et al. (2013) instructed pairs of people to jump and land at the same time. They found evidence that in some subjects there was a motor simulation of her partner’s jump which influences how she herself jumps and so enables precise coordination in landing together. This is one example of how motor simulation may enable coordination in joint action. 6
Reflecting on entrainment and coordination driven by motor simulation, it is striking that one-off motor simulation allows greater flexibility at the cost of some precision. Is there a more general trade-off between flexibility and precision in mechanisms underpinning coordination? If so, what might this tell us about the nature of mechanisms underpinning coordination for joint action and their relations to each other?
Flexibility vs Precision
Consider two ways of partially ordering mechanisms underpinning coordination. The first is precision: How precise, in space and time, is the coordination they underpin in the best cases? For instance, mechanisms underpinning entrainment enable expert musicians to coordinate their actions to within tens of milliseconds, whereas one-off motor simulation permits coordination of actions to within larger fractions of a second. A second partial ordering is flexibility: How wide is the range of situations in which this mechanism can underpin coordination? For instance, motor simulation can underpin coordination whether or not repetition or rhythm is involved, unlike entrainment. Thinking just about motor simulation and mechanisms underpinning entrainment, there appears to be a trade-off between precision and flexibility. This appears to generalise to other forms of coordination too, such as forms of coordination driven by shared intention. Gains in flexibility seem to come at the cost of precision.
Why? Before attempting to answer this question, it is useful to fix terminology with some stipulations. A goal of an action or behaviour is an outcome to which it is directed. Relative to a particular action or behaviour, goals can be partially ordered by the means-end relation. In saying that one goal is more abstract than another relative to a behaviour or action, I shall mean that the latter is linked to the former by a chain of outcomes ordered as means to ends. A goal-state is a mental state (or a structure of mental states) which represents, or otherwise specifies, an outcome and is the kind of thing in virtue of which some actions or behaviours can be directed to certain outcomes. Given that intentions are mental states, they are paradigmatic goal-states. But intentions are not the only goal-states: as we saw in 4, some motor representations are also goal-states. 7
So why might flexibility in a mechanism underpinning coordination come at the cost of precision? One possibility involves two conjectures. First, achieving flexibility generally depends on representing goals, and the more abstract the goals that can be represented, the greater the flexibility. To illustrate, entrainment can occur without any representations of goals at all, whereas motor simulation involves motor representations which are goal-states. But relative to intentions or knowledge states, motor representations are limited with respect to how abstract the outcomes they can specify are. Motor representations can specify outcomes such as grasping or transporting a fragile object, and even sequences of such outcomes (see, e.g., Fogassi et al. 2005). But they cannot specify outcomes such as selecting an organic egg or testing for freshness: motor processes and representations are mostly blind to things so distantly related to bodily action. A further conjecture is that processes involving more abstract goal representations typically (but not necessarily always) place greater demands on cognitive resources, which typically (but not necessarily always) results in lower precision. This conjecture is suggested by an analogy with the physiological. Because physiological processes are a source of variability, coordinating with a given degree of precision should get harder as the duration and complexity of the actions to be coordinated increases. Given that cognitive processes, like physiological processes, are a source of variability, increasing cognitive demands by relying on representations of more abstract goals should likewise increase variability and so limit precision.
In short, flexibility may come at the cost of precision because increasing flexibility requires representations of more abstract goals, which impose greater cognitive demands and thereby increase variability, so reducing how precise the coordination underpinned by a mechanism can be in the best cases. This may be why forms of coordination such as entrainment and motor representation can occur independently of, and even contrary to, intentions concerning coordination: precision requires such independence.
Thinking about trading precision for flexibility suggests that there is a gap in the forms of coordination so far considered. To see why, consider the situation of a couple alone on a beach. Having filled plastic beakers with wine, they spontaneously and fluidly clink them together in a toast without spilling a drop of wine. To explain how they are able to coordinate so precisely we cannot appeal to motor simulation alone; but it would be no less plausible to appeal only to practical deliberation involving intentions or other propositional attitudes. We need something more flexible than motor simulation and more precise than practical deliberation.
Task Co-representation
Consider individual agents acting alone for a moment. A task representation links an event to an outcome in such a way that, normally, the event’s actual or expected occurrence would trigger motor preparation for actions that should realise the outcome. Why do we need task representations? Imagine yourself cycling up to a crossroad. Even if you are concentrating hard on dodging potholes without being hit by the rapidly approaching car behind you (will it slow down or should you risk going through this hole?), it is likely—hopefully—that the traffic light’s turning red will cause you to brake. The connection between red light events and braking actions need not require intentional control, thanks to task representations.
How is task representation relevant to coordinating joint actions? Let us say that two individuals have a task co-representation if there is a task concerning which each has a task representation. 8 Sebanz, Bekkering, and Knoblich (2006) argue that the agents of a joint action can have a task co-representation concerning a task which only one of them is actually supposed to perform. This, they suggest, would enable agents to exploit motor simulation prior to, and independently of, observing the any actual actions. Thus task co-representation could in principle greatly extend the range of situations in which motor simulation could underpin coordination in joint action. To illustrate, consider again the couple on the beach filling beakers with wine and then clinking them together. As noted earlier (in 5), their doing this spontaneously, fluidly and with precision could not be explained by motor simulation alone when neither of them plays the role of leader. But it could be explained by Sebanz et al’s proposal about task co-representation. If the couple expect to clink beakers after the wine is poured and have task co-representations concerning each’s task in the clinking, then they will be able to use motor simulation to anticipate each other’s actions in advance of starting to act. This is one illustration of how task co-representation might underpin coordination for one-off joint actions where agents have to respond to events in ways they have never done before.
The task co-representation hypothesis—agents involved in a joint action can have a task co-representation concerning a task that only one of them is supposed to perform—generates a variety of predictions. It predicts interference and facilitation effects: when acting together with another, your performance of your task will be affected by facts about which task the other is performing, and your performance will be impaired or enhanced in ways analogous to those in which it would be affected if you were performing both tasks alone. This prediction has been confirmed for a variety of tasks (Sebanz, Knoblich, and Prinz 2005; Atmaca, Sebanz, and Knoblich 2011; Böckler, Knoblich, and Sebanz 2012; Wel and Fu 2015). The task co-representation hypothesis also predicts that, in some situations when you are acting with another, events linked to the other’s task will trigger some preparation (but not necessarily full preparation) in you for a task which is actually supposed to be performed by the other. Evidence in support of this prediction includes signs that agents of a joint action sometimes inhibit tendencies to act when another, rather than she herself, is supposed act (Sebanz et al. 2006; C.-C. Tsai et al. 2008), as well as signs that agents of a joint action are sometimes preparing for, or even covertly performing, actions that another is supposed to perform (e.g. Kourtis, Sebanz, and Knoblich 2013; Baus et al. 2014). 9
Task co-representation is valuable in coordinating joint actions at least in part because it is more flexible than bare motor simulation while also more precise than practical reasoning. But there is a limit to what can be explained with either motor simulation or task co-representation, at least as we have conceived of them so far. Suppose motor simulation (whether or not triggered by a task co-representation) enables agents of a joint action to anticipate each other’s actions. How could these anticipations inform preparation for their own actions, and, in particular, how could they do so without requiring cognitive processes inimical to precision? To offer even a candidate answer to this question requires going beyond motor simulation and task co-representation as we have so far conceived them.
Emergent vs Planned Coordination
In thinking about coordination for joint action it is useful to have plural counterparts of the notions of goal and goal-state introduced earlier (in 4). To say of an outcome that it is a collective goal of some actions or behaviours is to say that they are collectively directed to this outcome—that is, they are directed to this outcome and their being so directed is not, or not only, a matter of each action or behaviour being individually directed to that outcome. This is a broad notion: raising a brood can be a collective goal of some eusocial insects’ behaviours, 10 and repairing a broken fence can be a collective goal of some neighbours’ actions. A collective goal-state is a mental state or, more likely, a structure of mental states, which specifies an outcome and is the kind of thing in virtue of which some pluralities of actions or behaviours can be collectively directed to certain outcomes. Bratman’s account of shared intention aims to describe one kind of collective goal-state (Bratman 1993).
Following Knoblich, Butterfill, and Sebanz (2011), we can distinguish between emergent and planned coordination. Planned coordination is coordination driven by a collective goal-state, 11 whereas emergent coordination is coordination not so driven. Planned coordination is familiar from philosophical discussions of shared intention, one of the functions of which is to coordinate agents’ actions (Bratman 1993, 99). By contrast, all the forms of coordination discussed in this chapter so far—entrainment as well as coordination driven by action and task co-representations—are naturally thought of as forms of emergent coordination insofar as it seems they could occur independently of the agents having any collective goal-state. 12 But there is also a growing body of evidence about the existence of planned coordination for joint action.
Collective Goal-States
Two pianists are producing tones in the course of playing a duet. Consider one of the pianists. There is an outcome to which her action is directed, the production of a tone or melody; and there is an outcome to which her and her partner’s actions are collectively directed, the production of a combination of pitches or harmony. Do dueting pianists represent collective goals, that is, outcomes to which their actions are collectively directed?
One way to investigate this question involves covertly introducing errors. Janeen D. Loehr et al. (2013) contrasted two kinds of error: those which were errors relative to the goal of an individual pianist’s actions (the pitch) but not relative to the collective goal of the two pianists’ actions (the harmony); and those which were errors relative to both. They found neural signatures for both kinds of errors in expert pianists. This is evidence that dueting pianists do indeed represent collective goals. A further study indicates that these collective goals are represented motorically (Janeen D. Loehr and Vesper 2015). 13
How might motor representations of collective goals underpin coordination for joint action? One possible answer is suggested by Gallotti and Frith (2013) who propose that a ‘we-mode’ is required. They explain:
‘The central idea of the we-mode is that interacting agents share their minds by representing their contributions to the joint action as contributions to something that they are going to pursue together, as a ‘we’. […] To represent things in the we-mode is for interacting individuals to have the content of their individual actions specified by representing aspects of the interactive scene in a distinct psychological attitude of intending-together, believing-together, desiring-together, etc’ (Gallotti and Frith 2013, 163).
An alternative possible answer is suggested by what Vesper et al. (2010) call a ‘minimal architecture for joint action’. They propose to start by attempting to characterise joint action and its coordination without postulating distinct psychological attitudes and without invoking representations of interacting agents as comprising a ‘we’. Instead their proposal is that some or all of the representations underpinning coordination for joint action are ordinary motor representations, task representations and other representations that are also involved in the coordination of ordinary, individual action. Relatedly, in at least some cases, coordination is driven by representations which are agent-neutral, that is, which do not specify any particular agent or agent. This proposal is consistent with theories about the roles of motor simulation and task co-representation in coordinating joint action (see 4 and 6): anticipating another’s actions and their effects appears to involve much the same agent-neutral motor and task representations which would be involved if one were actually performing those actions oneself. Of course, motor and task representations concerning actions others will eventually perform must ultimately have effects different from those concerning actions the agent will perform; but this is necessary for both observation and joint action and need not involve a novel kind of attitude.
But how, given Vesper et al. (2010)’s ‘minimal architecture’ proposal, could motor representations of collective goals underpin coordination for joint action? In each agent of a joint action, the motor representations of collective goals trigger preparation for action in just the way any motor representations do. This has the effect that each agent is preparing to perform all of the actions comprising a joint action, although not necessarily in much detail (compare Janeen D. Loehr and Vesper 2015). Now this may appear wasteful given that each agent will only perform a subset of the actions prepared for. But it is not. One agent’s preparing (to some extent) to perform all of the actions that will comprise a joint action ensures that the resulting motor plan for her actions will be constrained by her motor plan for the others’ actions. And, given that she is sufficiently similar to the others and that the possibilities for action are sufficiently constrained in their situation, her motor plan for the others’ actions will reliably match their motor plans for their actions. So one agent’s preparing to perform all of the actions has the effect that her motor plan for her actions is indirectly constrained by the others’ motor plans for their actions. In this way, motor representations of collective goals could in principle underpin coordination for joint action by enabling agents to meet relational constraints on their actions (see further Butterfill 2016).
The conjecture that motor representations of collective goals underpin coordination for joint action provides one response to a question raised at the end of 6. The question was how anticipations concerning another’s actions arising from motor simulation (whether bare motor simulation or occurring as a consequence of task co-representation) feed into preparing and monitoring your own actions. When coordination depends on motor representations of collective goals, the presupposition this question makes is incorrect. There are not two processes but one. Anticipation of another’s actions and preparation for your own are not two separate things. They are parts of a single process in the same sense that, in preparing to perform a bimanual action, preparation for the actions to be performed by the left hand and anticipation of the movements of the right hand are parts of a single process. So where motor simulation and task co-representation involve collective goals to which a joint action is directed, motor processes themselves can ensure the integration of anticipations concerning another’s actions with preparation for your own.
This is not quite the end of the story about collective goals. Research on perceiving joint affordances points to a second way in which motor representations of collective goals may underpin coordination in joint action.
Joint Affordances
A joint affordance is an affordance for the agents of a joint action collectively—that is, it is an affordance for these agents and this is not, or not only, a matter of its being an affordance for any of the individual agents. Perceiving (or otherwise detecting) joint affordances is critical for many mundane joint actions such as appropriately gripping objects and applying the right force in moving them together, and crossing a busy road while holding hands. It is possible that motor representations of collective goals enable the agents of some joint actions to perceive joint affordances, or so I will suggest in this section. 14 But first, what grounds are there for supposing that joint affordances even exist?
Doerrfeld, Sebanz, and Shiffrar (2012, 474) argue that ‘the joint action abilities of a group shape the individual perception of its members.’ In their experiment, perceptual judgements of weight were affected by whether the perceiver was about to lift the box alone or with another. Others have investigated different situations in which performing actions independently or as part of a joint action can affect how you perceive affordances. For instance, consider two individuals walking through a doorway. How wide must the doorway be for them to walk though it without rotating their shoulders? Davis et al. (2010, Experiment 1) show that the answer cannot be obtained simply by adding the minimum widths for each individual, and (in Experiments 2–4) that people can perceive whether doorway-like openings will allow a particular pair of walkers to pass through comfortably. 15 Importantly, people can perceive joint affordances for walkers not only when they are one of those walking but also when they are merely observing others walking together (Davis et al. 2010, Experiment 4). This suggests that the perceptual capacity does not depend on the perceiver’s own current possibilities for action. So what makes perception of joint affordances possible?
Consider the conjecture that joint affordances are perceived as a consequence of motor simulation (this is one of two possibilities discussed by (Doerrfeld, Sebanz, and Shiffrar 2012)). This conjecture is made plausible by independent evidence for two hypotheses. First, motor representations can modulate perceptual experience; for instance, how an event is represented motorically can affect how a pair of tones are perceived with respect to pitch ((Repp and Knoblich 2007, 2009); for discussion, see (Sinigaglia and Butterfill 2015)). Second, perceiving another’s affordance involves motor activity (Cardellicchio, Sinigaglia, and Costantini 2012). These two findings make it plausible that, in general, perceiving some affordances is facilitated or even enabled by motor simulation. The findings just discussed suggest that the same may be true for joint affordances, that is, affordances for agents involved in one or another kind of joint action. But of course this is possible only given that there are motor representations of collective goals. After all, perceiving joint affordances requires motor simulation concerning the joint action, which would be triggered by a motor representation of a collective goal of the actions grounding the joint action; merely having separate motor simulations of each agent’s actions could not underpin the identification of a joint affordance. This is why motor representations of collective goals may facilitate coordination in joint actions not only by enabling the agents to meet relational constraints on their actions (see 8) but also by enabling them to perceive joint affordances.
Conclusion
What forms of coordination for joint action enable humans to exercise shared agency in doing things such as clinking beakers, sharing smiles, erecting marquees, or producing rhythmic music? We have seen that there is much diversity. Coordination for joint action includes not only emergent varieties such as entrainment (see 3) as well as the forms underpinned by motor simulation (see 4) and task co-representation (see 6), but also planned coordination underpinned by motor representations of collective goals (see 8). 16
This diversity in forms of coordination may exist in part because of a trade-off between flexibility and precision for individual mechanisms underpinning coordination (see 5). Having multiple mechanisms is useful partly because each makes a different trade-off between flexibility and precision.
Many exercises of shared agency appear to require both flexibility and extremely precise coordination. Improvising musicians ideally achieve temporal synchrony without becoming enslaved to a rhythm. How is this possible? Exercises of shared agency can depend on multiple forms of coordination, of course. Individual mechanisms underpinning coordination may be constrained by the precision–flexibility trade-off, but this constraint does not apply to a diversity of mechanisms considered in aggregate. So there is no theoretical obstacle to relying on highly flexible mechanisms yet achieving extremely precise coordination. This requires only that diverse mechanisms can have synergistic effects on coordination.
Just here we encounter the synergy challenge. Achieving precise coordination in space and time probably demands that mechanisms underpinning different forms of coordination are to a significant degree independent of each other (see 5). Yet acting flexibility requires that the different mechanisms sometimes nonaccidentally operate synergistically—the shared intention, the task co-representation, and the motor representation of the collective goal cannot all be pulling in different directions. The challenge is to understand how, in some situations, mechanisms underpinning different forms of coordination and which are driven by largely independent representational structures can nevertheless nonaccidentally have synergistic effects. Meeting this challenge may require attention to differences between novices and experts, to why practice is sometimes necessary, to the effects of common knowledge on moment-by-moment coordination (see, for example, (D. C. Richardson, Dale, and Kirkham 2007)), and to phenomenal aspects of coordination (as (Keller, Novembre, and Hove 2014) hint), among other things. The synergy challenge is currently a significant obstacle to progress is understanding how high degrees of flexibility and precision can be combined in the coordination of joint actions.
Another issue likely to demand future research concerns which, if any, forms of coordination require postulating novel kinds of representations or processes specific to shared agency (see 8). Although scientists sometimes adopt terms from philosophical discussions of collective intentionality such as ‘shared’ and ‘we-’ representations, the discoveries about the representations and processes underpinning coordination reviewed in this chapter do not require representations to be shared other than in the sense in which barrel organ aficionados share a taste in music.
One theme in this chapter was that much coordination of joint action appears to involve not fully distinguishing others’ actions from your own. Take motor simulation, task co-representation and motor representation of collective goals. In each case, coordination involves motor or task representations of actions, tasks or goals that relate primarily to another’s part in the joint action. This is not a matter of representing another’s goals or plans as an observer: it is a matter of preparing actions and representing tasks she will perform in ways that would also be appropriate if it were you, not her, who was about to perform them. To a limited but significant extent, then, coordination involves representing both another’s actions and your own in ways that give them equal status as parts of a single activity. The existence of such a perspective on the actions grounding a joint action might just turn out to matter not only for coordination but also for other aspects of collective intentionality such as commitment and cooperation. 17
Events D1, ... Dn ground E just if: D1, ... Dn and E occur; D1, ... Dn are each part of E; and every event that is a part of E but does not overlap D1, ... Dn is caused by some or all of D1, ... Dn. (This is a generalisation of the notion specified by (Pietroski 1998).)↩︎
Note that what follows is neutral on whether joint actions are actions. As a terminological stipulation, I shall say that an individual is an agent of a joint action just if she is an agent of an action which, together with some other events, grounds this joint action. (Depending on your views about events, causation and agents, getting some edge cases right may require adding that for this individual to be an agent of this joint action, this particular plurality of grounding events—her action and the other events—must include actions with agents other than her.)↩︎
See Schulze, Cordes, and Vorberg (2005, 474–76). Keller, Novembre, and Hove (2014) suggest, further, that the two kinds of adjustment involve different brain networks. Note that this view is currently controversial: Janeen D. Loehr and Palmer (2011) could be interpreted as providing evidence for a different account of how entrainment is maintained.↩︎
See, for example, D. C. Richardson and Dale (2005). For relatively speculative discussions, see D. Richardson, Schockley, and Kevin (2008; Merker, Madison, and Eckerdal 2009; Keller, Novembre, and Hove 2014, sec. 4).↩︎
Gangitano, Mottaghy, and Pascual-Leone (2001); see further Fadiga, Craighero, and Olivier (2005; Ambrosini, Sinigaglia, and Costantini 2012). For a review of evidence that, when observing an action, motor processes and representations occur in the observer like those which would occur if she were performing an action of the kind observed rather than merely observing it, see Rizzolatti and Sinigaglia (2010).↩︎
For evidence that motor simulation also enables coordination in musical performances, see Keller, Knoblich, and Repp (2007; Janeen D. Loehr and Palmer 2011; Novembre et al. 2014). For evidence on development, see Meyer et al. (2011)’s investigation of motor processes and coordination in three-year-old children.↩︎
For more detailed arguments that some motor representations are goal-states, see Prinz (1997, 143–46), Pacherie (2008) and Butterfill and Sinigaglia (2014).↩︎
This definition needs refining in various ways not directly relevant to the present discussion.↩︎
Wenke et al. (2011) and Dolk et al. (2011, 2014) have defended hypotheses which, if true, would enable some of the evidence for these predictions to be explained without accepting the task co-representation hypothesis.↩︎
The insects’ behaviours cannot be regarded as directed to raising a brood just in virtue of each individual insect behaviour being so directed because there is (typically, at least) a division of labour.↩︎
Note that, despite the name, planned coordination does not by definition involve planning.↩︎
Some forms of entrainment are probably a hybrid of emergent and planned coordination since, as we saw in 3, the precision with which entrained actions are synchronised can be influenced by the agents’ intentions concerning coordination and therefore probably also by collective goal-states.↩︎
Further evidence for motor representations of collective goals is provided by J. C.-C. Tsai, Sebanz, and Knoblich (2011; Ramenzoni, Sebanz, and Knoblich 2014; Ménoret et al. 2014) and Meyer, Wel, and Hunnius (2013).↩︎
The notion of a collective goal was introduced in 7; evidence for the existence of motor representations of collective goals was discussed in 8.↩︎
See Michael J. Richardson, Marsh, and Baron (2007) for a further study involving jointly lifting planks.↩︎
This is not a comprehensive list. Relevant reviews include Knoblich, Butterfill, and Sebanz (2011; Keller, Novembre, and Hove 2014; Marsh, Richardson, and Schmidt 2009).↩︎
ACKNOWLEDGMENTS. I have benefitted immeasurably from extended collaborations with Natalie Sebanz, Guenther Knoblich and Corrado Sinigaglia as well as from shorter (so far) collaborations with Cordula Vesper and Lincoln Colling. I am also indebted to many people for discussion. Thank you!
BIOGRAPHICAL NOTE. Stephen Butterfill researches and teaches on joint action, mindreading and other philosophical issues in cognitive science at the University of Warwick (UK).↩︎
:::
Coordinating Joint Action
Stephen A. Butterfill
<[email protected] >
:::
Introduction
It is often necessary that agents' actions are coordinated if they are to successfully exercise shared (or 'collective') agency in acting together. An eloping couple clink plastic beakers of cheap wine together to toast their escape, sharing a smile of achievement; on the beach in front of them a small group of roadies are putting up a marquee outside for a concert later that evening while the musicians, having been made to wait while the audio technicians replace a cable, playfully improvise on stage. In cases like these, successfully exercising shared agency involves coordinating actions precisely in space and time. Such precise coordination is not, or not only, a matter of having intentions and knowledge, whether individual or collective. Intentions and knowledge states may play a role in long-term coordination---they may explain, for instance, why the couple's both being on the beach tonight is no accident. But they cannot explain how the precise coordination needed to clink beakers or to share a smile is achieved. Given that is not only intention or knowledge, what does enable two or more agents' actions to be coordinated and so enables exercises of shared agency such as these?
Much psychological and neuroscientific research bears directly on this question. This chapter introduces that research: it outlines some of the key findings and describes a minimal theoretical framework, identifying along the way issues likely to be of interest to researchers studying collective intentionality.
Joint Action
Where philosophers tend to focus on notions such as intentional shared agency, scientific research on coordination mechanisms is usually interpreted in terms of a broader and simpler notion of joint action. This is standardly defined by appeal to Sebanz, Bekkering, and Knoblich (2006)'s working definition as:
'any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment' ((Sebanz, Bekkering, and Knoblich 2006, 70)).
Although widely used, this working definition has some drawbacks. It requires that joint actions should be 'social interactions', thereby raising tricky issues about which interactions are social. The working definition also appears to require that coordinating their actions is something the individuals involved in joint action do, perhaps even requiring that this is done with the end of bringing about a change. As we will see, there are reasons to consider the possibility actions can be coordinated without both (or even either) requirements being met. We can avoid the drawbacks while remaining true to the implicit conception underlying scientific research with a simpler and extremely broad definition:
A joint action is an event grounded 1 by two or more agents' actions.
This definition of joint action, like Sebanz, Bekkering, and Knoblich (2006)'s working definition, is neutral on representations and processes. So when two people swing their arms in synchrony, the event of them swinging their arms is a joint action. Likewise, if fish are agents then the movements of a shoal are joint actions. 2
How does research on coordination in joint action bear on the question about shared agency? Not all joint actions involve exercising shared agency, but some or all exercises of shared agency are, or involve, joint actions. It is reasonable to conjecture that what enables the actions of agents exercising collective agency to be precisely coordinated are mechanisms of coordination common to many different forms of joint action. To illustrate, consider entrainment.
Entrainment
Entrainment, the process of synchronizing two or more rhythmic behaviours with respect to phase, is a feature of everyday life. People walking side by side may fall into the same walking patterns (Ulzen et al. 2008; Nessler and Gilliland 2009), conversation partners sometimes synchronize their body sway (Shockley, Santana, and Fowler 2003) and gaze (D. C. Richardson, Dale, and Kirkham 2007), clusters of male fiddler crabs wave their claws synchronously to attract mates (Backwell et al. 1998; Merker, Madison, and Eckerdal 2009), and an audience will sometimes briefly synchronise its clapping (Néda et al. 2000).
As these examples suggest, entrainment enables the coordination of a wide range of joint actions, not all of which involve shared agency. In fact interpersonal entrainment is sometimes treated as a special case of a process by which sequences of actions can be synchronised with sequences of environmental stimuli such as a metronome (e.g. Repp and Su 2013; Konvalinka et al. 2010), and, more boldly, sometimes even as just one instance of what happens when oscillators are coupled (e.g. Shockley, Richardson, and Dale 2009, 314).
Which exercises of shared agency might entrainment enable? Entrainment allows for extremely precise coordination of movements (Repp 2000) and is probably essential for joint actions involving rhythmic music, dance, drill, and some martial arts.
How is entrainment related to agents' intentions concerning coordination or the lack thereof? Entrainment of two or more agents' actions can occur without any intention concerning coordination (e.g. Varlet et al. 2015), and without the agents being aware of the coordination of their actions (Michael J. Richardson, Marsh, and Schmidt 2005). Further, although subjects can sometimes intentionally prevent entrainment, entrainment and related forms of coordination do sometimes occur even despite individuals attempting not to coordinate their actions (e.g. Ulzen et al. 2008; Issartel, Marin, and Cadopi 2007). So whether two agents' actions become entrained is not always, and perhaps not typically, something which they do or could control.
But entrainment is not always independent of agents' intentions (Miles et al. 2010; Nessler and Gilliland 2009). Because no one can perform two actions without introducing some tiny variation between them, entrainment of any kind depends on continuous monitoring and ongoing adjustments (Repp 2005, 976). One kind of adjustment is a phase shift, which occurs when one action in a sequence is delayed or brought forwards in time. Another kind of adjustment is a period shift; that is, an increase or reduction in the speed with which all future actions are performed, or in the delay between all future adjacent pairs of actions. These two kinds of adjustment, phase shifts and period shifts, appear to be made by mechanisms acting independently, so that correcting errors involves a distinctive pattern of overadjustment. 3 Repp (2005, 987) argues, further, that while adjustments involving phase shifts are largely automatic, adjustments involving changes in frequency are to some extent controlled. This may be key to understanding the influence of intention on entrainment. One way or another (contrast Fairhurst, Janata, and Keller (2013, 2599) with Repp and Keller (2008)'s 'coordinative strategies' proposal), intentions play a role in frequency adjustments and thereby influence how tightly agents synchronise their actions.
Entrainment is clearly necessary for coordination in many joint actions requiring precise synchronisation such as those involving rhythmic music or dance. Entrainment may also be important in ways as yet barely understood for a much wider range of joint actions in which such precise synchronisation initially appears unnecessary. 4 But there must be more to coordinating joint actions than entrainment. After all entrainment depends on repetition whereas many joint actions are one-off events, as when the a couple clink plastic beakers. Which forms of coordination enable one-off joint actions?
Motor Simulation
Many one-off joint actions---those which do not depend on repetition or rhythm---require precise coordination. In clinking beakers, swinging a toddler between our arms, and executing a pass in football, the window for success may be fractions of a second in duration and but millimetres wide. One way---perhaps the only way---of achieving such precise coordination depends on the existence of a phenomenon often called 'motor simulation' or 'mirroring'. What is this?
To understand motor simulation it is necessary first to get a rough fix on the idea that motor processes and representations are involved in performing ordinary, individual actions. Preparing for, and performing, bodily actions involves not only intentions and practical reasoning but also motor representations and processes. To illustrate, consider a cook who has grasped an egg between her finger and thumb and is now lifting it from the egg box. She will typically grip the egg just tightly enough to secure it. But how tightly she needs to grip it depends in part, of course, on the forces to which she will subject the egg in lifting it. The fact that she grips eggs just tightly enough throughout such action sequences which vary in how she lifts the egg implies that how tightly she grips the egg depends on the path along which she will lift it. This in turn indicates (along with much other evidence) that information about her anticipated future hand and arm movements appropriately influences how tightly the cook initially grips the egg (Kawato 1999). This fine-grained, anticipatory control of grasp, like many other features of action performance ((see Rosenbaum 2009, chap. 1) for more examples), is not plausibly a consequence of mindless physiology, nor of intention and practical reasoning. The processes and representations it depends on are motoric.
Motor processes and representations lead a double life: they occur not only in performing actions but also observing them. For instance, in someone observing the cook gripping and lifting the egg, there may be motor processes and representations related to those which would occur in her if she, the observer, were performing this action herself. One dramatic piece of evidence for this claim comes from a study in which activity in an observer's motor cortex was artificially boosted with transcranial magnetic stimulation (TMS). This caused minute patterns of activation (specifically, motor-evoked potentials) to occur in a muscle of the observer at just the times the agent being observed used the corresponding muscle. 5 As this illustrates, motor processes in an observer can carry detailed information about the timing of components of actions. Motor simulation is the occurrence of motor processes and representations in an observer concerning an action which she is observing or imagining and which are driven by observing or imagining that action.
Motor simulation enables observers to anticipate how others' actions will unfold and the likely outcomes the actions will achieve (Wolpert, Doya, and Kawato 2003; Wilson and Knoblich 2005). Such anticipation is reflected both in explicit judgements (e.g. Aglioti et al. 2008) and in spontaneous eye movements (e.g. Flanagan and Johansson 2003; Rotman et al. 2006; Costantini et al. 2014; Ambrosini, Costantini, and Sinigaglia 2011).
How does any of this bear on the coordination of joint action? If motor simulation is to play a role in coordinating joint actions, agents must be capable of using anticipation based on motor simulation in preparing and performing actions different from those simulated. Accordingly, Kourtis, Sebanz, and Knoblich (2013) used neural markers of motor activity to show that motor simulation can occur in joint action even where agents are performing different actions in close succession. To investigate, further, whether motor simulation in joint action can facilitate coordination, Vesper et al. (2013) instructed pairs of people to jump and land at the same time. They found evidence that in some subjects there was a motor simulation of her partner's jump which influences how she herself jumps and so enables precise coordination in landing together. This is one example of how motor simulation may enable coordination in joint action. 6
Reflecting on entrainment and coordination driven by motor simulation, it is striking that one-off motor simulation allows greater flexibility at the cost of some precision. Is there a more general trade-off between flexibility and precision in mechanisms underpinning coordination? If so, what might this tell us about the nature of mechanisms underpinning coordination for joint action and their relations to each other?
Flexibility vs Precision
Consider two ways of partially ordering mechanisms underpinning coordination. The first is precision: How precise, in space and time, is the coordination they underpin in the best cases? For instance, mechanisms underpinning entrainment enable expert musicians to coordinate their actions to within tens of milliseconds, whereas one-off motor simulation permits coordination of actions to within larger fractions of a second. A second partial ordering is flexibility: How wide is the range of situations in which this mechanism can underpin coordination? For instance, motor simulation can underpin coordination whether or not repetition or rhythm is involved, unlike entrainment. Thinking just about motor simulation and mechanisms underpinning entrainment, there appears to be a trade-off between precision and flexibility. This appears to generalise to other forms of coordination too, such as forms of coordination driven by shared intention. Gains in flexibility seem to come at the cost of precision.
Why? Before attempting to answer this question, it is useful to fix terminology with some stipulations. A goal of an action or behaviour is an outcome to which it is directed. Relative to a particular action or behaviour, goals can be partially ordered by the means-end relation. In saying that one goal is more abstract than another relative to a behaviour or action, I shall mean that the latter is linked to the former by a chain of outcomes ordered as means to ends. A goal-state is a mental state (or a structure of mental states) which represents, or otherwise specifies, an outcome and is the kind of thing in virtue of which some actions or behaviours can be directed to certain outcomes. Given that intentions are mental states, they are paradigmatic goal-states. But intentions are not the only goal-states: as we saw in 4{reference-type="ref+label" reference="sec:co-representation"}, some motor representations are also goal-states. 7
So why might flexibility in a mechanism underpinning coordination come at the cost of precision? One possibility involves two conjectures. First, achieving flexibility generally depends on representing goals, and the more abstract the goals that can be represented, the greater the flexibility. To illustrate, entrainment can occur without any representations of goals at all, whereas motor simulation involves motor representations which are goal-states. But relative to intentions or knowledge states, motor representations are limited with respect to how abstract the outcomes they can specify are. Motor representations can specify outcomes such as grasping or transporting a fragile object, and even sequences of such outcomes (see, e.g., Fogassi et al. 2005). But they cannot specify outcomes such as selecting an organic egg or testing for freshness: motor processes and representations are mostly blind to things so distantly related to bodily action. A further conjecture is that processes involving more abstract goal representations typically (but not necessarily always) place greater demands on cognitive resources, which typically (but not necessarily always) results in lower precision. This conjecture is suggested by an analogy with the physiological. Because physiological processes are a source of variability, coordinating with a given degree of precision should get harder as the duration and complexity of the actions to be coordinated increases. Given that cognitive processes, like physiological processes, are a source of variability, increasing cognitive demands by relying on representations of more abstract goals should likewise increase variability and so limit precision.
In short, flexibility may come at the cost of precision because increasing flexibility requires representations of more abstract goals, which impose greater cognitive demands and thereby increase variability, so reducing how precise the coordination underpinned by a mechanism can be in the best cases. This may be why forms of coordination such as entrainment and motor representation can occur independently of, and even contrary to, intentions concerning coordination: precision requires such independence.
Thinking about trading precision for flexibility suggests that there is a gap in the forms of coordination so far considered. To see why, consider the situation of a couple alone on a beach. Having filled plastic beakers with wine, they spontaneously and fluidly clink them together in a toast without spilling a drop of wine. To explain how they are able to coordinate so precisely we cannot appeal to motor simulation alone; but it would be no less plausible to appeal only to practical deliberation involving intentions or other propositional attitudes. We need something more flexible than motor simulation and more precise than practical deliberation.
Task Co-representation
Consider individual agents acting alone for a moment. A task representation links an event to an outcome in such a way that, normally, the event's actual or expected occurrence would trigger motor preparation for actions that should realise the outcome. Why do we need task representations? Imagine yourself cycling up to a crossroad. Even if you are concentrating hard on dodging potholes without being hit by the rapidly approaching car behind you (will it slow down or should you risk going through this hole?), it is likely---hopefully---that the traffic light's turning red will cause you to brake. The connection between red light events and braking actions need not require intentional control, thanks to task representations.
How is task representation relevant to coordinating joint actions? Let us say that two individuals have a task co-representation if there is a task concerning which each has a task representation. 8 Sebanz, Bekkering, and Knoblich (2006) argue that the agents of a joint action can have a task co-representation concerning a task which only one of them is actually supposed to perform. This, they suggest, would enable agents to exploit motor simulation prior to, and independently of, observing the any actual actions. Thus task co-representation could in principle greatly extend the range of situations in which motor simulation could underpin coordination in joint action. To illustrate, consider again the couple on the beach filling beakers with wine and then clinking them together. As noted earlier (in 5{reference-type="ref+label" reference="sec:flexibility-precision-tradeoff"}), their doing this spontaneously, fluidly and with precision could not be explained by motor simulation alone when neither of them plays the role of leader. But it could be explained by Sebanz et al's proposal about task co-representation. If the couple expect to clink beakers after the wine is poured and have task co-representations concerning each's task in the clinking, then they will be able to use motor simulation to anticipate each other's actions in advance of starting to act. This is one illustration of how task co-representation might underpin coordination for one-off joint actions where agents have to respond to events in ways they have never done before.
The task co-representation hypothesis---agents involved in a joint action can have a task co-representation concerning a task that only one of them is supposed to perform---generates a variety of predictions. It predicts interference and facilitation effects: when acting together with another, your performance of your task will be affected by facts about which task the other is performing, and your performance will be impaired or enhanced in ways analogous to those in which it would be affected if you were performing both tasks alone. This prediction has been confirmed for a variety of tasks (Sebanz, Knoblich, and Prinz 2005; Atmaca, Sebanz, and Knoblich 2011; Böckler, Knoblich, and Sebanz 2012; Wel and Fu 2015). The task co-representation hypothesis also predicts that, in some situations when you are acting with another, events linked to the other's task will trigger some preparation (but not necessarily full preparation) in you for a task which is actually supposed to be performed by the other. Evidence in support of this prediction includes signs that agents of a joint action sometimes inhibit tendencies to act when another, rather than she herself, is supposed act (Sebanz et al. 2006; C.-C. Tsai et al. 2008), as well as signs that agents of a joint action are sometimes preparing for, or even covertly performing, actions that another is supposed to perform (e.g. Kourtis, Sebanz, and Knoblich 2013; Baus et al. 2014). 9
Task co-representation is valuable in coordinating joint actions at least in part because it is more flexible than bare motor simulation while also more precise than practical reasoning. But there is a limit to what can be explained with either motor simulation or task co-representation, at least as we have conceived of them so far. Suppose motor simulation (whether or not triggered by a task co-representation) enables agents of a joint action to anticipate each other's actions. How could these anticipations inform preparation for their own actions, and, in particular, how could they do so without requiring cognitive processes inimical to precision? To offer even a candidate answer to this question requires going beyond motor simulation and task co-representation as we have so far conceived them.
Emergent vs Planned Coordination
In thinking about coordination for joint action it is useful to have plural counterparts of the notions of goal and goal-state introduced earlier (in 4{reference-type="ref" reference="sec:co-representation"}). To say of an outcome that it is a collective goal of some actions or behaviours is to say that they are collectively directed to this outcome---that is, they are directed to this outcome and their being so directed is not, or not only, a matter of each action or behaviour being individually directed to that outcome. This is a broad notion: raising a brood can be a collective goal of some eusocial insects' behaviours, 10 and repairing a broken fence can be a collective goal of some neighbours' actions. A collective goal-state is a mental state or, more likely, a structure of mental states, which specifies an outcome and is the kind of thing in virtue of which some pluralities of actions or behaviours can be collectively directed to certain outcomes. Bratman's account of shared intention aims to describe one kind of collective goal-state (Bratman 1993).
Following Knoblich, Butterfill, and Sebanz (2011), we can distinguish between emergent and planned coordination. Planned coordination is coordination driven by a collective goal-state, 11 whereas emergent coordination is coordination not so driven. Planned coordination is familiar from philosophical discussions of shared intention, one of the functions of which is to coordinate agents' actions (Bratman 1993, 99). By contrast, all the forms of coordination discussed in this chapter so far---entrainment as well as coordination driven by action and task co-representations---are naturally thought of as forms of emergent coordination insofar as it seems they could occur independently of the agents having any collective goal-state. 12 But there is also a growing body of evidence about the existence of planned coordination for joint action.
Collective Goal-States
Two pianists are producing tones in the course of playing a duet. Consider one of the pianists. There is an outcome to which her action is directed, the production of a tone or melody; and there is an outcome to which her and her partner's actions are collectively directed, the production of a combination of pitches or harmony. Do dueting pianists represent collective goals, that is, outcomes to which their actions are collectively directed?
One way to investigate this question involves covertly introducing errors. Janeen D. Loehr et al. (2013) contrasted two kinds of error: those which were errors relative to the goal of an individual pianist's actions (the pitch) but not relative to the collective goal of the two pianists' actions (the harmony); and those which were errors relative to both. They found neural signatures for both kinds of errors in expert pianists. This is evidence that dueting pianists do indeed represent collective goals. A further study indicates that these collective goals are represented motorically (Janeen D. Loehr and Vesper 2015). 13
How might motor representations of collective goals underpin coordination for joint action? One possible answer is suggested by Gallotti and Frith (2013) who propose that a 'we-mode' is required. They explain:
'The central idea of the we-mode is that interacting agents share their minds by representing their contributions to the joint action as contributions to something that they are going to pursue together, as a 'we'. [...] To represent things in the we-mode is for interacting individuals to have the content of their individual actions specified by representing aspects of the interactive scene in a distinct psychological attitude of intending-together, believing-together, desiring-together, etc' (Gallotti and Frith 2013, 163).
An alternative possible answer is suggested by what Vesper et al. (2010) call a 'minimal architecture for joint action'. They propose to start by attempting to characterise joint action and its coordination without postulating distinct psychological attitudes and without invoking representations of interacting agents as comprising a 'we'. Instead their proposal is that some or all of the representations underpinning coordination for joint action are ordinary motor representations, task representations and other representations that are also involved in the coordination of ordinary, individual action. Relatedly, in at least some cases, coordination is driven by representations which are agent-neutral, that is, which do not specify any particular agent or agent. This proposal is consistent with theories about the roles of motor simulation and task co-representation in coordinating joint action (see 4{reference-type="ref+label" reference="sec:co-representation"} and 6{reference-type="ref+label" reference="sec:task-co-repr-1"}): anticipating another's actions and their effects appears to involve much the same agent-neutral motor and task representations which would be involved if one were actually performing those actions oneself. Of course, motor and task representations concerning actions others will eventually perform must ultimately have effects different from those concerning actions the agent will perform; but this is necessary for both observation and joint action and need not involve a novel kind of attitude.
But how, given Vesper et al. (2010)'s 'minimal architecture' proposal, could motor representations of collective goals underpin coordination for joint action? In each agent of a joint action, the motor representations of collective goals trigger preparation for action in just the way any motor representations do. This has the effect that each agent is preparing to perform all of the actions comprising a joint action, although not necessarily in much detail (compare Janeen D. Loehr and Vesper 2015). Now this may appear wasteful given that each agent will only perform a subset of the actions prepared for. But it is not. One agent's preparing (to some extent) to perform all of the actions that will comprise a joint action ensures that the resulting motor plan for her actions will be constrained by her motor plan for the others' actions. And, given that she is sufficiently similar to the others and that the possibilities for action are sufficiently constrained in their situation, her motor plan for the others' actions will reliably match their motor plans for their actions. So one agent's preparing to perform all of the actions has the effect that her motor plan for her actions is indirectly constrained by the others' motor plans for their actions. In this way, motor representations of collective goals could in principle underpin coordination for joint action by enabling agents to meet relational constraints on their actions (see further Butterfill 2016).
The conjecture that motor representations of collective goals underpin coordination for joint action provides one response to a question raised at the end of 6{reference-type="ref+label" reference="sec:task-co-repr-1"}. The question was how anticipations concerning another's actions arising from motor simulation (whether bare motor simulation or occurring as a consequence of task co-representation) feed into preparing and monitoring your own actions. When coordination depends on motor representations of collective goals, the presupposition this question makes is incorrect. There are not two processes but one. Anticipation of another's actions and preparation for your own are not two separate things. They are parts of a single process in the same sense that, in preparing to perform a bimanual action, preparation for the actions to be performed by the left hand and anticipation of the movements of the right hand are parts of a single process. So where motor simulation and task co-representation involve collective goals to which a joint action is directed, motor processes themselves can ensure the integration of anticipations concerning another's actions with preparation for your own.
This is not quite the end of the story about collective goals. Research on perceiving joint affordances points to a second way in which motor representations of collective goals may underpin coordination in joint action.
Joint Affordances
A joint affordance is an affordance for the agents of a joint action collectively---that is, it is an affordance for these agents and this is not, or not only, a matter of its being an affordance for any of the individual agents. Perceiving (or otherwise detecting) joint affordances is critical for many mundane joint actions such as appropriately gripping objects and applying the right force in moving them together, and crossing a busy road while holding hands. It is possible that motor representations of collective goals enable the agents of some joint actions to perceive joint affordances, or so I will suggest in this section. 14 But first, what grounds are there for supposing that joint affordances even exist?
Doerrfeld, Sebanz, and Shiffrar (2012, 474) argue that 'the joint action abilities of a group shape the individual perception of its members.' In their experiment, perceptual judgements of weight were affected by whether the perceiver was about to lift the box alone or with another. Others have investigated different situations in which performing actions independently or as part of a joint action can affect how you perceive affordances. For instance, consider two individuals walking through a doorway. How wide must the doorway be for them to walk though it without rotating their shoulders? Davis et al. (2010, Experiment 1) show that the answer cannot be obtained simply by adding the minimum widths for each individual, and (in Experiments 2--4) that people can perceive whether doorway-like openings will allow a particular pair of walkers to pass through comfortably. 15 Importantly, people can perceive joint affordances for walkers not only when they are one of those walking but also when they are merely observing others walking together (Davis et al. 2010, Experiment 4). This suggests that the perceptual capacity does not depend on the perceiver's own current possibilities for action. So what makes perception of joint affordances possible?
Consider the conjecture that joint affordances are perceived as a consequence of motor simulation (this is one of two possibilities discussed by (Doerrfeld, Sebanz, and Shiffrar 2012)). This conjecture is made plausible by independent evidence for two hypotheses. First, motor representations can modulate perceptual experience; for instance, how an event is represented motorically can affect how a pair of tones are perceived with respect to pitch ((Repp and Knoblich 2007, 2009); for discussion, see (Sinigaglia and Butterfill 2015)). Second, perceiving another's affordance involves motor activity (Cardellicchio, Sinigaglia, and Costantini 2012). These two findings make it plausible that, in general, perceiving some affordances is facilitated or even enabled by motor simulation. The findings just discussed suggest that the same may be true for joint affordances, that is, affordances for agents involved in one or another kind of joint action. But of course this is possible only given that there are motor representations of collective goals. After all, perceiving joint affordances requires motor simulation concerning the joint action, which would be triggered by a motor representation of a collective goal of the actions grounding the joint action; merely having separate motor simulations of each agent's actions could not underpin the identification of a joint affordance. This is why motor representations of collective goals may facilitate coordination in joint actions not only by enabling the agents to meet relational constraints on their actions (see 8{reference-type="ref+label" reference="sec:collective-goal-states"}) but also by enabling them to perceive joint affordances.
Conclusion
What forms of coordination for joint action enable humans to exercise shared agency in doing things such as clinking beakers, sharing smiles, erecting marquees, or producing rhythmic music? We have seen that there is much diversity. Coordination for joint action includes not only emergent varieties such as entrainment (see 3{reference-type="ref+label" reference="sec:entrainment"}) as well as the forms underpinned by motor simulation (see 4{reference-type="ref+label" reference="sec:co-representation"}) and task co-representation (see 6{reference-type="ref+label" reference="sec:task-co-repr-1"}), but also planned coordination underpinned by motor representations of collective goals (see 8{reference-type="ref+label" reference="sec:collective-goal-states"}). 16
This diversity in forms of coordination may exist in part because of a trade-off between flexibility and precision for individual mechanisms underpinning coordination (see 5{reference-type="ref+label" reference="sec:flexibility-precision-tradeoff"}). Having multiple mechanisms is useful partly because each makes a different trade-off between flexibility and precision.
Many exercises of shared agency appear to require both flexibility and extremely precise coordination. Improvising musicians ideally achieve temporal synchrony without becoming enslaved to a rhythm. How is this possible? Exercises of shared agency can depend on multiple forms of coordination, of course. Individual mechanisms underpinning coordination may be constrained by the precision--flexibility trade-off, but this constraint does not apply to a diversity of mechanisms considered in aggregate. So there is no theoretical obstacle to relying on highly flexible mechanisms yet achieving extremely precise coordination. This requires only that diverse mechanisms can have synergistic effects on coordination.
Just here we encounter the synergy challenge. Achieving precise coordination in space and time probably demands that mechanisms underpinning different forms of coordination are to a significant degree independent of each other (see 5{reference-type="ref+label" reference="sec:flexibility-precision-tradeoff"}). Yet acting flexibility requires that the different mechanisms sometimes nonaccidentally operate synergistically---the shared intention, the task co-representation, and the motor representation of the collective goal cannot all be pulling in different directions. The challenge is to understand how, in some situations, mechanisms underpinning different forms of coordination and which are driven by largely independent representational structures can nevertheless nonaccidentally have synergistic effects. Meeting this challenge may require attention to differences between novices and experts, to why practice is sometimes necessary, to the effects of common knowledge on moment-by-moment coordination (see, for example, (D. C. Richardson, Dale, and Kirkham 2007)), and to phenomenal aspects of coordination (as (Keller, Novembre, and Hove 2014) hint), among other things. The synergy challenge is currently a significant obstacle to progress is understanding how high degrees of flexibility and precision can be combined in the coordination of joint actions.
Another issue likely to demand future research concerns which, if any, forms of coordination require postulating novel kinds of representations or processes specific to shared agency (see 8{reference-type="ref+label" reference="sec:collective-goal-states"}). Although scientists sometimes adopt terms from philosophical discussions of collective intentionality such as 'shared' and 'we-' representations, the discoveries about the representations and processes underpinning coordination reviewed in this chapter do not require representations to be shared other than in the sense in which barrel organ aficionados share a taste in music.
One theme in this chapter was that much coordination of joint action appears to involve not fully distinguishing others' actions from your own. Take motor simulation, task co-representation and motor representation of collective goals. In each case, coordination involves motor or task representations of actions, tasks or goals that relate primarily to another's part in the joint action. This is not a matter of representing another's goals or plans as an observer: it is a matter of preparing actions and representing tasks she will perform in ways that would also be appropriate if it were you, not her, who was about to perform them. To a limited but significant extent, then, coordination involves representing both another's actions and your own in ways that give them equal status as parts of a single activity. The existence of such a perspective on the actions grounding a joint action might just turn out to matter not only for coordination but also for other aspects of collective intentionality such as commitment and cooperation. 17
::: refcontext :::
Footnotes
-
Events [D
1], ... [Dn] ground [E] just if: [D1], ... [Dn] and [E] occur; [D1], ... [Dn] are each part of [E]; and every event that is a part of [E] but does not overlap [D1], ... [Dn] is caused by some or all of [D1], ... [Dn]. (This is a generalisation of the notion specified by (Pietroski 1998).) ↩︎ -
Note that what follows is neutral on whether joint actions are actions. As a terminological stipulation, I shall say that an individual is an agent of a joint action just if she is an agent of an action which, together with some other events, grounds this joint action. (Depending on your views about events, causation and agents, getting some edge cases right may require adding that for this individual to be an agent of this joint action, this particular plurality of grounding events---her action and the other events---must include actions with agents other than her.) ↩︎
-
See Schulze, Cordes, and Vorberg (2005, 474--76). Keller, Novembre, and Hove (2014) suggest, further, that the two kinds of adjustment involve different brain networks. Note that this view is currently controversial: Janeen D. Loehr and Palmer (2011) could be interpreted as providing evidence for a different account of how entrainment is maintained. ↩︎
-
See, for example, D. C. Richardson and Dale (2005). For relatively speculative discussions, see D. Richardson, Schockley, and Kevin (2008; Merker, Madison, and Eckerdal 2009; Keller, Novembre, and Hove 2014, sec. 4). ↩︎
-
Gangitano, Mottaghy, and Pascual-Leone (2001); see further Fadiga, Craighero, and Olivier (2005; Ambrosini, Sinigaglia, and Costantini 2012). For a review of evidence that, when observing an action, motor processes and representations occur in the observer like those which would occur if she were performing an action of the kind observed rather than merely observing it, see Rizzolatti and Sinigaglia (2010). ↩︎
-
For evidence that motor simulation also enables coordination in musical performances, see Keller, Knoblich, and Repp (2007; Janeen D. Loehr and Palmer 2011; Novembre et al. 2014). For evidence on development, see Meyer et al. (2011)'s investigation of motor processes and coordination in three-year-old children. ↩︎
-
For more detailed arguments that some motor representations are goal-states, see Prinz (1997, 143--46), Pacherie (2008) and Butterfill and Sinigaglia (2014). ↩︎
-
This definition needs refining in various ways not directly relevant to the present discussion. ↩︎
-
Wenke et al. (2011) and Dolk et al. (2011, 2014) have defended hypotheses which, if true, would enable some of the evidence for these predictions to be explained without accepting the task co-representation hypothesis. ↩︎
-
The insects' behaviours cannot be regarded as directed to raising a brood just in virtue of each individual insect behaviour being so directed because there is (typically, at least) a division of labour. ↩︎
-
Note that, despite the name, planned coordination does not by definition involve planning. ↩︎
-
Some forms of entrainment are probably a hybrid of emergent and planned coordination since, as we saw in 3{reference-type="ref+label" reference="sec:entrainment"}, the precision with which entrained actions are synchronised can be influenced by the agents' intentions concerning coordination and therefore probably also by collective goal-states. ↩︎
-
Further evidence for motor representations of collective goals is provided by J. C.-C. Tsai, Sebanz, and Knoblich (2011; Ramenzoni, Sebanz, and Knoblich 2014; Ménoret et al. 2014) and Meyer, Wel, and Hunnius (2013). ↩︎
-
The notion of a collective goal was introduced in 7{reference-type="ref+label" reference="sec:emergent-vs-planned"}; evidence for the existence of motor representations of collective goals was discussed in 8{reference-type="ref+label" reference="sec:collective-goal-states"}. ↩︎
-
See Michael J. Richardson, Marsh, and Baron (2007) for a further study involving jointly lifting planks. ↩︎
-
This is not a comprehensive list. Relevant reviews include Knoblich, Butterfill, and Sebanz (2011; Keller, Novembre, and Hove 2014; Marsh, Richardson, and Schmidt 2009). ↩︎
-
ACKNOWLEDGMENTS. I have benefitted immeasurably from extended collaborations with Natalie Sebanz, Guenther Knoblich and Corrado Sinigaglia as well as from shorter (so far) collaborations with Cordula Vesper and Lincoln Colling. I am also indebted to many people for discussion. Thank you!
BIOGRAPHICAL NOTE. Stephen Butterfill researches and teaches on joint action, mindreading and other philosophical issues in cognitive science at the University of Warwick (UK). ↩︎