Abstract

This article offers a way of thinking about closed captioning that goes beyond quality (narrowly defined in current style guides in terms of visual design) to consider captioning as a rhetorical and interpretative practice that warrants further analysis and criticism from scholars in the humanities and social sciences. A rhetorical perspective recasts quality in terms of how genre, audience, context, and purpose shape the captioning act. Drawing on a range of Hollywood movies and television shows, this article addresses a set of topics that are central to an understanding of the effectiveness, significance, and reception of captions: overcaptioning, undercaptioning, subtitles vs. captions, the manipulation of time, non-speech information, series awareness, and the backchannel.

Distinguished guests, audience members, and to all of you watching at home tonight, or more realistically, Monday morning on a computer…

— Alec Baldwin, Saturday Night Live monologue (2010, May 15)

We do not want to be left behind as television moves to the Internet.

— Rosaline Crawford, National Association of the Deaf (cited in Stelter, 2010)

Video has exploded in popularity on the Web in recent years. More viewers are catching their favorite TV shows, like Saturday Night Live, on Monday mornings with laptops and mobile devices. Leave aside, for the moment, the millions of so-called "disposable" videos (Reid, 2008) that users are creating on-the-fly with their mobile phones and web cams, and then uploading to YouTube, Facebook, and other social networking sites (Bilton, 2010). What remains are thousands of hours of television shows and movies that broadcasters and retransmitters are increasingly making available online, usually for free (ad-supported). Very few of these TV shows and movies, when retransmitted on the Web, have closed captions. The episode of SNL featured in the epigraph to this essay (Baldwin, 2010) is required by law to be closed captioned when broadcast on TV. When the same episode is rebroadcast on the Web (via NBC's official retransmitter, Hulu.com), it is, at this time, uncaptioned.

But that will soon be changing. A new law, The 21st Century Communications and Video Accessibility Act of 2010, will require "captioned television programs to be captioned when delivered over the Internet" (AAPD, 2010). To date, online discussions about web captioning have centered on questions of quantity: How do we increase the number of captioned videos on the Web? How do we encourage our representatives in Congress to support one of the pending pieces of accessibility legislation? How do we increase awareness among content providers like Netflix of the importance of offering captions on their streaming web videos? Despite the passage of new legislation requiring Internet captioning on some types of content, questions of quantity will continue to dominate the discussions of online captioning because TV content accounts for only a fraction of the video content available online.

Before we can talk about quality, so the thinking goes, we need to have something to talk about. But closed-captioning technology predates the Web by about twenty years. The history of captioning on TV goes back to the early 1970s (see Earley, 1978), and TV captions in the U.S. are by law plentiful today (FCC, 2010; Robson, 2004, pp. 39-42). We have had plenty to talk about for a long time. Despite the age of captioning technology, we still do not have a comprehensive approach to caption quality that goes beyond basic issues of typography, placement, and timing. Current practice, at least on television, is too often burdened by a legacy of styling captions in all capital letters with centered alignment, among other lingering and pressing problems. In other words, caption quality has been evaluated in terms of visual design for the most part—i.e., how legibility and readability interact with screen placement, timing, and caption style (e.g., scroll-up style vs. pop-on style).

What we do not have yet, and what I intend to offer in this essay, is a way of thinking about captions that goes beyond quality (narrowly defined in terms of visual design) to consider captioning as a rhetorical and interpretative practice that warrants further analysis and criticism from scholars in the humanities and social sciences. My argument is not simply that we do not have any guidelines on quality but rather that we have not explored quality rhetorically. A rhetorical perspective recasts quality in terms of how readers and viewers make meaning: What do captioners need to know about a text or plot in order to honor it? Which sounds are essential to the plot? Which sounds do not need to be captioned? How should genre, audience, context, and purpose shape the captioning act? What are the differences between making meaning through reading and making meaning through listening? Is it even possible, given the inherent differences between, and different affordances of, writing and sound, to provide the same information in writing as in sound? The concepts that structure these questions—effectiveness, meaning, purpose, context, genre, audience—are of abiding interest to rhetoricians.

Closed captions have gone unnoticed by mainstream scholars in rhetoric and related fields, despite the seemingly obvious ways in which captions can be analyzed as text by textual/rhetorical critics and as a mode by scholars in composition interested in multimodal composition. The invisibility of captioning is a result of the invisibility of disability and disabled people in our scholarship. Despite some important exceptions in the fields of rhetoric and composition (e.g. Brueggemann, 1999; Lewiecki-Wilson & Brueggemann, 2007; Wilson & Lewiecki-Wilson, 2001) and technical communication (e.g. Bayer & Pappas, 2006; Carter & Markel, 2001; O'Hara, 2004; Lippincott, 2004; Palmeri, 2006; Theofanos & Redish, 2003, 2005; Van Der Geest, 2006; Wilson, 2000), research studies in our fields tend to be populated with able-bodied "users" and "students" who are often assumed to be seeing, hearing, fleet-footed, multi-tasking, nimble-fingered digital natives. Our scholarly texts have not typically made room for, or even seemed to recognize the existence of, disabled subjects. The able-bodied subject is the unmarked norm (on "ableism," see Cherney, this volume; Linton, 2010). For example, Jonathan Alexander's (2008) guest editorial for a special issue of Computers and Composition on "media convergence" features a vibrant, exciting scene populated by young people expertly, ably, and nimbly remixing and repurposing multimedia content (p. 2). Since many online texts are inaccessible to students with disabilities—e.g. only 7% of web images "provide adequate alt text" for blind and low vision users (Chisholm & May, 2009, p. 24), very few audio podcasts include text transcripts (see Thatcher et al., 2006, p.153; Zdenek, 2009), and very few videos are closed captioned (as I will discuss below)—we must conclude that the students being described and imagined by Alexander are not disabled.

A representative, accurate account of how our students and users interact with multimedia texts, then, must include people with disabilities. A sampling of statistics about hearing and deafness in the U.S. begins to suggest just how many people may require or benefit from closed captioning:

  • In the U.S., approximately 36 million adults—about 11% of the population— "report some degree of hearing loss" (NIDCD, 2010).
  • The number of closed-caption users in the U.S. is estimated at 50 million (CaptionsOn, 2010)—i.e. about 1 in 6 Americans.
  • The number of U.S. students with disabilities going to college "more than tripled" between 1978 and 1996 (OCR, 1999).
  • "According to the Deafness Research Foundation, hearing loss is the No. 1 diagnosis for U.S. soldiers in Afghanistan and more than 65 percent of Afghan war veterans are suffering from hearing damage" (Hemstreet, 2010).
  • The number of Americans 65 years of age and older—a population group more likely to benefit from accommodations such as closed captioning—is projected to rise from 13% in 2010 to 20% by 2050 (U.S. Census, 2008).
  • "One third of all senior citizens have hearing problems" (CaptionsOn, 2010). Thus when we focus on digital natives and so-called Millennials, we risk ignoring the needs of this fast-growing group of older Americans.

To take these numbers seriously requires a re-centering of our research studies around universal design and away from an able-bodied, youth-oriented norm. We do a disservice to all of our students and users when we assume that captions and other accommodations can only benefit people who are disabled (if we consider accessibility at all). As any hearing caption user (or universal design text) can tell you, closed captions are helpful in situations that create temporary or "situational disabilities" for able-bodied users (Chisholm & May. 2009, p. 12): watching TV in a noisy sports bar, helping your child learn to read, learning a second language, watching a movie with low production values, studying in a quiet area such as a library, marketing your Internet videos by making text captions available to search engines, and so on. (Search engines such as Google and video sharing sites such as YouTube are designed to index textual data such as video descriptions, tags, and closed captions [Ballek, 2010; Stelter, 2010; Sizemore, 2010]). From a universal design perspective, then, captions can potentially benefit all of us as we move through an increasingly noisy, increasingly less private, increasingly mobile world.

As closed captions online become both more important in a mobile world and more prevalent for certain types of content, we need to attend to the myriad ways in which captions create different, sometimes richer experiences for viewers than non-captioned viewing experiences. By exploring captions rhetorically, we take them seriously as vital components of the video landscape (as opposed to incidental, invisible, imitative, or merely required by law). This perspective is admittedly informed by the Hollywood movies and TV shows I analyze below; a different set of texts, especially texts that are not structured by traditional narrative techniques, will most likely yield a different set of claims about rhetoric and closed captions. Regardless of the texts chosen for analysis, however, captioning for deaf and hard of hearing viewers will remain a rhetorically complex act, not simply a matter of copying dialogue from the script file to the caption track. The question of which sounds should be captioned, for example, can only be answered by attending to the rhetorical needs of the text itself. Descriptions of non-speech information, like descriptions of alternative text for Web images (Chisholm & May, 2009, p. 26), should always be situated in the context of a specific text or situation. Two closely related non-speech sounds (e.g. audible heavy breathing from two different characters) may need different captions or no captions at all. Likewise, two different sounds may need the same caption. Sounds are embedded in contexts that must be nourished. A rhetorical perspective is interested in the work that a particular sound does within the context of a particular scene or text. Captions must be responsive not to the objective qualities of sound but to the contexts in which sounds occur.

To support this perspective, I address a central question for closed captioning, one that rhetoricians are well-suited to address: Which sounds are significant? This question has not been discussed in detail, even though it structures the practice of captioning itself. It also provides a lens through which to address the rhetorician's abiding interest in genre, audience, context, meaning, and purpose. My materials are a small and varied collection of Hollywood movies and TV shows. My methods include comparing a small collection of captioning style guides with which a rhetorical perspective can then be contrasted. In the course of my analysis I discuss a set of topics that are central to an understanding of the effectiveness, significance, and reception of captions: overcaptioning, undercaptioning, subtitles vs. captions, the manipulation of time, non-speech information, series awareness, and the backchannel.

What I propose is a way of thinking about captions that goes beyond 1) a correspondence model in which captions are assumed to merely duplicate the sound track, and 2) a richness model in which the "real" action is assumed to be found on the sound track and captions are marked as inferior attempts to embody sound. An alternative model acknowledges the qualitative differences between sound and writing and draws attention to the need for a rhetorical understanding of captioning. For disability and deaf studies scholars, the stakes are obvious. Deaf and hard-of-hearing viewers need high quality captions and more studies that explore and extend our understanding of caption quality. For composition and technical communication scholars, the stakes are just as high but perhaps not as obvious. When we assume that our students are not disabled, we build ableist pedagogies that fail to account for the rich variety of students who are increasingly populating our classrooms. A rhetorical understanding of captioning can also inform our theories of multimodal composition by helping us reflect on the rhetorical work of sound, the differences between writing and sound, and a concept of audience as more than a narrow group of hearing and seeing digital natives.

This essay is divided into five parts. After reviewing web captioning laws in the United States and offering a rough estimate of the number of web videos available with closed captions, I turn to an analysis of four captioning style guides that will set the stage for a rhetorical approach to closed captioning. This rhetorical approach goes beyond current discussions of quality and style—and at times runs counter to the style guides—to reflect how captions create different experiences for viewers. The concluding section imagines a scholarly landscape in which captions (and accessibility more broadly) are viewed as natural and necessary elements of our theories and pedagogies.

U.S. web captioning laws

On television, nearly all English language content is required by U.S. law to be transmitted with closed captions (FCC, 2010). On the Web, Section 508 of the Rehabilitation Act of 1973, as amended in 1998, requires federal agencies that "develop, procure, maintain, or use electronic and information technology" to make their products and services, including their websites, accessible (Section 508). §1194.22b of Section 508 mandates the use of synchronized alternatives (e.g., open or closed captions) for video content: "Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation" (Section 508). In the private sector, businesses that have contracts or hope to have contracts with the federal government must ensure that the products they deliver to the government comply with Section 508. State universities that receive federal money, even indirectly (e.g., through federal student loan programs), are also responsible for adhering to Section 508. State agencies, including state universities, may have additional or different obligations. In my home state, for example, the Texas Administrative Code regulates, among other things, website accessibility for state agencies, including state institutions of higher education. The regulations for higher education websites in Texas follow Section 508 almost to the letter, with one major exception: §1194.22b of Section 508, which mandates synchronized alternatives such as captions for multimedia content, is compulsory in Texas Administrative Code only after an institution of higher education receives a request from a user for "alternative form(s) of accommodation" (Texas Administrative Code, Rule §206.70). For example, an informational video on the website of a state university in Texas only needs to be accessible to deaf and hard-of-hearing users after a request for accommodation is made by a visitor to the site.

Other laws in the U.S., particularly the Americans with Disabilities Act (ADA), may also require closed captioning for the Web's private sector. For example, the judge presiding over the landmark National Federation of the Blind v. Target case ruled in 2006 that the ADA, which was signed into law before the advent of the Web, applies to private businesses regardless of whether goods and services are offered in brick-and-mortar stores or online: "Judge Marilyn Patel rejected Target's position that their site couldn't be sued under the ADA because the services of Target.com were separate from Target's brick-and-mortar stores" (Chisholm & May, 2009, p. 16). But because Target settled the case in 2008 "without admitting any wrongdoing," "the question of the ADA's applicability to the Web [is] somewhat unresolved" (p. 16). Regardless, the Department of Justice has declared that the ADA does indeed apply to the Internet. According to Thomas E. Perez, Assistant Attorney General in the DOJ's civil rights division, "It is and has been the position of the Department of Justice since the late 1990s that Title III of the ADA applies to websites. We intend to issue regulations under our Title III authority in this regard to help companies comply with their obligations to provide equal access" (quoted in Evans 2010).

Finally, a new law, The 21st Century Communications and Video Accessibility Act (2010), will require "captioned television programs to be captioned when delivered over the Internet" (AAPD, 2010). Signed into law by President Obama in October 2010 and roughly coinciding with the twentieth anniversary of the Americans with Disabilities Act, this law points to the growing seriousness with which the U.S. government is taking the issue of Web accessibility in the private sector. Currently, when TV shows and movies are redistributed on the Web by the original TV networks and authorized retransmitters like Hulu or Netflix, they are not likely to be accompanied with closed captions. Advocacy groups such as the Coalition of Organizations for Accessible Technology (COAT) lobbied for the passage of the bill because it will extend "closed captioning obligations to video programming provided by, or generally considered comparable to programming provided by, a television broadcast station, even when distributed over the Internet" (COAT 2009). The proposed legislation has met with resistance, most notably from the consumer electronics industry (Shapiro, 2010), but it was only a matter of time before the same regulations and agreements that govern television and DVD content—and ensure equal access to information and entertainment for millions of deaf and hard-of-hearing users in the U.S.—were extended to the Internet.

How much Web content is captioned?

How much TV and movie content is currently retransmitted on the Internet with closed captions? It's hard to say for sure; the situation on the Web is always in flux. What no one disputes is that very few television shows are closed captioned when re-presented on the Web. A recent, informal study by Jamie Berke in 2009 found that only five major content providers and retransmitters on the Web were offering closed-captioned content online. A whopping seventy-seven providers did not offer or support closed captions. Only ABC, CNET, Fox, Hulu, and NBC offered closed-captioned content online (Berke, 2009). Yet even Hulu's captioned collection is embarrassingly small—and appears to be getting smaller. An informal study I conducted over ten days in August 2009 found that Hulu's captioned content was hovering at around 4.5% for full episodes and 6.5% for movies (Zdenek, 2009). Five months later (on Jan. 31, 2010), I noted that Hulu's captioned full episodes had fallen to 2.7% (1355 episodes out of 50463 total) and captioned movies to 3.8% (36 movies out of 941 total). On my most recent check of the site on June 29, 2010, I found that the amount of captioned content had increased slightly but was still less than the percentage of captioned content from a year ago: 4.3% of episodes (2407 out of 55468) and 4.9% of movies (63 out of 1285) were available with captions. So while Hulu is sometimes held up as a model of accessibility by users of TV and video content on the Web, the fact is that Hulu's reputation is relative. The rest of the field is doing very little (and often nothing) to ensure the accessibility of their online content for deaf and hard-of-hearing viewers. At the same time, Hulu deserves praise for continuing to explore the potential of captions to serve the goals of universal design. For example, Hulu has taken steps to exploit the power of captioned media to provide a more fine-grained search experience. Search results can be targeted to specific instances within captioned shows. In addition, users are able to see visually on a "Heat Map" graph "the parts of the video that have been viewed the most; you can also click on the chart to navigate to any point within the captions" (Hulu). According to Eric Feng, the chief technical officer at Hulu, captions have "turned into a very important part of our user experience" (quoted in Stelter, 2010). These features leverage the power of captions to address the ongoing challenges of helping all users—regardless of ability—browse, search, find, and analyze large collections of video content.

The (un)captioned Web is in transition. Positive signs of change are everywhere. In June 2010 (when I composed the first draft of this essay), ESPN was partnering with VITAC to live caption World Cup soccer games on the Web, arguably "the first time that a cable network has added live captions to a streaming webcast simultaneous with TV captioning" (GlobalPittsburgh, 2010). Even Netflix, a company notorious among captioning advocates for seeming to be indifferent, even insensitive, to requests by users and organizations such as the National Association of the Deaf (NAD) to provide closed captioning on its web-based streaming movie service (e.g. see K_Yew, 2009; NAD, 2009), finally overcame the technical hurdles that had been preventing the company from offering closed captions on the Web (Netflix, 2009; 2010). ). In Februrary 2011, Netflix increased the amount of captioned streaming content to 30% and "expect[s] to get to 80% viewing coverage by the end of 2011" (Netflix, 2011). While Netflix users are still not able to search for captioned streaming movies on the Netflix site, other services such as FeedFliks.com allow users to do so. In contrast, users can easily search iTunes for closed-captioned movies. But as of June 2010, according to my own count, iTunes was only offering 318 closed-captioned movies (up from 53 in April 2008 [Buchanan, 2008]) out of "thousands" of downloadable movies (Apple, 2010), thus making the captioned content on iTunes most likely comparable (percentage-wise) to what is being offered with captions on Hulu—i.e., just a drop in the bucket. 1

Given the recent and continuing explosion of online video, the global saturation of mobile devices (especially the iPhone and iPad), the increasing awareness of and support for web accessibility among U.S. law makers, and the increasing sophistication and bandwidth capacity of the U.S. Internet infrastructure, the situation is sure to continue to evolve rapidly and unexpectedly.

Captioning style guides

Style guides for closed captioning are plentiful on the Web. They run the gamut from informal suggestions posted to personal websites to formal documents written by large captioning firms. Style guides cover roughly the same ground, even if they do not agree on specific guidelines (on the problem of standardization, see Clark, 2004). My own analysis of the style guides is limited to what is publicly available on the Web. I am keenly aware that captioning vendors and individual captioners make use of a wide range of learning and training methods that are not typically available to the outside researcher (e.g., in-house style manuals, formal training sessions, informal conversations, etc.). However, my goal is not to provide a comprehensive review of in-house captioning style guides but rather to sample briefly a small number of authoritative, publicly available guides in order to suggest something about the contours of these texts and the terrain they cover. This review will allow me to situate a rhetorical approach to closed captioning alongside of, and in some cases in opposition to, the information that is currently available on captioning style. My review is based on the following style guides:

  • Captioning Key for Educational Media: Guidelines and Preferred Techniques (DCMP, 2009). A major resource on captioning style, this guide is authored by the Described and Captioned Media Program (DCMP), which is funded by the U.S. Department of Education and administered by the National Association of the Deaf. The DCMP maintains a list of approved captioning vendors (including CaptionMax, National Captioning Institute, The Media Access Group/WGBH, and VITAC). Captioning Key mandates style guidelines for vendors who produce captioned content for the DCMP/Department of Education. Captioning Key's style guidelines are based on "captioning manuals…from major captioning vendors in the United States" (DCMP, 2009, p. 2).
  • The CBC Captioning Style Guide (2003). The Canadian Broadcasting Centre released this in-house captioning style guide in response to an information request filed by Joe Clark, a well-known and longtime expert on closed captioning (see Clark, 2008).
  • Gary D. Robson's The Closed Captioning Handbook (2004). Chapter 3 of Robson's book covers "Captioning Styles and Conventions." Other chapters in this book focus on caption timing and caption placement, two issues that are usually discussed as part of captioning style.
  • WGBH's "Suggested styles and conventions for closed captioning" (2002). Caption technology for television was developed at WGBH, a PBS station in Boston, in the early 1970s (Earley, 1978). The first captioning agency—The Caption Center—was established at WGBH in 1972 (Robson, 2004, p. 10). WGBH has arguably been thinking about captioning style longer than any other agency, thus making their style guidelines of particular value.

Standard topics in the style guides include: methods of captioning (prerecorded or live), styles of captions (pop on or roll up), accuracy, verbatim vs. edited captions, screen placement, line breaking, timing, typeface and type case, grammar and punctuation, and non-speech information (speaker IDs, music, sound descriptions). Style guides are light on theory; individual guidelines are typically offered up as truths in no need of justification. While readers should not necessarily expect style guides to provide a lengthy explanation of each best practice, the lack of good reasons is troubling for those practices that seem counter-intuitive, such as the requirement to style captions in upper case letters only, using centered alignment, in the shape of an inverted pyramid, in no more than two rows, with new sentences always starting on a new line, and with speaker IDs and sound descriptions set in mixed case.

Of these requirements, the all-caps guideline is undoubtedly going to be the most troubling and confusing to rhetoricians and document designers. While an all-caps style may have been necessary in the early days of TV captioning, it is unnecessary today and at odds with the most basic rules of good typography. With printed texts, lower case letters are "more legible than those in upper case" (Kostelnick & Roberts, 1998, p. 144). The same is true for electronic letters produced in the high resolution TV and web environments of today. Joe Clark (2008) refers to the preference for upper case caption styling in his review of The CBC Captioning Style Guide as "1980s nonsense." Uppercase styling is "a mistake, an archaism. It only ever came about because the original decoders' fonts were so lousy that all-upper-case captioning was deemed less illegible than mixed-case. CBC doesn't even know the reasons why it is using capital letters, or that such reasons are no longer in effect" (Clark, 2008). Whereas CBC offers no explanation for why offline captions need to be set in uppercase style ("All text shall be presented in upper case, except for…" [CBC, 2003, p. 9]), WGBH (2002) offers conflicting advice about type case. Examples in WGBH's captioning style guide are set in uppercase, with lowercase reserved for non-speech information and speaker IDs. But WGBH's style guideline for type case stipulates that "caption text is generally rendered in uppercase and lowercase, or sentence case, Roman font." Readers are thus presented with a WGBH guideline that WGBH itself does not follow, a guideline that seems better suited to a pre-Web, low resolution, analog world. Moreover, if an all-caps style is interpreted as screaming for viewers steeped in texting and instant messaging conventions, then how can uppercase captions convey whispering?

(whispering)
PLEASE OPEN THE DOOR!

In this example from WGHB's (2002) style guide, a screaming uppercase style comes into conflict with the intent to convey the opposite of screaming. Uppercase captions, in addition to being less legible than standard sentence case, are troubled by an association between all-caps and screaming. Is it even possible, in a post-Web world shaped by email and texting conventions, to whisper in all-caps?

A second issue likely to be of particular interest to rhetoricians is verbatim captioning—i.e., whether it is possible to caption every word of dialogue and every audible sound, and whether captioners should edit captions for certain readers (i.e., children), to meet maximum presentation rates, or to clean up a speaking style marked by presumably irrelevant verbal fillers like "um" and "uh." The first regularly captioned show on television—a nightly rebroadcast on PBS stations of The ABC News with open captions—was edited for content and for reading speed (Earley, 1978). The producers at WGBH who created the open captioned version of The ABC News decided to edit the audio content for two reasons:

Captioners at WGBH recognize two needs: (1) to reduce the amount of text in the average television program (roughly 180 words per minute) to allow time to read the captions and watch the program video, and (2) to adjust the language itself so that comprehension of the captions can be rapid and accurate. (Earley, 1978, p. 6)

Today, reading speed continues to guide decisions by captioners to edit content. According to WGBH, "[A]ny editing that occurs nowadays is usually for reading speed only." The authors of The Captioning Key agree and have specified maximum presentation rates for captions (ranging from 130 words per minute [wpm] for children to "near verbatim" or 235 wpm for theatrical productions for adults [DCMP, 2009, p. 12]). Some deaf and hard-of-hearing viewers may benefit from slower caption presentation speeds: "Where reading speed data are available, they suggest that the reading speeds of deaf and hard-of-hearing viewers are typically slower than those for hearing viewers" (Burnham et al., 2008, p. 391). At the same time, deaf and hard-of-hearing viewers have made it clear that they prefer verbatim or near-verbatim captioning because they want the same access as hearing viewers. As a result, the original practice of adjusting the language of captions for comprehension is no longer advocated in the captioning manuals and in fact has even been referred to as censorship: "Extreme rewriting of narration for captions develops problems, such as 'watered-down' language and omitted concepts. Language should not be censored" (DCMP, 2009, p. 1). With the exception of CBC's (2003, p. 7) style guide, which erroneously defines verbatim captioning as "difficult" to achieve, the style guides embrace verbatim captioning as standard practice that only presentation rate has the power to alter. "Editing is performed only when a caption exceeds a specified rate limit" (DCMP, 2009, p. 12).

Verbatim captioning and editing for reading speed are complicated by non-speech information (NSI) and in particular the variety of approaches to captioning it. NSI—a third issue likely to be of interest to rhetoricians—includes sound descriptions, speaker IDs, manners of speaking, music lyrics, and any other information that might be needed to convey a full understanding of the sound track. Captioning NSI is a creative and at times complex rhetorical act that involves careful attention to the context and nature of the video text. The amount, quality, and variety of NSI can vary wildly from DVD to DVD. Movies and TV shows are teeming with sounds that either cannot be captioned due to space or rate constraints, or are not deemed significant enough to warrant being captioned. It is simply not possible to caption every sound in a movie or TV show. Discussions of NSI in the style guides dutifully list which categories of non-speech sounds need to be captioned and how, at a basic level, to caption them, but they do not offer suggestions for helping captioners identify sounds which are significant or how to develop and hone the captioner's rhetorical powers of description. The style guides seem to assume that it will be obvious which non-speech sounds need to be described. In some cases, they are right, as with the guideline to caption music lyrics verbatim (e.g., DCMP, 2009, p. 23). But given the diversity of approaches to captioning NSI, the lack of standardized guidelines across service providers for handling NSI, and the few cues in a movie or television script for describing NSI, it is more often the case that captioners must choose which NSI sounds need to be captioned and how to caption them so that they fit seamlessly into the signifying world of the text.

NSI is, to a large extent, invented by the captioner. A style guide may direct the captioner to "convey all the concomitant non-dialog information that the hearing audience takes for granted" (WGBH, 2002), but the guide will not explain what challenges are involved in making captioners aware of information they usually take for granted or how to caption that information. Should a character's breathy sound be described as a gasp, sigh, scoff, grunt, pant, groan, moan, or nothing at all? A style guide may warn the captioner "not to congest a show with unnecessary descriptive captions" (CBC, 2003, p. 15), but the guide will not help the captioner understand where to draw the line between too much and not enough, with the exception of providing examples of how to edit content to achieve a set presentation rate. (It is worth noting that examples in the style guides of editing draw solely on speech captions; they never draw on NSI for examples of how to edit content [e.g., see DCMP, 2009, pp. 12-13]). For rhetoricians, then, NSI is of interest for at least three reasons: 1) NSI is often the most distinctive and subjective aspect of a caption track, highlighting the captioner's rhetorical agency; 2) NSI calls attention to the relationship between writing and sound because similar sounds may have different captions and different sounds may have the same caption; and 3) NSI raises awareness of the challenges of distinguishing significant from insignificant sounds.

The question of significance cuts to the heart of a rhetorical approach to captioning, as I will suggest in the next section. Definitions of captioning often make a distinction, sometimes implicit, between significant and insignificant sounds. But they seem to assume that the question of significance, importance, or essence is a straightforward one, easily answered. Captioning Key (DCMP, 2009) states that "background sound effects" should only be captioned "when they're essential to the plot" (p. 21). Background music "that is not important to the content of the program" (p. 24) only needs to be captioned with a music icon. And "background audio that is essential to the plot, such as a PA system or TV" (p. 17), needs to be captioned in italics. These directives assume either that the captioner already understands implicitly which sounds are important or essential, or that the style guide authors do not believe that the question of significance is a significant one. As a result, the style guides are weighed down by the micro details of presentation and design, some of which have resulted in guidelines that are misguided, arbitrary or inconsistent from one manual to the next:

  • Alignment.

    Caption alignment continues to be an ongoing area of inconsistency. Should caption lines be center-aligned at bottom center of the screen, as CBC (2003, p. 3) and CaptionMax recommend, or "left-aligned at center screen on the bottom two lines within the safe zone," as Captioning Key (DCMP, 2009, p. 6) recommends? Why have captioners not drawn more heavily on the convention in written English of using a strong left alignment with ample margins? What reasons continue to implicitly support centered alignment in an era of high resolution digital captioning?
  • Type case.

    Why does uppercase text style persist as a viable and popular option for captions? The technical challenges of the early days of TV captioning (i.e., low resolution) have been resolved (Clark, 2008). Given the superior readability of sentence case for all other types of extended reading (both on the printed page and on the computer screen), why haven't more major captioning vendors changed their practices? The persistence of all-caps style for offline, prerecorded captioning has to be the most troubling and baffling aspect of the visual rhetoric of closed captioning.
  • New sentences.

    Two major style guides advocate starting new sentences on their own line for offline, pop-on style captioning (DCMP, 2009, p. 10; WGBH, 2002). According to Captioning Key (DCMP, 2009, p. 10), captioners should "Never end a sentence and begin a new sentence on the same line unless they are short, related sentences containing one or two words." According to WGBH (2002), "A period usually indicates the end of a caption, and the next sentence starts with a new caption." Nowhere else in the world of English sentences do we find such an unusual guideline. New readers of English learn to move their eyes from the end of one line to the beginning of the next without needing each new sentence to begin on a new line. Captioners should always break lines for sense, but guidelines for breaking lines need not include breaking lines after periods.
  • Caption shape.

    Readability should always take precedence over the desire to create an allegedly pleasing shape out of a multi-line caption. In other words, breaking caption lines for sense should always take precedence over the desire to "present a block of captions that are similar in size and duration" (CBC, 2003, p. 8) or to present "captions [that] appear in a two-line pyramid or inverted pyramid shape" (CBC, 2003, p. 17).
  • Maximum number of lines.

    According to CBC's (2003, p. 8) style guide, "Captions should always be one or two lines in length. Three line captions are acceptable when time or space is limited. Four line captions are unacceptable." This prohibition against 4-line captions seems arbitrary. If a captioner is trying to honor the viewer's preference for verbatim captions and avoid editing captions to meet a specified presentation rate, then why deny the captioner an extra line, especially for no reason, if the presentation speed and nature of the text call for it?

The style guides avoid higher-level issues that seem crucial to a full account of how captions make meaning. These issues are grounded in a rhetorical concern for how readers and viewers make meaning; how genre, context, and purpose can and should shape caption production and reception; and the differences between making meaning through reading and making meaning through listening. In the next section, I address these issues in the context of a central and underexplored question for closed captioning: Which sounds are significant?

Towards a rhetoric of closed captioning

A rhetoric of closed captioning goes beyond questions of accuracy, timing, and screen placement to consider the ways in which users, multimedia texts, and genres interface with captions to make meaning. While "The CBC Captioning Style Guide" (CBC, 2003) asserts that a "caption viewer should not receive more or less information than a hearing one" (p. 15), a rhetorical perspective starts from the assumption that sound and writing are fundamentally different. Providing precisely the same "information" verbally and visually (in writing and sound) is not possible. This does not mean that captions are not capable of providing sufficient accommodations for deaf and hard-of-hearing viewers. But it does entail a different way of thinking about what captions do and mean. The rhetorician starts from the perspective that the caption mode, simply by nature of its written form, provides a different interpretation and experience of the text. In this section, I explore a central and deceptively easy question: Which sounds are significant? I developed my response to this question in 2009-11 over the course of creating a number of video commentaries for my blog (AccessibleRhetoric.com). I have included links to relevant blog entries below. All examples refer to the official DVD captions. The analysis is admittedly limited by my own viewing preferences. I did not randomly select movies and TV shows to watch on DVD. The analysis is also limited by genre, broadly speaking. All of the texts are mainstream, Hollywood narrative fare. Despite these limitations, the perspective I offer is broad enough to encompass a much more diverse set of narrative texts.

Most definitions of closed captions either 1) assume naively that captions embody the full complement of audible sounds in a text, or 2) draw on an undefined notion of significant or important sounds. Consider these definitions of closed captioning:

  • "Captions describe all the audio content, as well as information about the identity of speakers and their tone of voice." (WordIQ, my emphasis)
  • Closed captions "display speech and non-speech information…" (eHow)
  • "Closed captions are a text version of the spoken part of a television, movie, or computer presentation." (WhatIs)
  • "'Captions' aim to describe to the deaf and hard of hearing all significant audio content…" (Wikipedia, my emphasis)

The first two definitions assume simplistically that captions can, by default and without qualification, capture every sound in a movie, TV show, or other multimedia text. The third definition reflects a misunderstanding of the distinction between subtitles and captions, because true closed captions not only capture "the spoken part" but the non-speech content as well (i.e., music, sound effects, speaker IDs, etc.). The fourth definition introduces a qualifier ("significant"), which implies that some sounds are insignificant and need not be captioned.

Definitions that simplistically assume that all sounds can be captioned are not based on a deep understanding of the sound content of most movies and TV shows. In mainstream movies, captions do not "describe all the audio content," because sound is pervasive and multi-layered. In addition, the space for captions is severely limited. Action movies, especially, tend to be built on multiple layers of overlapping and competing sound tracks: dialogue, background music, sound effects, etc. Captioning style guides, even when they make or imply a distinction between significant and insignificant sounds, do not provide criteria for distinguishing the former from the latter. Sometimes such a distinction is easy to make. At other times, the captioner must decide which sounds are significant and which are not.

Let me offer an example of what I call overcaptioning, and then try to rebut the charge that any attempt by the captioner to decide which sounds should be captioned amounts to censorship. In this example, an emotional family reunion at the end of Taken, a 2008 thriller starring Liam Neeson, is disrupted by a captioned, incomplete, and ironic announcement over the airport's public address (PA) system (see http://accessiblerhetoric.com/?p=986). As the family embraces and speaks to each other in person for the first time since their daughter was kidnapped, sold into sex slavery, and then rescued by biological father Neeson, a partially muffled PA announcement interjects:

[Man on P.A.]
Attention travelers, you are not required—

Shall we go?

This airport does not
sponsor their activities.

In these three consecutive captions, a question from the stepfather ("Shall we go?") interrupts the PA announcement just as the announcer is about to tell viewers and listeners what is "not required." The main point of the announcement is muffled as the stepfather speaks. It is not possible for hearing viewers to make out the uncaptioned spoken words of the PA. Ironically, the movie itself is about activities that are not officially "sponsored" (i.e., kidnapping, human trafficking). In this sense, the PA announcement is relevant to the larger themes of the movie, even if the announcement is most likely not a public warning about kidnapping or slavery. But irony should never be enough to trump a scene's thematic intensity. Because the announcement disrupts the emotional intensity of the triumphant reunion and its main idea is inaudible (whose activities are not sponsored?), it should not have been captioned verbatim. A complete, verbatim rendering of the announcement is impossible anyway and only leads to confusion and distraction.

When we assume that only significant sounds should be captioned (rather than starting from the mistaken assumption that all sounds can be captioned), we begin to explore significance beyond volume level. The loudest sounds are not always the most significant, just as quiet and even partly inaudible sounds are sometimes in need of captioning. Captioners must honor the narrative above all. The stepfather whispers in his returned stepdaughter's ear in this scene but his words are hard to make out as clearly as other spoken words in this scene (including the PA announcement). A manner caption ("[Whispers]") modulates the spoken words, "It's so good to have you back." The daughter audibly cries out while the whisper is being uttered, making it difficult for hearing viewers to determine precisely what is said without the aid of captions. Yet despite the low volume of the whisper, it needs to be captioned because it is a crucial component of the emotional reunion at the end of the film. Captioning should thus be driven by the scene's purpose. Volume level alone may be helpful but not sufficient in determining which sounds should be captioned.

Based on my reading of this scene from Taken, I offer five guidelines for thinking through the question of significance:

  1. Captions should support the emotional arc of a text.
  2. A sound is significant if it contributes to the purpose of scene.
  3. Caption space is precious. It should never be wasted on superfluous sounds that may confuse viewers or diminish their sense of identification with the protagonist(s).
  4. Sounds in the background do not necessarily need to be captioned, even if they are loud.
  5. Every caption should honor and respect the narrative. While the narrative does not have one correct reading, it does have a sequence and arc that must be nourished.

By suggesting that the captioned PA announcement at the end of Taken might be edited down to a sound description such as "PA announcement" or "crowd talks indistinctly," I realize that I open myself up to charges of censorship. Indeed, I am acutely aware that deaf, hearing, and hard-of-hearing viewers do not want dumbed-down captions. Every viewer deserves equal access. The original practice at WGBH of editing speech for comprehension (Earley, 1978) is no longer advocated today—for good reason. But what discussions of verbatim captioning leave out are 1) the differences between writing and speech, and 2) the limited space available for captions. Everything cannot be captioned. There are some pretty loud footsteps in the Taken scene, some loud but indistinct chatter, and at least one loud car horn. Someone had the good sense not to caption them. Caption space is limited. Only significant sounds should be captioned. The footsteps seem to be almost as loud as the PA caption, but volume alone should never drive caption design. Captioning is an art. The captioner must contend with spatial and temporal constraints while being responsive to the rhetorical needs of the narrative. Countless sounds are left out of every caption file—they have to be left out. Time and space are working against the captioner, but more importantly viewers do not want to be burdened by a screen full of insignificant captions. Censorship is simply the wrong word to describe the selective and creative process of captioning. What I am describing is not censorship—far from it—but the art and rhetoric of captioning. Someone must make these decisions. Captioning is not an objective science; it is a highly interpretative practice. (By the way, it sounds as if there is a foreign language PA at the beginning of the Taken scene. It is not significant and thankfully there is no reference to it—even a vague one like "Foreign PA announcement"—in the caption file. It is not censorship to leave it out but reflects instead an attempt to be responsive to this scene's purpose.)

The flip side of overcaptioning—what I appropriately call undercaptioning—reveals the extent to which captioning is a rhetorical, purpose-driven act that can not be objectively reduced either to a production script or to the volume level of individual sounds. In Curb Your Enthusiasm, only speech is captioned. 2 This so-called subtitle approach to captioning leaves out sound descriptions entirely, thus cutting deaf and hard-of-hearing viewers off from significant non-speech content. Descriptions of sounds, even seemingly minor or barely audible sounds, are necessary to convey the full significance of the sound track. Episode 9 of Season 1 (2000) centers around Larry's repeated attempts to fill a prescription for Cheryl, thereby alleviating her desire to scratch (see http://accessiblerhetoric.com/?p=1048). When Cheryl is visibly scratching, captions are usually unnecessary or can be kept to a minimum. But when Cheryl's scratching can be heard but not seen, the need to caption it increases. A subtitle (dialogue-only) approach is insufficient. Some examples of scratching in this episode are both invisible (off-screen) and barely audible. Cheryl scratches off-screen in the darkened interior of a car and, later in the episode, scratches off-screen as loud background music plays. At one point, Larry stares at the ceiling as both background music and soft scratching sounds are heard. Because the humor of this scene depends upon the soft sound of scratching in the background, it needs to be captioned. Similarly, the background music, borrowed from the soundtrack to Psycho, is rhetorically significant and needs to be captioned. In the final seconds of the episode, Larry ascends the stairs of his home to the slight sound of scratching, revealing his inability to fill Cheryl's prescription. The full significance of this final moment is not available to deaf and hard-of-hearing viewers who rely on captions to provide information about both speech and non-speech content. Even barely audible sounds such as light scratching need to be captioned if they are significant to the narrative. In this case, significance is not simply a matter of sound volume. In a second Curb video, I explore five significant non-speech sounds from season two that should have been captioned but were not (see http://accessiblerhetoric.com/?p=1343).

Captioners need to understand a movie or TV show in context. Individual scenes may need to be captioned with a view to how the narrative is unfolding at any given moment. In other words, significance is a contextual issue and can only be addressed from inside the arc of a particular narrative. The background music in the scratching scene from Curb establishes an emotional context that must be carefully captioned to convey that context to caption users. A musical note would be insufficient in this case, although it might be perfectly acceptable in others.

Captioners may also need to be responsive to context beyond the individual scene or episode, what I refer to as series awareness. When TV episodes are closed captioned in isolation, without an awareness of how individual sounds are connected intertextually to other episodes in the series, caption users may be cut off from important themes in the show. Because every TV series builds a set of relationships that connects all the episodes together, these relationships, when mediated through sound, need to be captioned. Consider recurring music that's identified with a specific character or show theme. The difference between "[Disco]" and "[The Final Countdown]" is no small difference if the show is Arrested Development and the character is Gob. Fans associate Europe's 1986 hit "The Final Countdown" with Gob. The song always accompanies Gob's magic act. It has been called "his trademark opening song" (Arrested Development Wikia). Consider a clip from episode 8 of season 3 (see http://accessiblerhetoric.com/?p=2933). As a recurring element intimately tied to a major character, the song must be captioned by title, such as ["The Final Countdown"] or [Europe's "The Final Countdown"], and accompanied by the requisite music note. A generic caption such as [Disco] will not suffice to convey the full meaning of the music in the series. While [Disco] may work in isolation, it does not work from a series perspective. In this clip, a snippet from another recurring song in the series—"It ain't easy"—is undercaptioned as [Country]. Only someone unfamiliar with the show would reduce this song to [Country]. When captions are considered from a series perspective, themes are more likely to be visible on the caption track.

When scenes or episodes are not captioned contextually, captioners may provide crucial, significant information but at the wrong narrative moment. For example, the caption track of Pirates of the Caribbean 2: Dead Man's Chest (2006) gives away the natives' big secret a full fifteen minutes before the narrative is ready to do so (see http://accessiblerhetoric.com/?p=803). The natives' cannibalism—a significant plot device that is revealed all at once in the narrative—is revealed in the captions too soon through the use of sound descriptions such as [speaks cannibals' language] and [cannibals murmur]. As a result, the captions provide too much information and thus fail to nurture suspense and surprise. Movie captions must never reveal information prematurely. To caption contextually requires a rhetorical sensibility that takes the act of captioning beyond simple transcription of speech content. In order to understand what a narrative needs from captions, the captioner must draw from a global understanding of the text. Applied to movies, the notion of series awareness directs captioners to account for how individual scenes are intimately connected and should thus not be captioned in isolation. Individual captions must work together to support and reflect the movie's emotional intensity and developmental arc. Surprise and suspense must be nurtured and captions must be situated firmly in the moment, never ahead of it.

This example from Pirates also suggests that captions have the power to manipulate time. The viewer watching with captions knows ahead of the viewer watching without captions that the natives are cannibals. On a local (single caption) level, captions may allow experienced readers to read ahead of the speech being captioned. A short two-line call-and-response between two speakers can be read very quickly. In my house, we have laughed or otherwise responded to a captioned joke before the joke has been uttered. Captioners need to be sensitive to the ways in which captions necessarily modify time. Consider the guideline to identify the name of any speaker who is off-screen or cannot be identified visually. This guideline may come into conflict with the captioner's need to be responsive to a narrative that contains elements of mystery, suspense, or surprise. In a scene from the TV show Dollhouse (Season 1, episode 10), a man lurking in the shadows says something to Echo (played by Eliza Dushku). Unable to see who it is, Echo responds, "Who's there?" At which point, the man comes out of the shadows and Echo recognizes him as Nicolas (see http://accessiblerhetoric.com/?p=605).

NICOLAS:
Strange—he doesn't
like most people.

Who's there?

I have a better question.

While speaking the third caption ("I have a better question"), Nicolas comes into the light and reveals himself. But closed-caption viewers knew it was Nicolas from the moment he started to speak from the shadows, because the caption had identified him with a Speaker ID. For better or worse, captions have the potential to make time travel possible.

From the viewer's perspective, knowing the future does have a price. No longer do we ride that same wave of suspense with Echo. We know right away that Nicolas is skulking around in the shadows. Closed captioners need to consider carefully how captions affect the emotional arc and intensity of a scene or show. Captions need to be situated. The captioning convention that calls for Speaker IDs (we cannot see Nicolas so we better identify him in the caption) should not have been invoked in this case. The need to build suspense in this particular scene trumps the need to identify speakers who are off-screen (or obscured by that terrifying darkness). The practice of captioning should be rhetorical. Captioning, done well, demands that conventions be applied flexibly. A rhetorical approach uses purpose, genre, structure, emotion, content, and audience to determine the best course of action. For another example of the time-traveling potential of captions, see http://accessiblerhetoric.com/?p=483.

Caption viewers may be out of sync, just slightly, with the action, or worse, stripped of the full experience of surprise and suspense. Perhaps it is not an advantage to us, after all, when captions reveal secrets before the movie is ready to share them. But the larger point—encompassing any discussion of specific advantages or disadvantages—is that no one is really talking about the rhetoric of closed captioning, the ways in which captions (and the interplay of writing and sound more broadly) create experiences for users that are different from uncaptioned experiences. Captions are not simply the text equivalent of spoken dialogue but create different opportunities for users, mediate meaning making differently, and add subtle and complex layers of meaning to video texts. By analyzing the rhetorics of closed captioning, we can offer new critiques of the limits of current thinking about caption style and, hopefully, improve caption technology and stylistic conventions. By showing how closed captions can provide different (even advantageous) viewing experiences over traditional, non-captioned experiences, we can help to bring closed captions into the mainstream.

In closing this section, let me offer two final examples to suggest how captioned viewing experiences can be notably different than uncaptioned viewing experiences, despite the best efforts of captioners to ensure equal access to information. The first example involves what I call the backchannel. The term usually refers to audience discussions at conferences that take place live and behind the scenes—e.g., on Twitter using hashtags. I want to appropriate the term here to describe background sounds that come forward when they are captioned. Every sound becomes equally "loud" when it is transferred to the caption track. The distinction between background and foreground blurs. The Happening (2008), for example, contains a number of scenes of crowds of various sizes chattering and murmuring in the background (see http://accessiblerhetoric.com/?p=1041). When the crowd's chatter is captioned as indistinct sound (e.g., "chattering"), it remains in the background. When the crowd's chattering is captioned verbatim, it comes forward and becomes clear. Examples of verbatim backchannel sounds include:

[Man] We need an extension over here.

I've never seen anything like it.

[Man] Go back to the laughing please.

Sir, we'll need to
check your suitcase.

[Man] Please have your tickets out and ready.

I know, I know you don't want to.

- Do you have a phone?
- No, no. I'm sorry. I don't.

They're not telling us anything.

[Woman] Drive. Just roll up the window.

- We have no communication.
- We don't have a cell phone.

Has anyone seen any other people on the roads?

[Man] I just walked down a quarter mile. It was clean.

[Woman] I'm here. Right here. Keep going.

In these examples, the crowd's chatter is brought forward because it is captioned. Without captions, this background chatter remains indistinct to hearing viewers. Put another way, captions equalize sounds by removing or downplaying the distinctions between loud and quiet sounds. All sounds become equally "loud" on the caption track. For hearing viewers, captions make speech and other sounds accessible that are otherwise hard to understand, not loud enough, spoken too quickly, and so on. In this way, captions clarify. Because the distinction between foreground and background breaks down on the caption track, captioners need to ensure that only significant sounds in the background are brought forward.

Captioned viewing experiences are also notably different because the affordances of writing and sound are different. A single non-speech sound can be captioned in multiple ways. Consider "breathy" sounds such as gasps, scoffs, pants, sighs, grunts, and heaving breathing. The difference between a gasp and a pant may come down to more than the objective qualities of the sound itself. Facial expression and context influence how a sound should be described. Similar "breathy" sounds may warrant different captions, just as different sounds may rely on the same caption. The English lexicon is quite limited when it comes to offering a descriptive language for paralinguistic sounds. With this in mind, I offer a compilation of breathy, non-speech sounds from the movie Twilight (2008), a movie I only half-jokingly call the "gaspiest movie in the world" (see http://www.accessiblerhetoric.com/?p=54). Dramatic breathing, especially from Bella (Kristen Stewart), plays a recurring, visible, and captioned role in the film. Bella gasps, sighs, grunts, and pants her way through the narrative. What we learn about paralanguage and captions from Twilight is that:

  • Captions are situated in the context of the narrative. How a sound should be described depends on what that sound does—i.e., the purpose it plays in the context of the scene.
  • Captions may be arbitrary (e.g., two "gasps" may sound very different).
  • The English lexicon offers few options for captioning paralanguage. A single caption ("gasp") is called on to represent a range of sounds and emotions, including fear, surprise, desire, anger and pain.
  • Captioning everything is not practical, feasible, or necessary. Not every "breathy" sound in the film is captioned. Space may not allow, and context may not require, the inclusion of some paralinguistic sounds as captions.

We do not yet have any rhetorical studies of closed captions. I hope that other scholars will begin to find here a fertile ground on which to explore the intersection of multimodal composition and accessibility, and sound and writing more broadly.

Conclusion: Naturalizing Captions

A rhetoric of closed captions, in keeping with the examples I have explored here, is rooted in the following claims:

  • Captioning is a rhetorically complex and creative act. Captioners are rhetorical agents who must, at times, make decisions about which sounds to caption and how to caption them.
  • Captions provide a different experience of the text.
  • Captions do not merely transcribe audio content but transform it.
  • Significant sounds do not exist in a vacuum. Significance is contextual, not simply a function of volume level.

At their best, captioners are rhetorical agents who choose what and how to caption based on a thorough understanding of the text, including its narrative and emotional arcs. While texts do not have single meanings, they do have relationships and contexts that must be reconstructed and nourished.

Closed captions have wide, universal appeal, and as they become more visible and common on the Web, my hope is that they will begin to seem more natural and even desirable among a wider range of viewers. When captions are seen as universally desirable, they will be considered by more scholars as central as opposed to incidental or not considered at all. For example, captioning technology is being leveraged to produce richer viewing experiences on the Web through interactive transcripts, which allow users to click anywhere in a video transcript and be transported to that moment in the video where the clicked words are spoken. Ted.com provides some of the best examples of interactive transcripts. Users can listen to a video presentation in English, load captions in a second language, and read the transcript in a third language. Loading all three in English provides excellent support for deaf, hearing, and hard-of-hearing viewers. Interactive transcripts signify a leap forward for captioning technology (see Zdenek, forthcoming). Interactive transcripts, like enhanced TV episodes that rely on captioned commentary, suggest an expanded role for captions that could, one hopes, make them part of the standard topography of the typical web video.

Captions also have a crucial role to play in certain genres of user-generated content, such as popular songs accompanied by on-screen (captioned) lyrics, and so-called "literal music videos," which are parodies that replace the original lyrics with new lyrics based on what is happening visually in the official music video. These parody videos depend for their humorous effect on open captions. Captions are an integral part of the genre. (For my own attempt at caption-based parody, see http://accessiblerhetoric.com/?p=1100). Moreover, parody provides a direct route from accessibility to multimodal composition. Consider the countless parody videos of Downfall (search YouTube for "downfall parody"), including one published anonymously in Kairos (theamishaugur, 2008). The Kairos submission is explained in a brief footnote by the journal editor in terms of the ethics of publishing anonymously (Ball, 2008). But we might just as well discuss the Downfall meme in terms of universal design. You do not need to be a hearing person to get the Downfall joke. With this seemingly simple and obvious point, we can counter the tendency in our fields to assume an able-bodied user/student. Parody is leveraged into a discussion of universal design.

Captioning research has the potential to make a number of contributions to the study of multimodal composition. Captioners and captioning researchers study the meaning and significance of sound, the relationship between writing and sound, the visual display of sound, remediations of sound, how to convert sound into writing, the sonic backchannel, how to make sound accessible, sonic intertextuality (e.g., series awareness), and the design of accessible pedagogical soundscapes. These topics complement and can potentially inform sound studies in composition and related fields (e.g., Alexander 2008; McKee 2006; Mueller 2009; Shipka 2006). But more importantly, captioning research—and disability studies more generally—calls attention to our underlying beliefs and assumptions about users and students. If we start from the assumption that our pedagogies and multimodal compositions need to be accessible, if we assume that not all of our students are able-bodied digital natives, we can develop richer, more informed, more robust, and more accessible pedagogies, tools, technologies, and texts. We limit our theories when we assume that all of our students are hearing (e.g., see Alexander 2008), or, as McKee (2006) does in her essay on sound in multimodal texts, when we recognize the "important issue" of accessibility but simply choose not to discuss it. McKee (2006) describes accessibility in terms of discrete guidelines (e.g., "providing subtitles for all sounds" [p. 335]), but my preceding analysis has tied accessibility to a set of broader questions about sound and writing that transcend such a narrow and simplistic definition of subtitles. (As I have shown, no subtitle or caption track can ever account for "all sounds." Anyone who has given the slightest thought to captioning quickly arrives at the same conclusion.) If we wish to provide robust accounts of multimodal composition, we need to inform our understanding of sound with an accessibility-infused sensitivity to the broader questions about sound, writing, and rhetoric at the heart of this essay's analysis. We need to start with universal design, not dismiss it.

Captioning research can and should be done in concert with deaf and hard-of-hearing users and researchers. My own view of captioning is informed by the style guides' insistence that deaf and hard-of-hearing caption viewers prefer verbatim captions, and by studies such as Jensema et al.'s (2000) on the amount of time deaf and hard-of-hearing viewers spend reading captions. Gallaudet University's extensive study of 37 hours of non-speech information, coded in "deaf and hearing teams," is another example of how captioning research can be informed by the experiences and preferences of deaf and hard-of-hearing users and researchers (Harkins et al., 1996). More broadly, research in deaf studies can provide a context for understanding the complex relationships of deaf people to sound (Brueggemann, 1999; Edwards, 2010; Lane, 2010; Padden & Humphries, 1990).

Given the explosion of online video and the growing awareness of the importance of web accessibility, we ignore or simplify captioning at our peril. Future research can extend the framework offered here in a number of directions. The handful of examples analyzed in this essay are intended to be a starting point for larger studies over a wider range of multimodal texts. Recently, I have begun collecting DVD caption files, extracting non-speech captions from them, and coding these captions across a large, representative corpus of movies (see http://accessiblerhetoric.com/?p=2361). Future studies will explore large collections of captions from a rhetorical perspective. How different scenes, episodes, or movies are or should be intertextually linked on the caption layer is an ongoing question of interest for me, one that grows out of my recent work, as mentioned above, with "series awareness." Comedy and parody are also promising areas of research, if simply because mainstream audiences have recognized the importance of captions for delivering parody. To explore the question of rhetorical agency, studies of captioning practices are needed (e.g., interviews with and observations of captioners). Sonic allusions and cultural literacy are promising areas of research as well (e.g., see http://accessiblerhetoric.com/?p=3116). Finally, the influence of commercial interests on the display of captions is a potential problem worth exploring (e.g., see http://accessiblerhetoric.com/?p=2695).

The promise of an increasingly captioned digital world and the growing importance of web accessibility and universal design make captioning an exciting, important, and as of yet unexplored area for rhetoricians.

Works Cited

Endnotes

  1. On June 29, 2010, I used "Power Search" in iTunes to search for closed-captioned content. I enabled the checkbox to search for CC content while leaving all the other fields blank, and this search returned a list of all the movies with closed captions. Unfortunately, I was unable to determine the total number of movies available on iTunes. According to Apple (2010), the total is somewhere in the "thousands."


    Return to Text
  2. Seasons 1 and 2 on DVD adopt a dialogue-only (subtitle) approach to captioning. I do not know whether the same is true for other seasons of Curb Your Enthusiasm.


    Return to Text
Return to Top of Page