Jean Delisle
(Language Update, Volume 6, Number 4, 2009, page 21)

A new language profession

Do you have a keen interest in communication and the French language? Has nature blessed you with an acute sense of hearing, good diction and an even-toned voice? Do you have good elocution and concentration, a quick mind and sound judgment? Do you have a university degree in communications, translation, linguistics, history or a related field? Do you have a solid all-round education and do you like to be on top of current affairs? Can you deal with arcane French grammar rules? Do you handle stress well and can you work under pressure? Would you be able to rephrase what you hear and use a multi-function video game joystick? If you can answer yes to all of the above, you have everything it takes to be a voicewriter.

This emerging professionRemark b is expected to expand considerably in the coming years because of the combined effect of at least three factors: population aging (nearly 25% of Canadians aged 75 or over have a hearing disability), new CRTC requirements that television broadcasters provide closed captioning for all of their programs during peak viewing hours, and a shortage of sign-language interpreters and stenotypistsFootnote 1. For the past 15 years, the Regroupement québécois pour le sous-titrageFootnote 2 has sought to increase the number of French closed-captioned television programs and films. Since 2003, the Cité collégiale d’Ottawa, a French-language community college, has been offering a three-year certificate program in computer-assisted stenotypy to make up for the shortage of French-language stenotypists. Although this is a well-paid occupation, there have not been enough registrants to make it possible to launch the program.

Two years earlier, French television network TVA allocated $500,000 to the Centre de recherche informatique de Montréal (CRIM)—Montréal informatics research centre—for the development of a prototype that would provide live closed captioning of news bulletins with the help of automatic voice-recognition technology adapted to Canadian French. Since the existing software programs on the market were designed to recognize European French accents, CRIM developed a live closed-captioning system called STDirect. This system, which is one-of-a-kind in the Francophone world, was used in a TVA broadcast for the first time in 2004. Using encoded titles, it displays the dialogue and sound effects of a video program in printed form. A decoder is required to make the dialogue visible on television screens.

A major challenge

The Honourable Jean-Robert Gauthier, a retired Liberal Senator, former Member of Parliament and ardent promoter of the French language, who became hard-of-hearing after a viral infection, put pressure on the parliamentary authorities to have House of Commons debates closed-captioned. Since 1991, English closed captioning has been provided by stenotypists, whereas the Francophone public has been served by sign-language interpreters. Not all hard-of-hearing persons understand sign language, as is the case of Senator Gauthier. They need to be able to read the text of what is said. At the request of the House of Commons, the Translation Bureau, whose Interpretation and Parliamentary Translation Directorate provides Parliament with interpretation and translation services, participated in a live closed-captioning pilot project from 2005 to 2007. CRIM carried out the project, and the new House of Commons service started operating in the fall of 2007. For the time being, it is limited to Question Period, which is held every day when the House is sitting from 2:15 pm to 3:00 pm, Monday to Thursday, and from 11:15 am until noon on Fridays. Because of its linguistic expertise, the Translation Bureau was asked to assess the quality of the closed captioning and worked closely with CRIM, which sent periodic performance reports to the Bureau.

The project was an enormous challenge and involved making sure that the closed captioning was displayed on the Cable Public Affairs Channel (CPAC) with as little time lag as possible when a Member of Parliament spoke in French in the House, or that the interpretation was done into French if the MP spoke in English. The obstacles were overcome, and the complexity of French grammar was by no means the least one. The House of Commons audio signal is transmitted to Montréal over a telephone line. In a soundproof CRIM studio (Fig. 1), a voicewriter seated in front of a screen sees the cable link-transmitted picture of the MP speaking and repeats what the speaker says. The system, which recognizes the voicewriter’s voice, transcribes what the voicewriter says, and the resulting text is coded and then sent over the telephone line to the Line 21 encoder in the Parliament buildings. The back-and-forth transmission time is two seconds. The broadcasting of the closed captioning on CPAC is done by cable link, which takes about another two seconds. With an approximate time lag of four seconds, this can be called “simultaneous” closed captioning in the same way as simultaneous interpretation.

Description of this image follows

Fig. 1 – Voicewriters providing live closed captioning in a soundproof booth.

The CRIM team did not work in isolation. Deaf and hard-of-hearing persons were frequently consulted and all of CRIM’s clients, broadcasters for the most part, made use of the successive improvements to the closed-captioning environment. According to Michel Boissonneault, a trained linguist and translator, former French teacher and manager of closed-captioning and visual interpretation services at the Translation Bureau: “We focused our efforts on the intelligibility of the closed captioning. The infinitive form of a verb may display in the past participle form, but this grammatical error in no way jeopardizes the intelligibility of the message. The voicewriters who have been working on the project since 2005 have acquired a lot of experience and are very good.” When you see them at their workstations, you may think they are interpreters (they do their work with headphones and microphone in a soundproof enclosure), but they are not making linguistic transfers. They are intermediaries between two modes of expression: oral and written.

When an English-speaking parliamentarian speaks in the House, the interpreter in the French booth listens to the English and reformulates it in French. The voicewriter, on the other hand, listens to the French (the French of the interpreter or of a person in the House speaking French) and repeats the French. However, interpreters would be ill-advised to look down their noses at voicewriters and nickname them parrots because in ancient Egypt, it was parrots that were used to symbolize the interpreter’s profession. In ancient Carthage (9th century BCE), there was, in fact, a privileged cast of interpreters whose heads were shaved and bore a distinctive tattoo representing a parrot. This parrot had folded wings if the interpreter worked with a single foreign language and outstretched wings if he or she knew several languages.Footnote 3

The art of voicewriting

It would betray an ignorance of the true nature of their work if you were to refer to voicewriters as parrots because you thought they did a mindless recovery task requiring no thinking effort. Whereas, as soon as the message is understood, the interpreter must break through the verbal straitjacket and re-express the core meaning—and everyone agrees that it is not easy to do and this verbal conjuring trick rightly receives general admiration—voicewriters for the House of Commons are required to stay closer to what is said and carry out a verbatim recovery of the spoken words. That does not imply, however, that they are just word “chewers,” automatic converters or human robots.

Voicewriters have to have a good understanding of what the parliamentarians are saying and must be particularly attentive to the manner in which the words are expressed in order to be able to make any necessary adaptations. As soon as they realize that the system cannot correctly process a particular segment of a statement (a foreign word or a word missing from the basic vocabulary), they have to solve the problem quickly. Thus when the name of the village of Kashechewan in Manitoba came up for the first time, there was a strong chance of its being confused with the province of Saskatchewan. The voicewriter quickly intervened and skilfully substituted an equivalent paraphrase, “the Aboriginal community in northern Manitoba.” Each phonetic sequence must correspond to a lexical entry in the voice recognition software application. Similarly, the system can easily recognize the trinomial la commission Gomery (included in its dictionary), but le rapport Gomery is at risk of being displayed as le rapport gomme rit.

Voicewriters have to be quick-witted sometimes and insert a generic form rather than a specific name. For example, the name of a tourist mispronounced by a parliamentarian or inaudible because of noise in the House might be rendered as “this man imprisoned in Mexico.” Sometimes acronyms and the names of companies and associations require similar treatment. The same with English words. “Bugs Bunny” will be rendered in French as “a cartoon character.” The voicewriter had to intervene when MP Denis Coderre stated in the House: “C’est une bande de Mickey Mouse!” and “Il se prend pour Forrest Gump avec sa boîte de chocolats.” Says voicewriter Sophie Leclerc, “We were not going to repeat something like that. We try to use equivalents that convey the spirit of the images evoked in the speaker’s words. It is true that we do not always reproduce the same colour of language.” For that reason, the closed-caption version is sometimes more “refined” than the original and uses a more formal level of language. Because they are very attentive to what is said and familiar with the topic, voicewriters permit themselves to correct obvious errors. If the interpreter or speaker says “millions” of dollars, when the context clearly means “billions,” the mistake is corrected. The same applies to slips of the tongue. If former Prime Minister Paul Martin were called Pierre by mistake, his correct name would be used.

Punctuation and atmosphere

Voicewriters are also expected to insert punctuation in the text scrolling in front of them and, up to a certain point, to recreate the atmosphere on the floor of the House of Commons. How do they do that? They use a pre-programmed joystick. In addition to the main punctuation marks (? . , !), they can display various messages or “events” such as [noise], [interpreter’s voice], [end of translation], [incomplete sentence], etc. Using other buttons, they can also erase the screen if technical problems convert the text into unintelligible scribble, or they can activate other functions of the application.

Voicewriters must also deal with interpreters’ performances and styles. Some interpreters speak in a clear flow that is easy to follow, while others have a laboured way of speaking and express themselves in a choppy, hesitant manner. Still others take a longer time to reorganize the speaker’s ideas and then produce the interpreted words at rapid-fire speed. “It is not always easy to follow the speaking rhythm of some interpreters,” says Simon Dupuis, one of the first voicewriters recruited by CRIM in 2005, “and just as the interpreters have their favourite and least favourite Members of Parliament, so too the voicewriters have their favourite interpreters.” However, it should be said in defence of the interpreters that some parliamentarians have a very rapid elocution speed (over 130 words per minute). Since it is impossible to slow them down, the interpreters and voicewriters have to adjust to them. It is a requirement of their jobs and must be done live and in the heat of the moment.

When English-speaking parliamentarians decide to speak in French, their French is sometimes shaky, broken and punctuated by mistakes. So the voicewriter rephrases the person’s words more clearly and concisely without changing the meaning. And what do voicewriters do with all the proper nouns and rarely used terms? “There are always words that are not in the vocabulary, but the system adapts and expands the vocabulary every day,” explains Julie Brousseau, who is Production Director of CRIM’s Closed Captioning Bureau (Fig. 2) and a speech recognition specialist with a master’s degree in linguistics. She worked at Dragon Systems in Boston, where she adapted the DragonDictate commercial system to Canadian French before joining the CRIM team, where she participated in a research project set up to integrate voice recognition and machine translation. The phenomenal rapidity of the system designed at CRIM is due in part to the speed of the new microprocessors and in part to storage of the information in finite-state graph form. Julie Brousseau explains, “For a particular acoustic sequence, the system analyzes the acoustic probability and the probability of the language model, then determines a weighting between the two. The result produces a voice-recognition hypothesis that displays on the screen.” All of this in a fraction of a second.

Description of this image follows

Fig. 2 – Julie Brousseau, Production Director, Closed Captioning and Speech Recognition Services (CRIM).

It is difficult for us to imagine the level of concentration, coordination and quick-mindedness that the voicewriter’s job requires, not to mention good hearing and sight, good speaking ability and dexterity. Voicewriters have to perform many operations consecutively or simultaneously: listen to the message, repeat it intelligibly, insert punctuation in the written versions scrolling in front of them, indicate an event, correct an error as the text passes by, find an equivalent for a foreign word or a word not found in the dictionary and view the scrolling of three lines of text on screen (some applications even require the voicewriter to toggle the closed captions from the bottom to the top of the screen)–and all of these operations online, at the very instant that the oral communication occurs and without a safety net. You have to have a sharp intelligence and well-developed communication skills to be able to manage it all. Voicewriters work in tandem and understandably need to alternate every 20 minutes, as conference interpreters do.

The actual closed-captioning session is preceded by a pre-production stage, as they say in CRIM jargon. During the pre-production stage, the voicewriters study information on the hot topics of the day that are most likely to be discussed in the House of Commons. They enter new terms in the vocabulary and update the system before going on air. Every evening, the system uses an algorithm to carry out automatic term extractions in French-language websites and enters all of the new terms in the basic vocabulary (names of crewmembers who died in a plane crash, for example). After each session, the voicewriter does post-production work that involves listening again to what was recorded and comparing it with the transcript, and making necessary corrections (grammatical agreement, etc.). New words are added to the dictionary.

A significant advantage of the STDirect system over stenotypy is information sharing. Stenotypists set up their own databases that they alone can use, whereas the STDirect databases can be used by all voicewriters, on the sole condition that the system be able to recognize their voices. A parallel can be drawn with the personal card files that translators used to jealously keep for their exclusive use in days gone by and the large public-access terminology banks that are now available to thousands of users.

A promising future

L’Express magazine predicted in 1984 that by the year 2000, close to 25% of the labour force would be working in new occupations, and that these occupations would be based on new technology. The magazine’s prediction was accurate. To the list of new occupations that have come into being in recent years—sea farmer, biogeneticist, cryologist, 3D animation designer, software developer, computer graphics designer and terminologist—we can now add the new occupation of voicewriter.

A voicewriter is an intermediary in the communication chain, just like an interpreter or a translator. After passing a rigorous French examination and a dexterity test (handling of the joystick), candidates have to spend about 40 hours familiarizing themselves with the special closed-captioning environment before actually doing the work. They must spend about ten hours making audio recordings in order to calibrate the acoustic models of the voice-recognition software program to their voices.

Thereafter, the learning process is ongoing and, as in any other profession, experience is acquired over time. “The occupation of voicewriter is not a simple casual job that you can do to pay your way through school. You have to make a long-term commitment,” says Karyn Chartrand, who has been working in this occupation since 2006. It is possible to go into a career as a voicewriter and to assume that an increasing number of positions will be created. The upward trend will continue in parallel with the adoption of the new technology in broadcasting and production companies. Closed captioning may also prove to be an auxiliary language teaching method for new immigrants.

In the beginning, Senator Gauthier wanted closed captioning to be provided by stenotypists, and he did not hide his scepticism about live voice recognition-based closed captioning. His attitude changed when he saw the quality of the product. In two years, the accuracy rate of closed captioning in Question Period has continually improved. It is currently higher than 94%, an outstanding achievement. The quality of the STDirect system has earned several awards for its designers: the IWAY Award (2004), OCTAS Award (2005), Innovation Award (2005) and the CATA Alliance Innovation Award (2005).

“All clients have their own specific closed-captioning requirements,” says Julie Brousseau. “CRIM’s partnership with the Translation Bureau and the House of Commons for the purposes of providing closed captioning for Question Period has contributed to significant technological advances because of the considerable pressure to achieve a high level of performance. The context for using closed captioning was highly conducive to the development of this innovative technology.” Now it is a question of expanding, in other words, setting up a live closed-captioning service company, which should help to raise awareness of this new technology and expand its use. It is not within the mandate of CRIM, which is first and foremost a research institute, to commercialize such services. However, that does not prevent the Translation Bureau from having a permanent team of voicewriters one day. After all, is not the House of Commons the only institution in Canada to broadcast its debates using live closed captioning in both official languages?

