Tuesday, May 13, 2008

Transcribing Voice and Inserting Captions with Screen Capture Movies

This post is a sequel to a recent post called futzing with captioning. Here I want to talk about the “more natural” process of recording the voice first, without a script, and then transcribing that. I have the expression more natural in quotes because that’s the way it is for me. I assume it would be that way most everyone else who has intimate and often quite implicit knowledge of a subject matter, so can talk about it with ease (though whether that talk is intelligible to others might be an issue). If you actually sound more natural by reading a script that you produced ahead of time, by all means do that. It likely will be faster overall and give a cleaner presentation. But as I said in the earlier post, I sound mechanical that way – there is a lack of spontaneity. Recording the audio without a script is much better and I believe will give a better experience for the student. But then making that video accessible is more of a chore.

Here are a few such movies at my blog The Economics Metaphor, those under the label “Cost.” The three movies run less than 9 minutes in total, but with the ancillary spreadsheet, Word doc, and self-test in the last of the movies, I believe this content would be sufficient to replace the hour lecture on Cost that many instructors cover in Principles or even in Intermediate Microeconomics. Others can judge that for themselves. Here I want to talk about the making of these things and what I’ve learned from doing them.

First, each movie recorded a region of the screen that was 640 x 480 in dimension. The bottom 640 x 90 area was deliberately set to have a plain background so the captions could overlay that area and not block out anything of interest. Do note that this is all done so the captions can be turned off if the user wants to do that. One can produce captions that are placed “underneath” the captured video as long as one is willing for those to display all the time. But then the entire captured region is shrunk so to squeeze in the captions, and the shrinking of the video means it will be narrower as well as less wide. So with the captions it will render with some black space on each side of the video. My approach avoids that black space on the sides. Ultimately, the produced movies are rendered somewhat smaller, at a custom size of 596 x 437, so they can fit into the Blog without appearing to overflow the column. They are still large enough to render as if at original size.

For the Cost Table video, I did the screen capture first and then recorded the voice while watching the movie the second time through. That’s my trick for making sure things move along and I don’t go off on too many tangents. But it is more laborious that way and so for the other two I recorded the voice while I did the capture. Even with that I had to rehearse the captures a few times to make sure I knew what I was going to do at each step. Then I set out to produce the transcript.

A lesson learned is to produce the transcript first without worry about the timings, and then use the built in Camtasia tool to put the timings in afterwards – more about that in a bit. After screwing around with various alternatives, I opted for each line to have 45 characters (which includes blank space between words). I don’t really know what is optimal in the tradeoff of having the text change too often, on the one hand, and having too much in any one view on the other. I tried for a reasonable compromise. And with that, I vowed to have two lines of text per caption. Three is possible but I thought that too much.

Once I worked down the routine, this is how I produced the transcript. I’d play a little bit of the video and then hit pause. Then I’d type a line, or two if I was lucky. Then I’d have to figure out if there needed to be a line break or not. If not, I’d play a little bit more of the video to pick up a few more words and proceed again. Some of my sentences are longer than two lines of text and sometimes they are very short. So in going from one caption to the next there had to be a way to signal that. I started each new caption with a capital letter as if it were the start of a sentence, whether that was the case or not.

The creating of the script is laborious and the person doing it would find life much easier if they had some idea for the subject matter, to see if they should make a literal translation or make some small changes in the text to enhance readability. Look at the captions for the Algebra video (the middle one). This will give you some idea of the issues with captioning text when there is technical content. I’m guessing that between writing the script and putting the timings in, it was about a half hour, this for a movie that is under 3 minutes in duration, a good incentive for keeping the videos brief and to the point.

The other thing I’d like to comment on here is just how colloquial the voice is. I thought my writing was informal, but au contraire, at least compared to the speaking. I think that’s necessary when talking about a subject from a technical point of view, for otherwise it will seem impenetrable. Also, my pacing is distinct from the pacing when reading text. I don’t pause much if at all at the end of the sentence. It’s almost as if a period is an excuse to accelerate the speaking. I believe the rapid speech conveys some sense that I’m interested in what I’m talking about, which is likely important to convey for otherwise the students will find this stuff very dry. But it also means that student might not get it in a first pass. One of the benefits from the video is that it can be replayed as often as the student would like.

To sum up, if the only concern was providing good content for the students, this would definitely be the way to go with screen capture movies, producing a capture where there is no audio for the script and then producing a transcript. But this is quite laborious, no doubt about it. So generating a large volume of content of this sort requires the fore knowledge that the videos will have substantial use value to a good number of students. If that condition is satisfied, then it’s likely also worthwhile to hire students (who’ve already taken the course) to create the transcripts and put in the timings.

No comments: