Andrew Weaver, Media Preservation Librarian at the University of Washington (UW) Libraries, described recent efforts at his institution to caption legacy media materials. Before the 2020 pandemic, UW Libraries had been discussing the need for a media remediation project but with staff and students working from home throughout the year, the opportunity arose to move forward with a new workflow.
Due to the number of items that needed captioning, the Libraries opted for a workflow that incorporated a certain amount of automation while including manual quality-control checks and steps to add elements like sound effects and background music to the captions. Ultimately, UW Libraries settled on a workflow that went as follows:
- Automated captioning: Captions were initially produced by Microsoft Stream (a Microsoft 365 application);
- First pass: Project participants corrected the captions where speech was rendered incorrectly;
- Second pass: Project participants added sound effects, music notes, and speaker information to the captions;
- Third pass: Project participants corrected errors in the formatting of captions and timing with the video;
- First review: Students selected videos they themselves had not captioned and reviewed each at least twice, checking for spelling and grammar errors, timing issues, and completeness of captions. Each video was checked at least twice—once with the sound off;
- Final review: Staff completed final review of captioned content.
After captions were created, they were recombined with corresponding videos via the open source tool Ffmpeg. Access copies were updated in CONTENTdm and the Internet Archive, and captions were paired with relevant access and preservation files stored in UW’s digital preservation repository.
For an example of a newly captioned video, see “Report from Russia” on the Internet Archive.