The Timed Text Track JavaScript API
Introduction
In the HTML5 part 1 course, we saw that <video> and <audio> elements can have <track> elements. A <track> can have a label, a kind(subtitles, captions, chapters, metadata, etc.), a language (srclang attribute), a source URL (src attribute), etc.
Here is a small example of a video with 3 different tracks (“……” masks the real URL here, as it is too long to fit in this page width!):
- <video id=”myVideo” preload=”metadata” controls crossOrigin=”anonymous”>
- <source src=”https://…../elephants-dream-medium.mp4″ type=”video/mp4″>
- <source src=”https://…../elephants-dream-medium.webm” type=”video/webm”>
- <track label=”English subtitles” kind=”subtitles” srclang=”en”
- src=”https://…../elephants-dream-subtitles-en.vtt”>
- <track label=”Deutsch subtitles” kind=”subtitles” srclang=”de”
- src=”https://…../elephants-dream-subtitles-de.vtt” default>
- <track label=”English chapters” kind=”chapters” srclang=”en”
- src=”https://…../elephants-dream-chapters-en.vtt”>
- </video>
And here is how it renders in your current browser (please play the video and try to show/hide the subtitles/captions):
Notice that (unfortunately), the support for multiple tracks differs significantly from one browser to another. You can read this article by Ian Devlin: “HTML5 Video Captions – Current Browser Status”, written in April 2015, for further details. Here is a quick summary:
- IE 11 and Safari provide a menu you can use to choose which subtitle/caption track to display. If one of the defined text tracks has the default attribute set, then it is loaded by default. Otherwise, the default is off.
- Chrome and Opera: these browsers don’t provide a menu for the user to make a choice. Instead, they load the text track set that matches the browser language. If none of the available text tracks match the browser’s language, then it loads the track with the default attribute, if there is one. Otherwise, it loads none. Let’s say that support is very incomplete (!).
- Firefox provides no text track menu at all, but will show the first defined text track only if it has default set. It will load all tracks in memory as soon as the page is loaded.
So…. how can we do better? Fortunately, there is a Timed Text Track API in the HTML5/HTML5.1 specification that enables us to manipulate <track> contents from JavaScript. Do you recall that text tracks are associated with WebVTT files? As a quick reminder, let’s look at a WebVTT file:
- WEBVTT
- 1
- 00:00:15.000 –> 00:00:18.000 align:start
- <v Proog>On the left we can see…</v>
- 2
- 00:00:18.167 –> 00:00:20.083 align:middle
- <v Proog>On the right we can see the…</v>
- 3
- 00:00:20.083 –> 00:00:22.000
- <v Proog>…the <c.highlight>head-snarlers</c></v>
- 4
- 00:00:22.000 –> 00:00:24.417 align:end
- <v Proog>Everything is safe. Perfectly safe.</v>

The different time segments are called “cues” and each cue has an id (1, 2, 3 and 4 in the above example), astartTime and an endTime, and a text content that can contain HTML tags for styling (<b>, etc…) or be associated with a “voice” as in the above example. In this case, the text content is wrapped inside <v name_of_speaker>…</v> elements.
It’s now time to look at the JavaScript API for manipulating tracks, cues, and events associated with their life cycle. In the following lessons, we will look at different examples which use this API to implement missing features such as:
- how to build a menu for choosing the subtitle track language to display,
- how to display a synchronized description of a video (useful for disabled people, for example),
- how to display a clickable transcript aside the video (similar to what the edX video player does),
- how to show chapters,
- how to use JSON encoded cue contents (useful for showing external resources in the HTML document while a video is playing),
- etc.