The Missing (2014) Subtitles 'LINK'
If subtitles for a title are offered in a language but do not display on your device, try another device. The Netflix app may not support subtitles for some languages including Arabic, Chinese, Hebrew, Hindi, Japanese, Korean, Thai, Romanian, or Vietnamese on devices manufactured before 2014, but most newer devices do support them.
The Missing (2014) subtitles
The main contributions of this paper are as follows: first, we propose a language agnostic approach for missing subtitle block detection using VAD and AC models. Our approach alleviates the dependencies on language reliant systems such as automatic speech recognition (ASR) and text translation models for this task. Second, we use a VAD model explicitly trained on DEC corpus, enhancing the robustness of the proposed method to various background noises present in DEC titles. Third, we present a baseline solution using the neural VAD model. Fourth, despite its robustness, the VAD system potentially identifies certain sounds as human speech. The effect of such false positives is reduced by our multiclass AC model, which identifies 121 categories of sounds and is trained on DEC and open source corpora. Finally, we show that our model results in (a) 10% reduction in incorrect predicted missing subtitle timings, (b) 2.5% improvement in identifying the correct locations of missing subtitles on real-world dataset, (c) 77% reduction in false positive rate (FPR) of overextending the predicted speech timings, and (d) 119% improvement in the predicted speech block-level precision over a VAD baseline on a real-world human-annotated dataset of missing subtitle speech blocks.
The proposed approach to identify missing subtitle speech blocks involves two steps: (1) identification of speech and non-speech duration using VAD and (2) improvement of these duration through removal of false positive cases using AC model. In this section, first, we describe the VAD and AC model architectures; second, the datasets used to train and validate them; third, comparison with their corresponding state-of-the-art models which justifies our architectural design choices; and fourth, the method for missing subtitle detection using these models.
To create the training set, we divide the videos into 800 milliseconds (ms) non-overlapping clips and label them into speech and non-speech using the timing information in the subtitles. This results in 1.1 M speech and non-speech clips respectively. Similarly, the validation set consists of 0.1 M speech and non-speech clips respectively. The test set consists of human validated 18k and 27k speech and non-speech clips respectively. It is curated from 33 movies which are not part of the training and validation sets (DEC-1100).
We compare the models against PANNs , ResNeXt , and GRU-based [32, 33] models. Comparison results for these methods can be found in the Table 3. We observe that CNNTD-large model results in the best AUC, average recall, and top3 accuracy among all the models. Hence, we use CNNTD-large model as our AC model to be used as a component of missing subtitle detector. In the following subsection, we describe the approach to detect missing subtitle speech blocks using the GRU based VAD and VGGish CNNTD-large AC model.
Our proposed method consists of 3 stages, as depicted in the Fig. 4. First, we obtain the timings of speech/non-speech segments or blocks from VAD and AC models independently. Second, we merge the two timings and remove the false positives of VAD. Finally, we compare the predicted timings with the timings in the subtitle file and identify the positions of missing speech in the file. We now describe the timing generation process using the two models.
Owing to lack of publicly available datasets on the problem, we use two proprietary datasets in our evaluations. First, we create a synthetic dataset of missing subtitles from 50 proprietary videos sampled from Amazon Originals. These videos consists of synced subtitles in English language. To create the dataset, we randomly remove 10% of the subtitle speech blocks and treat them as missing subtitle blocks. Second, we use dataset of 430 incorrectly synced DEC video-subtitle pairs that contains missing subtitle blocks obtained through our internal Language Quality Program. We used human validation to identify 354 missing speech blocks with time duration >500 ms.
We calculate the coverage metric across two terms: First, between the predicted speech blocks (hypothesis) with the missing speech blocks in the subtitle file (reference). We term predicted speech blocks with intersection t>800 ms with the reference missing speech blocks as correctly predicted missing speech blocks (Fig. 6a,b). On the other hand, incorrectly predicted speech blocks have a intersection t>800 ms with non-speech blocks and are without intersection with the missing speech blocks in the subtitle file (Fig. 6e). Second, for every correctly predicted missing speech block (hypothesis) we compute its intersection with neighboring non-speech blocks in the subtitle file (reference). The first value indicates the effectiveness of method to correctly predict the time duration of the missing subtitle blocks. The second value highlights the bleeding of predicted missing speech time duration into non-speech regions.
The figure depicting subtitle speech blocks in peach (overlaid on audio track) in the middle, predicted speech blocks in blue at the top and subtitle text at the bottom. The figure highlights several output cases of our algorithm: a the coverage of predicted speech blocks (blue) with the subtitle speech blocks (peach), b the subtitle non-speech block, missing subtitle speech block (in light gray) and predicted speech block that correctly predicts the missing subtitle location but overlaps with the non-speech segment as well, c our algorithm is unable to predict the missing speech block, d our algorithm makes a prediction with coverage
These metrics quantify the efficacy of the method in detecting missing speech blocks. First, we compute the FPR that quantifies the percentage of correctly predicted missing speech blocks that over-extends to non-speech blocks of the subtitle file. The FPR is computed in two steps: first, we identify the number of correctly predicted missing speech blocks that also intersects with the neighboring non-speech subtitle blocks, and, second, we take their ratio with the total number of non-speech subtitle blocks. Next, we compute the precision as the ratio of the number of correctly predicted missing speech blocks to the total number of predicted speech blocks. Finally, we compute the recall as the ratio of the number of correctly predicted speech blocks to the total number of missing subtitle blocks.
We proposed two automated language-agnostic methods for missing subtitle detection. We showed that a VAD can be suitably used for detecting audio segments having a missing subtitle blocks. Further, conjugating the VAD model with an AC model improves the detection by effectively reducing the false positive cases of VAD. We presented a performance comparison on two DEC missing subtitle blocks datasets and showed that our proposed method works significantly well for the task at hand. Our proposed method is language agnostic and achieves an true coverage of 75% on a human-annotated dataset and a configurable block-level precision of up to 0.85. The proposed approach can also be reasonably applied to other VAD methods proposed for various applications apart from missing subtitle detection. Since our method reduces the false-positives of the VAD model, it can be extended to other use-cases such as speech identification or subtitle drift detection to reduce the false-positive cases of the VAD model.
The Missing is a British anthology drama television series written by brothers Harry and Jack Williams. It was first broadcast in the UK on BBC One on 28 October 2014, and in the United States on Starz on 15 November 2014. The Missing is an international co-production between the BBC and Starz. The first eight-part series, about the search for a missing boy in France, was directed by Tom Shankland. It stars Tchéky Karyo as Julien Baptiste, the French detective who leads the case, with James Nesbitt and Frances O'Connor as the boy's parents.
The second eight-part series, about a missing girl in Germany, was directed by Ben Chanan. It was broadcast in the UK, on BBC One, from 12 October 2016 and in the United States, on Starz, on 12 February 2017. Tchéky Karyo returns as Julien Baptiste, with David Morrissey and Keeley Hawes as the girl's parents.
Tony Hughes, his wife Emily and their five-year-old son Oliver, are travelling from the United Kingdom to northern France for a holiday. It is the summer of 2006, during the FIFA World Cup. Soon after entering France, their car breaks down. They are forced to spend the night in the fictional small town of Châlons du Bois. That evening, Tony and Oliver visit a crowded outdoor bar, where a quarter-finals football match is being watched. Tony loses sight of his son, who goes missing. Businessman Ian Garrett offers a reward for information leading to Oliver's capture, but it later emerges that, on discovering that Garrett is a paedophile, Tony beat Garrett to death and concealed the evidence.
The story is paralleled by flashbacks to 2014 and is set near a British army garrison in Eckhausen, Germany. In 2014 police tell Sam and Gemma Webster, whose daughter Alice went missing in 2003, that Alice has reappeared and claims she had been held captive with Sophie Giroux, a French girl who disappeared about the same time. Retired French detective Julien Baptiste, who was in charge of the Giroux investigation, cannot resist becoming involved again and travels to Germany and Iraq to find answers.
Hi Tom. Once it happened to me that some captions had disappeared while others were still visible. The reason was merely that we had inadvertently changed the motion attributes (position) in the effects tab of the source panel, so the missing subtitles were now below the image's bottom line. Have you tried simply resetting the captions' position? 041b061a72