Prerequisites

Downloading and Installing Audacity

Audacity is a free open source software available at https://www.audacityteam.org. It is required to open the projects that contain the enhanced audio fragments along with transcript annotations. These provide means to select and play the corresponding fragment in the file. Another benefit is that the original time-line is preserved in relation to the uncut source recording. The time-line is visible atop the tracks. This would allow anyone to locate the section in the original uncut file and perform their own audio analysis either for a better enhancement or for verification of the words and fragments spoken.

Here is the download link for the Windows® version: https://www.audacityteam.org/download/windows. Just pick the first file in the list and run it after it has been downloaded. It is a standard software setup where you can just hit the ‘Next’ button until it turns into a ‘Finish’ button to complete the installation. Only if you want to install it somewhere else over the default location, your intervention is required.

Playback Instructions

After downloading an Audacity project, place it into a new folder and extract the .zip archive. After extracting it you will find an .aup file and a folder named ‘<projectname>_data’. This folder contains the audio bits for the project and belongs to the project. You can open the Audacity project by double-clicking the .aup project file or you can open it with Audacity itself.

  • To scroll horizontally through the track:
    Hover over the track, then hold down the SHIFT key and use the mouse scroll wheel to get into view the part of the track you wish to examine.

  • To zoom in or out:
    Hover over the track, then hold down the CTRL key and use the mouse scroll wheel to zoom in or out.

  • To play a fragment:
    Left-click in the purple label in the label track of the fragment you wish to listen to. Then right-click in the audio track. Now the corresponding section within the audio track becomes selected. Press the space-bar to play the selected audio.

Listening Tips for the Low-volume Fragments

You may have to replay fragments multiple times to ‘tune’ your ears to it – perception will improve after doing so after getting a sense of the sounds heard. You will have to mentally filter out the noise and ‘lift’ the voices from it. Some fragments are clearer than others are. Likewise, it is recommended to play the fragments over a decent stereo set rather than a head-set. Your hearing will be able to better distill the speech when listening to the fragments from a normal listening distance from the speakers.

Analysis Methods Used 

To obtain the actual words spoken for those cases where words are not clearly audible, a number of analysis methods have been used. All of these together allow one to come to the actual words spoken, even those words that are close to, or within the noise floor. After applying each method a number of times, it is then possible to determine the words spoken. Analysis is performed on a word by word basis to make use of the brain’s short-term sonic memory, part of the brain’s hearing center. The methods used:

  • Syllable counting.
    This particular analysis is about counting the syllables used within a word. It is used together with the other methods, namely vowel analysis and dictionary analysis.

  • Vowel extraction.
    Vowels are comprised of lower periodic frequencies. This makes these types of sounds the easiest part of a word to hear. The number of vowels one can extract are synchronous with the syllable count most of the time. This may not always be the case.

  • Dictionary matching.
    Once the syllable count and the vowels are obtained, one can use these against a language dictionary to obtain a sub-set of words that would match against the syllables, vowels and their positions. This language dictionary generally is your own vocabulary at hand, so having a rich vocabulary is useful. This relates directly to having knowledge of cases, matters and so on. Once this set of words has been obtained, the next method is employed.

  • Pattern matching.
    Pattern matching entails attempting to listen to whether what is heard matches up to a given word. This exercise uses the capability of your brain that compares sounds. With this method, it may be possible that something sounds like a hit. However, that is because most of the word (syllables and present noise) make it a close match. Test all the words from your dictionary match. The actual word all in a sudden will sound much clearer than the other, closely matching words. This technique is an opposite to the concept of ‘noise cancelling’ in that it actually boosts the right word, lifting it from the noise floor. It’s this process that helps getting the non-vowels from the noise floor, like ‘s’, ‘t’, ‘f’, ‘v’ etc. It may be possible that one ends up with more matching words that seem to lift from the noise. The next analysis method can resolve that issue.

  • Grammar.
    Together with the dictionary matching and pattern matching, sentence buildup, or grammar should hint at what a word is supposed to be. This lingual aspect, together with the dictionary matching further reduces the words possible.

  • Context flow.
    When multiple words and or sentence parts have been distilled, at various points across the timeline, context flow helps again to further narrow down possible words. What is meant by this, is that people conversing usually ‘flow’ through subjects, where people make remarks and others respond to made remarks. It can also be coined conversation flow. However, context flow narrows down to subjects talked about, as opposed to generic conversation flow.

  • Interpolation.
    Interpolation is what the brain performs on repeatedly listening to a fragment. What happens is that your brain will start to filter out the noise (you will still hear the noise, but it will be easier to focus on the word with each pass) and unwanted sounds. This is done automatically as long as one is focusing on the fragment using any of the methods described above. So listen to fragments repeatedly, repeating any of the methods as often as is necessary to come to a final outcome.