Skip to main content

RX 11 Manual

Text Navigation

Overview

Text Navigation converts speech into a transcription that is displayed above the spectrogram and shown in sync with the corresponding audio. Transcribed text is searchable and provides reference points for what is contained in the file. This eliminates the need to audition files in order to manually place markers.

Text Navigation Note

Text Navigation is designed to be an editing navigation tool and not a transcription service. It is optimized for American English. Accuracy may vary if there is noticeable background noise or a speaker has a non-American accent.

Workflow

To get started, drag or import audio into RX and click the Speech Recognition Word Lane button at the bottom left of the spectrogram.

Speech Recognition Button

Audio transcription begins immediately and runs in the background at approximately 7 - 9X real time. The transcription populates the tabs in the Word Lane above the spectrogram, indicating the transcription is in process.

Word Lane

The audio material has to fulfill the following requirements:

  • Audio must be dialogue or speech – the transcription of musical lyrics is not currently supported.

  • Audio files must be at least 10 seconds long. The speech recognition button is disabled for files shorter than 10 seconds.

  • Only American English is supported at this time

Clicking the iZotope logo opens a small window that shows the progress of the transcription. At the bottom of this window are buttons for pausing or canceling the transcription.

Transcription Progress

Once the transcription is finished, you can zoom in or out of the audio and the transcription expands or contracts accordingly.

  • Mac: Command+= to zoom in, Command+- to zoom out, or swipe up or down with two fingers on a trackpad

  • Windows: Ctrl+= to zoom in, Ctrl+- to zoom out

Click on a word tab to select the corresponding audio in the spectrogram. Drag the handles on either side of a tab to select surrounding words, a phrase, or a sentence.

When zoomed in close enough so there is a single word per tab, double-clicking on the word selects it and makes it editable. Type in a correction for a misspelled word or alter individual words to fit your editing needs.

  • When editing, typing multiple words in the tab does not split the tab up

  • Typing nothing in the tab will not delete the tab

  • Right-click the Word Lane to Rescan. (Rescanning will overwrite any corrections)

Multiple Speaker Detection

Built into Text Navigation is functionality that automatically detects when there is more than one speaker on a track and color codes the sections of speech associated with each speaker.

Multiple speaker detection runs after the text transcription pass has finished. Up to 8 speakers can be detected.

Each speaker receives a unique identifying color, which is indicated in the speaker pane and in the corresponding tabs of the transcription.

Speaker Pane

To select all instances of a speaker, click on the speaker name in the speaker pane. This makes it easy to target specific processing by speaker.

Double-click a speaker’s name to edit or change it to fit the particular needs of a project.­

Export Transcript

To access the Transcript Export menu, click the menu button at the top of the Text Navigation Pane or right-click in the Word Lane.

Transcript Export Menu
  • Copy transcript to clipboard: Copy the transcribed text and paste it into a word processing application.

  • Export transcript to file: Export the transcribed text as a .txt file.

  • Rescan speech to text: Transcribe your file again.