Welcome

Please enter your password to continue.

Help & Shortcuts

🚀 Quick Start

From a file: Click Select File, choose any audio or video file, then click Transcribe it.

From a recording: Click Record to capture from your microphone. When done, click Stop to preview, or Process to transcribe immediately.

Supported formats: MP3, MP4, M4A, WAV, WebM, AAC, OGG, FLAC, OPUS, AMR, AIFF, MOV, AVI, MKV.

🌐 Language Selection

Click the Language button to open the language picker:

Auto-detect — the engine figures out the language automatically (default).
EN — force English for faster, more accurate results on English audio.
SK — force Slovak.
Other… — type any ISO 639-1 language code (e.g. de, fr, ja) and click Apply.

The button label updates to show the active language so you always know what's set.

🎛️ Transcription Options

Tag Audio Events: Detects and labels non-speech sounds inline — e.g. [laughter], [applause], [music].

Diarize (Speaker ID): Identifies who is speaking and labels each segment (Speaker 1, Speaker 2, …). Useful for interviews or meetings.

Output Format:

TXT — plain text transcript.
SRT — subtitle file with timestamps, ready for video editors.

Raw: Returns the full JSON response from the transcription engine, including word-level timestamps. Gemini post-processing is disabled in this mode.

🤖 Gemini it

The purple Gemini it button (and the Gemini button during recording) transcribes your audio and immediately sends the result through your Gemini AI prompt for cleanup, formatting, or summarisation — all in one step.

The Gemini system prompt and per-language post-processing prompts can be customised in Settings → Post-Processing Prompts.

You need a Gemini API key configured in Settings for this to work.

🎙️ Recording Controls

While recording is active, three buttons replace the normal controls:

Stop — stops recording and lets you preview the audio before processing.
Process — stops recording and immediately sends it for transcription.
Gemini — stops recording and immediately transcribes + runs Gemini cleanup.

Below those, Pause / Resume lets you pause mid-recording, and Cancel discards the recording entirely.

✨ AI Post-Processing

After a transcription completes, click AI Post-Processing. This will:

Copy the transcript together with the configured prompt to your clipboard.
Open your selected AI app (Gemini or ChatGPT) in a new tab.
Paste directly to get a summary, cleaned-up text, or any other analysis.

Configure which AI app opens and customise the prompts (per language: EN / SK) in Settings → Post-Processing Prompts.

⚙️ Settings

API Keys: Add one or more ElevenLabs API keys. The app automatically picks the key with the most remaining quota for each job. You can view per-key quota usage here.

Gemini API Key: Required for the Gemini it feature and Gemini post-processing.

Auth Codes: Create named access codes that other users can use to log in via ?auth_code=… without knowing the main password. Each code shows its last-used time (displayed in your local timezone).

Post-Processing Prompts: Customise the prompt sent with the transcript for EN, SK, and Gemini AI modes.

Debug Logging: Prints detailed server-side logs — useful for troubleshooting failed transcriptions.

🤖 Automation & URL Tricks

Auto-record on load: Append ?rec_now to the URL — the app starts recording the moment the page opens.

Auto-login: Append ?auth_code=YOUR_CODE to log in automatically using an auth code.

Combine both: /?auth_code=YOUR_CODE&rec_now — logs in and starts recording instantly. Perfect for a one-tap recording shortcut on mobile.

💡 Tips & Tricks

Best accuracy: Clear audio with minimal background noise gives the best results. Use a headset or get close to the microphone when recording.

Forcing the language: If you know the language in advance, selecting it instead of Auto-detect is faster and often more accurate.

Dark / Light mode: Toggle with the moon/sun icon in the navigation bar. Your preference is saved across sessions.

Mobile home screen: Add this page to your Home Screen (iOS: Share → Add to Home Screen) for a full-screen, app-like experience with no browser chrome.