Transcription Time

Posted onby

Disclaimer: This post may contain affiliate links. Please read my disclosure for more info.

Transcription is a popular work from home job. As a matter of fact, there are a ton of companies that are hiring…even complete beginners.

And the pay sounds great for such easy work.

The video conferencing service has announced Otter Live Notes, which means that calls can now be transcribed in real time. Participants can now open a live transcript directly from Zoom to follow.

One of our most popular value-added benefits is the integration of time stamping, sometimes referred to as time coding. This service is the assimilation of time markers throughout a transcription. Time stamps can be inserted at regular intervals (for example, every 60 seconds) or can even be inserted each time a new individual begins speaking. Transcription may take a while depending on your internet speed, up to about the length of the audio file. Be sure to keep the Transcribe pane open while the transcription is happening, but feel free to do other work or switch browser tabs or applications and come back later.

Or does it ?

Do you wonder if you can really make money transcribing? And is it even worth it ?

Let’s find out.

How Much Does Transcription Pay?

The short answer: A beginner transcriptionist can realistically make anywhere from $3.00 an hour to $11.00 an hour.

An experienced transcriptionist can make considerably more – even over $20.00 an hour.

It’s possible to make money from home as a transcriptionist, but no matter how much experience you have, what you earn depends on these two factors:

#1 How much the pay is per audio minute


#2 How long it takes to transcribe that minute.

Let me explain.

Every transcription company I have come across pays by the audio minute (or rounded up by the audio hour) – not the working hour.

What they pay per audio minute is anywhere from a few pennies to a few dollars.

And it can take several minutes to transcribe one audio minute.

So what you earn isn’t what you think you’ll earn.

Here’s what I mean:

Every blog that spews information about entry-level online transcription jobs tells us we can earn at least $12.00 an hour. Some even tell us $24.00 an hour.

So let’s say an audio file is 30 minutes long. That’s 30 minutes of someone talking.

And the pay is .40 cents an audio minute. That’s potentially $12.00 for this 30 minute file (30 minutes times .40 cents = $12.00)

So you might think to yourself “if I do one file an hour, that’s $12.00 an hour. And…if I do two 30-minute files in an hour, that’s $24.00 an hour !”

WRONG. Here’s why:

It will take you a lot longer than 30 minutes to transcribe that file.


We now have to take a look at what an audio minute is and how it relates to a working minute.

What is an Audio Minute ?

An audio minute is one minute of someone talking. Rounded up to an audio hour, it’s one hour of someone talking.

Transcription companies pay you for each transcribed audio minute or audio hour.

And, to be brutally honest, it can take several minutes to work a one-minute audio file.

Which brings us to this…

What Is A Working Minute ?

A working minute is one minute of actually listening to an audio file and transcribing it. A working hour is an hour of doing that same work.

There are several factors that determine the difference between audio minute and working minute, some are:

  • sound quality of the file
  • accent and dialect of the speaker
  • not understanding words
  • having to play back the file
  • proofreading your transcription

All these things have an effect on the length of time it takes you to transcribe an audio file.

Which brings us to this…how long it takes to transcribe a file.

How Long Does It Take To Transcribe 1 Hour Of Audio ?

For an experienced transcriber, it can take up to 4 hours to transcribe one hour of audio.

For a beginner, it can take up to 6 hours to transcribe that same file.

We’ve established that transcribing an hour of audio is much different than listening to an hour of audio, which then changes how much you think you will earn.

This is how I figured it out:

I estimated the time it takes to transcribe one audio hour based on my personal experience, with my typing speed of about 60 words per minute. Adding in a few minutes for proofreading per audio file, I figure the following:

How long it takes to transcribe an hour of audio: EXPERIENCED TRANSCRIBERS

  • It can take 1 hour to complete 15 audio minutes
  • 2 hours for 30 audio minutes
  • 3 hours for 45 audio minutes
  • 4 hours for 1 audio hour

How long it takes to transcribe an hour of audio: BRAND NEW, INEXPERIENCED TRANSCRIBERS

  • It can take 1.5 hour to complete 15 audio minutes
  • 3 hours for 30 audio minutes
  • 4.5 hours for 45 audio minutes
  • 6 hours for 1 audio hour


How Much Does An Experienced Transcriptionist Make?

An experienced transcriptionist can expect to make $7.50 an hour to over $20.00 an hour, depending on If you work for an online company that produces a lot of transcriptions or if you freelance, charging rates you think you’re worth.

Let’s take our figures from above.

If a company pays $0.50 cents per audio minute, that’s $30.00 for an audio hour (.50 cents x 60 minutes in an hour = $30.00)

But…if it takes 4 hours to complete an audio hour, that’s $7.50 per hour pay rate ($30 for the audio hour / 4 hours to complete).

Now, I know that pay is low. Really low, especially for an experienced transcriptionist.

So what can you do ?

– Work for a company that pays much more than .50 cents an audio minute


– Do independent transcription work for businesses and bloggers

THEN you can make at least $20.00 an hour. In fact, here’s my post that shows you how to become a transcriptionist and start making big bucks.

How Much Does An Entry Level Transcriptionist Make?

An entry-level transcriptionist can expect to earn around $3.00 up to $11.00 an hour, with an average pay or $5.00 an hour. This is due to the fact it takes longer to transcribe an audio file than someone with more experience as well as companies hiring beginners tend to pay low rates.

Again looking at our figures from above.

And using the same scenario, with a company paying $0.50 cents per audio minute equaling $30.00 for an audio hour (.50 cents x 60 minutes in an hour = $30.00).

If it takes 6 hours to complete an audio hour, that is $5.00 per hour pay rate ($30 for the audio hour / 6 hours to complete).

OK, this pay is pretty bad, even for a beginner transcriptionist.


But you have to start somewhere.

So here’s what you can do:

First, get a job with GMR Transcription or GoTranscript (or one of these other beginner transcription companies)

Then, learn to become a better transcriptionist.

Is Doing Online Transcription Worth It ?

Yes, doing online transcription IS worth it because it CAN pay well.

Especially if you’re good at it.

And to be honest, this is the kind of job that can be done as a side gig or a full-time career – it depends on how much time you want to devote to it.

It is worth it if you like it because the entry-level jobs give you the experience you need to be a pro.

And being a pro brings a lot more money.

Which brings us to this questions:

How can you make good money doing transcription ?

Here’s How To Make Good Money As A Transcriptionist

There are many opportunities to do transcription work, with the entry-level transcriptionist salary being anywhere from $3.00 and hour to $9.00 an hour and experienced transcriptionists averaging $20.00 an hour with the professional earning up to $40.00 an hour.

For example, take a look at these pro transcriptionists:

Bloggers in particular hire experienced transcribers, the ones who make $40.00 an hour. As an example, this SEO blog has a lot of its podcasts transcribed:

You can easily become a highly paid transcriptionist by first starting out with one of these companies then acquiring the proper training to become a pro. as I outline in this post.

Final Words

I hope you found this post interesting, and that it answered your questions on transcription salaries and whether or not it’s something you want to do.

Please leave a comment below if you found this helpful. Or if you decided to pursue transcription as a career.

Either way, I’d love to hear from you.


Conversation Transcription is a speech-to-text solution that combines speech recognition, speaker identification, and sentence attribution to each speaker (also known as diarization) to provide real-time and/or asynchronous transcription of any conversation. Conversation Transcription distinguishes speakers in a conversation to determine who said what and when, and makes it easy for developers to add speech-to-text to their applications that perform multi-speaker diarization.

Key features

  • Timestamps - each speaker utterance has a timestamp, so that you can easily find when a phrase was said.
  • Readable transcripts - transcripts have formatting and punctuation added automatically to ensure the text closely matches what was being said.
  • User profiles - user profiles are generated by collecting user voice samples and sending them to signature generation.
  • Speaker identification - speakers are identified using user profiles and a speaker identifier is assigned to each.
  • Multi-speaker diarization - determine who said what by synthesizing the audio stream with each speaker identifier.
  • Real-time transcription – provide live transcripts of who is saying what and when while the conversation is happening.
  • asynchronous transcription – provide transcripts with higher accuracy by using a multichannel audio stream.


Although Conversation Transcription does not put a limit on the number of speakers in the room, it is optimized for 2-10 speakers per session.

Get started

See the real-time conversation transcription quickstart to get started.

Use cases

To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it is important to have transcription in real time. Conversation Transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting without a delay.

Improved efficiency

Meeting participants can focus on the meeting and leave note-taking to Conversation Transcription. Participants can actively engage in the meeting and quickly follow up on next steps, using the transcript instead of taking notes and potentially missing something during the meeting.

How it works

This is a high-level overview of how Conversation Transcription works.

Expected inputs

  • Multi-channel audio stream – For specification and design details, see Microsoft Speech Device SDK Microphone. To learn more or purchase a development kit, see Get Microsoft Speech Device SDK.
  • User voice samples – Conversation Transcription needs user profiles in advance of the conversation. You will need to collect audio recordings from each user, then send the recordings to the Signature Generation Service to validate the audio and generate user profiles.


User voice samples are optional. Without this input, the transcription will show different speakers, but shown as 'Speaker1', 'Speaker2', etc. instead of recognizing as pre-enrolled specific speaker names.

Real-time vs. asynchronous

Conversation Transcription offers three transcription modes:


Audio data is processed live to return speaker identifier + transcript. Select this mode if your transcription solution requirement is to provide conversation participants a live transcript view of their ongoing conversation. For example, building an application to make meetings more accessible the deaf and hard of hearing participants is an ideal use case for real-time transcription.


Audio data is batch processed to return speaker identifier and transcript. Select this mode if your transcription solution requirement is to provide higher accuracy without live transcript view. For example, if you want to build an application to allow meeting participants to easily catch up on missed meetings, then use the asynchronous transcription mode to get high-accuracy transcription results.

Real-time plus asynchronous

Audio data is processed live to return speaker identifier + transcript, and, in addition, a request is created to also get a high-accuracy transcript through asynchronous processing. Select this mode if your application has a need for real-time transcription but also requires a higher accuracy transcript for use after the conversation or meeting occurred.

Language support

Transcription Time Meaning In Hindi

Currently, Conversation Transcription supports all speech-to-text languages in the following regions: centralus, eastasia, eastus, westeurope. If you require additional locale support, contact the Conversation Transcription Feature Crew.

Transcription Time Per Audio Hour

Next steps