---
title: Recall.ai Meeting Bot — Transcript, Captions & Recording
slug: recall-ai-meeting-transcription
category: Integrations
summary: Send an AI notetaker bot into a Zoom/Meet/Teams call, then pull back the transcript, WebVTT captions, and the video recording — wired to a record in your app.
tags: [recall, transcription, webhooks, video, captions, meetings, hmac, integrations]
status: stable
visibility: public
source_project: crm.llamapress.ai
layers: [model, sql, controller, view]
---

# Recall.ai Meeting Bot — Transcript, Captions & Recording

> ⚠️ **Cookbook example — not live code.** Every code block below is an **example
> snippet**, **not part of the llamapress.ai codebase**, and **not running on this
> server**. This is a reference recipe for a **Leo instance (an AI coding agent) to
> implement in its own app** — read it to understand the pattern, then recreate it
> there. (An optional copy-ready install package lives in
> `app/views/cookbook/recall-ai-meeting-transcription/` — see its `INSTALL.md`.)

[Recall.ai](https://recall.ai) is a single API that sends a recording **bot** into a
Zoom / Google Meet / Microsoft Teams call. The bot joins like a participant, records
the meeting, and produces a **transcript** and a **video recording** you can pull back
into your app. This pattern wires that whole lifecycle to a record (here a
`Conversation`): paste a meeting URL → a bot joins → when the call ends you get the
transcript, downloadable WebVTT **captions**, and an mp4 you can stream or store.

> **When to use:** any app that needs meeting notes/recordings — CRM call logging,
> sales-call analysis, interview capture, support-call review.
> **When not to:** you only need a transcript of an *uploaded* file (use a plain
> speech-to-text API); you can't expose a public HTTPS webhook endpoint (you can still
> use the manual "pull" path below, but you lose the automatic save-on-finish).

---

## The 80/20 in one breath

1. **One service object** (`RecallAi`) wraps the REST API: `join_meeting`,
   `get_bot`, `get_transcript`, `get_recording_url`, plus webhook verification and
   transcript→VTT formatting. Everything else just calls into it.
2. **On record create**, if a `meeting_url` is present, call `join_meeting` and stash
   the returned bot id (`bot_id`) on the record. That's what dispatches the bot.
3. **Two ways to get results back:** (a) a **webhook** (`bot.status_change` → `"done"`)
   that auto-saves the transcript, or (b) a manual **"Pull transcription"** button that
   calls `get_transcript` / `get_recording_url` on demand. Build both — the webhook for
   hands-off capture, the button for retries and backfills.
4. **Store the artifacts:** transcript text + raw JSON on the record, captions as a
   WebVTT Active Storage attachment, and the recording either as a hot `video_url` or a
   downloaded mp4 attachment.
5. **Recording/transcript URLs are presigned and expire** — treat any saved
   `video_url` as perishable and add a "refresh" action that re-fetches it from the bot.

Two env vars: `RECALL_API_KEY` (API auth) and `RECALL_WEBHOOK_SECRET` (HMAC secret).

---

## Layer 1 — Model & SQL

Add `bot_id` + `meeting_url` (and a hot `video_url`) to whatever record represents the
meeting, and attach the artifacts via Active Storage.

```ruby
# db/migrate/XXXXXXXXXXXXXX_add_recall_fields_to_conversations.rb
class AddRecallFieldsToConversations < ActiveRecord::Migration[7.2]
  def change
    add_column :conversations, :bot_id,      :string  # Recall bot UUID — the join key for every API call
    add_column :conversations, :meeting_url, :string  # the Zoom/Meet/Teams URL the user pastes
    add_column :conversations, :video_url,   :string  # presigned recording URL (EXPIRES — see gotchas)
    # transcription / raw_transcript are :text columns holding the human-readable transcript
  end
end
```

```ruby
# app/models/conversation.rb
class Conversation < ApplicationRecord
  belongs_to :contact            # whatever the meeting is "about" — adapt to your schema
  belongs_to :project, optional: true

  has_one_attached :video_file        # the downloaded mp4 (durable copy of the recording)
  has_one_attached :vtt_file          # WebVTT captions for the <video> player
  has_one_attached :transcript_json   # raw Recall transcript JSON, kept for re-generation

  # A conversation is valid if it either already has a transcript, OR is "pending"
  # (a bot was dispatched / a meeting URL was given and results will arrive later).
  validates :transcription, presence: true,
            unless: -> { bot_id.present? || meeting_url.present? }
end
```

**Why three attachments + columns?** Each serves a different consumer: `transcription`
(text) is for humans and search, `transcript_json` is the lossless source for
re-generating captions later, `vtt_file` is what the `<video>` element loads, and
`video_file` is a durable copy so you don't depend on Recall's expiring URL.

---

## Layer 2 — The service object (the whole integration lives here)

This is the only file that knows Recall's API shape. Keep controllers thin and let
them call these methods.

```ruby
# app/services/recall_ai.rb
require 'net/http'
require 'json'
require 'uri'
require 'openssl'

class RecallAi
  # Region-specific base URL. Use the region your Recall workspace is in.
  # us-west-2 (default), us-east-1, eu-central-1, etc.
  RECALL_API_URL = "https://us-west-2.recall.ai/api/v1"

  def initialize(api_key: ENV["RECALL_API_KEY"], webhook_secret: ENV["RECALL_WEBHOOK_SECRET"])
    @api_key = api_key
    @webhook_secret = webhook_secret
  end

  # === Join Meeting === dispatches a bot into a live call; returns JSON incl. the bot "id".
  def join_meeting(meeting_url, bot_name: "Recall Bot", transcribe: true,
                   transcription_mode: "prioritize_low_latency")
    uri = URI("#{RECALL_API_URL}/bot")
    req = Net::HTTP::Post.new(uri, headers)

    payload = { meeting_url: meeting_url, bot_name: bot_name }

    # NEW API shape: recording_config.transcript.provider.recallai_streaming.
    # language_code is REQUIRED for low-latency mode.
    if transcribe
      payload[:recording_config] = {
        transcript: {
          provider: {
            recallai_streaming: {
              mode: transcription_mode,   # "prioritize_low_latency" | "prioritize_accuracy"
              language_code: "en"
            }
          }
        }
      }
    end

    req.body = payload.to_json
    res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(req) }
    raise "Recall.ai Connection Failed: #{res.code} #{res.body}" unless res.is_a?(Net::HTTPSuccess)

    data = JSON.parse(res.body)
    raise "Recall.ai Bot Error: #{data['status_changes']}" if data["status"] == "fatal"
    data
  end

  # === Get Bot === the canonical read; everything else digs into this payload.
  def get_bot(bot_id)
    uri = URI("#{RECALL_API_URL}/bot/#{bot_id}")
    req = Net::HTTP::Get.new(uri, headers)
    res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(req) }
    raise "Recall.ai Fetch Failed: #{res.code} #{res.body}" unless res.is_a?(Net::HTTPSuccess)
    JSON.parse(res.body)
  end

  # === Get Transcript === downloads the transcript JSON from the bot's presigned URL.
  def get_transcript(bot_id)
    bot = get_bot(bot_id)
    transcript_url = bot.dig("recordings", 0, "media_shortcuts", "transcript", "data", "download_url")
    raise "No transcript URL found for bot #{bot_id}" unless transcript_url

    uri = URI(transcript_url)
    res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(Net::HTTP::Get.new(uri)) }
    raise "Recall.ai Transcript Download Failed: #{res.code} #{res.body}" unless res.is_a?(Net::HTTPSuccess)
    JSON.parse(res.body)
  end

  # === Get Recording URL === presigned mp4 URL (EXPIRES — re-fetch when stale).
  def get_recording_url(bot_id)
    get_bot(bot_id).dig("recordings", 0, "media_shortcuts", "video_mixed", "data", "download_url")
  end

  # === Status helpers ===
  def current_status(bot_id)
    get_bot(bot_id)["status_changes"]&.last&.dig("code")
  end

  def recording_ready?(bot_id)
    bot = get_bot(bot_id)
    status    = bot["status_changes"]&.last&.dig("code")
    video_url = bot.dig("recordings", 0, "media_shortcuts", "video_mixed", "data", "download_url")
    status == "done" && video_url.present?
  end

  # === Verify Incoming Webhook === HMAC-SHA256 over "timestamp.body". Call from controller.
  def verify_webhook!(payload_body, headers)
    timestamp = headers['x-recall-signature-timestamp'] || headers['webhook-timestamp']
    signature = headers['x-recall-signature']           || headers['webhook-signature']
    raise "Missing Webhook Headers" unless timestamp && signature

    signed_content = "#{timestamp}.#{payload_body}"
    expected = OpenSSL::HMAC.hexdigest(OpenSSL::Digest.new('sha256'), @webhook_secret, signed_content)

    match = signature.split(' ').any? do |sig_part|
      version, hash = sig_part.split(',')
      version == 'v1' && Rack::Utils.secure_compare(hash, expected)
    end
    raise "Invalid Webhook Signature" unless match
    true
  end

  # === Generate WebVTT (captions) === turns the transcript JSON into a caption track.
  # Breaks cues on: speaker change, silence gap > 0.5s, terminal punctuation, or ~75 chars.
  def generate_vtt(transcript_items)
    return "WEBVTT\n\n" if transcript_items.nil? || transcript_items.empty?
    vtt = "WEBVTT\n\n"

    transcript_items.each do |item|
      speaker = item.dig("participant", "name") || "Unknown"
      words = item["words"] || []
      next if words.empty?

      current, cue_start = [], nil
      words.each_with_index do |w, i|
        start_rel = w.dig("start_timestamp", "relative")
        end_rel   = w.dig("end_timestamp", "relative")
        cue_start ||= start_rel
        current << w["text"]

        nxt = words[i + 1]
        last = nxt.nil?
        pause = nxt && (nxt.dig("start_timestamp", "relative") - end_rel > 0.5)
        punct = w["text"].match?(/[.!?]$/)
        long  = current.join(" ").length > 75

        if last || pause || punct || long
          vtt += "#{format_vtt_time(cue_start)} --> #{format_vtt_time(end_rel)}\n"
          vtt += "<v #{speaker}>#{current.join(' ')}\n\n"
          current, cue_start = [], nil
        end
      end
    end
    vtt
  end

  def format_vtt_time(seconds)
    ms = (seconds * 1000).to_i
    format("%02d:%02d:%02d.%03d", ms / 3_600_000, (ms / 60_000) % 60, (ms / 1000) % 60, ms % 1000)
  end

  # === Format transcript into readable "[MM:SS] Speaker: text" ===
  def format_transcript(transcript_items)
    return "[Empty transcript]" if transcript_items.nil? || transcript_items.empty?
    transcript_items.map do |item|
      if item["participant"]
        speaker = item.dig("participant", "name") || "Unknown"
        words   = item["words"] || []
        text    = words.map { |w| w["text"] }.join(" ")
        ts      = words.first&.dig("start_timestamp", "relative")
      else
        speaker = item["speaker"] || item["participant_name"] || "Unknown"
        text    = item["text"] || item["words"]&.map { |w| w["text"] }&.join(" ") || ""
        ts      = item["start_time"] || item["timestamp"]
      end
      ts.present? ? "[#{format_time(ts)}] #{speaker}: #{text}" : "#{speaker}: #{text}"
    end.join("\n\n")
  end

  def format_time(seconds)
    return seconds.to_s unless seconds.is_a?(Numeric)
    s = seconds.to_i
    s >= 3600 ? format("%02d:%02d:%02d", s / 3600, (s % 3600) / 60, s % 60) : format("%02d:%02d", s / 60, s % 60)
  end

  private

  def headers
    # NOTE: Recall uses "Token <key>", NOT "Bearer <key>".
    { "Authorization" => "Token #{@api_key}", "Content-Type" => "application/json", "Accept" => "application/json" }
  end
end
```

---

## Layer 3 — Controller (dispatch the bot + pull/attach results)

```ruby
# config/routes.rb
resources :conversations do
  member do
    post :pull_transcription   # on-demand: fetch transcript + captions + recording URL
    post :refresh_recording    # re-fetch the expiring presigned video URL
    post :attach_recording     # download the mp4 into Active Storage (durable copy)
    get  :video                # dedicated captioned player page
  end
end

# Recall.ai webhook (public — no CSRF, verified by HMAC instead)
post "/webhooks/recall", to: "webhooks#recall"
```

```ruby
# app/controllers/conversations_controller.rb
class ConversationsController < ApplicationController
  require 'open-uri'
  require 'stringio'
  before_action :set_conversation,
                only: %i[show pull_transcription refresh_recording attach_recording video update destroy]

  # Dispatch the bot the moment a conversation with a meeting URL is created.
  def create
    @conversation = Conversation.new(conversation_params)

    if @conversation.meeting_url.present?
      begin
        res = RecallAi.new.join_meeting(@conversation.meeting_url, bot_name: "🦙 Notetaker")
        @conversation.bot_id = res["id"]   # stash the bot id — it's the key to everything later
      rescue => e
        @conversation.errors.add(:meeting_url, "could not trigger Recall bot: #{e.message}")
        return render :new, status: :unprocessable_entity
      end
    end

    if @conversation.save
      redirect_to @conversation, notice: "Conversation created — the bot is joining the meeting."
    else
      render :new, status: :unprocessable_entity
    end
  end

  # On-demand pull: transcript text + raw JSON + WebVTT captions + recording URL.
  def pull_transcription
    return redirect_to(@conversation, alert: "No Bot ID for this conversation.") unless @conversation.bot_id.present?

    recall = RecallAi.new
    transcript_data = recall.get_transcript(@conversation.bot_id)
    items = transcript_data.is_a?(Array) ? transcript_data : (transcript_data["results"] || [])

    @conversation.update!(
      transcription:  recall.format_transcript(items),
      raw_transcript: recall.format_transcript(items),
      video_url:      (recall.get_recording_url(@conversation.bot_id) rescue nil)
    )

    @conversation.vtt_file.attach(
      io: StringIO.new(recall.generate_vtt(items)),
      filename: "captions_#{@conversation.id}.vtt", content_type: "text/vtt"
    )
    @conversation.transcript_json.attach(
      io: StringIO.new(transcript_data.to_json),
      filename: "transcript_#{@conversation.id}.json", content_type: "application/json"
    )

    redirect_to @conversation, notice: "Transcript, captions, and recording pulled from Recall."
  rescue => e
    redirect_to @conversation, alert: "Failed to pull transcription: #{e.message}"
  end

  # Re-fetch the expiring presigned recording URL.
  def refresh_recording
    if fetch_fresh_video_url
      redirect_to @conversation, notice: "Recording URL refreshed."
    else
      redirect_to @conversation, alert: "Recording is not ready yet."
    end
  rescue => e
    redirect_to @conversation, alert: "Failed to refresh recording: #{e.message}"
  end

  # Download the mp4 into Active Storage. Presigned URLs 403 when expired — refresh & retry once.
  def attach_recording
    return redirect_to(@conversation, alert: "No recording URL to attach.") unless @conversation.video_url.present?
    perform_attachment(@conversation.video_url)
    redirect_to @conversation, notice: "Recording downloaded and attached."
  rescue OpenURI::HTTPError => e
    if e.io.status[0] == "403" && (new_url = fetch_fresh_video_url)
      perform_attachment(new_url)
      redirect_to @conversation, notice: "Link was expired (403) — refreshed and attached."
    else
      redirect_to @conversation, alert: "Failed to attach recording: #{e.message}"
    end
  rescue => e
    redirect_to @conversation, alert: "Failed to attach recording: #{e.message}"
  end

  def video; end   # renders the captioned player (Layer 4 view)

  private

  def set_conversation
    @conversation = Conversation.find(params[:id])
  end

  def conversation_params
    params.require(:conversation).permit(:transcription, :raw_transcript, :meeting_url,
                                         :video_url, :video_file, :contact_id, :project_id, tag_ids: [])
  end

  def fetch_fresh_video_url
    return nil unless @conversation.bot_id.present?
    url = RecallAi.new.get_recording_url(@conversation.bot_id)
    url && @conversation.update!(video_url: url) && url
  end

  def perform_attachment(url)
    @conversation.video_file.attach(
      io: URI.open(url), filename: "recording_#{@conversation.id}.mp4", content_type: "video/mp4"
    )
  end
end
```

```ruby
# app/controllers/webhooks_controller.rb
class WebhooksController < ApplicationController
  skip_before_action :verify_authenticity_token   # external caller — verified by HMAC, not CSRF

  def recall
    payload_body = request.body.read   # read the body EXACTLY ONCE (it's a stream)

    begin
      RecallAi.new.verify_webhook!(payload_body, request.headers)
    rescue => e
      Rails.logger.error("Recall webhook verification failed: #{e.message}")
      return render json: { error: "Unverified" }, status: :unauthorized
    end

    event = JSON.parse(payload_body)
    handle_status_change(event["data"]) if event["event"] == "bot.status_change"
    head :ok
  end

  private

  def handle_status_change(data)
    bot_id = data["bot_id"] || data.dig("bot", "id")
    code   = data.dig("status", "code") || data["status_code"]
    return unless code == "done" && bot_id.present?

    # Do the slow transcript fetch OUTSIDE the request so the webhook returns fast.
    # PREFER an ActiveJob over Thread.new (see gotchas) — e.g.:
    SaveRecallConversationJob.perform_later(bot_id)
  end
end
```

---

## Layer 4 — The captioned player view

The payoff: a native `<video>` element with a WebVTT `<track>` for synced captions,
falling back from a downloaded mp4 to the live presigned URL.

```erb
<%# app/views/conversations/video.html.erb %>
<div class="card bg-base-300 shadow-2xl overflow-hidden">
  <div class="aspect-video bg-black flex items-center justify-center">
    <% if @conversation.video_file.attached? %>
      <video controls autoplay class="w-full h-full object-contain">
        <source src="<%= url_for(@conversation.video_file) %>" type="<%= @conversation.video_file.content_type %>">
        <% if @conversation.vtt_file.attached? %>
          <track label="English" kind="subtitles" srclang="en" src="<%= url_for(@conversation.vtt_file) %>" default>
        <% end %>
        Your browser does not support the video tag.
      </video>
    <% elsif @conversation.video_url.present? %>
      <video controls autoplay class="w-full h-full object-contain">
        <source src="<%= @conversation.video_url %>" type="video/mp4">
        <% if @conversation.vtt_file.attached? %>
          <track label="English" kind="subtitles" srclang="en" src="<%= url_for(@conversation.vtt_file) %>" default>
        <% end %>
      </video>
    <% else %>
      <p class="text-base-content/50">No video found for this conversation.</p>
    <% end %>
  </div>
</div>
```

And the action buttons on the detail page (`show.html.erb`) — gated on what exists:

```erb
<%# app/views/conversations/show.html.erb (excerpt) %>
<% if @conversation.bot_id.present? %>
  <%= button_to "Pull transcription", pull_transcription_conversation_path(@conversation),
        method: :post, class: "btn btn-primary", data: { turbo: false } %>
<% end %>

<% if @conversation.video_url.present? || @conversation.video_file.attached? %>
  <%= link_to "Watch", video_conversation_path(@conversation), target: "_blank", class: "btn btn-accent" %>
  <% if @conversation.video_url.present? && !@conversation.video_file.attached? %>
    <%= button_to "Download & attach", attach_recording_conversation_path(@conversation), method: :post, class: "btn" %>
  <% end %>
  <%= button_to "Refresh link", refresh_recording_conversation_path(@conversation), method: :post, class: "btn btn-square" %>
<% end %>
```

The meeting-URL field that kicks the whole thing off (in `_form.html.erb`):

```erb
<%# app/views/conversations/_form.html.erb (excerpt) %>
<div class="my-5">
  <%= form.label :meeting_url, "Recall AI Meeting URL" %>
  <%= form.text_field :meeting_url, class: "block w-full rounded-md border px-3 py-2 mt-2" %>
  <p class="text-sm text-gray-500 mt-1 italic">
    Enter a Zoom, Google Meet, or Microsoft Teams URL to invite the Recall AI bot.
  </p>
</div>
```

---

## Gotchas (the hard-won stuff)

- **Auth header is `Token <key>`, not `Bearer`.** A `Bearer` prefix silently 401s.
- **Region-specific base URL.** `https://us-west-2.recall.ai/api/v1` is one region.
  EU workspaces use `eu-central-1`, etc. Wrong region = auth/404 confusion. Make it an
  env var if you serve multiple regions.
- **Recording & transcript URLs are presigned and EXPIRE.** A `video_url` you saved an
  hour ago will `403`. That's why `attach_recording` catches `OpenURI::HTTPError`,
  checks for `403`, calls `fetch_fresh_video_url` (re-reads the bot), and retries once.
  Always re-fetch from `get_bot` rather than trusting a stored URL — or download the mp4
  into Active Storage immediately for a durable copy.
- **Read the webhook body exactly once.** `request.body.read` consumes the stream. Read
  it into a variable, verify the HMAC against *that exact string*, then `JSON.parse`
  the same variable. Parsing first and re-serializing will change bytes and break the
  signature.
- **HMAC is over `"#{timestamp}.#{body}"`**, SHA-256, hex digest, compared with
  `Rack::Utils.secure_compare` (constant-time — don't use `==`). The signature header
  is space-separated `v1,<hash>` parts; check the `v1` version.
- **`skip_before_action :verify_authenticity_token`** on the webhook controller — it's
  an external POST with no CSRF token. The HMAC check *is* the authentication.
- **New transcript API shape.** Enable transcription via
  `recording_config.transcript.provider.recallai_streaming` (the current format), and
  include `language_code` — it's **required** for `prioritize_low_latency` mode.
- **Deep-dig the bot payload, defensively.** Results live at
  `recordings[0].media_shortcuts.{transcript,video_mixed}.data.download_url`. Before the
  meeting ends these are `nil` — guard every dig and surface "not ready yet" instead of
  crashing.
- **Don't block the webhook on the transcript fetch.** Downloading + formatting takes
  seconds; the webhook must return `200` fast or Recall retries. The source app used
  `Thread.new`, but **prefer an ActiveJob** (`perform_later`) — a raw thread is lost on
  deploy/restart and has no retries, so a transcript can silently never save. (The
  manual "Pull transcription" button is your backstop when a webhook is missed.)
- **Two capture paths, on purpose.** The webhook auto-saves on `"done"`; the button
  re-pulls anytime. Meetings drop, bots fail to join, webhooks get missed — the manual
  pull is what makes the feature reliable in practice.
- **`bot_id` is the join key for everything.** Persist it on create. Without it you
  can't pull a transcript, refresh a URL, or poll status.

---

## Files this pattern touches

```
db/migrate/XXXX_add_recall_fields_to_conversations.rb   # bot_id, meeting_url, video_url
app/models/conversation.rb                              # attachments + pending-validity rule
app/services/recall_ai.rb                               # the entire API integration
app/controllers/conversations_controller.rb            # create(join) + pull/refresh/attach/video
app/controllers/webhooks_controller.rb                 # HMAC-verified bot.status_change webhook
app/jobs/save_recall_conversation_job.rb               # (recommended) background transcript save
config/routes.rb                                        # member routes + POST /webhooks/recall
app/views/conversations/_form.html.erb                 # meeting_url field
app/views/conversations/show.html.erb                  # pull/watch/attach/refresh buttons
app/views/conversations/video.html.erb                 # captioned <video> player
```

## How to adapt to your schema

1. **Swap the host record.** `Conversation` is just "the thing a meeting is about."
   Rename to `Meeting`, `Call`, `Interview` — keep the four fields (`bot_id`,
   `meeting_url`, `video_url`, a transcript text column) and the three attachments.
2. **Set your region & secrets.** `RECALL_API_KEY`, `RECALL_WEBHOOK_SECRET`, and the
   `RECALL_API_URL` region. Register the webhook URL (`/webhooks/recall`) and the
   `bot.status_change` event in the Recall dashboard.
3. **Pick your background runner.** Replace the inline `Thread.new`/`perform_later`
   stub with a real job (`SaveRecallConversationJob` that calls `get_transcript` +
   formats + attaches). Reuse the exact logic from `pull_transcription`.
4. **Trim what you don't need.** If you never want a durable mp4, drop
   `attach_recording`/`video_file` and just stream `video_url` (but then refresh it on
   every view, since it expires). If you don't need captions, drop `generate_vtt` and
   the `<track>`. The minimum viable version is: `join_meeting` on create + the webhook
   (or button) calling `get_transcript` + `format_transcript`.
5. **Customize the bot.** `bot_name` is what shows in the participant list; pass
   `transcription_mode: "prioritize_accuracy"` when you care about quality over latency.
```
