AI - Ops · Clinical Technology · 2025

Automating Clinical Workflows: AI Video Analysis for Declutter Health's Therapy Platform

AI that helps psychiatrists do what only humans can — by handling everything else.

Declutter Health
AI - Ops
↓ Read the story

Mental health technology built
for the therapists who use it every session.

Declutter Health is an innovative mental health platform connecting patients with psychiatrists through structured digital therapy sessions. In a discipline where the quality of clinical documentation directly affects the quality of care, the manual work of post-session documentation was creating a hidden tax on every practitioner — taking time from therapy and reducing the depth of clinical insight available for treatment decisions.

We built an AI-powered video analysis system that transforms therapy session recordings into comprehensive clinical reports automatically — generating medical-grade summaries, emotional timelines, speech pattern analysis, and accurate transcriptions in minutes. The result is psychiatrists who can see more patients, document more accurately, and track patient progress with quantitative precision that manual note-taking could never provide.

Mental HealthAIAmazon BedrockMediaPipeClinical AIHIPAAIndia
Engagement at a glance
AI StackAmazon Bedrock · Google MediaPipe · Amazon Transcribe
Processing Time2-3 Minutes Per Session
RegionIndia
FocusClinical Documentation Automation & Quantitative Insight
Year2025

Built for the therapist losing billable hours to note-taking, the clinical lead needing outcome data, and the founder bringing responsible AI to a sensitive domain.

Three people experienced the limitations of manual clinical documentation differently — a psychiatrist losing time, a patient losing depth of care, and a founder facing a scalability ceiling.

🩺
Psychiatrist · Declutter Health

Psychiatrists spent 20-30 minutes after every session writing notes from memory before they could see the next patient. Exact emotional moments, specific phrases, and subtle speech changes were consistently lost in reconstruction — and documentation fatigue accumulated across the day.

📋 20-30 minutes of manual notes after every session
💭
Therapy patient · Mumbai

Therapy patients received subjective impressions of progress rather than quantitative data showing how their emotional patterns had actually changed. The note-taking demands on psychiatrists also diverted attention away from the patient during the session itself.

📊 No quantitative tracking of emotional progress
🚀
Co-Founder · Declutter Health

Therapy platform founders faced a hard throughput ceiling: 30 minutes of post-session documentation per appointment meant that growing patient volume came directly at the cost of practitioner wellbeing. Scaling the platform without solving documentation was not viable.

📈 Documentation overhead limiting patient throughput

Declutter's therapists spent 20–30 minutes on post-session documentation that AI could process in 2.

The gap between what happened in a therapy session and what ended up in clinical notes was not a minor accuracy problem — it was a systematic loss of clinical data. Dr. Meera couldn't simultaneously conduct therapy and capture micro-expressions, speech rhythm changes, precise timestamps of emotional shifts, or exact patient phrasing. The notes she wrote were reconstructions, not recordings — and reconstructions lose the detail that drives treatment accuracy.

For Nandita's platform, this wasn't just a quality issue — it was a scalability constraint. Every hour of therapy required additional documentation time that the platform's growth model couldn't sustain indefinitely. The solution needed to eliminate documentation time entirely while improving documentation quality — capturing what manual note-taking couldn't, automatically.

The most important part of therapy happens in real time. The documentation has to happen without stealing any of it.

Complexity factors at the start
Post-session documentation time20-30 minutes per session
Micro-expression and emotional data captureImpossible manually
Speech pattern and prosody trackingSubjective only
Patient progress quantificationAbsent
Session throughput ceilingDocumentation-constrained

Amazon Bedrock for clinical summaries, MediaPipe for facial tracking, Transcribe for speaker diarization — HIPAA-compliant throughout.

Every component was built around a single principle: capture what Dr. Meera cannot capture alone, so she can focus entirely on the patient in front of her.

📄

Automated Clinical Summary Generation

Built an AI system using Amazon Bedrock that analyzes session videos and generates 'Client Summary' and 'Framework Summary' reports in medical-grade format — structured around psychiatric frameworks including CBT and DBT.

Human-Centricity
😊

Real-Time Emotion & Face Tracking

Deployed Google MediaPipe to track 478 facial landmarks throughout each session — Dr. Meera gets timestamped emotional data showing exactly when Rohan showed stress, engagement, or emotional shifts.

Human-Centricity
🎙️

Speaker-Aware Automatic Transcription

Implemented Amazon Transcribe with speaker diarization to create accurate, labeled transcripts — doctor and patient dialogue identified separately for easy review and quote extraction.

Sustainability
📊

Speech Pattern & Prosody Analysis

Integrated prosody analysis measuring speech velocity, volume variations, and rhythm patterns — quantifying what Dr. Meera could previously only describe subjectively, giving Rohan trackable progress data.

Human-Centricity
🔒

HIPAA-Compliant Secure Portal

Built a secure, encrypted web portal for session upload and report delivery — all data processed in compliance with healthcare privacy regulations, with Word/PDF reports available within minutes of upload.

Resilience

Sub-3-Minute Processing Pipeline

Optimised the full analysis pipeline — transcription, emotion tracking, speech analysis, and report generation — to complete within 2-3 minutes of a 5-minute session upload.

Resilience

2–3 minute AI processing replacing 20–30 minute manual documentation, under 1% error rate, $0.15 per session.

2–3 min
Processing time for a full clinical report per session — down from 20-30 minutes manual
Dr. Meera sees her next patient with documentation already complete
<1%
Error rate in transcription, face detection, and clinical summary generation
Rohan's clinical record is accurate — not a best-effort reconstruction
$0.15
Cloud processing cost per session
Nandita's cost model supports scaling to thousands of sessions
Quantitative emotional data — timelines, charts, and speech graphs previously impossible
Progress tracking that was never available before is now the standard

What changed for the people
on both sides of the screen.

🩺

Practitioner Freedom

Dr. Meera conducts therapy with complete focus on the patient. The documentation happens automatically — more accurately than manual notes could ever be, and ready before she opens the next patient's file.

💭

Patient Understanding

Rohan's psychiatrist now has access to objective emotional data across sessions — showing not just what was said but how engagement, stress, and speech patterns changed over time. Treatment adjustments are evidence-based.

🚀

Platform Scale Unlocked

Nandita's documentation ceiling is gone. Psychiatrists can see more patients, the platform can onboard more practitioners, and the quality of clinical insight improves with every session — not despite scale, but because of it.

📋

Clinical Data Completeness

The data that was previously lost between session and note is now permanently captured. Micro-expressions, speech rhythms, emotional timestamps — details that used to disappear from memory now exist in every patient record.

Let's automate the work that pulls practitioners away from patients

AI that does the documentation
so clinicians can do the healing.

Clinical AI systems that capture what human note-taking cannot — giving practitioners more time, better data, and the ability to focus fully on the people who need them.