Data Collection Architecture
How data flows from the camera through calculations to the report.
Pipeline
Webkamera 1280x720 MediaPipe Face Landmarker 478 landmarks + 52 blendshapes detect*() funkce cur object (every frame) collectSnapshot() 1x/s KV Store (Cloudflare) Report
Files
camera.jsMediaPipe init, detection, calculations, data collection sensors.jsMicrophone, TTS, volume, device info engine.jsSession control, result assembly report.jsVisualization, charts, cards, export worker.jsCloudflare Worker — API, KV storage
Technology
Face detectionGoogle MediaPipe Face Landmarker (GPU) Hand detectionGoogle MediaPipe Hand Landmarker (GPU) Voice commentsWeb Speech API (SpeechRecognition) System voiceWeb Speech API (SpeechSynthesis) Runs where100% locally in browser, no data leaves the device without consent
How often things are measured and stored
Detection (value calculation)
requestAnimationFrame — runs as fast as the browser can handle. Typically 30–60 FPS (depends on GPU/CPU). Each frame updates all metrics in the object cur.
Storage
collectSnapshot()every 1000 ms (1x/s) — complete snapshot of all metrics collectBlendshapes()every 2000 ms (1x per 2s) — raw 52 blendshapes checkMultipleFaces()every 5000 ms (1x per 5s) — second MP instance, up to 3 faces facePhotosevent-triggered — at session_start, each question, each reveal, session_end
Data points per second
Each snapshot contains 82 data points: 52 blendshapes + 30 derived metrics (emotions, attention, gaze, fatigue, headpose, distance, blink, smile, iris, gestures, symmetry, ...). Configuration: CONFIG.SNAPSHOT_INTERVAL = 1000
— Biometric Metrics —
🎯 Attention — Gaze (Gaze Attention)
🎯 Gaze Attention Reliable
What it measures
Where the user is looking. 100% = looking straight ahead (at screen), 0% = looking markedly sideways or up. Measures gaze direction deviation from straight ahead.
Data source
Blendshapes (muscle values) from MediaPipe: eyeLookOutLeft, eyeLookOutRight, eyeLookUpLeft, eyeLookUpRight, eyeLookDownLeft, eyeLookDownRight. These values come directly from the ML model — no manual geometry.
Function
detectGazeAttention(blendshapes) v camera.js
Algorithm
// 1. Side gaze — looking sideways sideAvg = (eyeLookOutLeft + eyeLookOutRight) / 2 sidePenalty = min(1.0, sideAvg × 3) // 0.33+ = full penalty // 2. Upward gaze — looking up/away upAvg = (eyeLookUpLeft + eyeLookUpRight) / 2 upPenalty = min(1.0, upAvg × 4) // 0.25+ = full penalty // 3. Extreme down — looking at lap/phone (jen >0.7) downAvg = (eyeLookDownLeft + eyeLookDownRight) / 2 downPenalty = max(0, (downAvg - 0.7)) × 3 // penalty only above 0.7 // Total penalty (weighted sum) totalPenalty = min(1.0, side×0.4 + up×0.3 + down×0.3) gazeAttention = round((1 - totalPenalty) × 100)
Frequency
Calculation: every frame (30–60 FPS). Storage: 1x/s into collected.gazeAttention[] and into each snapshot.
Range
0–100%. Slight downward gaze (reading, lower screen) = still high value. Penalty only for extreme downward gaze (>0.7).
In report
Summary karta"Attention (gaze)" — average value Bio summaryRow with average and min–max range GrafSolid green curve in "Attention & Fatigue" chart Question/Video timeline🎯 gaze X% for each question/step Foto thumbnails🎯X% below each photo Charts summary tabulkaRow with avg/min/max
Why blendshapes? MediaPipe ML model extracts muscle values directly from the image. Unlike iris tracking (geometric calculation from landmarks), it does not depend on precise coordinates and works reliably even with a standard webcam.
👁 Iris Tracking
👁 Iris Tracking Experimental
What it measures
Iris position within eye apertures. Theoretically more precise than blendshapes, but requires a quality camera or dedicated eye-tracking hardware.
Data source
MediaPipe landmarks: duhovky lm[473] (leva) a lm[468] (prava), rohy oci lm[33], lm[133], lm[263], lm[362].
Function
detectAttention(landmarks) v camera.js
Algorithm
// Iris position ratio relative to eye width leW = abs(eyeOuter.x - eyeInner.x) // left eye width lPos = (irisLeft.x - eyeOuter.x) / leW // expected 0.0–1.0 rPos = (irisRight.x - eyeOuter.x) / reW dev = abs(((lPos + rPos) / 2) - 0.5) × 2 // deviation from center attention = round(max(0, min(100, (1 - dev) × 100)))
Known issue: With a standard webcam, lPos/rPos values come out as 2.5–3.7 (instead of 0.0–1.0). Reason: iris landmark coordinates (473/468) are in a different range than eye corner landmarks, likely due to camera mirror mode or coordinate normalization. Result: 54–94% of measurements = 0% attention. Therefore the metric is marked as experimental.
In report
Summary karta"Iris tracking (exp.)" — reduced opacity 0.6 GrafDashed green curve with opacity 0.3 Timeline👁 iris (exp.) X% after gaze value
Stored raw data
irisLeft a irisRight in each snapshot — for future analysis with better HW.
😊 Emotions (Emotion Detection)
😊 Emotion Detection Functional / bias
What it measures
Dominant emotion from facial expression. 5 categories: Happy, Surprised, Focused, Relaxed, Neutral. Each with confidence score 0–100%.
Function
detectEmotion(blendshapes) v camera.js
Algorithm (waterfall if/else)
smile = (mouthSmileLeft + mouthSmileRight) browUp = (browOuterUpLeft + browOuterUpRight) browDown = (browDownLeft + browDownRight) jawOpen = jawOpen if smile > 0.4Happy (conf: min(100, smile × 100)) if browUp > 0.3 + jawOpen > 0.2Surprised (conf: min(100, (browUp+jawOpen) × 50)) if browDown > 0.2Focused (conf: min(100, browDown × 150)) if activity < 0.15Relaxed (conf: (1-activity) × 80) otherwise → Neutral (conf: 60, fixed)
Known bias: browDown > 0.2 captures "Focused" in 53–73% of users because some people have naturally low eyebrows. Priority ordering means Focused "wins" over Relaxed/Neutral. For more precise detection, baseline face calibration at session start would be needed.
Frequency
Every frame → stored 1x/s into collected.emotions[] as { timestamp, emotion, confidence }.
In report
Summary karta"Emotions" — count of different detected emotions Emotion timeline grafColor bars per second (Happy=green, Surprised=yellow, Focused=blue, Relaxed=purple, Neutral=gray) Red curve in emotion chartSmile intensity Question/Video timeline😊 Emotion + confidence X% for each question Foto thumbnailsEmoce below each photo
😴 Fatigue
😴 Fatigue Score Functional / cascade
What it measures
User fatigue level, 0–100%. Point system from 5 independent factors.
Function
detectFatigue(blendshapes) v camera.js
Algorithm
// 5 factors, each adds points, max 100 if blinkRate > 25 → +30 bodu // high blink rate if blinkRate > 20 → +15 bodu // elevated blink rate if eyeSquint > 0.3 → +25 bodu // eye squinting if browDown > 0.2 → +15 bodu // lowered eyebrows if mouthOpen → +15 bodu // open mouth (yawning), jawOpen > 0.15 if attention < 50 → +15 bodu // attention drop (iris-based) fatigue = min(100, sum)
Cascade effect: Faktor attention < 50 depends on iris tracking, which is experimental. With a standard webcam, iris attention is often 0%, so this factor adds +15 in 87% of measurements. After deploying Gaze Attention, it can be switched to that.
Non-adaptive thresholds: eyeSquint > 0.3 a browDown > 0.2 are fixed — some people naturally have squinted eyes or low eyebrows, which inflates the fatigue score.
In report
Summary karta"Fatigue" — average Bio summaryRow with average and range GrafRed curve in "Attention & Fatigue" chart Charts summary tabulkaRow with avg/min/max
📏 Distance (Face Distance)
📏 Face Distance Functional
What it measures
Estimated face distance from camera in centimeters.
Function
detectDistance(landmarks) v camera.js
Algorithm
d = sqrt((rightEye.x - leftEye.x)² + (rightEye.y - leftEye.y)²) // d = inter-eye distance in normalized coordinates (0–1) // constant 0.095 ≈ average inter-eye width distance = max(20, min(150, round((0.095 / (d + 0.001)) × 100))) // Clamped to 20–150 cm
Accuracy: Depends on camera resolution and individual anatomy (inter-eye width varies). This is an estimate, not precise measurement. Constant 0.095 is not calibrated for individual users.
In report
Bio summary"Distance" — average and range in cm GrafBlue curve "Face distance from camera", Y axis: 0–150 cm Foto modal📏Xcm for each photo
🔄 Head Movement (Head Pose)
🔄 Head Pose Functional
What it measures
Head rotation angles in degrees: Pitch (up/down), Yaw (left/right), Roll (side tilt).
Function
detectHeadPose(landmarks, transformMatrix) v camera.js
Algorithm — two paths
// Priority 1: Transform matrix from MediaPipe (more precise) if (transformMatrix.data) { pitch = asin(-m[6]) × 57.3° yaw = atan2(m[2], m[10]) × 57.3° roll = atan2(m[4], m[5]) × 57.3° } // Priority 2: Geometric estimate from landmarks (fallback) else { yaw = (rightDist - leftDist) / (leftDist + rightDist) × 60° pitch = (nosePos - 0.35) × 80° roll = atan2(rightEye.y - leftEye.y, rightEye.x - leftEye.x) × 57.3° }
In report
Bio summary"Head" — average absolute angles ±P°/±Y° Graf"Head movement" — Pitch (purple) + Yaw (pink), Y axis: -45° to +45°, center line na 0°
😊 Smile
😊 Smile Intensity Functional
Calculation
(mouthSmileLeft + mouthSmileRight) × 50 — direct conversion from blendshapes, range 0–100%.
In report
Red curve in Emotion timeline chart. Also shown in photo modal: 😊X%.
✋ Hand Gestures
Hand Gesture Detection Functional
What it measures
Hand gestures in camera view. 8 recognized gestures: 👍 Thumbs Up, 👎 Thumbs Down, ✌️ Peace, ☝️ Point, ✋ Open Palm, ✊ Fist, 👌 OK, 🤟 Rock.
Source
MediaPipe Hand Landmarker — 21 bodu ruky. Funkce detectGesture(handLandmarks) compares fingertip positions against joints (tip.y vs joint.y).
Storage
Each frame (1x/s) is logged — including "None" (hand not visible). This allows measuring how long the hand was visible. "None" entries are filtered for the report.
False positives: occasionally "Point" is detected when no hand is present (nose, forehead, or background object). One user had 176× Point during a 286s session. Clusters of 3+ seconds of the same gesture are more reliable than isolated detections.
In report
Summary karta"Gestures" — count (filtered, without "None") Sekce gestaGrouped by type with count, mini timeline (dots) and time range
📷 Face detection (Face Detection Rate)
📷 Face Detection Functional
What it measures
Percentage of frames where the face was successfully detected. Low values = face out of frame, covered, or poor lighting.
Calculation
faceDetected.filter(detected).length / faceDetected.length × 100%
In report
Bio summary: "Detection" X% (count/total). Displayed as first row.
👥 Multi-face Presence Detection
👥 Multi-face Check Functional
What it measures
Presence of other people in frame. Second MediaPipe instance (without blendshapes, numFaces: 3) checks every 5 seconds.
In report
Bio summary"other people" — detection count or "No" Emotion grafRed vertical zones where more than 1 face was detected Biometric grafyRed vertical lines + zones
🎤 Voice Comments
🎤 Voice Comments Echo problem
What it measures
User voice comments transcribed in real-time using Web Speech API (SpeechRecognition). Each comment has text, timestamp, and confidence (0–100%).
Configuration
continuous: true, interimResults: false, language per session settings (cs-CZ / en-US). Auto-restart on end.
Known issue: Microphone runs continuously and also captures sound from video (avatar) or TTS. Pause during audio playback is not implemented — planned.
In report
Summary karta"Comments" — count Sekce hlasove komentareList with time, text, and confidence TimelineComments assigned to questions/steps by time
🔊 Microphone Volume (Mic Volume)
🔊 Mic Volume Zero data
What it measures
Peak microphone volume every second (AnalyserNode from Web Audio API).
Issue: All recorded values are 0. AnalyserNode is probably not correctly connected to mic stream, or peak detection has a bug. Requires investigation.
In report
Volume timeline (if data is non-zero). Currently not displayed.
📸 Photos and Photo Quality
📸 Face Photos + Quality Score Cascade from iris
When photos are taken
Event-triggered: session_start, video_* (each video step), question_*, reveal_*, session_end. Two versions: clean (pure image) + overlay (with face mesh).
Quality score
quality = (faceDetected ? 40 : 0) + min(30, attention × 0.3) // ← iris-based, max +30 + min(20, emotion.score × 0.2) // max +20 + (headPose centered ? 10 : 0) // yaw < 15° && pitch < 15° // Max mozne: 40 + 30 + 20 + 10 = 100
Cascade: With broken iris attention (always ~0), up to 30 points are missing. Max achievable score is ~70 instead of 100. Best photo is selected via selectBestPhoto() — compares quality and takes the highest.
In report
Face Photos sekceCarousel with thumbnails — click opens modal with overlay Kazda fotka ukazujeTime, emotion, gaze/attention %, distance, smile User main photoBest photo (highest quality) in report header
🧬 Raw Blendshapes (Raw Blendshape Log)
🧬 Blendshape Log Collected
What it is
Complete log of all 52 blendshapes (muscle values) from MediaPipe. Every 2 seconds a raw snapshot is stored. Used for future detailed analysis and debugging.
52 blendshapes include
eyeBlinkLeft/Right, eyeLookDownLeft/Right, eyeLookInLeft/Right, eyeLookOutLeft/Right, eyeLookUpLeft/Right, eyeSquintLeft/Right, eyeWideLeft/Right, browDownLeft/Right, browInnerUp, browOuterUpLeft/Right, cheekPuff, cheekSquintLeft/Right, jawForward, jawLeft/Right, jawOpen, mouthClose, mouthDimpleLeft/Right, mouthFrownLeft/Right, mouthFunnel, mouthLeft/Right, mouthLowerDownLeft/Right, mouthPressLeft/Right, mouthPucker, mouthRollLower/Upper, mouthShrugLower/Upper, mouthSmileLeft/Right, mouthStretchLeft/Right, mouthUpperUpLeft/Right, noseSneerLeft/Right, _neutral
In report
Not displayed directly — JSON export contains the complete log. Used for JSON export and future AI analysis.
⚖️ Face Symmetry
⚖️ Face Symmetry Collected, not displayed
Calculation
// 4 blendshape pairs pary = [mouthSmileL/R, eyeBlinkL/R, browDownL/R, cheekSquintL/R] symScore = average(1 - abs(left - right)) for each pair faceSym = round(symScore × 100) // 0–100%, 100 = perfectly symmetric
In report
Currently not displayed. Data is in each snapshot (faceSym). Interesting metric for future report extensions.
— Report sekce —
🔌 Sensor Status in Report
Zobrazene senzory
📷 CameraActive/Inactive + frame count 🧠 MediaPipeActive/Failed + point count + photo count + "82 pts/s" 🎙 MicrophoneActive/Inactive + comment count 🔊 ReproduktorHlasitost v %
Source
Ze sessionLog eventu session_started (pole sensors) a dopocitano z dat (fallback).
📊 Charts in Report
All charts
Emotions over timeColor bars (1 bar = 1s), red smile curve, question markers, multi-face zones Attention & FatigueGaze (solid green), Iris (dashed green 0.3), Fatigue (red), question markers Distance (cm)Blue curve, Y axis: 0–150 Blink Rate (/min)Yellow curve, Y axis: 0–60 Head movementPitch (purple) + Yaw (pink), Y axis: -45° to +45°, center line Microphone volumeGreen area chart (if data exists and is non-zero)
Chart technical details
All charts are SVG s viewBox="0 0 100 100" a preserveAspectRatio="none". Data is mapped to 0–100% axes. Curves use vector-effect="non-scaling-stroke" for consistent stroke width. Question markers are dashed vertical lines. Multi-face detections are red zones.
Curve functions
buildPath(arr, key, maxVal) { arr.map((v, i) => { x = (i / (arr.length - 1)) × 100 y = 100 - ((v[key] ?? v.score ?? 0) / maxVal) × 100 return (i===0 ? 'M' : 'L') + x + ',' + y }).join(' ') }
🃏 Summary Cards in Report
Video scenario (platform-demo)
Video krokuvideoSteps.length VolbaclosingChoice (pokud existuje) Emocicount of unique emotions KomentaruvoiceComments.length Pozornost (pohled)average gazeAttention (new metric) Iris tracking (exp.)average attention (opacity 0.6) Fatigueaverage fatigue Gestgestures filtered != None (if > 0)
Classic scenario (device-privacy-awareness)
Questionsquestion count Correctcorrect/total Ø Reakceaverage reaction time in seconds + shared cardsEmotions, Comments, Gaze, Iris (exp.), Fatigue, Gestures
Report header
Dynamic text: "During training we captured X biometric snapshots (Y data points), Z photographs, N hand gestures, and M voice comments. All in real-time." Data points count = snapshots × 82.
Chapter 2
🧠 Contextual Analysis Methodology
This chapter documents every derived indicator used in the "Session Analysis" report section. For each metric, the exact formula, data source, scientific basis, and implementation notes for replicability across any scenario are provided.
📋 Analysis Principles Foundation
Philosophy
Analysis is deterministic — no AI generation, no random elements. Every conclusion follows directly from numerical data. Interpretations are formulated as observations, not judgments. Goal: provide a contextual framework for raw data that is replicable and verifiable.
Input data
Complete session JSON export containing: gazeAttention[], fatigue[], emotions[], blinkRate[], distance[], headPose[], gestures[], blendshapeLog[], micVolume[], voiceComments[], videoSteps[], reactionTimes[], presenceChecks[], sessionLog[].
Segmentation
Data is segmented by videoSteps[] (for video scenarios) or answers[] (for classic scenarios). Each segment is defined by time range [startedAt, endedAt] and all biometric data is filtered by timestamp into the respective segment.
📊 Engagement Score
🎯 Engagement Score Derived
Purpose
Summary number 0–100 expressing overall participant engagement during the session. Displayed as a circular indicator in the report.
Formula
score = w1×gazeNorm + w2×faceNorm + w3×emotionNorm + w4×interactionNorm + w5×completionNorm
Components and weights
gazeNorm (w=0.30)avg(gazeAttention.score) / 100 faceNorm (w=0.15)count(faceDetected=true) / total emotionNorm (w=0.20)count(emotion=Focused|Happy) / total interactionNorm (w=0.20)(hasVoiceTips + hasGestures + hasVoiceChoice) / 3 completionNorm (w=0.15)1.0 if session_completed, 0.5 if tab_switch end, 0.0 otherwise
Example from demo session
0.30×0.694 + 0.15×1.0 + 0.20×0.992 + 0.20×1.0 + 0.15×1.0 = 0.208 + 0.15 + 0.198 + 0.20 + 0.15 = 0.906 → zaokrouhleno na 80
Note: in current implementation, the score is set manually based on expert estimation. The formula above is the proposed automation.
Implementation notes
Weights are adjustable per scenario. For classic scenarios, add component correctAnswersNorm. For platforms without voice inputs, adjust interactionNorm to use gestures and clicks.
🎯 Attention per Segment
🎯 Gaze Attention per Video Step Derived
What is shown in report
Horizontal mini-bars showing average gaze attention for each video step. Color: green (>70), orange (60-70), yellow (<60). Allows identifying which segment engaged the participant most and least.
Formula
segmentAvg = avg(gazeAttention.score WHERE timestamp BETWEEN step.startedAt AND step.endedAt)
For each videoSteps[] element, all records from gazeAttention[] whose timestamp falls within the time range of that step are filtered.
Implementation
function gazePerSegment(gazeAttention, videoSteps) { return videoSteps.map(step => { const scores = gazeAttention .filter(g => g.timestamp >= step.startedAt && g.timestamp <= (step.endedAt || step.startedAt + 60000)) .map(g => g.score); return { videoId: step.videoId, avg: scores.length > 0 ? scores.reduce((a,v) => a+v, 0) / scores.length : 0, n: scores.length }; }); }
Interpretation logic in report
1. Find segment with max(avg) → "Highest attention at [segment name]"
2. Find segment with min(avg) → "Lowest attention at [segment name]"
3. Pokud max - min > 10 → "This suggests the topic of [max segment] was more engaging than [min segment]"
4. Pokud max - min < 5 → "Attention was evenly distributed across all segments"
Mini-bar colors
avg >= 70var(--beam) green — high attention avg 60–69var(--warning) orange — medium avg < 60var(--danger) red — low Tip/otazkovy segmentvar(--info) blue — differentiated from video
😴 Fatigue Trend per Segment
😴 Fatigue per Video Step + Trend Detection Derived
What is shown in report
Bar mini-chart showing average fatigue per segment. Color: green (<45%), orange (45–54%), yellow (≥55%). Below the chart are segment descriptions. Text: trend identification (rising/falling) and exceptions.
Formula
segmentFatigueAvg = avg(fatigue.score WHERE timestamp BETWEEN step.startedAt AND step.endedAt)
Identicky filtr as u pozornosti, pouze z pole fatigue[].
Interpretation logic in report
1. Calculate firstHalfAvg (average fatigue of first half of segment) and secondHalfAvg
2. Pokud secondHalfAvg - firstHalfAvg > 10 → "Fatigue gradually increased"
3. For each segment: if segAvg[i] < segAvg[i-1] AND segAvg[i] < segAvg[i+1] → "[Segment] re-energized the participant" (local minimum)
4. Najdi min(segAvg) a max(segAvg) and report both with time
5. Global minimum from entire fatigue[] array → calmness peak (Math.min(...fatigue.map(f => f.score)))
Bar chart colors
avg < 45%#10b981 green — low fatigue avg 45–54%#fb923c orange — medium avg >= 55%#f59e0b yellow — high fatigue
😊 Emotion Map per Segment
😊 Emotion Distribution per Video Step Derived
What is shown in report
Text description identifying WHICH emotions appeared in WHICH segments. Correlation with gestures and other events.
Formula
function emotionsPerSegment(emotions, videoSteps) { return videoSteps.map(step => { const segEmotions = emotions .filter(e => e.timestamp >= step.startedAt && e.timestamp <= (step.endedAt || step.startedAt + 60000)) .map(e => e.emotion); const counts = {}; segEmotions.forEach(em => counts[em] = (counts[em] || 0) + 1); return { videoId: step.videoId, emotions: counts, total: segEmotions.length }; }); }
Interpretation logic in report
1. For each non-Focused emotion, find segments where it appears → "Emotion [X] appeared exclusively during [segment]"
2. Pokud gesto (napr. Thumbs Up) ma timestamp v rozsahu stejneho segmentu → "This precisely correlates with gesture [Y]"
3. Pokud vsechny segmenty = 100% Focused → "Entire session ran in Focused mode — stable concentration without emotional fluctuations"
4. Dominantni emoce = ta s highestm celkovym poctem
5. "Positive emotions" = Happy, Surprised; "Neutral" = Focused, Neutral, Relaxed; "Negative" = None (not yet supported)
Gesture correlations
For each gesture from gestures[] (filtr gesture != 'None') find temporal overlap with segments:
gestureSegment = videoSteps.find(s => gesture.timestamp >= s.startedAt && gesture.timestamp <= s.endedAt)
Pokud segment s gestem = segment s ne-Focused emoci → silna korelace, uvest v reportu.
🎙 Interakce a reakce (Interaction Analysis)
🎙 Reaction Times & Voice Interaction Quality Derived
What is shown in report
Breakdown of voice answers (tip), their reaction times, and answer consistency analysis. For closing choice, formulation analysis.
Zdrojova data
reactionTimes[]{questionId, time_ms} — time from input display to answer voiceComments[]{timestamp, text, confidence} — rozpoznany hlas closingChoice{detectedKeyword, attempts, detailViewed}
Derived metrics
avgReactionTimeavg(reactionTimes.time_ms) reactionTimeVariancemax(time_ms) - min(time_ms) consistencyScorePokud variance < 1000ms → „konzistentni premysleni"
Interpretation logic in report
1. reactionTimeVariance < 1000 → "Very similar reaction times indicate consistent and active thinking"
2. avgReactionTime < 2000 → "Fast answers — possibly random guessing or certainty"
3. avgReactionTime 3000–7000 → "Active thinking before answer"
4. avgReactionTime > 10000 → „Dlouhe premysleni — slozita otazka nebo nerozhodnost"
5. closingChoice.detectedKeyword != standard words (biometrics/hardware/adaptivity/no) → natural formulation (e.g. "chci do reportu" instead of "ne") → poznamka o porozumeni kontextu
6. closingChoice.attempts > 0 → „[N]x upozorneni pred spravnou detekci"
📏 Fyzicke chovani (Physical Behavior Summary)
📏 Distance + Head + Blink Combined Interpretation Derived
What is shown in report
Summary of physical metrics: distance (range, drift), head movement (average pitch/yaw), blink rate vs norm. Combined interpretation.
Source data and calculations
Vzdalenost rozsahmin(distance.cm)max(distance.cm) Distance driftavg(last 10) - avg(first 10) Average pitchavg(|headPose.pitch|) in degrees Average yawavg(|headPose.yaw|) in degrees Finalni blink rateblinkRate[last].rate Blink norma15–20/min (Bentivoglio et al., 1997)
Interpretation logic in report
1. Vzdalenost: drift > 0 → „Mirny trend oddalovani — prirozeny projev postupne relaxace"
drift < -5 → "Approaching screen — possible eye fatigue or trying to see better"
|drift| < 2 → „Stabilni pozice po celou dobu"

2. Head movement: avg_pitch < 3 AND avg_yaw < 5 → „Minimalni pohyb hlavy — vysoka vizualni fixace"
avg_pitch > 5 OR avg_yaw > 8 → „Zvyseny pohyb — mozna nepohodli nebo rozptyleni"

3. Blink rate: rate < 10 → "Below norm — combined with stable head indicates high visual fixation"
rate < 10 AND avg_pitch < 3 → "Combination of low blink rate and stable head = strong concentration signal"
Poznamka: "Low blink rate may partly be a detection artifact"
📊 Overall Profile & Content Recommendations
📊 Participant Type + Content Effectiveness Score Derived
What is shown in report
Three paragraphs: (1) Participant type, (2) Highlight — most vs least engaging segment, (3) Content recommendations.
Participant type — decision tree
function classifyParticipant(data) { const gazeAvg = avg(data.gazeAttention, 'score'); const fatAvg = avg(data.fatigue, 'score'); const headMov = avgHeadMovement(data.headPose); const gestures = data.gestures.filter(g => g.gesture !== 'None').length; const voices = data.voiceComments.length; const drops = detectDropsAndSpikes(data.gazeAttention); if (gazeAvg > 60 && fatAvg < 60 && headMov < 1.0) return 'Focused, calm, analytical observer'; if (gazeAvg > 60 && (gestures > 0 || voices > 0)) return 'Active, engaged participant'; if (fatAvg > 60) return 'Unaveny, se snizenymi vysledky'; if (drops.drops.length > 5 && drops.recoveryRatio < 0.5) return 'Distracted, with frequent attention lapses'; if (gazeAvg < 50 && gestures === 0 && voices === 0) return 'Passive observer'; return 'Standard participant'; }
Highlights — extreme identification
1. Calculate per-segment: gazeAvg, emotionVariety (count of unique emotions != Focused), gestureCount, fatigueAvg
2. Segment s max(gazeAvg) AND max(emotionVariety) AND max(gestureCount) → „nejzajimavejsi"
3. Segment s min(gazeAvg) AND max(fatigueAvg) → „nejmene poutavy"
4. Textova sablona: "[Max segment name] triggered the strongest reaction — highest attention, [specifics]. [Min segment name] was conversely the least engaging."
Content recommendations — content effectiveness
segmentScore = gazeAvg × (1 - fatigueAvg / 100)

Logika:
• Segment s min(segmentScore) → „Zvazit zkraceni nebo oziveni [segmentu]"
• Segment with max(segmentScore) → "[Segment] serves as the main scenario hook"
• Pokud max(segmentScore) / min(segmentScore) > 1.5 → „Vyrazny rozdil v efektivite mezi segmenty"
• Pokud max(segmentScore) / min(segmentScore) < 1.2 → „Segmenty jsou vyrovnane"
Template table for text output
Typ„Typ: [classifyParticipant result]. [Popis]" Zajimavost„[max segment] = nejzajimavejsi. [min segment] = nejmene poutavy." Recommendation"Consider [action] for [min segment]. [max segment] serves as hook."
Note
All text outputs in the report are generated from these templates and numerical thresholds. This is not free text — every sentence has a clear data basis. If data does not reach the threshold, the corresponding sentence is not shown in the report.
💪 Facial Muscle Analysis (FACS — Facial Action Coding System)
🔬 FACS Blendshape Aggregation Derived
Purpose
Extract patterns from 52 ARKit blendshapes matching known facial expressions per FACS (Ekman & Friesen, 1978). ARKit blendshapes map to FACS Action Units (AU).
Scientific basis
Ekman & Friesen (1978)Facial Action Coding System (FACS). Standard system for describing facial muscle movements.

ARKit → FACS mapovani (Ozel, 2022, facethefacs.com):
browDownLeft/Right → AU4 (Brow Lowerer) — corrugator supercilii
eyeSquintLeft/Right → AU7 (Lid Tightener) — orbicularis oculi, palpebral
cheekSquintLeft/Right → AU6 (Cheek Raiser) — orbicularis oculi, orbital
mouthSmileLeft/Right → AU12 (Lip Corner Puller) — zygomaticus major
mouthFrownLeft/Right → AU15 (Lip Corner Depressor) — depressor anguli oris
mouthPressLeft/Right → AU24 (Lip Pressor) — orbicularis oris
jawOpen → AU26/27 (Jaw Drop) — masseter; internal pterygoid
Key patterns
ConcentrationAU4 (browDown > 0.3) + AU7 (eyeSquint > 0.2) HappinessAU6 (cheekSquint > 0.2) + AU12 (mouthSmile > 0.3) DeliberationAU24 (mouthPress > 0.1) + AU4 (browDown > 0.3) SurpriseAU5 (eyeWide > 0.3) + AU26 (jawOpen > 0.2) DispleasureAU15 (mouthFrown > 0.2) + AU4 (browDown > 0.4)
Calculation per segment
function facsAggregation(blendshapeLog, startTs, endTs) { const seg = blendshapeLog.filter(b => b.timestamp >= startTs && b.timestamp <= endTs); if (seg.length === 0) return null; const keys = ['browDownLeft','browDownRight','eyeSquintLeft','eyeSquintRight', 'mouthSmileLeft','mouthSmileRight','mouthPressLeft','mouthPressRight', 'cheekSquintLeft','cheekSquintRight','jawOpen','eyeWideLeft','eyeWideRight']; const result = {}; keys.forEach(k => { const vals = seg.map(s => s.values[k] || 0); result[k] = { avg: vals.reduce((a,v) => a+v, 0) / vals.length, max: Math.max(...vals) }; }); return result; }
Notes
Blendshape values are 0.0–1.0. Averaging across a segment gives "basal muscle tone". Maximum shows peak intensity. For identifying brief smiles, it suffices that max(mouthSmile) > 0.5 even if avg < 0.05.
📈 Attention Variability (Gaze Attention Variability)
📊 Gaze Std per Segment Derived
Purpose
Average attention shows "how much", but standard deviation shows "how dynamically" — higher std = participant actively shifts attention = deeper cognitive processing. Low std = monotonous watching.
Formula
std = sqrt( Σ(xi - mean)² / n )
Where xi = individual gaze scores in segment, mean = segment average, n = sample count.
Interpretation
std > 8Active processing — dynamic attention shifts std 5–8Normal watching std < 5Fixed gaze — either high concentration or monotony
Implementation
function gazeStd(gazeAttention, startTs, endTs) { const scores = gazeAttention .filter(g => g.timestamp >= startTs && g.timestamp <= endTs) .map(g => g.score); if (scores.length < 2) return 0; const mean = scores.reduce((a,v) => a+v, 0) / scores.length; const variance = scores.reduce((a,v) => a + (v - mean) ** 2, 0) / scores.length; return Math.sqrt(variance); }
Note
Distinguish low std from concentration (= high avg + low std) vs. low std from disinterest (= low avg + low std). Always report std together with the average.
🔄 Head Movement Analysis (Head Movement Events)
🔄 Head Movement per Segment + Spike Detection Derived
Metric 1: Average movement per segment
avg_movement = Σ(|pitch[i]-pitch[i-1]| + |yaw[i]-yaw[i-1]|) / (2 × (n-1))
Measures frame-to-frame "jitter" — how much the head moves on average each frame.
Metric 2: Spike detection
spike = |value[i] - value[i-1]| > threshold
Prah pro yaw: 10°, prah pro pitch: . Spikes indicate sudden head turns — reaction to external stimulus, position change, or surprise.
Interpretation
avg_movement < 0.3°Very stable — fixed gaze on content avg_movement 0.3–1.0°Normal — slight movements during processing avg_movement > 1.0°Increased activity — physical reaction to content or distraction Yaw spike > 10°Sudden turn — external stimulus or position change
Implementation
function headMovementPerSegment(headPose, startTs, endTs) { const seg = headPose.filter(h => h.timestamp >= startTs && h.timestamp <= endTs); if (seg.length < 2) return { avgMovement: 0, spikes: [] }; let totalDiff = 0, spikes = []; for (let i = 1; i < seg.length; i++) { const dP = Math.abs(seg[i].pitch - seg[i-1].pitch); const dY = Math.abs(seg[i].yaw - seg[i-1].yaw); totalDiff += dP + dY; if (dY > 10 || dP > 5) spikes.push({ ts: seg[i].timestamp, dP, dY }); } return { avgMovement: totalDiff / (2 * (seg.length - 1)), spikes }; }
📊 Correlation: fatigue ↔ attention
📉 Pearson Correlation: Fatigue vs Gaze Derived
Purpose
Verify whether fatigue and attention measure a consistent phenomenon. Expected negative correlation (higher fatigue → lower attention) validates both metrics.
Vzorec (Pearson r)
r = Σ((xi - x̄)(yi - ȳ)) / sqrt(Σ(xi - x̄)² × Σ(yi - ȳ)²)
Where xi = gaze score, yi = fatigue score, paired by nearest timestamp (tolerance 2s).
Interpretation
r = -0.7 az -1.0Strong negative — fatigue dominantly affects attention r = -0.3 az -0.7Medium negative — fatigue is one of the factors r = 0 az -0.3Weak — attention is driven primarily by content, not fatigue r > 0Unexpected — possible data error or atypical participant
Implementation
function pearsonCorrelation(gazeAttention, fatigue) { const pairs = []; gazeAttention.forEach(g => { const closest = fatigue.reduce((best, f) => Math.abs(f.timestamp - g.timestamp) < Math.abs(best.timestamp - g.timestamp) ? f : best); if (Math.abs(closest.timestamp - g.timestamp) < 2000) pairs.push([g.score, closest.score]); }); const n = pairs.length; if (n < 3) return null; const mG = pairs.reduce((a,[g]) => a+g, 0)/n; const mF = pairs.reduce((a,[,f]) => a+f, 0)/n; const cov = pairs.reduce((a,[g,f]) => a + (g-mG)*(f-mF), 0)/n; const sG = Math.sqrt(pairs.reduce((a,[g]) => a + (g-mG)**2, 0)/n); const sF = Math.sqrt(pairs.reduce((a,[,f]) => a + (f-mF)**2, 0)/n); return (sG > 0 && sF > 0) ? cov / (sG * sF) : 0; }
⚡ Distraction and Recovery Detection (Attention Drops & Spikes)
Gaze Attention Drops & Recovery Derived
Purpose
Identify precise moments where the participant lost attention (drop) and where they returned (spike). Enables correlation with external events (segment transition, sound, movement).
Formula
drop = score[i-1] - score[i] > threshold
spike = score[i] - score[i-1] > threshold
Default threshold: 15 points. Adjustable per scenario.
Derived metrics
drop_countTotal drops > threshold spike_countTotal spikes > threshold max_dropLargest single drop recovery_ratiospike_count / drop_count — closer to 1.0 = good self-regulation avg_recovery_timeAverage time (s) between drop and subsequent spike
Implementation
function detectDropsAndSpikes(gazeAttention, threshold = 15) { const drops = [], spikes = []; for (let i = 1; i < gazeAttention.length; i++) { const diff = gazeAttention[i].score - gazeAttention[i-1].score; if (diff < -threshold) drops.push({ ts: gazeAttention[i].timestamp, from: gazeAttention[i-1].score, to: gazeAttention[i].score }); if (diff > threshold) spikes.push({ ts: gazeAttention[i].timestamp, from: gazeAttention[i-1].score, to: gazeAttention[i].score }); } const recoveryRatio = drops.length > 0 ? spikes.length / drops.length : 1; return { drops, spikes, recoveryRatio }; }
📐 Ergonomics and Distance (Distance Ergonomics)
📐 Distance Drift & Ergonomic Zone Derived
Purpose
Evaluate whether the participant sits in the ergonomic zone and how their distance changes over time (drift = relaxation vs. leaning forward = trying to see better).
Scientific basis
CCOHS (Canadian Centre for Occupational Health and Safety) — Resting Point of Accommodation (RPA) ≈ 80 cm. At this distance, eye muscles require no effort to focus.

OSHA eTools Computer Workstations — recommended range 50–100 cm (20–40 inches) from eyes to monitor.

Ergonomic drift interpretation: gradual distancing = relaxation, approaching = trying to see better (possible eye fatigue or small text).
Metrics
avg_distanceavg(distance.cm) in_ergonomic_zone50 <= avg <= 100 → boolean driftavg(last 10 samples) - avg(first 10 samples) drift_directiondrift > 0 = distancing (relaxation), drift < 0 = approaching near_rpa|avg - 80| < 10 → in resting focus zone
🎤 Voice Activity and Microphone Analysis
🎤 Voice Activity & Ambient Analysis Derived
Purpose
Distinguish active speech from ambient noise. Identify participant voice profile.
Metriky z micVolume[]
speaking_ratiocount(isSpeaking=true) / total ambient_volumeavg(rms) where isSpeaking=false speech_volumeavg(rms) where isSpeaking=true snr (signal-to-noise)speech_volume / ambient_volume dominant_freq_speechavg(dominantFreq) where isSpeaking=true
Frequency interpretation
85–180 HzTypical male voice (fundamental) 165–255 HzTypical female voice (fundamental) 500–1000 HzHarmonics — speech harmonic component detection 0 HzNo dominant frequency — silence or noise
Calibration note
In demo session, the speaking detector was inconsistent (detected 2 out of 3 actual voice inputs). Recommendation: compare voiceComments[] (speech-to-text results) with micVolume[].isSpeaking and calibrate the speaking detector threshold.
👤 Participant Profile Generation
👤 Behavioral Profile Derived
Purpose
Automatically generated text profile of the participant based on combination of all metrics. Intended for trainers and training managers.
Decision logic
Type: Focused, calmgazeAvg > 60 AND fatigueAvg < 60 AND headMovement < 1.0° Type: Active, engagedgazeAvg > 60 AND gestureCount > 0 AND voiceComments > 0 Type: Fatigued, decliningfatigueAvg > 60 AND gazeTrend < 0 (declining trend) Type: DistracteddropCount > 5 AND recoveryRatio < 0.5 Type: PassivegazeAvg < 50 AND gestureCount = 0 AND voiceComments = 0
Content recommendations (per segment)
For each video segment, calculate segmentScore = gazeAvg × (1 - fatigueAvg/100). Segments with lowest score → candidates for shortening or reworking. Segments with highest score → confirm as effective.
Segment score example
Pilir 1: Biometrika76.2 × (1 - 39.8/100) = 45.9 ← highest Uvod74.2 × (1 - 34.9/100) = 48.3 ← also high Pilir 2: Hardware60.9 × (1 - 55.0/100) = 27.4 ← lowest Pilir 3: Adaptivita68.3 × (1 - 48.6/100) = 35.1 ← medium
— End of Documentation —
BEAM X1 v3 · Chapter 1: Collection Metrics · Chapter 2: Contextual Analysis · April 2026