BEAM X1 — Metrics Documentation

Foundation

Data Collection Architecture

How data flows from the camera through calculations to the report.

Pipeline

Webkamera 1280x720 → MediaPipe Face Landmarker → 478 landmarks + 52 blendshapes → detect*() funkce → cur object (every frame) → collectSnapshot() 1x/s → KV Store (Cloudflare) → Report

Files

camera.jsMediaPipe init, detection, calculations, data collection sensors.jsMicrophone, TTS, volume, device info engine.jsSession control, result assembly report.jsVisualization, charts, cards, export worker.jsCloudflare Worker — API, KV storage

Technology

Face detectionGoogle MediaPipe Face Landmarker (GPU) Hand detectionGoogle MediaPipe Hand Landmarker (GPU) Voice commentsWeb Speech API (SpeechRecognition) System voiceWeb Speech API (SpeechSynthesis) Runs where100% locally in browser, no data leaves the device without consent

Frequency

How often things are measured and stored

Detection (value calculation)

requestAnimationFrame — runs as fast as the browser can handle. Typically 30–60 FPS (depends on GPU/CPU). Each frame updates all metrics in the object cur.

Storage

collectSnapshot()every 1000 ms (1x/s) — complete snapshot of all metrics collectBlendshapes()every 2000 ms (1x per 2s) — raw 52 blendshapes checkMultipleFaces()every 5000 ms (1x per 5s) — second MP instance, up to 3 faces facePhotosevent-triggered — at session_start, each question, each reveal, session_end

Data points per second

Each snapshot contains 82 data points: 52 blendshapes + 30 derived metrics (emotions, attention, gaze, fatigue, headpose, distance, blink, smile, iris, gestures, symmetry, ...). Configuration: CONFIG.SNAPSHOT_INTERVAL = 1000

Primary metric

🎯 Attention — Gaze (Gaze Attention)

🎯 Gaze Attention Reliable

What it measures

Where the user is looking. 100% = looking straight ahead (at screen), 0% = looking markedly sideways or up. Measures gaze direction deviation from straight ahead.

Data source

Blendshapes (muscle values) from MediaPipe: eyeLookOutLeft, eyeLookOutRight, eyeLookUpLeft, eyeLookUpRight, eyeLookDownLeft, eyeLookDownRight. These values come directly from the ML model — no manual geometry.

Function

detectGazeAttention(blendshapes) v camera.js

Algorithm

// 1. Side gaze — looking sideways
sideAvg = (eyeLookOutLeft + eyeLookOutRight) / 2
sidePenalty = min(1.0, sideAvg × 3)        // 0.33+ = full penalty

// 2. Upward gaze — looking up/away
upAvg = (eyeLookUpLeft + eyeLookUpRight) / 2
upPenalty = min(1.0, upAvg × 4)             // 0.25+ = full penalty

// 3. Extreme down — looking at lap/phone (jen >0.7)
downAvg = (eyeLookDownLeft + eyeLookDownRight) / 2
downPenalty = max(0, (downAvg - 0.7)) × 3   // penalty only above 0.7

// Total penalty (weighted sum)
totalPenalty = min(1.0, side×0.4 + up×0.3 + down×0.3)
gazeAttention = round((1 - totalPenalty) × 100)

Frequency

Calculation: every frame (30–60 FPS). Storage: 1x/s into collected.gazeAttention[] and into each snapshot.

Range

0–100%. Slight downward gaze (reading, lower screen) = still high value. Penalty only for extreme downward gaze (>0.7).

In report

Summary karta"Attention (gaze)" — average value Bio summaryRow with average and min–max range GrafSolid green curve in "Attention & Fatigue" chart Question/Video timeline🎯 gaze X% for each question/step Foto thumbnails🎯X% below each photo Charts summary tabulkaRow with avg/min/max

Why blendshapes? MediaPipe ML model extracts muscle values directly from the image. Unlike iris tracking (geometric calculation from landmarks), it does not depend on precise coordinates and works reliably even with a standard webcam.

Experimental

👁 Iris Tracking

👁 Iris Tracking Experimental

What it measures

Iris position within eye apertures. Theoretically more precise than blendshapes, but requires a quality camera or dedicated eye-tracking hardware.

Data source

MediaPipe landmarks: duhovky lm[473] (leva) a lm[468] (prava), rohy oci lm[33], lm[133], lm[263], lm[362].

Function

detectAttention(landmarks) v camera.js

Algorithm

// Iris position ratio relative to eye width
leW = abs(eyeOuter.x - eyeInner.x)   // left eye width
lPos = (irisLeft.x - eyeOuter.x) / leW  // expected 0.0–1.0
rPos = (irisRight.x - eyeOuter.x) / reW

dev = abs(((lPos + rPos) / 2) - 0.5) × 2  // deviation from center
attention = round(max(0, min(100, (1 - dev) × 100)))

Known issue: With a standard webcam, lPos/rPos values come out as 2.5–3.7 (instead of 0.0–1.0). Reason: iris landmark coordinates (473/468) are in a different range than eye corner landmarks, likely due to camera mirror mode or coordinate normalization. Result: 54–94% of measurements = 0% attention. Therefore the metric is marked as experimental.

In report

Summary karta"Iris tracking (exp.)" — reduced opacity 0.6 GrafDashed green curve with opacity 0.3 Timeline👁 iris (exp.) X% after gaze value

Stored raw data

irisLeft a irisRight in each snapshot — for future analysis with better HW.

😊 Emotions (Emotion Detection)

😊 Emotion Detection Functional / bias

What it measures

Dominant emotion from facial expression. 5 categories: Happy, Surprised, Focused, Relaxed, Neutral. Each with confidence score 0–100%.

Function

detectEmotion(blendshapes) v camera.js

Algorithm (waterfall if/else)

smile = (mouthSmileLeft + mouthSmileRight)
browUp = (browOuterUpLeft + browOuterUpRight)
browDown = (browDownLeft + browDownRight)
jawOpen = jawOpen

if smile > 0.4         → Happy     (conf: min(100, smile × 100))
if browUp > 0.3 + jawOpen > 0.2 → Surprised (conf: min(100, (browUp+jawOpen) × 50))
if browDown > 0.2       → Focused   (conf: min(100, browDown × 150))
if activity < 0.15     → Relaxed   (conf: (1-activity) × 80)
otherwise                  → Neutral   (conf: 60, fixed)

Known bias: browDown > 0.2 captures "Focused" in 53–73% of users because some people have naturally low eyebrows. Priority ordering means Focused "wins" over Relaxed/Neutral. For more precise detection, baseline face calibration at session start would be needed.

Frequency

Every frame → stored 1x/s into collected.emotions[] as { timestamp, emotion, confidence }.

In report

Summary karta"Emotions" — count of different detected emotions Emotion timeline grafColor bars per second (Happy=green, Surprised=yellow, Focused=blue, Relaxed=purple, Neutral=gray) Red curve in emotion chartSmile intensity Question/Video timeline😊 Emotion + confidence X% for each question Foto thumbnailsEmoce below each photo

😴 Fatigue

😴 Fatigue Score Functional / cascade

What it measures

User fatigue level, 0–100%. Point system from 5 independent factors.

Function

detectFatigue(blendshapes) v camera.js

Algorithm

// 5 factors, each adds points, max 100
if blinkRate > 25   → +30 bodu    // high blink rate
if blinkRate > 20   → +15 bodu    // elevated blink rate
if eyeSquint > 0.3  → +25 bodu    // eye squinting
if browDown > 0.2   → +15 bodu    // lowered eyebrows
if mouthOpen        → +15 bodu    // open mouth (yawning), jawOpen > 0.15
if attention < 50   → +15 bodu    // attention drop (iris-based)
fatigue = min(100, sum)

Cascade effect: Faktor attention < 50 depends on iris tracking, which is experimental. With a standard webcam, iris attention is often 0%, so this factor adds +15 in 87% of measurements. After deploying Gaze Attention, it can be switched to that.

Non-adaptive thresholds: eyeSquint > 0.3 a browDown > 0.2 are fixed — some people naturally have squinted eyes or low eyebrows, which inflates the fatigue score.

In report

Summary karta"Fatigue" — average Bio summaryRow with average and range GrafRed curve in "Attention & Fatigue" chart Charts summary tabulkaRow with avg/min/max

👀 Blink Rate

👀 Blink Rate Non-adaptive threshold

What it measures

Blinks per minute. Normal range: 15–20/min. Higher values may indicate fatigue.

Function

detectBlink(blendshapes) v camera.js

Algorithm

avg = (eyeBlinkLeft + eyeBlinkRight) / 2
eyeOpen = avg < 0.5          // threshold: < 0.5 = open, >= 0.5 = closed

// Blink detection: transition open → closed → open within 15 frames
if transition from open to closed → start counting frames
if back to open && frames < 15 → record blink

blinkHistory = blinks in the last 60 seconds (cumulative)
blinkRate = blinkHistory.length  // count per minute

Known issue: Fixed threshold 0.5 is not adaptive. Some people have naturally higher baseline eyeBlink (heavier eyelids) and their blinks never reach 0.5. Example: user had max eyeBlink 0.489 — entire session 0 blinks.

In report

Bio summary"Blink rate" — last recorded value (rate/min) GrafYellow curve in "Blink Rate (/min)" chart, Y axis: 0–60

📏 Distance (Face Distance)

📏 Face Distance Functional

What it measures

Estimated face distance from camera in centimeters.

Function

detectDistance(landmarks) v camera.js

Algorithm

d = sqrt((rightEye.x - leftEye.x)² + (rightEye.y - leftEye.y)²)
// d = inter-eye distance in normalized coordinates (0–1)
// constant 0.095 ≈ average inter-eye width
distance = max(20, min(150, round((0.095 / (d + 0.001)) × 100)))
// Clamped to 20–150 cm

Accuracy: Depends on camera resolution and individual anatomy (inter-eye width varies). This is an estimate, not precise measurement. Constant 0.095 is not calibrated for individual users.

In report

Bio summary"Distance" — average and range in cm GrafBlue curve "Face distance from camera", Y axis: 0–150 cm Foto modal📏Xcm for each photo

🔄 Head Movement (Head Pose)

🔄 Head Pose Functional

What it measures

Head rotation angles in degrees: Pitch (up/down), Yaw (left/right), Roll (side tilt).

Function

detectHeadPose(landmarks, transformMatrix) v camera.js

Algorithm — two paths

// Priority 1: Transform matrix from MediaPipe (more precise)
if (transformMatrix.data) {
  pitch = asin(-m[6]) × 57.3°
  yaw = atan2(m[2], m[10]) × 57.3°
  roll = atan2(m[4], m[5]) × 57.3°
}

// Priority 2: Geometric estimate from landmarks (fallback)
else {
  yaw = (rightDist - leftDist) / (leftDist + rightDist) × 60°
  pitch = (nosePos - 0.35) × 80°
  roll = atan2(rightEye.y - leftEye.y, rightEye.x - leftEye.x) × 57.3°
}

In report

Bio summary"Head" — average absolute angles ±P°/±Y° Graf"Head movement" — Pitch (purple) + Yaw (pink), Y axis: -45° to +45°, center line na 0°

😊 Smile

😊 Smile Intensity Functional

Calculation

(mouthSmileLeft + mouthSmileRight) × 50 — direct conversion from blendshapes, range 0–100%.

In report

Red curve in Emotion timeline chart. Also shown in photo modal: 😊X%.

✋ Hand Gestures

✋ Hand Gesture Detection Functional

What it measures

Hand gestures in camera view. 8 recognized gestures: 👍 Thumbs Up, 👎 Thumbs Down, ✌️ Peace, ☝️ Point, ✋ Open Palm, ✊ Fist, 👌 OK, 🤟 Rock.

Source

MediaPipe Hand Landmarker — 21 bodu ruky. Funkce detectGesture(handLandmarks) compares fingertip positions against joints (tip.y vs joint.y).

Storage

Each frame (1x/s) is logged — including "None" (hand not visible). This allows measuring how long the hand was visible. "None" entries are filtered for the report.

False positives: occasionally "Point" is detected when no hand is present (nose, forehead, or background object). One user had 176× Point during a 286s session. Clusters of 3+ seconds of the same gesture are more reliable than isolated detections.

In report

Summary karta"Gestures" — count (filtered, without "None") Sekce gestaGrouped by type with count, mini timeline (dots) and time range

📷 Face detection (Face Detection Rate)

📷 Face Detection Functional

What it measures

Percentage of frames where the face was successfully detected. Low values = face out of frame, covered, or poor lighting.

Calculation

faceDetected.filter(detected).length / faceDetected.length × 100%

In report

Bio summary: "Detection" X% (count/total). Displayed as first row.

👥 Multi-face Presence Detection

👥 Multi-face Check Functional

What it measures

Presence of other people in frame. Second MediaPipe instance (without blendshapes, numFaces: 3) checks every 5 seconds.

In report

Bio summary"other people" — detection count or "No" Emotion grafRed vertical zones where more than 1 face was detected Biometric grafyRed vertical lines + zones

🎤 Voice Comments

🎤 Voice Comments Echo problem

What it measures

User voice comments transcribed in real-time using Web Speech API (SpeechRecognition). Each comment has text, timestamp, and confidence (0–100%).

Configuration

continuous: true, interimResults: false, language per session settings (cs-CZ / en-US). Auto-restart on end.

Known issue: Microphone runs continuously and also captures sound from video (avatar) or TTS. Pause during audio playback is not implemented — planned.

In report

Summary karta"Comments" — count Sekce hlasove komentareList with time, text, and confidence TimelineComments assigned to questions/steps by time

🔊 Microphone Volume (Mic Volume)

🔊 Mic Volume Zero data

What it measures

Peak microphone volume every second (AnalyserNode from Web Audio API).

Issue: All recorded values are 0. AnalyserNode is probably not correctly connected to mic stream, or peak detection has a bug. Requires investigation.

In report

Volume timeline (if data is non-zero). Currently not displayed.

📸 Photos and Photo Quality

📸 Face Photos + Quality Score Cascade from iris

When photos are taken

Event-triggered: session_start, video_* (each video step), question_*, reveal_*, session_end. Two versions: clean (pure image) + overlay (with face mesh).

Quality score

quality = (faceDetected ? 40 : 0)
        + min(30, attention × 0.3)    // ← iris-based, max +30
        + min(20, emotion.score × 0.2) // max +20
        + (headPose centered ? 10 : 0)  // yaw < 15° && pitch < 15°
// Max mozne: 40 + 30 + 20 + 10 = 100

Cascade: With broken iris attention (always ~0), up to 30 points are missing. Max achievable score is ~70 instead of 100. Best photo is selected via selectBestPhoto() — compares quality and takes the highest.

In report

Face Photos sekceCarousel with thumbnails — click opens modal with overlay Kazda fotka ukazujeTime, emotion, gaze/attention %, distance, smile User main photoBest photo (highest quality) in report header

🧬 Raw Blendshapes (Raw Blendshape Log)

🧬 Blendshape Log Collected

What it is

Complete log of all 52 blendshapes (muscle values) from MediaPipe. Every 2 seconds a raw snapshot is stored. Used for future detailed analysis and debugging.

52 blendshapes include

eyeBlinkLeft/Right, eyeLookDownLeft/Right, eyeLookInLeft/Right, eyeLookOutLeft/Right, eyeLookUpLeft/Right, eyeSquintLeft/Right, eyeWideLeft/Right, browDownLeft/Right, browInnerUp, browOuterUpLeft/Right, cheekPuff, cheekSquintLeft/Right, jawForward, jawLeft/Right, jawOpen, mouthClose, mouthDimpleLeft/Right, mouthFrownLeft/Right, mouthFunnel, mouthLeft/Right, mouthLowerDownLeft/Right, mouthPressLeft/Right, mouthPucker, mouthRollLower/Upper, mouthShrugLower/Upper, mouthSmileLeft/Right, mouthStretchLeft/Right, mouthUpperUpLeft/Right, noseSneerLeft/Right, _neutral

In report

Not displayed directly — JSON export contains the complete log. Used for JSON export and future AI analysis.

⚖️ Face Symmetry

⚖️ Face Symmetry Collected, not displayed

Calculation

// 4 blendshape pairs
pary = [mouthSmileL/R, eyeBlinkL/R, browDownL/R, cheekSquintL/R]
symScore = average(1 - abs(left - right)) for each pair
faceSym = round(symScore × 100)   // 0–100%, 100 = perfectly symmetric

In report

Currently not displayed. Data is in each snapshot (faceSym). Interesting metric for future report extensions.

🔌 Sensor Status in Report

Zobrazene senzory

📷 CameraActive/Inactive + frame count 🧠 MediaPipeActive/Failed + point count + photo count + "82 pts/s" 🎙 MicrophoneActive/Inactive + comment count 🔊 ReproduktorHlasitost v %

Source

Ze sessionLog eventu session_started (pole sensors) a dopocitano z dat (fallback).

📊 Charts in Report

All charts

Emotions over timeColor bars (1 bar = 1s), red smile curve, question markers, multi-face zones Attention & FatigueGaze (solid green), Iris (dashed green 0.3), Fatigue (red), question markers Distance (cm)Blue curve, Y axis: 0–150 Blink Rate (/min)Yellow curve, Y axis: 0–60 Head movementPitch (purple) + Yaw (pink), Y axis: -45° to +45°, center line Microphone volumeGreen area chart (if data exists and is non-zero)

Chart technical details

All charts are SVG s viewBox="0 0 100 100" a preserveAspectRatio="none". Data is mapped to 0–100% axes. Curves use vector-effect="non-scaling-stroke" for consistent stroke width. Question markers are dashed vertical lines. Multi-face detections are red zones.

Curve functions

buildPath(arr, key, maxVal) {
  arr.map((v, i) => {
    x = (i / (arr.length - 1)) × 100
    y = 100 - ((v[key] ?? v.score ?? 0) / maxVal) × 100
    return (i===0 ? 'M' : 'L') + x + ',' + y
  }).join(' ')
}

🃏 Summary Cards in Report

Video scenario (platform-demo)

Video krokuvideoSteps.length VolbaclosingChoice (pokud existuje) Emocicount of unique emotions KomentaruvoiceComments.length Pozornost (pohled)average gazeAttention (new metric) Iris tracking (exp.)average attention (opacity 0.6) Fatigueaverage fatigue Gestgestures filtered != None (if > 0)

Classic scenario (device-privacy-awareness)

Questionsquestion count Correctcorrect/total Ø Reakceaverage reaction time in seconds + shared cardsEmotions, Comments, Gaze, Iris (exp.), Fatigue, Gestures

Report header

Dynamic text: "During training we captured X biometric snapshots (Y data points), Z photographs, N hand gestures, and M voice comments. All in real-time." Data points count = snapshots × 82.

Session Analysis

🧠 Contextual Analysis Methodology

This chapter documents every derived indicator used in the "Session Analysis" report section. For each metric, the exact formula, data source, scientific basis, and implementation notes for replicability across any scenario are provided.

📋 Analysis Principles Foundation

Philosophy

Analysis is deterministic — no AI generation, no random elements. Every conclusion follows directly from numerical data. Interpretations are formulated as observations, not judgments. Goal: provide a contextual framework for raw data that is replicable and verifiable.

Input data

Complete session JSON export containing: gazeAttention[], fatigue[], emotions[], blinkRate[], distance[], headPose[], gestures[], blendshapeLog[], micVolume[], voiceComments[], videoSteps[], reactionTimes[], presenceChecks[], sessionLog[].

Segmentation

Data is segmented by videoSteps[] (for video scenarios) or answers[] (for classic scenarios). Each segment is defined by time range [startedAt, endedAt] and all biometric data is filtered by timestamp into the respective segment.

📊 Engagement Score

🎯 Engagement Score Derived

Purpose

Summary number 0–100 expressing overall participant engagement during the session. Displayed as a circular indicator in the report.

Formula

score = w1×gazeNorm + w2×faceNorm + w3×emotionNorm + w4×interactionNorm + w5×completionNorm

Components and weights

gazeNorm (w=0.30)avg(gazeAttention.score) / 100 faceNorm (w=0.15)count(faceDetected=true) / total emotionNorm (w=0.20)count(emotion=Focused|Happy) / total interactionNorm (w=0.20)(hasVoiceTips + hasGestures + hasVoiceChoice) / 3 completionNorm (w=0.15)1.0 if session_completed, 0.5 if tab_switch end, 0.0 otherwise

Example from demo session

0.30×0.694 + 0.15×1.0 + 0.20×0.992 + 0.20×1.0 + 0.15×1.0 = 0.208 + 0.15 + 0.198 + 0.20 + 0.15 = 0.906 → zaokrouhleno na 80

Note: in current implementation, the score is set manually based on expert estimation. The formula above is the proposed automation.

Implementation notes

Weights are adjustable per scenario. For classic scenarios, add component correctAnswersNorm. For platforms without voice inputs, adjust interactionNorm to use gestures and clicks.

Report insight #1

🎯 Attention per Segment

🎯 Gaze Attention per Video Step Derived

What is shown in report

Horizontal mini-bars showing average gaze attention for each video step. Color: green (>70), orange (60-70), yellow (<60). Allows identifying which segment engaged the participant most and least.

Formula

segmentAvg = avg(gazeAttention.score WHERE timestamp BETWEEN step.startedAt AND step.endedAt)
For each videoSteps[] element, all records from gazeAttention[] whose timestamp falls within the time range of that step are filtered.

Implementation

Interpretation logic in report

1. Find segment with max(avg) → "Highest attention at [segment name]"
2. Find segment with min(avg) → "Lowest attention at [segment name]"
3. Pokud max - min > 10 → "This suggests the topic of [max segment] was more engaging than [min segment]"
4. Pokud max - min < 5 → "Attention was evenly distributed across all segments"

Mini-bar colors

avg >= 70var(--beam) green — high attention avg 60–69var(--warning) orange — medium avg < 60var(--danger) red — low Tip/otazkovy segmentvar(--info) blue — differentiated from video

Report insight #2

😴 Fatigue Trend per Segment

😴 Fatigue per Video Step + Trend Detection Derived

What is shown in report

Bar mini-chart showing average fatigue per segment. Color: green (<45%), orange (45–54%), yellow (≥55%). Below the chart are segment descriptions. Text: trend identification (rising/falling) and exceptions.

Formula

segmentFatigueAvg = avg(fatigue.score WHERE timestamp BETWEEN step.startedAt AND step.endedAt)
Identicky filtr as u pozornosti, pouze z pole fatigue[].

Interpretation logic in report

1. Calculate firstHalfAvg (average fatigue of first half of segment) and secondHalfAvg
2. Pokud secondHalfAvg - firstHalfAvg > 10 → "Fatigue gradually increased"
3. For each segment: if segAvg[i] < segAvg[i-1] AND segAvg[i] < segAvg[i+1] → "[Segment] re-energized the participant" (local minimum)
4. Najdi min(segAvg) a max(segAvg) and report both with time
5. Global minimum from entire fatigue[] array → calmness peak (Math.min(...fatigue.map(f => f.score)))

Bar chart colors

avg < 45%#10b981 green — low fatigue avg 45–54%#fb923c orange — medium avg >= 55%#f59e0b yellow — high fatigue

Report insight #3

😊 Emotion Map per Segment

😊 Emotion Distribution per Video Step Derived

What is shown in report

Text description identifying WHICH emotions appeared in WHICH segments. Correlation with gestures and other events.

Formula

Interpretation logic in report

1. For each non-Focused emotion, find segments where it appears → "Emotion [X] appeared exclusively during [segment]"
2. Pokud gesto (napr. Thumbs Up) ma timestamp v rozsahu stejneho segmentu → "This precisely correlates with gesture [Y]"
3. Pokud vsechny segmenty = 100% Focused → "Entire session ran in Focused mode — stable concentration without emotional fluctuations"
4. Dominantni emoce = ta s highestm celkovym poctem
5. "Positive emotions" = Happy, Surprised; "Neutral" = Focused, Neutral, Relaxed; "Negative" = None (not yet supported)

Gesture correlations

For each gesture from gestures[] (filtr gesture != 'None') find temporal overlap with segments:
gestureSegment = videoSteps.find(s => gesture.timestamp >= s.startedAt && gesture.timestamp <= s.endedAt)
Pokud segment s gestem = segment s ne-Focused emoci → silna korelace, uvest v reportu.

Report insight #4

🎙 Interakce a reakce (Interaction Analysis)

🎙 Reaction Times & Voice Interaction Quality Derived

What is shown in report

Breakdown of voice answers (tip), their reaction times, and answer consistency analysis. For closing choice, formulation analysis.

Zdrojova data

reactionTimes[]{questionId, time_ms} — time from input display to answer voiceComments[]{timestamp, text, confidence} — rozpoznany hlas closingChoice{detectedKeyword, attempts, detailViewed}

Derived metrics

avgReactionTimeavg(reactionTimes.time_ms) reactionTimeVariancemax(time_ms) - min(time_ms) consistencyScorePokud variance < 1000ms → „konzistentni premysleni"

Interpretation logic in report

1. reactionTimeVariance < 1000 → "Very similar reaction times indicate consistent and active thinking"
2. avgReactionTime < 2000 → "Fast answers — possibly random guessing or certainty"
3. avgReactionTime 3000–7000 → "Active thinking before answer"
4. avgReactionTime > 10000 → „Dlouhe premysleni — slozita otazka nebo nerozhodnost"
5. closingChoice.detectedKeyword != standard words (biometrics/hardware/adaptivity/no) → natural formulation (e.g. "chci do reportu" instead of "ne") → poznamka o porozumeni kontextu
6. closingChoice.attempts > 0 → „[N]x upozorneni pred spravnou detekci"

Report insight #5

📏 Fyzicke chovani (Physical Behavior Summary)

📏 Distance + Head + Blink Combined Interpretation Derived

What is shown in report

Summary of physical metrics: distance (range, drift), head movement (average pitch/yaw), blink rate vs norm. Combined interpretation.

Source data and calculations

Vzdalenost rozsahmin(distance.cm) – max(distance.cm) Distance driftavg(last 10) - avg(first 10) Average pitchavg(|headPose.pitch|) in degrees Average yawavg(|headPose.yaw|) in degrees Finalni blink rateblinkRate[last].rate Blink norma15–20/min (Bentivoglio et al., 1997)

Interpretation logic in report

1. Vzdalenost: drift > 0 → „Mirny trend oddalovani — prirozeny projev postupne relaxace"
drift < -5 → "Approaching screen — possible eye fatigue or trying to see better"
|drift| < 2 → „Stabilni pozice po celou dobu"

2. Head movement: avg_pitch < 3 AND avg_yaw < 5 → „Minimalni pohyb hlavy — vysoka vizualni fixace"
avg_pitch > 5 OR avg_yaw > 8 → „Zvyseny pohyb — mozna nepohodli nebo rozptyleni"

3. Blink rate: rate < 10 → "Below norm — combined with stable head indicates high visual fixation"
rate < 10 AND avg_pitch < 3 → "Combination of low blink rate and stable head = strong concentration signal"
Poznamka: "Low blink rate may partly be a detection artifact"

Report insight #6

📊 Overall Profile & Content Recommendations

📊 Participant Type + Content Effectiveness Score Derived

What is shown in report

Three paragraphs: (1) Participant type, (2) Highlight — most vs least engaging segment, (3) Content recommendations.

Participant type — decision tree

function classifyParticipant(data) { const gazeAvg = avg(data.gazeAttention, 'score'); const fatAvg = avg(data.fatigue, 'score'); const headMov = avgHeadMovement(data.headPose); const gestures = data.gestures.filter(g => g.gesture !== 'None').length; const voices = data.voiceComments.length; const drops = detectDropsAndSpikes(data.gazeAttention); if (gazeAvg > 60 && fatAvg < 60 && headMov < 1.0) return 'Focused, calm, analytical observer'; if (gazeAvg > 60 && (gestures > 0 || voices > 0)) return 'Active, engaged participant'; if (fatAvg > 60) return 'Unaveny, se snizenymi vysledky'; if (drops.drops.length > 5 && drops.recoveryRatio < 0.5) return 'Distracted, with frequent attention lapses'; if (gazeAvg < 50 && gestures === 0 && voices === 0) return 'Passive observer'; return 'Standard participant'; }

Highlights — extreme identification

1. Calculate per-segment: gazeAvg, emotionVariety (count of unique emotions != Focused), gestureCount, fatigueAvg
2. Segment s max(gazeAvg) AND max(emotionVariety) AND max(gestureCount) → „nejzajimavejsi"
3. Segment s min(gazeAvg) AND max(fatigueAvg) → „nejmene poutavy"
4. Textova sablona: "[Max segment name] triggered the strongest reaction — highest attention, [specifics]. [Min segment name] was conversely the least engaging."

Content recommendations — content effectiveness

segmentScore = gazeAvg × (1 - fatigueAvg / 100)

Logika:
• Segment s min(segmentScore) → „Zvazit zkraceni nebo oziveni [segmentu]"
• Segment with max(segmentScore) → "[Segment] serves as the main scenario hook"
• Pokud max(segmentScore) / min(segmentScore) > 1.5 → „Vyrazny rozdil v efektivite mezi segmenty"
• Pokud max(segmentScore) / min(segmentScore) < 1.2 → „Segmenty jsou vyrovnane"

Template table for text output

Typ„Typ: [classifyParticipant result]. [Popis]" Zajimavost„[max segment] = nejzajimavejsi. [min segment] = nejmene poutavy." Recommendation"Consider [action] for [min segment]. [max segment] serves as hook."

Note

All text outputs in the report are generated from these templates and numerical thresholds. This is not free text — every sentence has a clear data basis. If data does not reach the threshold, the corresponding sentence is not shown in the report.

👁 Cognitive Load from Blinking (Blink Suppression)

🧠 Blink Suppression Index Derived

Purpose

Detect cognitive load levels based on spontaneous blink suppression. Lower blink rate than normal = higher visual engagement.

Scientific basis

Magliacano et al. (2020) — „Eye blink rate increases as a function of cognitive load during an auditory oddball paradigm." Neuroscience Letters, Vol. 736, doi:10.1016/j.neulet.2020.135293. Finding: EBR increases during active non-visual tasks vs. rest.

Holland & Tarlow (1972) — „Blinking and mental load." Psychological Reports, 31, 119–127. Finding: blink rate and cognitive load are inversely proportional during visual tasks.

Frontiers in Human Neuroscience (2017) — „What Does Eye-Blink Rate Variability Dynamics Tell Us About Cognitive Performance?" doi:10.3389/fnhum.2017.00620. Finding: BRV dynamics predict cognitive performance.

Nakano et al. (2019) — „Rapid serial blinks: An index of temporally increased cognitive load." PLOS ONE, doi:10.1371/journal.pone.0225897. Finding: rapid serial blinks (RSB) indicate local cognitive load increase.

Norma: spontaneous EBR in adults = ~15–20 blinku/min (Bentivoglio et al., 1997; Ponder & Kennedy, 1927).

Metric 1: Suppression ratio

suppression = 1 - (measured_blink_rate / baseline_norm)
Where baseline_norm = 17.5 (midpoint of 15–20 range).
Example: 1 - (5 / 17.5) = 0.71 → 71% suppression = high cognitive load.

Metric 2: Blink clustering per segment

For each video segment, count blink events (= momenty kdy blinkRate[i].rate > blinkRate[i-1].rate).
segment_blink_events = count(rate_increases) within [stepStart, stepEnd]
Segments with 0 blink events = maximum visual fixation.

Implementation

💪 Facial Muscle Analysis (FACS — Facial Action Coding System)

🔬 FACS Blendshape Aggregation Derived

Purpose

Extract patterns from 52 ARKit blendshapes matching known facial expressions per FACS (Ekman & Friesen, 1978). ARKit blendshapes map to FACS Action Units (AU).

Scientific basis

Ekman & Friesen (1978) — Facial Action Coding System (FACS). Standard system for describing facial muscle movements.

ARKit → FACS mapovani (Ozel, 2022, facethefacs.com):
browDownLeft/Right → AU4 (Brow Lowerer) — corrugator supercilii
eyeSquintLeft/Right → AU7 (Lid Tightener) — orbicularis oculi, palpebral
cheekSquintLeft/Right → AU6 (Cheek Raiser) — orbicularis oculi, orbital
mouthSmileLeft/Right → AU12 (Lip Corner Puller) — zygomaticus major
mouthFrownLeft/Right → AU15 (Lip Corner Depressor) — depressor anguli oris
mouthPressLeft/Right → AU24 (Lip Pressor) — orbicularis oris
jawOpen → AU26/27 (Jaw Drop) — masseter; internal pterygoid

Key patterns

ConcentrationAU4 (browDown > 0.3) + AU7 (eyeSquint > 0.2) HappinessAU6 (cheekSquint > 0.2) + AU12 (mouthSmile > 0.3) DeliberationAU24 (mouthPress > 0.1) + AU4 (browDown > 0.3) SurpriseAU5 (eyeWide > 0.3) + AU26 (jawOpen > 0.2) DispleasureAU15 (mouthFrown > 0.2) + AU4 (browDown > 0.4)

Calculation per segment

Notes

Blendshape values are 0.0–1.0. Averaging across a segment gives "basal muscle tone". Maximum shows peak intensity. For identifying brief smiles, it suffices that max(mouthSmile) > 0.5 even if avg < 0.05.

📈 Attention Variability (Gaze Attention Variability)

📊 Gaze Std per Segment Derived

Purpose

Average attention shows "how much", but standard deviation shows "how dynamically" — higher std = participant actively shifts attention = deeper cognitive processing. Low std = monotonous watching.

Formula

std = sqrt( Σ(xi - mean)² / n )
Where xi = individual gaze scores in segment, mean = segment average, n = sample count.

Interpretation

std > 8Active processing — dynamic attention shifts std 5–8Normal watching std < 5Fixed gaze — either high concentration or monotony

Implementation

function gazeStd(gazeAttention, startTs, endTs) { const scores = gazeAttention .filter(g => g.timestamp >= startTs && g.timestamp <= endTs) .map(g => g.score); if (scores.length < 2) return 0; const mean = scores.reduce((a,v) => a+v, 0) / scores.length; const variance = scores.reduce((a,v) => a + (v - mean) ** 2, 0) / scores.length; return Math.sqrt(variance); }

Note

Distinguish low std from concentration (= high avg + low std) vs. low std from disinterest (= low avg + low std). Always report std together with the average.

🔄 Head Movement Analysis (Head Movement Events)

🔄 Head Movement per Segment + Spike Detection Derived

Metric 1: Average movement per segment

avg_movement = Σ(|pitch[i]-pitch[i-1]| + |yaw[i]-yaw[i-1]|) / (2 × (n-1))
Measures frame-to-frame "jitter" — how much the head moves on average each frame.

Metric 2: Spike detection

spike = |value[i] - value[i-1]| > threshold
Prah pro yaw: 10°, prah pro pitch: 5°. Spikes indicate sudden head turns — reaction to external stimulus, position change, or surprise.

Interpretation

avg_movement < 0.3°Very stable — fixed gaze on content avg_movement 0.3–1.0°Normal — slight movements during processing avg_movement > 1.0°Increased activity — physical reaction to content or distraction Yaw spike > 10°Sudden turn — external stimulus or position change

Implementation

📊 Correlation: fatigue ↔ attention

📉 Pearson Correlation: Fatigue vs Gaze Derived

Purpose

Verify whether fatigue and attention measure a consistent phenomenon. Expected negative correlation (higher fatigue → lower attention) validates both metrics.

Vzorec (Pearson r)

r = Σ((xi - x̄)(yi - ȳ)) / sqrt(Σ(xi - x̄)² × Σ(yi - ȳ)²)
Where xi = gaze score, yi = fatigue score, paired by nearest timestamp (tolerance 2s).

Interpretation

r = -0.7 az -1.0Strong negative — fatigue dominantly affects attention r = -0.3 az -0.7Medium negative — fatigue is one of the factors r = 0 az -0.3Weak — attention is driven primarily by content, not fatigue r > 0Unexpected — possible data error or atypical participant

Implementation

function pearsonCorrelation(gazeAttention, fatigue) { const pairs = []; gazeAttention.forEach(g => { const closest = fatigue.reduce((best, f) => Math.abs(f.timestamp - g.timestamp) < Math.abs(best.timestamp - g.timestamp) ? f : best); if (Math.abs(closest.timestamp - g.timestamp) < 2000) pairs.push([g.score, closest.score]); }); const n = pairs.length; if (n < 3) return null; const mG = pairs.reduce((a,[g]) => a+g, 0)/n; const mF = pairs.reduce((a,[,f]) => a+f, 0)/n; const cov = pairs.reduce((a,[g,f]) => a + (g-mG)*(f-mF), 0)/n; const sG = Math.sqrt(pairs.reduce((a,[g]) => a + (g-mG)**2, 0)/n); const sF = Math.sqrt(pairs.reduce((a,[,f]) => a + (f-mF)**2, 0)/n); return (sG > 0 && sF > 0) ? cov / (sG * sF) : 0; }

⚡ Distraction and Recovery Detection (Attention Drops & Spikes)

⚡ Gaze Attention Drops & Recovery Derived

Purpose

Identify precise moments where the participant lost attention (drop) and where they returned (spike). Enables correlation with external events (segment transition, sound, movement).

Formula

drop = score[i-1] - score[i] > threshold
spike = score[i] - score[i-1] > threshold
Default threshold: 15 points. Adjustable per scenario.

Derived metrics

drop_countTotal drops > threshold spike_countTotal spikes > threshold max_dropLargest single drop recovery_ratiospike_count / drop_count — closer to 1.0 = good self-regulation avg_recovery_timeAverage time (s) between drop and subsequent spike

Implementation

📐 Ergonomics and Distance (Distance Ergonomics)

📐 Distance Drift & Ergonomic Zone Derived

Purpose

Evaluate whether the participant sits in the ergonomic zone and how their distance changes over time (drift = relaxation vs. leaning forward = trying to see better).

Scientific basis

CCOHS (Canadian Centre for Occupational Health and Safety) — Resting Point of Accommodation (RPA) ≈ 80 cm. At this distance, eye muscles require no effort to focus.

OSHA eTools Computer Workstations — recommended range 50–100 cm (20–40 inches) from eyes to monitor.

Ergonomic drift interpretation: gradual distancing = relaxation, approaching = trying to see better (possible eye fatigue or small text).

Metrics

avg_distanceavg(distance.cm) in_ergonomic_zone50 <= avg <= 100 → boolean driftavg(last 10 samples) - avg(first 10 samples) drift_directiondrift > 0 = distancing (relaxation), drift < 0 = approaching near_rpa|avg - 80| < 10 → in resting focus zone

🎤 Voice Activity and Microphone Analysis

🎤 Voice Activity & Ambient Analysis Derived

Purpose

Distinguish active speech from ambient noise. Identify participant voice profile.

Metriky z micVolume[]

speaking_ratiocount(isSpeaking=true) / total ambient_volumeavg(rms) where isSpeaking=false speech_volumeavg(rms) where isSpeaking=true snr (signal-to-noise)speech_volume / ambient_volume dominant_freq_speechavg(dominantFreq) where isSpeaking=true

Frequency interpretation

85–180 HzTypical male voice (fundamental) 165–255 HzTypical female voice (fundamental) 500–1000 HzHarmonics — speech harmonic component detection 0 HzNo dominant frequency — silence or noise

Calibration note

In demo session, the speaking detector was inconsistent (detected 2 out of 3 actual voice inputs). Recommendation: compare voiceComments[] (speech-to-text results) with micVolume[].isSpeaking and calibrate the speaking detector threshold.

👤 Participant Profile Generation

👤 Behavioral Profile Derived

Purpose

Automatically generated text profile of the participant based on combination of all metrics. Intended for trainers and training managers.

Decision logic

Type: Focused, calmgazeAvg > 60 AND fatigueAvg < 60 AND headMovement < 1.0° Type: Active, engagedgazeAvg > 60 AND gestureCount > 0 AND voiceComments > 0 Type: Fatigued, decliningfatigueAvg > 60 AND gazeTrend < 0 (declining trend) Type: DistracteddropCount > 5 AND recoveryRatio < 0.5 Type: PassivegazeAvg < 50 AND gestureCount = 0 AND voiceComments = 0

Content recommendations (per segment)

For each video segment, calculate segmentScore = gazeAvg × (1 - fatigueAvg/100). Segments with lowest score → candidates for shortening or reworking. Segments with highest score → confirm as effective.

Segment score example

Pilir 1: Biometrika76.2 × (1 - 39.8/100) = 45.9 ← highest Uvod74.2 × (1 - 34.9/100) = 48.3 ← also high Pilir 2: Hardware60.9 × (1 - 55.0/100) = 27.4 ← lowest Pilir 3: Adaptivita68.3 × (1 - 48.6/100) = 35.1 ← medium