Highest gaze attention was measured at Pillar 1: Biometrics (76.2 %) and Introduction (74.2 %). The lowest was at Pillar 2: Hardware (60.9 %). This suggests the biometrics topic engaged the participant significantly more than the technical hardware description.
Fatigue gradually increased from 35 % at the start to 55 % at Pillar 2 Hardware — the longest video segment after the introduction. Interestingly, at Pillar 3: Adaptivity, fatigue slightly dropped to 48.6%, suggesting this topic re-energized and engaged the participant. The session minimum (15 %) appeared as a brief fluctuation around the 182nd second.
Emotions Happy (4 frames) and Neutral (2 frames) appeared exclusively during Pillar 1: Biometrics (41–69s). This precisely correlates with the Thumbs Up gesture captured at 58–62s. This segment triggered the strongest positive reaction of the entire session.
The rest of the scenario ran in Focused mode — stable concentration without significant emotional fluctuations. This is consistent with the information-dense content of the remaining pillars.
The participant answered both tip questions by voice:
Tip 1 (data points/s): answer "22", reaction time 4 977 ms
Tip 2 (device types): answer "50", reaction time 4 755 ms
Very similar reaction times (~5s) indicate consistent and active thinking before answering, not random guessing. For the final choice, the participant said "chci do reportu" — a natural formulation instead of just "no", indicating understanding of the scenario context.
Screen distance ranged from 81–92 cm with an average of 84 cm. A slight distancing trend (from 83 cm at start to 92 cm at end) is a natural manifestation of gradual relaxation during a longer session.
Head movement was minimal — averaging ±3° pitch and ±4° yaw. Combined with low blink rate (5/min, norm 15–20/min), this indicates high visual fixation on the screen. Lower blinking may partly be a detection artifact, but also corresponds to a state of concentration.
Type: Focused, calm, analytical observer. Watched content without distraction, responded to interactive elements thoughtfully, showed no signs of impatience or loss of interest.
Highlight: Biometrics (Pillar 1) was clearly the most interesting topic — highest attention, only occurrence of positive emotions, approval gesture. Hardware (Pillar 2) was conversely the least engaging segment with highest fatigue and lowest attention.
Content recommendation: Consider shortening or revitalizing the hardware segment. The biometrics section serves as the main "hook" of the scenario.
Measured blink rate 5/min is significantly below the physiological norm of ~20 blinks/min. Research (Magliacano et al., 2020; Frontiers in Human Neuroscience, 2017) shows that lower blink frequency during visual tasks correlates with higher cognitive load — the brain suppresses blinking to minimize loss of visual input.
Of 8 detected blinks, 5 occurred during Pillar 1 (53–66s), then silence until Pillar 3 (182s, 186s) and conclusion (222s). This clustering corresponds to the "attentional blink" phenomenon — blinks naturally concentrate into moments of lower cognitive load or transitions between segments.
Ref: Magliacano et al. (2020), Neuroscience Letters; Frontiers in Human Neuroscience (2017), doi:10.3389/fnhum.2017.00620
Of 52 ARKit blendshapes, consistently elevated values show:
browDown L/R (AU4): avg 0.54 / 0.58 — lowered eyebrows, typical concentration marker
eyeSquint L/R (AU7): avg 0.32 / 0.33 — narrowed eyes, focused gaze
mouthPress L/R (AU24): avg 0.12 / 0.11 — pressed lips, tension during thinking
mouthSmile L/R (AU12): avg 0.016, max 0.891 — one brief but intense smile
The AU4 + AU7 combination is a classic FACS (Facial Action Coding System) pattern for "concentrated examination". High max mouthSmile (0.89) confirms a moment of joy during Pillar 1.
Standard deviation of gaze attention reveals engagement quality — higher std = more dynamic attention shifting = more active content processing:
Pillar 1 Biometrika: std = 8.2 (rozsah 59–91) — most dynamic, active processing
Answer 1: std = 11.0 (range 51–85) — highest variability, processing new information
Pillar 2 Hardware: std = 5.2 (rozsah 51–84) — lowest variability, monotonous watching
Tip 1 (question): std = 5.0 — stable focus during answer deliberation
Low variability at Pillar 2 supports the conclusion that the hardware topic did not trigger active cognitive processing unlike biometrics.
For most of the session, the head moved within ±1–2°. However, two notable deviations reveal interesting moments:
t = 92.9s: yaw spike to +22° — sudden head turn to the right. Occurred at the transition from Pillar 1 to Pillar 2. Possibly a reaction to room sound or physical position adjustment.
t = 154.9–155.9s: yaw +8° — shorter turn at the end of Pillar 2 reveal, again at segment transition.
Average movement per segment: Introduction 0.2° (nearly motionless) vs Answer 1: 2.17° (10× more) — processing the answer triggered a physical reaction.
Pearson correlation between fatigue and gaze attention: r = −0.448 (n = 245). This is a medium negative correlation — as fatigue increases, attention decreases, which is an expected physiological pattern.
This means both indicators measure a consistent phenomenon and mutually validate each other. At the same time, the correlation is not too high (−0.45 vs −1.0), showing that fatigue is not the only factor affecting attention — content attractiveness also plays a role (see differences between pillars).
The algorithm identified 9 attention drops > 15 points and 10 recovery spikes within the session. Most interesting patterns:
t = 92.9s: 32-point drop (largest) — correlates with sudden head turn (yaw +22°)
t = 126.9s a 136.9s: 23-point drop — both in Pillar 2 Hardware, confirming lower engagement
t = 218.9s: 22-point drop → immediate +27-point recovery at t = 219s — momentary distraction with rapid self-regulation
Rapid attention recoveries after drops indicate high self-regulation ability — the participant always quickly returned to the content.
Average distance 84 cm is almost exactly at the Resting Point of Accommodation (~80 cm per CCOHS), which is the distance at which eye muscles require no effort to focus. OSHA ergonomic recommendation states an ideal range of 50–100 cm.
Session drift: +1.6 cm (from 84 to 85.6 cm) — minimal distancing is a natural manifestation of relaxation, not loss of interest. At session end (from 236s), distance increased to 87–92 cm, correlating with the final video phase and transition to report.
Ref: CCOHS Monitor Positioning Guidelines; OSHA eTools Computer Workstations
Of 246 microphone samples: speaking detected only 2× (t = 79–80s, during answer "22"), but non-zero volume 37×. This indicates a quiet environment with occasional ambient noise.
Dominant frequency during speech: 750 Hz — corresponds to typical male voice range (85–800 Hz fundamental + harmonics). RMS during speech (124) vs ambient (5–6) shows clear signal-to-noise separation.
Interesting: 3 voice comments were recognized by speech-to-text (confidence 93%), but the speaking detector captured only the first — this indicates room for speaking detection threshold calibration.