← All Modules
Human reported
Model predicted
Color = designed quadrant:
Tense / Anxious
Excited / Joyful
Calm / Serene
Sad / Gloomy
Transitional

Click any dot to open the YouTube clip

+V -V +A -A Excited / Joyful Tense / Anxious Calm / Serene Sad / Gloomy

Spectral Correlates by Quadrant

Tense / Anxious
High zero-crossing rate
High spectral flatness
Irregular amplitude envelope
Dense upper partials
Excited / Joyful
High RMS energy
High spectral rolloff
Bright spectral centroid
Fast attack transients
Sad / Gloomy
Low RMS energy
Low spectral centroid
Slow amplitude decay
Narrow spectral spread
Calm / Serene
Low zero-crossing rate
Smooth rolloff curve
Even harmonic partials
Stable amplitude envelope

Model Performance

Valence: R² = 0.945  |  Arousal: R² = 0.944

Random Forest trained on 68 audio and biosignal features (EDA, PPG, ECG, respiration + 24 spectral statistics). All top-5 predictive features across both targets were spectral, not harmonic, confirming the spectralist hypothesis at the level of machine learning.

Defining Valence and Arousal

The two-dimensional model of affect formalized by Russell (1980) provides the theoretical backbone of this research. Each piece of music is located in the plane defined by these two orthogonal axes.

Valence

Valence is the hedonic quality of an emotional experience, the degree to which a stimulus is perceived as pleasant or unpleasant. It is the primary axis of the circumplex model, spanning from highly negative affect (aversive, dysphoric) at one extreme to highly positive affect (appetitive, euphoric) at the other.

In the Affective Circumplex (Russell, 1980; Barrett & Russell, 1999), valence is operationally independent of arousal: a piece can be simultaneously high-arousal and negative-valence (tense, anxious) or high-arousal and positive-valence (excited, joyful).

In this study: Participants rated valence on a continuous 0–100 slider labeled "unpleasant" to "pleasant" following each 60-second excerpt. The scale was anchored at 0 (maximally aversive), 50 (emotionally neutral), and 100 (maximally pleasant).

V ∈ [0, 100] → normalized to [−1, +1] by (V/50) − 1

Arousal

Arousal is the activation or energy dimension of emotion, capturing how wakeful, energized, or activated an experience feels regardless of whether it is pleasant. High-arousal states include excitement, anxiety, and agitation; low-arousal states include calm, sadness, and serenity.

Arousal is physiologically grounded: it correlates directly with sympathetic nervous system activity, including elevated heart rate, reduced heart rate variability, increased electrodermal response (skin conductance), and faster respiration. This makes arousal the more physiologically accessible of the two dimensions, and explains why biosignals improved model performance substantially over audio features alone.

In this study: Participants rated arousal on a continuous 0–100 slider labeled "calm" to "activated" following each 60-second excerpt.

A ∈ [0, 100] → normalized to [−1, +1] by (A/50) − 1

Russell's Circumplex Model of Affect (1980)

Russell proposed that all emotional experiences, including those evoked by music, can be located in a two-dimensional continuous space defined by valence and arousal. Categorical emotion labels (happy, sad, tense, calm) are regions within this space rather than discrete categories. This model is now the dominant framework in music emotion research (Eerola & Vuoskoski, 2011; Thayer, 1986) because it accommodates the gradual, ambiguous, and multi-dimensional character of musical affect, particularly in post-tonal and spectralist repertoire where harmonic function is suspended.

The four quadrant labels used throughout this site reflect the dominant emotional character of each region: Excited / Joyful (+V, +A), Tense / Anxious (−V, +A), Calm / Serene (+V, −A), and Sad / Gloomy (−V, −A). Contemporary classical music often occupies ambiguous inter-quadrant positions, which is a key motivation for a continuous rather than categorical measurement model.

References: Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Barrett, L. F., & Russell, J. A. (1999). The structure of current affect. Current Directions in Psychological Science, 8(1), 10–14.

Dataset and Methods

All model training, evaluation, and the circumplex coordinates above derive from a controlled listening study with simultaneous biosignal capture.

2,773
Labeled windows (5 s each)
100+
Music clips analyzed
68
Features per window
10 Hz
Feature extraction rate
0.945
Valence R²
0.944
Arousal R²

Study Protocol

Participants listened to each 60-second excerpt while wearing a Bitalino biosignal device capturing four simultaneous physiological streams: electrodermal activity (EDA), photoplethysmography (PPG), electrocardiography (ECG), and 3-axis accelerometer. Signals were sampled at 1000 Hz and downsampled to 10 Hz for windowed feature extraction.

Following each clip, participants rated both valence and arousal on continuous 0–100 sliders without time pressure. Windows were extracted by segmenting each clip into overlapping 5-second frames with a 0.1-second stride. Each window received the participant's post-clip ratings as its target labels, aligning self-report with the contemporaneous physiological signal captured during listening.

Feature Schema — 68 Features per Window

Feature groupStatisticsCountType
RMS Energymean, std, min, max4Audio
Spectral Centroid (Hz)mean, std, min, max4Audio
Spectral Rolloff (Hz)mean, std, min, max4Audio
Spectral Bandwidth (Hz)mean, std, min, max4Audio
Spectral Flatnessmean, std, min, max4Audio
Zero-Crossing Ratemean, std, min, max4Audio
EDA Tonic (skin conductance level)mean, std, min, max4Biosignal
EDA Phasic (galvanic skin response amplitude)mean, std, min, max4Biosignal
EDA SCR Rate (responses per minute)mean, std, min, max4Biosignal
Respiration Rate (breaths/min)mean, std, min, max4Biosignal
Respiration Amplitudemean, std, min, max4Biosignal
Respiration Depthmean, std, min, max4Biosignal
ECG Heart Rate (bpm)mean, std, min, max4Biosignal
ECG HR Trendmean, std, min, max4Biosignal
ECG RMSSD (HRV index)mean, std, min, max4Biosignal
PPG Heart Rate (bpm)mean, std, min, max4Biosignal
PPG HRV (pulse-derived variability)mean, std, min, max4Biosignal

Model and Key Finding

Algorithm: Random Forest Regressor (100 trees). Features were median-imputed for missing biosignal windows and z-score standardized before training. Separate models were trained for valence and arousal.

Performance: In-sample R² on 2,773 windows: valence = 0.945, arousal = 0.944. The circumplex scatter plot above compares participant-reported ratings (filled dots) against Random Forest predictions (outline dots) for the 10 contemporary pieces, with dashed connecting lines showing per-piece agreement. Shorter lines indicate tighter model-to-human alignment.

Key finding: When feature importances were ranked by mean decrease in impurity, the top-5 predictors for both valence and arousal were exclusively spectral audio features (spectral centroid, rolloff, bandwidth, flatness, and RMS energy), not harmonic or melodic features. This computationally corroborates the spectralist hypothesis articulated by Grisey, Murail, and Saariaho: that timbre is the primary carrier of emotional information in contemporary music, and that harmony, the organizing principle of tonal music, is insufficient to account for emotional response in post-tonal repertoire.