The two-axis emotional model underlying this research. Hover any dot to see the piece and its spectral profile.
Click any dot to open the YouTube clip
Valence: R² = 0.945 | Arousal: R² = 0.944
Random Forest trained on 68 audio and biosignal features (EDA, PPG, ECG, respiration + 24 spectral statistics). All top-5 predictive features across both targets were spectral, not harmonic, confirming the spectralist hypothesis at the level of machine learning.
The two-dimensional model of affect formalized by Russell (1980) provides the theoretical backbone of this research. Each piece of music is located in the plane defined by these two orthogonal axes.
Valence is the hedonic quality of an emotional experience, the degree to which a stimulus is perceived as pleasant or unpleasant. It is the primary axis of the circumplex model, spanning from highly negative affect (aversive, dysphoric) at one extreme to highly positive affect (appetitive, euphoric) at the other.
In the Affective Circumplex (Russell, 1980; Barrett & Russell, 1999), valence is operationally independent of arousal: a piece can be simultaneously high-arousal and negative-valence (tense, anxious) or high-arousal and positive-valence (excited, joyful).
In this study: Participants rated valence on a continuous 0–100 slider labeled "unpleasant" to "pleasant" following each 60-second excerpt. The scale was anchored at 0 (maximally aversive), 50 (emotionally neutral), and 100 (maximally pleasant).
Arousal is the activation or energy dimension of emotion, capturing how wakeful, energized, or activated an experience feels regardless of whether it is pleasant. High-arousal states include excitement, anxiety, and agitation; low-arousal states include calm, sadness, and serenity.
Arousal is physiologically grounded: it correlates directly with sympathetic nervous system activity, including elevated heart rate, reduced heart rate variability, increased electrodermal response (skin conductance), and faster respiration. This makes arousal the more physiologically accessible of the two dimensions, and explains why biosignals improved model performance substantially over audio features alone.
In this study: Participants rated arousal on a continuous 0–100 slider labeled "calm" to "activated" following each 60-second excerpt.
Russell proposed that all emotional experiences, including those evoked by music, can be located in a two-dimensional continuous space defined by valence and arousal. Categorical emotion labels (happy, sad, tense, calm) are regions within this space rather than discrete categories. This model is now the dominant framework in music emotion research (Eerola & Vuoskoski, 2011; Thayer, 1986) because it accommodates the gradual, ambiguous, and multi-dimensional character of musical affect, particularly in post-tonal and spectralist repertoire where harmonic function is suspended.
The four quadrant labels used throughout this site reflect the dominant emotional character of each region: Excited / Joyful (+V, +A), Tense / Anxious (−V, +A), Calm / Serene (+V, −A), and Sad / Gloomy (−V, −A). Contemporary classical music often occupies ambiguous inter-quadrant positions, which is a key motivation for a continuous rather than categorical measurement model.
References: Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Barrett, L. F., & Russell, J. A. (1999). The structure of current affect. Current Directions in Psychological Science, 8(1), 10–14.
All model training, evaluation, and the circumplex coordinates above derive from a controlled listening study with simultaneous biosignal capture.
Participants listened to each 60-second excerpt while wearing a Bitalino biosignal device capturing four simultaneous physiological streams: electrodermal activity (EDA), photoplethysmography (PPG), electrocardiography (ECG), and 3-axis accelerometer. Signals were sampled at 1000 Hz and downsampled to 10 Hz for windowed feature extraction.
Following each clip, participants rated both valence and arousal on continuous 0–100 sliders without time pressure. Windows were extracted by segmenting each clip into overlapping 5-second frames with a 0.1-second stride. Each window received the participant's post-clip ratings as its target labels, aligning self-report with the contemporaneous physiological signal captured during listening.
| Feature group | Statistics | Count | Type |
|---|---|---|---|
| RMS Energy | mean, std, min, max | 4 | Audio |
| Spectral Centroid (Hz) | mean, std, min, max | 4 | Audio |
| Spectral Rolloff (Hz) | mean, std, min, max | 4 | Audio |
| Spectral Bandwidth (Hz) | mean, std, min, max | 4 | Audio |
| Spectral Flatness | mean, std, min, max | 4 | Audio |
| Zero-Crossing Rate | mean, std, min, max | 4 | Audio |
| EDA Tonic (skin conductance level) | mean, std, min, max | 4 | Biosignal |
| EDA Phasic (galvanic skin response amplitude) | mean, std, min, max | 4 | Biosignal |
| EDA SCR Rate (responses per minute) | mean, std, min, max | 4 | Biosignal |
| Respiration Rate (breaths/min) | mean, std, min, max | 4 | Biosignal |
| Respiration Amplitude | mean, std, min, max | 4 | Biosignal |
| Respiration Depth | mean, std, min, max | 4 | Biosignal |
| ECG Heart Rate (bpm) | mean, std, min, max | 4 | Biosignal |
| ECG HR Trend | mean, std, min, max | 4 | Biosignal |
| ECG RMSSD (HRV index) | mean, std, min, max | 4 | Biosignal |
| PPG Heart Rate (bpm) | mean, std, min, max | 4 | Biosignal |
| PPG HRV (pulse-derived variability) | mean, std, min, max | 4 | Biosignal |
Algorithm: Random Forest Regressor (100 trees). Features were median-imputed for missing biosignal windows and z-score standardized before training. Separate models were trained for valence and arousal.
Performance: In-sample R² on 2,773 windows: valence = 0.945, arousal = 0.944. The circumplex scatter plot above compares participant-reported ratings (filled dots) against Random Forest predictions (outline dots) for the 10 contemporary pieces, with dashed connecting lines showing per-piece agreement. Shorter lines indicate tighter model-to-human alignment.
Key finding: When feature importances were ranked by mean decrease in impurity, the top-5 predictors for both valence and arousal were exclusively spectral audio features (spectral centroid, rolloff, bandwidth, flatness, and RMS energy), not harmonic or melodic features. This computationally corroborates the spectralist hypothesis articulated by Grisey, Murail, and Saariaho: that timbre is the primary carrier of emotional information in contemporary music, and that harmony, the organizing principle of tonal music, is insufficient to account for emotional response in post-tonal repertoire.