Valence / Arousal — Neuroaesthetic Music

Human reported

Model predicted

Color = designed quadrant:

Tense / Anxious

Excited / Joyful

Calm / Serene

Sad / Gloomy

Transitional

Click any dot to open the YouTube clip

Spectral Correlates by Quadrant

Tense / Anxious

High zero-crossing rate
High spectral flatness
Irregular amplitude envelope
Dense upper partials

Excited / Joyful

High RMS energy
High spectral rolloff
Bright spectral centroid
Fast attack transients

Sad / Gloomy

Low RMS energy
Low spectral centroid
Slow amplitude decay
Narrow spectral spread

Calm / Serene

Low zero-crossing rate
Smooth rolloff curve
Even harmonic partials
Stable amplitude envelope

Model Performance

Valence: R² = 0.945 | Arousal: R² = 0.944

Random Forest trained on 68 audio and biosignal features (EDA, PPG, ECG, respiration + 24 spectral statistics). All top-5 predictive features across both targets were spectral, not harmonic, confirming the spectralist hypothesis at the level of machine learning.

Defining Valence and Arousal

The two-dimensional model of affect formalized by Russell (1980) provides the theoretical backbone of this research. Each piece of music is located in the plane defined by these two orthogonal axes.

Valence

Valence is the hedonic quality of an emotional experience, the degree to which a stimulus is perceived as pleasant or unpleasant. It is the primary axis of the circumplex model, spanning from highly negative affect (aversive, dysphoric) at one extreme to highly positive affect (appetitive, euphoric) at the other.

In the Affective Circumplex (Russell, 1980; Barrett & Russell, 1999), valence is operationally independent of arousal: a piece can be simultaneously high-arousal and negative-valence (tense, anxious) or high-arousal and positive-valence (excited, joyful).

In this study: Participants rated valence on a continuous 0–100 slider labeled "unpleasant" to "pleasant" following each 60-second excerpt. The scale was anchored at 0 (maximally aversive), 50 (emotionally neutral), and 100 (maximally pleasant).

V ∈ [0, 100] → normalized to [−1, +1] by (V/50) − 1

Arousal

Arousal is the activation or energy dimension of emotion, capturing how wakeful, energized, or activated an experience feels regardless of whether it is pleasant. High-arousal states include excitement, anxiety, and agitation; low-arousal states include calm, sadness, and serenity.

Arousal is physiologically grounded: it correlates directly with sympathetic nervous system activity, including elevated heart rate, reduced heart rate variability, increased electrodermal response (skin conductance), and faster respiration. This makes arousal the more physiologically accessible of the two dimensions, and explains why biosignals improved model performance substantially over audio features alone.

In this study: Participants rated arousal on a continuous 0–100 slider labeled "calm" to "activated" following each 60-second excerpt.

A ∈ [0, 100] → normalized to [−1, +1] by (A/50) − 1

Russell's Circumplex Model of Affect (1980)

Russell proposed that all emotional experiences, including those evoked by music, can be located in a two-dimensional continuous space defined by valence and arousal. Categorical emotion labels (happy, sad, tense, calm) are regions within this space rather than discrete categories. This model is now the dominant framework in music emotion research (Eerola & Vuoskoski, 2011; Thayer, 1986) because it accommodates the gradual, ambiguous, and multi-dimensional character of musical affect, particularly in post-tonal and spectralist repertoire where harmonic function is suspended.

The four quadrant labels used throughout this site reflect the dominant emotional character of each region: Excited / Joyful (+V, +A), Tense / Anxious (−V, +A), Calm / Serene (+V, −A), and Sad / Gloomy (−V, −A). Contemporary classical music often occupies ambiguous inter-quadrant positions, which is a key motivation for a continuous rather than categorical measurement model.

References: Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Barrett, L. F., & Russell, J. A. (1999). The structure of current affect. Current Directions in Psychological Science, 8(1), 10–14.

Dataset and Methods

All model training, evaluation, and the circumplex coordinates above derive from a controlled listening study with simultaneous biosignal capture.

2,773

Labeled windows (5 s each)

100+

Music clips analyzed

Features per window

10 Hz

Feature extraction rate

0.945

Valence R²

0.944

Arousal R²

Study Protocol

Participants listened to each 60-second excerpt while wearing a Bitalino biosignal device capturing four simultaneous physiological streams: electrodermal activity (EDA), photoplethysmography (PPG), electrocardiography (ECG), and 3-axis accelerometer. Signals were sampled at 1000 Hz and downsampled to 10 Hz for windowed feature extraction.

Following each clip, participants rated both valence and arousal on continuous 0–100 sliders without time pressure. Windows were extracted by segmenting each clip into overlapping 5-second frames with a 0.1-second stride. Each window received the participant's post-clip ratings as its target labels, aligning self-report with the contemporaneous physiological signal captured during listening.

Feature Schema — 68 Features per Window

Feature group	Statistics	Count	Type
RMS Energy	mean, std, min, max	4	Audio
Spectral Centroid (Hz)	mean, std, min, max	4	Audio
Spectral Rolloff (Hz)	mean, std, min, max	4	Audio
Spectral Bandwidth (Hz)	mean, std, min, max	4	Audio
Spectral Flatness	mean, std, min, max	4	Audio
Zero-Crossing Rate	mean, std, min, max	4	Audio
EDA Tonic (skin conductance level)	mean, std, min, max	4	Biosignal
EDA Phasic (galvanic skin response amplitude)	mean, std, min, max	4	Biosignal
EDA SCR Rate (responses per minute)	mean, std, min, max	4	Biosignal
Respiration Rate (breaths/min)	mean, std, min, max	4	Biosignal
Respiration Amplitude	mean, std, min, max	4	Biosignal
Respiration Depth	mean, std, min, max	4	Biosignal
ECG Heart Rate (bpm)	mean, std, min, max	4	Biosignal
ECG HR Trend	mean, std, min, max	4	Biosignal
ECG RMSSD (HRV index)	mean, std, min, max	4	Biosignal
PPG Heart Rate (bpm)	mean, std, min, max	4	Biosignal
PPG HRV (pulse-derived variability)	mean, std, min, max	4	Biosignal

Model and Key Finding

Algorithm: Random Forest Regressor (100 trees). Features were median-imputed for missing biosignal windows and z-score standardized before training. Separate models were trained for valence and arousal.

Performance: In-sample R² on 2,773 windows: valence = 0.945, arousal = 0.944. The circumplex scatter plot above compares participant-reported ratings (filled dots) against Random Forest predictions (outline dots) for the 10 contemporary pieces, with dashed connecting lines showing per-piece agreement. Shorter lines indicate tighter model-to-human alignment.

Key finding: When feature importances were ranked by mean decrease in impurity, the top-5 predictors for both valence and arousal were exclusively spectral audio features (spectral centroid, rolloff, bandwidth, flatness, and RMS energy), not harmonic or melodic features. This computationally corroborates the spectralist hypothesis articulated by Grisey, Murail, and Saariaho: that timbre is the primary carrier of emotional information in contemporary music, and that harmony, the organizing principle of tonal music, is insufficient to account for emotional response in post-tonal repertoire.

Valence / Arousal Circumplex