Features Chapter 1 of 6 · tap to browse
Feature Vectors
Every data point is a vector — and the choice of features determines what the algorithm can see
Spotify's recommendation engine does not listen to music — it computes distances between feature vectors. When it recommends a song, it is finding the nearest neighbours to your listening history in a space defined by tempo, energy, danceability, and dozens of other measurements.
- 1 Define a feature vector and identify its components in a described dataset.
- 2 Explain how feature choice determines what relationships an algorithm can and cannot detect.
- 3 Construct a feature vector for a described entity, choosing appropriate features and justifying each inclusion.
- 4 Given two feature vectors, explain what their similarity or difference means in the context of the real-world scenario.
From Songs to Vectors
A music streaming platform has 50,000 songs in its library. A user finishes listening to a jazz piano piece — calm, acoustic, slow tempo, low energy. The platform needs to find similar songs in the next 200 milliseconds. It cannot listen to all 50,000 songs. It cannot read their lyrics or understand their cultural context. It needs a way to describe each song as a compact mathematical object that captures its character — something that can be compared instantly. That object is a feature vectorfeature-vector.
What a Feature Vector Is
A feature vectorfeature-vector is an ordered list of numbers describing one entity. For a song in the streaming platform’s catalogue, the feature vector might record tempo, energy level, danceability, acousticness, and loudness:
Each number is one component — one measurable property of the song. A slow acoustic jazz piece might be represented as [72, 0.21, 0.34, 0.89, −18.4]. An upbeat pop track might be [128, 0.88, 0.91, 0.05, −4.2]. These two vectors are very different — and that difference is exactly what the recommendation engine measures. Songs with similar vectors are close together in mathematical space. Songs with different vectors sit far apart.
Every entity in the dataset must be described by the same set of features in the same order. A song cannot swap its energy and danceability columns relative to other songs — the vector position must mean the same thing across the entire dataset. Position 3 is always danceability. Position 4 is always acousticness. This consistency is what makes comparison possible.
Feature Space
When every song is a vector, the entire catalogue becomes a collection of points in a mathematical space. Two features define a 2D space. Three features define a 3D space. Fifty features define a 50D space. In this space, songs that sound similar cluster together. Songs that sound different sit far apart. The recommendation engine’s job is to find the nearest neighbours to the song you just played — the points sitting closest to your current position in feature spacefeature-space.
The geometry of feature space directly encodes musical similarity when the features are well chosen. A jazz piece at [0.32, 0.80] in energy-acousticness space sits close to other jazz pieces at [0.28, 0.85] and [0.38, 0.74]. It sits far from the pop cluster at [0.82, 0.08]. Distance in feature space is the recommendation engine’s definition of musical similarity — nothing more.
Why Feature Choice Matters
The features you include determine what relationships the algorithm can see. Include acousticness and the algorithm can distinguish acoustic from electronic music. Exclude it and that distinction is invisible — acoustic jazz and acoustic classical become indistinguishable from electronic jazz and electronic classical on every dimension the algorithm has access to.
This is not a minor implementation detail. It is the most important design decision in an ML system. An algorithm cannot learn a pattern from data that does not capture that pattern. A fraud detection system that does not include transaction timing cannot learn that fraud is more common at 3 a.m. A hiring system that excludes salary history cannot correct for historical pay gaps. A medical diagnosis model that omits patient age cannot account for age-related risk factors.
Spotify’s audio features API exposes 13 measurements per track: acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness, tempo, valence, time signature, key, mode, and duration. Each one was chosen because it carries information about musical similarity that listeners respond to. Features that did not improve recommendation quality were dropped. The 13 retained features represent years of engineering work to find measurements that correlate with how humans perceive musical similarity.
Including every available feature on the assumption that more information is always better. A feature that varies randomly with no relationship to the pattern of interest adds noise without adding signal. In high-dimensional feature spaces, noise features dilute the contribution of informative ones — distance calculations are pulled toward random variation instead of meaningful differences. Feature selection is the practice of choosing carefully rather than exhaustively. The right set of features is usually much smaller than the available set.
Feature Vectors Connect to Mathematical Vectors
Every song’s feature vector is exactly the kind of vector covered in Domain 1: Mathematical Foundations. The same operations apply. The Euclidean distance between two feature vectors measures how different two songs are. The dot product measures alignment along specific dimensions. The vector operations are identical — only the context has changed from abstract coordinates to musical measurements.
The entire mathematical machinery of vectors — magnitude, direction, distance, dot product — applies directly to feature vectors. A music recommendation system is, at its core, a nearest-neighbour search in a vector space. This is why mathematical foundations came first: the algebra of vectors in two or three dimensions generalises directly to the 50-dimensional feature spaces that power real recommendation engines.
Bridge to the Playground
The playground shows 120 songs from the platform’s catalogue plotted as points in 2D feature space. Each dot is one song, coloured by genre — jazz, pop, classical, rock. Four select controls let you choose which two features form the horizontal and vertical axes: energy, acousticness, tempo, danceability, or loudness. Switching the axes repositions all 120 dots — some feature pairs cleanly separate genres into distinct clusters, while others mix them completely. A toggle removes genre colours entirely, revealing the clustering structure that exists independently of labels. A second toggle highlights one reference jazz song and draws lines to its three nearest neighbours in the current feature space, changing completely as you switch feature pairs.
Every repositioning answers the same question: does this pair of features carry enough information about musical similarity to be useful to a recommendation engine?
What is the difference between a feature and a label?
A feature is a measurement used as input to an algorithm — tempo, energy, danceability. A label is what the algorithm is trying to predict or find — genre, whether a user will like the song, whether a transaction is fraudulent. Features are the inputs. Labels are the outputs. In unsupervised learning there are no labels — only features.
How many features should a feature vector have?
As many as genuinely help the algorithm find the pattern, and no more. More features are not always better. Irrelevant features add noise. Redundant features waste computation. The curse of dimensionality means that as feature count grows, data becomes increasingly sparse and distance measures become unreliable. Feature selection is the practice of finding the right subset.
Does the order of features in a vector matter?
Yes, technically — feature 3 must always be the same measurement across all vectors in the dataset. But the order itself is arbitrary as long as it is consistent. A song described as [tempo, energy, danceability] and another described as [energy, tempo, danceability] are not comparable unless they use the same order.
The playground is about to show you 120 songs plotted using two audio features — energy on the horizontal axis and acousticness on the vertical axis. Before you switch between feature pairs: which two measurements do you think would best separate rock music from classical music? What properties do those two genres have that differ most from each other — and which of the five available features (energy, acousticness, tempo, danceability, loudness) would capture those differences most clearly?
Key Ideas
A music streaming platform with 50,000 songs represents each track as an ordered list of numbers — tempo, energy, danceability, acousticness, loudness. These five numbers are its feature vector. A slow acoustic jazz piece might be [72, 0.21, 0.34, 0.89, −18.4]. An upbeat pop track might be [128, 0.88, 0.91, 0.05, −4.2]. The distance between these two vectors — 107 BPM apart, 0.67 apart on energy, 0.57 apart on danceability, 0.84 apart on acousticness — is the recommendation engine’s measure of how different these songs sound. It never listens to either track.
When every song is a vector, the entire catalogue becomes a collection of points in a mathematical space. Songs with similar musical character sit close together. Songs with different character sit far apart. Two features define a 2D space that can be drawn on a page. Fifty features define a 50D space that cannot be visualised but obeys the same geometry — distances, angles, and nearest neighbours all work identically. The recommendation engine finds songs near your current listening history by computing distances in this space and returning the closest points.
Feature choice is the most important design decision in this system. Plotting 120 songs using energy and acousticness reveals two clean clusters: acoustic instruments in one corner, amplified production in the other. Plotting the same 120 songs using loudness and tempo dissolves those clusters — genres mix because these measurements carry little information about acoustic character. The songs did not change. Only which properties were used to position them changed. Any algorithm trained on the loudness-tempo representation will give worse recommendations than one trained on the energy-acousticness representation — not because of algorithmic sophistication, but because of what the features measure.
The mathematics in this chapter is identical to the vector mathematics covered in Domain 1. A song’s feature vector is a point in a vector space. The Euclidean distance between two feature vectors measures how different two songs are. The dot product measures alignment along specific dimensions. The algebraic operations from high-school geometry and linear algebra — magnitude, distance, projection — apply without modification in 50 dimensions. What changed is the interpretation: coordinates that once represented physical position now represent measured musical properties.
The practical consequence is that feature selection often matters more than algorithm selection. Given a good feature representation, simple algorithms work well. Given a poor feature representation, sophisticated algorithms struggle to find patterns that the features do not contain. Choosing which measurements to include — and which to exclude — is where most of the real engineering work happens.
One category of feature has not been addressed yet: genre itself. Placing the string “jazz” or “classical” directly into a feature vector is not possible — algorithms require numbers, and text values cannot be used in distance calculations. Categorical features like genre labels, artist names, country of origin, and instrument type must be converted to numbers before they can enter a feature vector. Converting categories to numbers without introducing false orderings or spurious distances is the subject of the next chapter.
A feature vector is an ordered list of numbers describing one entity — one song, one patient, one email. Each number is one measurable property. Together they locate that entity as a point in feature space.
Feature choice determines what relationships an algorithm can detect. A feature the algorithm cannot see is a pattern it cannot find — a music recommendation system without acousticness cannot distinguish acoustic from electronic music, regardless of how sophisticated its algorithm is.
Similar entities cluster together in feature space. Recommendation, classification, and clustering all exploit this geometry — finding nearby points, drawing boundaries between clusters, or grouping points by proximity.
More features are not always better. Uninformative features add noise and dilute the signal from informative ones. In high dimensions this effect compounds — feature selection matters as much as feature engineering.
Check Your Understanding
Four questions on feature vectors, feature space, and feature selection. Click a question to reveal the answer — there is no score.
A recommendation system for films describes each film using: runtime (minutes), release year, average user rating, and budget (millions). A data scientist wants to add 'director name' as a feature. What must happen before this feature can be included in the feature vector?
Two songs with very different feature vectors will always sound different to a human listener.
A spam detection system uses feature vectors with 500 features — one for each of the 500 most common words in emails (1 if the word appears, 0 if not). A data scientist suggests adding 10,000 more rare words as features. What is the likely effect on model performance?
Order these steps in building a music recommendation feature vector, from first to last:
- 1.Choose which audio properties to measure (tempo, energy, acousticness…)
- 2.Encode any non-numerical features (genre labels, artist names) as numbers
- 3.Collect raw audio files and metadata for each song in the catalogue
- 4.Validate that similar-sounding songs cluster together in the resulting feature space