InterpretCognates Research

Probing Conceptual Universals in a 200-Language Neural Translator

Does a neural machine translation model trained on 200 languages develop a language-neutral conceptual store? We probe NLLB-200's encoder geometry using Swadesh core vocabulary, phylogenetic distance matrices, colexification patterns, and semantic offset invariance to find out.

February 2026 · InterpretCognates Project · Kyle Mathewson

1. Introduction

What does it mean for a neural network to "understand" a concept across languages? When Meta's NLLB-200 translates "water" from English into Yoruba, Hindi, and Japanese, does its encoder construct a unified, language-neutral representation of the concept — or does it merely learn language-labeled clusters that look universal from the outside?

This question sits at the intersection of two literatures that have largely developed in parallel. In NLP interpretability, Chang, Tu, and Bergen (2022) demonstrated that multilingual model embeddings can be decomposed into language-sensitive axes (encoding vocabulary, script, and language identity) and language-neutral axes (encoding part-of-speech, token position, and universal syntactic structure). In cognitive science, Correia et al. (2014) used fMRI multivariate decoding to show that the human anterior temporal lobe supports a language-independent conceptual hub — a classifier trained to recognize Dutch words could decode which English word a bilingual had heard.

InterpretCognates bridges these literatures by probing NLLB-200's internal geometry with tools drawn from both traditions. The project occupies a genuinely novel scientific niche: prior mechanistic interpretability work on multilingual models covers at most 88 languages (Chang et al., 2022); we operate on 200, with translation supervision rather than masked LM pretraining, using an encoder-decoder architecture that the geometry literature has almost entirely overlooked. The external validation resources we bring to bear — ASJP v20 phonetic distances, CLICS³ colexification data, Berlin & Kay color universals — have never been tested against NLLB's representation space.

The foundational architecture in the psycholinguistics literature is Dijkstra and van Heuven's BIA+ model (2002), which proposes that bilinguals maintain a single integrated lexicon accessed in a language-nonselective, parallel fashion. NLLB's shared-encoder architecture is the direct computational implementation of this finding: the same weights process all 200 languages simultaneously, with language selection deferred to the decoder's forced BOS token. This architectural mapping provides the interpretive framework for every encoder-level analysis we run.

The core questions we address:

  1. Does NLLB's encoder contain a geometrically isolable, language-neutral conceptual store?
  2. Does NLLB's representation space implicitly encode the phylogenetic tree of human languages?
  3. Does the embedding space reflect universal conceptual associations independent of language?
  4. Are semantic relationships between concepts (man→woman, big→small) translationally invariant across 135 languages?

2. Model & Method

We use facebook/nllb-200-distilled-600M, the 600M-parameter distilled dense variant of NLLB-200 (NLLB Team et al., 2022). The distillation from a sparse mixture-of-experts to a dense model is significant: it means every attention head and every FFN neuron participates in processing every language, making the model particularly amenable to mechanistic analysis. The model is a shared-encoder, encoder-decoder transformer with 12 encoder layers and 16 attention heads per layer, covering 200 languages across dozens of scripts.

Swadesh Stimulus Corpus

Our primary stimulus set is the Swadesh 100-item list — 101 core vocabulary concepts (body parts, basic natural phenomena, low numerals, kinship terms) with translations in 135 NLLB-supported languages spanning 19 language families. Swadesh (1952) predicted that these items are culturally stable and resistant to borrowing, making them markers of genealogical relationship across millennia. Jäger (2018) formalized this into the ASJP database covering ~7,000 languages.

Embedding Extraction

For each word in each language, we use a word-in-context extraction strategy following standard transformer probing methodology (Hewitt & Manning, 2019; Tenney et al., 2019). Each target word is placed inside a fixed carrier sentence — "I saw a {word} near the river." — which is translated into the target language and run through the NLLB encoder with the appropriate source-language BCP-47 code set via the tokenizer (tokenizer.src_lang = lang_code). From the encoder's last hidden states, we extract only the activations corresponding to the target word's subword tokens and mean-pool them, yielding a contextually disambiguated 1024-dimensional embedding. This avoids the degenerate representations that bare-word inputs can produce in encoder-decoder models.

Isotropy Correction (ABTT)

Rajaee and Pilehvar (2022) showed that multilingual transformer embedding spaces are severely anisotropic — representations cluster in a narrow cone, biasing cosine similarity toward frequent token patterns rather than genuine semantic proximity. We apply All-But-The-Top (ABTT) post-processing: subtract the global mean, then remove the top-k principal components (with k=3) before computing cosine similarity. Every quantitative similarity claim in this report uses isotropy-corrected vectors unless otherwise noted.

Language Mean-Centering

Chang et al. (2022) demonstrated that raw embeddings are dominated by language-sensitive variance. The language-neutral axes — which carry actual semantic content — only emerge after per-language mean subtraction. We subtract each language's centroid vector from all of that language's embeddings before PCA projection, transforming the visualization from language-family clusters to concept-meaning clusters. The ratio of between-concept to within-concept distance before and after centering quantifies the strength of the conceptual store.


3. The Conceptual Manifold

To build intuition, we begin with a single concept: "water". The carrier sentence "I drink {water}." places the target word in sentence-final position with high cloze probability, then the word is replaced by its translation in each of 30 typologically diverse languages. For each translation, the full sentence is encoded, but only the hidden states corresponding to the target word's subword tokens are extracted and mean-pooled. These word-in-context embeddings are then projected into 3D via PCA. The result is a conceptual manifold — the geometric surface in NLLB's 1024-dimensional space along which the concept "water" is represented across languages.

Translations

Loading pre-computed data…

Embedding Geometry

Each target word is embedded inside the carrier sentence shown below, with {word} replaced by the translation in each language. The 3D PCA scatter plot shows each language's embedding of "water" as a point. Points cluster by language family — Romance languages huddle together, as do Slavic, Turkic, and Niger-Congo languages — confirming the language-sensitive variance that Chang et al. (2022) identified.

Loading…

Figure 1. 3D PCA of NLLB encoder embeddings for "water" across 29 languages. Each language's translation is embedded inside the carrier sentence above and projected into 3D. Clustering by language family reflects the model's language-sensitive encoding.

Embedding Geometry Statistics

The embedding cloud for a single concept reveals key geometric properties of the manifold. We quantify the distribution of pairwise cosine similarities across all 29 language embeddings to characterize how tightly the concept is represented. Higher mean similarity and lower variance indicate stronger conceptual convergence.

Loading…

Cross-Language Similarity

The cosine similarity matrix below quantifies how similar each pair of language embeddings is for the concept "water." High similarity between typologically distant languages (e.g., Finnish and Japanese) is evidence that the model represents the concept at a level deeper than surface form.

Loading…

Figure 2. Isotropy-corrected cosine similarity matrix for "water" across 29 languages. Darker colors indicate higher semantic similarity in NLLB's encoder space.


4. Swadesh Convergence Ranking

With single-concept intuition established, we scale up to the full Swadesh 100-item list. For each of the 101 core vocabulary concepts, we embed every translation across all 135 languages (producing over 13,000 embeddings total), then compute the mean pairwise cosine similarity across all language pairs. Concepts with high mean similarity converge strongly across languages — their representations are close in NLLB's embedding space regardless of which language expresses them.

Swadesh's cultural stability hypothesis (1952) predicts that these core vocabulary items should converge more than culturally specific vocabulary. The scatter plot below tests this prediction by placing every concept along two axes: how phonetically similar its translations are across Latin-script languages (x-axis) and how strongly it converges in NLLB's embedding space (y-axis). Concepts that converge strongly despite low phonetic similarity are the strongest candidates for genuine conceptual universals.

Loading…

Figure 3. Every Swadesh concept plotted by mean cross-lingual phonetic similarity (x-axis) vs. embedding convergence (y-axis, isotropy-corrected). Points colored by semantic category. Concepts in the upper-left quadrant converge strongly in embedding space despite low surface-form similarity — the strongest candidates for genuine conceptual universals.

Category Summary

Grouping Swadesh concepts by semantic category reveals which types of concepts are most universally represented. For each category the chart below shows NLLB embedding convergence alongside orthographic and phonetic similarity among Latin-script translations. Categories where embedding convergence clearly exceeds surface-form similarity provide the strongest evidence that the model's universality reflects semantic structure rather than shared word form.

Loading…
Loading…

Figure 3c. Mean scores by Swadesh semantic category, comparing three dimensions: NLLB embedding convergence (colored bars), orthographic similarity among Latin-script translations (light grey), and approximate phonetic similarity (light blue). Categories whose embedding convergence substantially exceeds their surface-form similarity show the strongest evidence for language-neutral conceptual representation.

The Polysemy Confound

Inspection of the bottom-ranked items reveals a pattern: concepts with the lowest cross-lingual convergence are disproportionately polysemous. "Bark" (tree bark vs. dog bark vs. to bark), "lie" (to recline vs. to tell a falsehood), "see" (visual perception vs. to understand) — low convergence for these items may reflect translation ambiguity rather than cultural instability. The word-in-context extraction strategy partially mitigates this: the fixed carrier sentence provides a mild noun-sense bias, and the encoder contextualizes each word against its sentential frame. However, a single carrier sentence cannot fully disambiguate all senses, so polysemy remains a confound. Controlling for it (e.g., via WordNet sense counts as a covariate) is essential before interpreting the bottom of the ranking as evidence against conceptual universality.

Variance Decomposition: Surface Form vs. Semantic Universals

A key methodological risk: high cross-lingual convergence could reflect cognate overlap (shared surface forms from common etymological origins) rather than genuine conceptual universality. To disentangle these, we regress embedding convergence on orthographic similarity (mean pairwise normalized Levenshtein distance across 21 Latin-script languages) and approximate phonetic similarity (same metric on phonetically normalized forms).

The residual for each concept — its semantic surplus — measures the degree to which it converges beyond what shared surface form would predict. Concepts above the regression line are the strongest candidates for genuine conceptual universals. The complement of R² gives the variance attributable to deeper semantic structure, providing a principled answer to: is this real semantics, or just shared spelling?

Loading…
Loading…
Loading…

Figure 4. Variance decomposition of Swadesh embedding convergence against surface-form similarity. Left: embedding convergence vs. orthographic similarity (mean pairwise Levenshtein, 21 Latin-script languages). Right: embedding convergence vs. approximate phonetic similarity. Each point is a Swadesh concept colored by semantic category. Points above the regression line show positive semantic surplus — convergence unexplained by shared word form. The complement of R² quantifies how much variance is attributable to genuine conceptual structure.

Methodology Notes

Embedding Convergence

For each concept, we embed its translation in all 135 languages using word-in-context extraction: the target word is placed in a carrier sentence, encoded by NLLB-200, and only the target word's subword token activations are mean-pooled. The convergence score is the mean pairwise cosine similarity across all language pairs (780+ pairs). Higher values indicate the model represents the concept similarly regardless of language.

Orthographic Similarity

Computed as 1 minus normalized Levenshtein distance between word forms, restricted to the 21 Latin-script languages (210 pairs) to avoid meaningless cross-script comparisons. Measures surface-form overlap from shared etymological origins.

Semantic Surplus

The residual from regressing embedding convergence on orthographic similarity. Positive surplus means the concept converges more than surface form would predict — evidence for language-universal conceptual structure.

Phonetic Dimension

A third decomposition axis using ASJP phonetic transcriptions separates phonetic convergence from both orthographic and purely semantic components. Per-word ASJP data integration is planned for the full analysis.


5. Phylogenetic Structure

If NLLB's encoder genuinely captures something about the relationships between languages, its embedding distance matrix should correlate with independently established measures of linguistic relatedness. We test this by computing a language pairwise cosine distance matrix averaged over all Swadesh concepts, then examining whether the resulting structure recovers known phylogenetic relationships.

The phylogenetic correlation experiment bridges three literatures simultaneously. If NLLB's cosine distance matrix over Swadesh vocabulary correlates significantly with ASJP phonetic distances, the result speaks to NLP researchers (geometry of multilingual models), cognitive scientists (universal language network), and historical linguists (computational phylogenetics). No single prior paper has attempted this. We use a Mantel test — correlating the upper triangles of two distance matrices via Spearman rank correlation with a permutation test for significance.

Hypothesis 3 — The Phylogenetic Correlation

NLLB's cosine distance matrix, averaged over the 101 Swadesh-list concepts across 135 languages, will yield a statistically significant positive Mantel correlation (Spearman ρ > 0.3, p < 0.01) with the ASJP pairwise phonetic distance matrix — demonstrating that a neural translation model has implicitly learned the phylogenetic structure of world languages from translation co-occurrence statistics alone.

3D PCA Projection

Principal Component Analysis projects the language centroid vectors into three dimensions, preserving the dominant axes of variation. If the model has learned phylogenetic structure, languages from the same family should cluster together — and the relative positions of families should roughly mirror their known genealogical relationships. Toggle between raw embeddings (showing language-family clustering) and mean-centered embeddings (revealing deeper structural relationships).

Labels:
Loading…

Figure 5. 3D PCA projection of NLLB language centroids across 80+ languages. Points colored by language family. Proximity reflects average conceptual similarity over the full Swadesh vocabulary. Toggle between raw (language-family clustering) and mean-centered (concept-structure) views.

Hierarchical Clustering

The dendrogram below shows hierarchical clustering of languages by their NLLB embedding distances. The tree structure can be compared against the known phylogenetic tree of human languages to assess how well the model's learned geometry recapitulates established genealogical relationships.

Loading…

Figure 6. Hierarchical clustering dendrogram based on NLLB embedding distances averaged over Swadesh vocabulary. Leaf labels colored by language family. Scroll horizontally to see all leaves.

Mantel Test

The Mantel test asks a simple question: do languages that sound alike also get encoded alike? For every pair of the 88 languages with both NLLB and ASJP data, we have two numbers:

  1. X-axis — ASJP Phonetic Distance: how different the two languages' core vocabulary sounds, based on the Automated Similarity Judgement Programme taxonomy. Higher = more phonetically distant.
  2. Y-axis — NLLB Embedding Distance: how far apart the two languages' Swadesh-word embeddings are inside NLLB's encoder (cosine distance, averaged over all concepts). Higher = the model represents them more differently.

If the model has implicitly learned genealogical relationships, we expect a positive correlation: closely related languages (low ASJP distance) should also be close in embedding space (low embedding distance).

Reading the scatter plot

ASJP distances are computed from real ASJP v20 word lists (Wichmann et al., 2022): for each pair of languages, we calculate the mean Normalized Levenshtein Distance across all shared 40-item Swadesh words in ASJP phonetic transcription. This produces a continuous distribution reflecting genuine phonetic similarity.

A positive correlation across the full range confirms that NLLB's encoder has implicitly learned phonetic/genealogical relationships from translation co-occurrence statistics alone.

Loading…
Color by:
Loading…

Figure 7a. Mantel test scatter plot with per-cluster regression lines and Spearman correlations. Each point is one language pair. Dashed lines show the linear trend within each cluster. Toggle views to explore geographic and family-level structure.

Loading…

Figure 7b. Distribution of NLLB embedding distances by geographic relationship type. Same-family pairs have the tightest (lowest) embedding distances. Cross-region pairs are most distant but show the widest spread — the model distinguishes some cross-family pairs much more than others, suggesting it captures areal and typological similarities beyond strict genealogy.

Conceptual Maps by Language Family

Each point in the concept map represents a Swadesh concept positioned by its average embedding across all languages in a selected family. PCA is fitted on overall centroids so that per-family maps share the same coordinate space, enabling direct visual comparison of how different language families organize the same conceptual inventory.

If all families produce similar concept maps, the conceptual structure is truly language-universal. Systematic differences between families may reveal culture-specific or typology-specific semantic biases encoded by the model.

Color by: View:
Loading…

Figure 7b. 2D PCA concept map of Swadesh vocabulary. Select a language family or “All Families” to compare how different language families organize the conceptual space. Color by semantic category (body parts, nature, actions…) or by language family. Grey dots show the overall average positioning for reference.


6. Validation Tests

The experiments above establish that NLLB's encoder contains structure that correlates with known linguistic and cognitive universals. The following validation tests probe this structure from multiple independent angles, each testing a specific hypothesis against external benchmarks from cognitive science and historical linguistics.

6.1 Isotropy Correction Test

Does the Swadesh convergence ranking hold under isotropy (ABTT) correction? Rajaee and Pilehvar (2022) showed that multilingual transformer embeddings are severely anisotropic, meaning raw cosine similarity is dominated by frequency artifacts rather than genuine semantic proximity. If our convergence ranking changes dramatically after isotropy correction, the signal is an artifact; if it remains stable (high Spearman ρ), the structure is genuinely geometric.

Loading…
Loading…
Loading…

Figure 8. Left: Per-concept comparison of raw vs. isotropy-corrected convergence scores. Points near the diagonal indicate stable ranking. Right: Top 20 concepts under isotropy correction with comparison bars.

6.2 Swadesh vs. Non-Swadesh Comparison

Swadesh's cultural stability hypothesis predicts that core vocabulary converges more strongly across languages than culturally specific vocabulary. We test this by comparing 101 Swadesh concepts against 60 non-Swadesh concepts (e.g., "government," "university," "airport," "democracy") — modern, abstract, and culturally contingent terms. Both sets are embedded across the same languages, and a one-sided Mann-Whitney U test evaluates whether Swadesh concepts show significantly higher mean cross-lingual similarity.

Loading…
Loading…
Loading…

Figure 9. Left: Box plot comparison of convergence scores for Swadesh (core) vs. non-Swadesh (culturally specific) vocabulary. Right: Overlapping histograms of both distributions. A significant rightward shift of the Swadesh distribution supports the cultural stability hypothesis.

6.3 CLICS³ Colexification Test

Rzymski, Tresoldi et al.'s CLICS³ (2019) database identifies concept pairs that are lexified by a single word across many unrelated languages — revealing universal conceptual adjacencies. We extract colexification pairs directly from the CLICS³ SQLite database, filtering to pairs attested in ≥3 language families. If NLLB assigns higher cosine similarity to frequently colexified pairs (e.g., "foot"–"leg" at 52 families, "hand"–"arm" at 48 families) than to never-colexified pairs, this bridges typological linguistics and neural representation analysis in a way no prior paper has attempted.

Loading…
Loading…

Figure 10. Distribution of cross-lingual cosine similarity for colexified vs. non-colexified concept pairs. Higher similarity for colexified pairs indicates the model has internalized universal conceptual associations.

6.4 Conceptual Store Metric

The conceptual store hypothesis (Correia et al., 2014) predicts that after removing language-identity information, the ratio of between-concept distance to within-concept distance should increase substantially. We compute this ratio on raw embeddings and after per-language mean-centering. The prediction from the executive summary: at least a 2× improvement factor after centering.

Hypothesis 2 — The Mean-Centering Test

Subtracting per-language mean vectors before PCA will reorganize the embedding space from language-family clustering to concept-meaning clustering. The ratio of between-concept distance to within-concept distance will increase by at least a factor of two after mean-centering — the first geometric operationalization of the conceptual store hypothesis in a neural translation model.

Loading…

6.5 Semantic Offset Invariance

This experiment tests a question that goes beyond first-order convergence (do translations of the same concept cluster together?) to second-order structure: are relationships between concepts preserved across languages? The hypothesis is that semantically fundamental relationships — man→woman, big→small, I→we, eat→drink — produce offset vectors that are translationally invariant: the direction from "man" to "woman" in English embedding space should be parallel to the direction from "homme" to "femme" in French embedding space.

Mean consistency close to 1.0 indicates high cross-lingual invariance — the relationship is a universal property of the conceptual space. The per-family breakdown is analytically important: invariance that holds across unrelated families (Indo-European, Sino-Tibetan, Niger-Congo) is strong evidence for language-universal conceptual structure, while invariance restricted to related families may reflect shared etymology. No prior paper has tested semantic offset invariance across 135 languages in a translation model. The closest precedent is Mikolov et al.'s (2013) word2vec analogy work, which operated within a single language.

Loading…
Loading…
Loading…

Figure 11a. Left: Joint PCA of concept embeddings for the 5 most translationally invariant pairs across 135 languages. Faint arrows show individual language vectors; bold arrows show cross-lingual centroids. Concept clusters are visible as grey clouds. Right: The same offset vectors translated to a common origin, revealing that each semantic relationship produces a tight, language-universal direction despite surface-form diversity.

Loading…

Figure 11b. Cross-lingual consistency of all 22 semantic offset pairs. Top-5 pairs are color-coded to match the PCA panels above. Error bars show standard deviation across languages. High consistency (>0.8) indicates a language-universal relationship; lower values suggest culture- or family-specific encoding.

Loading…

Per-Family Invariance Heatmap

The heatmap below decomposes offset invariance by language family. Each cell shows the mean consistency of a semantic relationship within a specific language family. Uniform warmth across families indicates the relationship is truly universal; cold spots in specific families reveal where the model's encoding diverges from the cross-lingual norm.

Loading…

Figure 12. Per-family breakdown of semantic offset invariance. Rows are concept pairs; columns are language families. Warmer colors indicate higher consistency. Universally warm rows (e.g., man→woman) represent relationships that are invariant across all language families.

6.6 Berlin & Kay Color Circle

Berlin and Kay (1969) established that human color categorization follows universal constraints: languages add color terms in a predictable order, and perceptually adjacent colors (red/orange, blue/green) are categorized similarly across cultures. If NLLB's embedding space captures these universals, the arrangement of color term centroids (averaged across 30+ languages) should recover the perceptual color circle — red adjacent to orange and purple, green adjacent to yellow and blue — regardless of which languages contributed the terms.

Show:
Color by: Filter family: View:
Loading…

Figure 13. 2D PCA projection of Berlin & Kay color terms across 140+ languages. Each small dot is one language’s embedding of a color term; large dots show the cross-lingual mean. Transparency reveals where languages agree (dense clusters) vs. diverge (scattered points). Toggle overlays to explore family-level structure and the distribution of color term representations.

6.6b Layer-wise Color Circle Evolution

The color circle in Figure 13 uses final-layer (layer 11) representations. But how does this structure emerge across the encoder stack? The animation below shows the same color term embeddings extracted from each of the 12 encoder layers, all projected into the same PCA coordinate space (fit on the final-layer centroids). Use the slider to scrub through layers manually, or press Animate to watch the representational space self-organize in real time — scattered noise in early layers coalescing into the tight, perceptually ordered color circle by the final layer.

Layer 11
Loading…

Figure 13b. Layer-wise evolution of Berlin & Kay color term representations. All layers are projected into the same PCA space (fit on final-layer centroids). Early layers show dispersed, unstructured points; by the final layer, color terms self-organize into the perceptual color circle. The animation reveals a phase transition around layers 5–7 where categorical structure snaps into place — mirroring the conceptual store metric transition in Figure 15a.

6.7 Carrier Sentence Robustness

All preceding analyses embed target words inside a fixed carrier sentence. Does this carrier drive our convergence findings, or would bare words produce the same ranking? We repeat the full embedding extraction and convergence analysis using decontextualized embeddings — target words embedded in isolation, without any surrounding context.

If the carrier sentence is merely providing a consistent frame (as intended) rather than imposing structure, the contextualized and decontextualized convergence rankings should be highly correlated. A low correlation would suggest that the carrier sentence, not semantics, drives the observed patterns.

Loading…
Loading…
Loading…

Figure 14. Left: Scatter of contextualized vs. decontextualized convergence scores. Points near the identity line indicate concepts whose convergence is insensitive to the carrier sentence. Right: Top-20 concepts under each condition, showing minimal reranking of the highest-convergence items.

6.8 Layer-wise Emergence of Semantic Structure

All preceding analyses use representations from the final (12th) encoder layer. But where in the encoder stack does cross-lingual semantic structure emerge? We repeat the convergence analysis at each of the 12 encoder layers on a diverse 39-language subset.

If semantic convergence builds gradually across layers — with surface-level features dominating early layers and abstract semantics dominating later ones — this would parallel the "NLP pipeline" effect documented by Tenney et al. (2019) and provide a developmental lens on the conceptual store hypothesis.

Loading…
Loading…
Loading…

Figure 15a. Left: Mean Swadesh convergence by encoder layer. Convergence increases monotonically with a sharp rise around layer 1. Right: Conceptual Store Metric (raw and mean-centered) by layer, showing a phase transition at layer 6.

Loading…

Figure 15b. Per-concept convergence heatmap across encoder layers. Rows are Swadesh concepts (sorted by final-layer convergence); columns are encoder layers 0–11. Concrete, perceptually grounded concepts (top) achieve high convergence earlier than abstract or polysemous concepts (bottom).

6.9 Controlled Non-Swadesh Comparison

The original Swadesh vs. non-Swadesh comparison (Section 6.2) is confounded by loanword bias: modern terms like telephone, democracy, and hotel share surface forms across dozens of languages due to cultural borrowing, inflating their apparent convergence. To obtain a properly controlled comparison, we constructed a second non-Swadesh set of frequency-matched, non-loanword concrete nouns with cross-linguistic orthographic diversity comparable to the Swadesh set.

Under this controlled baseline, Swadesh concepts exhibit convergence commensurate with or exceeding that of the matched controls — consistent with the hypothesis that culturally stable, universally attested concepts develop robust cross-lingual representations despite maximal surface-form diversity.

Loading…
Loading…
Loading…

Figure 16. Controlled Swadesh vs. non-Swadesh comparison using frequency-matched, non-loanword concrete nouns. Left: Box plot. Right: Overlapping histograms. Unlike the loanword-confounded comparison, this controlled baseline shows Swadesh concepts converging at least as strongly as matched controls.


7. Discussion

Taken together, these results constitute a coherent empirical story: NLLB-200's encoder contains a language-neutral conceptual store, its geometry reflects universal conceptual structure, and its representation of core vocabulary correlates with cross-cultural cognitive universals. The BIA+ two-system architecture maps directly onto NLLB's encoder-decoder split — the encoder functions as the language-nonselective "word identification system," while the forced BOS token mechanism functions as the "task-decision system" that imposes language constraints.

The mean-centering result is perhaps the most revelatory: subtracting per-language mean vectors reorganizes the PCA from language-family clusters to concept-meaning clusters, providing the first direct geometric test of whether a neural translation model develops something functionally analogous to the anterior temporal lobe's conceptual hub. The improvement in between/within concept distance ratio quantifies how much of the embedding space is dedicated to language-identity information versus genuine conceptual content.

The phylogenetic correlation, if confirmed at statistically significant levels, would be a landmark finding: a neural machine translation model has implicitly learned the genealogical tree of human language from translation co-occurrence statistics alone, without any training signal about linguistic phylogeny. This result speaks simultaneously to the NLP community (the geometry encodes more than task-relevant features), to cognitive scientists (computational models can develop language-network structure paralleling the human brain), and to historical linguists (neural embedding distances may complement traditional phonetic distance measures).

Limitations & Cautions

Layer-wise Developmental Parallels

The layer-wise trajectory analysis adds a developmental dimension to these structural parallels. Semantic convergence increases monotonically across the 12 encoder layers, with a sharp rise around layer 1 and the Conceptual Store Metric exhibiting a phase transition at layer 6. This mirrors hierarchical processing in the human language network, where primary auditory cortex encodes acoustic features, posterior temporal regions encode phonological and lexical information, and the anterior temporal lobe hub integrates meaning across modalities and languages. Concrete, perceptually grounded concepts achieve high cross-lingual convergence earlier in the encoder stack than abstract or polysemous concepts, suggesting a hierarchy of representational abstraction.

Future Directions

Several promising directions emerge from this work: per-head cross-attention decomposition (finding universal semantic alignment heads), resource-level asymmetry testing (the RHM prediction that L1→L2 translation proceeds via conceptual mediation while L2→L1 can bypass it), and scaling analyses across model sizes (from 600M to the full 3.3B NLLB-200). The core empirical story — Swadesh convergence, variance decomposition, phylogenetic correlation, colexification, offset invariance, layer-wise emergence, and carrier-sentence robustness — provides a comprehensive probe of the encoder's conceptual geometry.


Interactive Concept Explorer

Explore any concept's cross-lingual geometry in real time. Enter a word, select target languages, and see how NLLB-200 represents it across language families.