Free Language Embeddings: 66.5% on Google Analogies with 1/3 the Data

2026-03-20 · AI/ML
word-embeddingsword2vecNLPfree-softwareDFSGmachine-learningembeddingsopen-source

Free Language Embeddings (V34)

300-dimensional word vectors trained from scratch on ~2B tokens of freely-licensed text using a single RTX 3090.

66.5% on Google analogies — beating the original word2vec (61% on 6B tokens) by 5.5 points with 1/3 the data.

Interactive Demos

Explore the embeddings yourself:

Citation

@misc{hamner2026fle,
title={Free Language Embeddings: Dynamic Masking Word2Vec on DFSG-Compliant Data},
author={David Hamner},
year={2026},
url={https://github.com/ruapotato/Free-Language-Embeddings}
}

GPL-3.0 — Built by David Hamner.

View in Interactive Desktop App