Free Language Embeddings: 66.5% on Google Analogies with 1/3 the Data
300-dimensional word vectors trained from scratch on ~2B tokens of freely-licensed text. Beats the original word2vec by 5.5 points using 1/3 the data. Every dataset DFSG-compliant, every weight reproducible.