TODO: Organize these somehow, add one-line blurbs

Organize by usage? (classification, recommendation etc.)

Collections of Collections

Categorization Data

Recommendation Data

Multilingual Data

  • - 308,000 subtitle files covering about 18,900 movies in 59 languages (July 2006 numbers). This is a curated collection of subtitles from an aggregation site, [] The original site,, is up to 1.6m subtitles files.
  • Statistical Machine Translation - devoted to all things language translation. Includes multilingual corpuses of European and Canadian legal tomes.



General Resources