Ellogon
Category:
What is Ellogon?
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.

Europarl Parallel Corpus
Category:
Europarl Parallel Corpus
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. For this purpose we extracted matching items and labeled them with corresponding document IDs. Using a preprocessor we identified sentence boundaries. We sentence aligned the data using a tool based on the Church and Gale algorithm.

Forvo: the pronunciation guide.
Category:
Forvo: the pronunciation guide. All the words in the world pronounced by native speakers
Forvo is the largest pronunciation guide in the world. Ever wondered how a word is pronounced? Ask for that word or name, and another user will pronounce it for you. You can also help others by recording your pronunciations in your own language.

semanticvectors
Category:
semanticvectors
Semantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package was created as part of a project by the University of Pittsburgh Office of Technology Management, to explore the potential for automatically matching related concepts in them technology management domain, e.g., mapping new technologies to potentatially interested licensors.

Confusing Words
Category:
Confusing Words
Confusing Words is a collection of 3210 words that are troublesome to readers and writers. Words are grouped according to the way they are most often confused or misused.