Magical Letter Page
Category:
Magical Letter Page - Linguistic Iconism, Sound Symbolism, Phonosemantics
Sound Symbolism, Phonosemantics, Phonetic Symbolism, Mimologics, Iconism, Cratylus Ideophones, Synaesthesia The Alphabet, The Word

Ellogon
Category:
What is Ellogon?
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.

Europarl Parallel Corpus
Category:
Europarl Parallel Corpus
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

The goal of the extraction and processing was to generate sentence aligned text for statistical machine translation systems. For this purpose we extracted matching items and labeled them with corresponding document IDs. Using a preprocessor we identified sentence boundaries. We sentence aligned the data using a tool based on the Church and Gale algorithm.

Porter Stemming Algorithm
Category:
Porter Stemming Algorithm
The Porter stemming algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

OpenCyc.org
Category:
OpenCyc.org
OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. OpenCyc can be used as the basis of a wide variety of intelligent applications such as:
* rapid development of an ontology in a vertical area
* email prioritizing, routing, summarization, and annotating
* expert systems
* games

Natural Language Toolkit
Category:
Natural Language Toolkit
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.

History of the English Language
Category:
History of the English Language
The English language, like all languages, traces its ultimate ancestry to a time predating the written word. Since history relies heavily on written documents as records of the past, it follows logically that the roots of language must be prehistoric. This fact makes it much more difficult to pin down the development of English's earliest linguistic ancestors. However, thanks to some stunning work by philologists and linguists, we can actually trace the history of languages in Europe far into the remote past--possibly as far back as 5,000 BCE. This background will lead to the growth of what we call "Anglo-Saxon English" in the fifth-century CE, which in turn will become Middle English after the Norman Invasion of 1066, and then give us Modern English in the Renaissance.

simon listens: Home
Category:
simon listens: Home
# is an open-source speech recognition program and replaces the mouse and keyboard.
# ... is designed to be very flexible and allows customization for any application where speech recognition is needed.
# ... is a potential European project of "e-inclusion" because of the language-independent programming.
# ... is in development for physically disabled people to give them the possibility to chat, to write e-mails, to surf the internet, to do internet-banking and much more.

Forvo: the pronunciation guide.
Category:
Forvo: the pronunciation guide. All the words in the world pronounced by native speakers
Forvo is the largest pronunciation guide in the world. Ever wondered how a word is pronounced? Ask for that word or name, and another user will pronounce it for you. You can also help others by recording your pronunciations in your own language.

Poliqarp
Category:
Poliqarp | freshmeat.net
Poliqarp is a universal suite of utilities for processing large corpora. It includes a concordancer that works on binary corpora compiled for efficient searching and a corpus builder. It supports positional tagsets, ambiguities in the texts, and Unicode.

Tool for natural language processing