Grabbing parallel corpora from the web

Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortunately, these resources are difficult to obtain and organize. In this document we describe a set of tools designed to help in the task of mining bilingual resources from the web, from a specific site,...

Full description

Bibliographic Details
Main Author:	Almeida, J. J. (author)
Other Authors:	Simões, Alberto (author), Castro, José Alves de (author)
Format:	article
Language:	eng
Published:	2002
Subjects:	Corpora paralelos Web-mining
Online Access:	http://hdl.handle.net/1822/599
Country:	Portugal
Oai:	oai:repositorium.sdum.uminho.pt:1822/599

Description
Summary:	Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortunately, these resources are difficult to obtain and organize. In this document we describe a set of tools designed to help in the task of mining bilingual resources from the web, from a specific site, from a file system, from a list of URLs, or from a translation memory. As a design goal we intend to build tools that can be used both cooperatively (in pipeline) and also in a independent way.

Grabbing parallel corpora from the web

Similar Items

Need Help?