Grabbing parallel corpora from the web

Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortunately, these resources are difficult to obtain and organize. In this document we describe a set of tools designed to help in the task of mining bilingual resources from the web, from a specific site,...

Full description

Bibliographic Details
Main Author: Almeida, J. J. (author)
Other Authors: Simões, Alberto (author), Castro, José Alves de (author)
Format: article
Language:eng
Published: 2002
Subjects:
Online Access:http://hdl.handle.net/1822/599
Country:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/599
Description
Summary:Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortunately, these resources are difficult to obtain and organize. In this document we describe a set of tools designed to help in the task of mining bilingual resources from the web, from a specific site, from a file system, from a list of URLs, or from a translation memory. As a design goal we intend to build tools that can be used both cooperatively (in pipeline) and also in a independent way.