A large Portuguese corpus on-line: cleaning and preprocessing
We present a newly available on-line resource for Portuguese,a corpus of 310 million words, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. Here we report on work carried out on the corpus previous toits publication on-line. We focu...
Autor principal: | |
---|---|
Outros Autores: | , |
Formato: | conferenceObject |
Idioma: | eng |
Publicado em: |
2019
|
Assuntos: | |
Texto completo: | http://hdl.handle.net/10451/37430 |
País: | Portugal |
Oai: | oai:repositorio.ul.pt:10451/37430 |
Resumo: | We present a newly available on-line resource for Portuguese,a corpus of 310 million words, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. Here we report on work carried out on the corpus previous toits publication on-line. We focus on the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. |
---|