close
Warning:
AdminModule failed with TracError: Unable to instantiate component <class 'trac.ticket.admin.ComponentAdminPanel'> (super(type, obj): obj must be an instance or subtype of type)
- Timestamp:
-
Jan 17, 2017, 12:52:15 PM (9 years ago)
- Author:
-
xbaisa
- Comment:
-
--
Legend:
- Unmodified
- Added
- Removed
- Modified
-
|
v28
|
v29
|
|
| 6 | 6 | The system includes selected corpus processing tools and the following HaBiT corpora: |
| 7 | 7 | |
| 8 | | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=wic&reload=1 Amharic WIC corpus], 200 thousand tokens |
| | 8 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=wic&reload=1 Amharic WIC corpus], 200 thousand tokens |
| 9 | 9 | |
| 10 | 10 | Amharic WIC corpus (News from Walta Information Center), manually tagged. |
| 11 | 11 | |
| 12 | | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=amwac16&reload=1 Amharic WaC corpus], 17 million tokens |
| | 12 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 Amharic WaC corpus], 17 million tokens |
| 13 | 13 | |
| 14 | 14 | Amharic web corpus. Crawled by !SpiderLing in August 2013 and October 2015. Encoded in UTF-8, cleaned, deduplicated. Automatically tagged by !TreeTagger trained on Amharic WiC |
| 15 | 15 | |
| 16 | | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=or_spoken Oromo spoken corpus], 7,500 tokens. |
| | 16 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=or_spoken Oromo spoken corpus], 7,500 tokens. |
| 17 | 17 | |
| 18 | 18 | Oromo spoken corpus containing 1205 utterances. Built by Text Laboratory, University of Oslo. |
| 19 | | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=orwac16 Oromo WaC corpus], 5.1 million tokens. |
| | 19 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 Oromo WaC corpus], 5.1 million tokens. |
| 20 | 20 | |
| 21 | 21 | Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. |
| 22 | | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=sowac16 Somali WaC corpus], 80 million tokens. |
| | 22 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 Somali WaC corpus], 80 million tokens. |
| 23 | 23 | |
| 24 | 24 | Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. |
| 25 | | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=tiwac16 Tigrinya WaC corpus], 2.5 million tokens. |
| | 25 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 Tigrinya WaC corpus], 2.5 million tokens. |
| 26 | 26 | |
| 27 | 27 | Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. |
| | 28 | |
| | 29 | * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=czech_norwegian_opus__norwegian Czech-Norwegian parallel corpus], 4 million aligned segments. |
| | 30 | |
| | 31 | Czech-Norwegian parallel corpus from subtitles, OpenSubtitles2016 subcorpus of OPUS2, filtered for Czech and Norwegian. |
| 28 | 32 | |
| 29 | 33 | == Publications == |