FrequencyIndexCreator (LanguageTool 6.4-SNAPSHOT API)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.languagetool.dev.bigdata.FrequencyIndexCreator

```
public class FrequencyIndexCreator
extends Object
```
Index *.gz files from Google's ngram corpus into a Lucene index ('text' mode) or aggregate them to plain text files ('lucene' mode). Index time (1 doc = 1 ngram and its count, years are aggregated into one number): 130µs/doc (both on an external USB hard disk or on an internal SSD) = about 7700 docs/sec
The reason this isn't faster is not Lucene but the aggregation work we do or simply the large amount of data. Indexing every line takes 3µs/doc, i.e. Lucene can index about 333,000 docs/s.
Also see https://dev.languagetool.org/finding-errors-using-n-gram-data.

Since:

2.7

- Constructor Summary
  
  Constructors
  Constructor and Description
  
  FrequencyIndexCreator(org.languagetool.dev.bigdata.FrequencyIndexCreator.Mode mode)
- Method Summary
  
  All Methods Static Methods Concrete Methods
  Modifier and Type Method and Description
  
  static void main(String[] args)
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

FrequencyIndexCreator

public FrequencyIndexCreator(org.languagetool.dev.bigdata.FrequencyIndexCreator.Mode mode)

Method Detail

main

public static void main(String[] args)
                 throws Exception

Throws:: Exception

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method