Weka package for loader and saver for common CSV formats, using the Apache Commons CSV library.
Requires Weka 3.9.5+.
Supported formats:
- DEFAULT - Standard Comma Separated Value format, as for RFC4180 but allowing empty lines.
- EXCEL - The Microsoft Excel CSV format.
- INFORMIX_UNLOAD - Informix UNLOAD format used by the UNLOAD TO file_name operation.
- INFORMIX_UNLOAD_CSV - Informix CSV UNLOAD format used by the UNLOAD TO file_name operation (escaping is disabled.)
- MYSQL - The MySQL CSV format.
- ORACLE - Default Oracle format used by the SQL*Loader utility.
- POSTGRESSQL_CSV - Default PostgreSQL CSV format used by the COPY operation.
- POSTGRESSQL_TEXT - Default PostgreSQL text format used by the COPY operation.
- RFC-4180 - The RFC-4180 format defined by RFC-4180.
- TDF - A tab delimited format.
The loader:
CommonCSVLoader <file.csv | file.tsv | file.txt> [options]
-decimal <num>
The maximum number of digits to print after the decimal
place for numeric values (default: 6)
The CSV format to use
(default: DEFAULT)
Whether to use custom field separator
(default: no)
-custom-field-separator <separator-char>
The custom field separator
(default: ,)
Whether to use custom quote character
(default: no)
-custom-quote-character <quote-char>
The custom quote character
(default: ")
Whether to use custom quote mode
(default: no)
The custom quote mode
(default: MINIMAL)
Whether to use custom escape character
(default: no)
-custom-escape-character <escape-char>
The custom escape character
(default: )
Whether there is no header row in the spreadsheet
(default: assumes header row present)
-nominal <range>
The attribute range to treat as nominal
(default: none)
-nominal-label-spec <nominal label spec>
Optional specification of legal labels for nominal
attributes. May be specified multiple times.
The spec contains two parts separated by a ":".
The first part can be a range of attribute indexes or
a comma-separated list off attruibute names;
the second part is a comma-separated list of labels. E.g.:
"1,2,4-6:red,green,blue" or "att1,att2:red,green,blue"
-string <range>
The attribute range to treat as string
(default: none)
-date <range>
The attribute range to treat as date
(default: none)
-date-format <format>
The format to use for parsing the date attribute(s)
see: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/format/DateTimeFormatter.html
(default: yyyy-MM-dd'T'HH:mm:ss)
-missing-value <string>
The string to interpret as missing value
(default: '')
-num-rows-type-detection <int>
The number of rows to use for detecting numeric rows
(default: '100')
The saver:
CommonCSVSaver options:
-i <the input file>
The input file
-o <the output file>
The output file
The CSV format to use
(default: DEFAULT)
Whether to use custom field separator
(default: no)
-custom-field-separator <separator-char>
The custom field separator
(default: ,)
Whether to use custom quote character
(default: no)
-custom-quote-character <quote-char>
The custom quote character
(default: ")
Whether to use custom quote mode
(default: no)
The custom quote mode
(default: MINIMAL)
Whether to use custom escape character
(default: no)
-custom-escape-character <escape-char>
The custom escape character
(default: )
Whether to suppress output of header row
(default: outputs header)
Use the following dependency in your pom.xml
For more information on how to install the package, see: