-
Notifications
You must be signed in to change notification settings - Fork 2
git mirror of C# port of Mozilla Universal Charset Detector
License
linquize/ude
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DESCRIPTION =========== Ude is a C# port of Mozilla Universal Charset Detector. The original source code is available at: http://mxr.mozilla.org/mozilla/source/extensions/universalchardet/src/ The article "A composite approach to language/encoding detection" describes the algorithms of Universal Charset Detector and is available at: http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/doc/UniversalCharsetDetection.doc Some data-structures used into this port have been adapted from the Java port "juniversalchardet", available at: http://code.google.com/p/juniversalchardet/ Another port I know of is "chardet" (in Python) available at: http://chardet.feedparser.org/ USAGE ======= Import the library: using Ude; and feed a stream or a byte array to the detector. Call DataEnd to notify the detector that you want back the result: ICharsetDetector cdet = new CharsetDetector(); byte[] buff = new byte[1024]; int read; while ((read = stream.Read(buff, 0, buff.Length)) > 0 && !done) { cdet.Feed(buff, 0, read); } cdet.DataEnd(); Console.WriteLine("Charset: {0}, confidence: {1}, cdet.Charset, cdet.Confidence); Alternatively, you can feed a Stream to the detector: using (FileStream fs = File.OpenRead(filename)) { ICharsetDetector cdet = new CharsetDetector(); cdet.Feed(fs); cdet.DataEnd(); Console.WriteLine("Charset: {0}, confidence: {1}, cdet.Charset, cdet.Confidence); } Or you can provide an alternative implementation of the interface - Ude.ICharsetDetector - that wraps the original nsUniversalDetector API. LICENSE ======= This library is subject to the Mozilla Public License Version 1.1 (the "License"). Alternatively, the contents of this file may be used under the terms of either the GNU General Public License Version 2 or later (the "GPL"), or the GNU Lesser General Public License Version 2.1 or later (the "LGPL")
About
git mirror of C# port of Mozilla Universal Charset Detector
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published