astrunc （Used C++11）

English-Description

astrunc: A library that performs sentence break processing of utf-8 encoded text segments in natural language processing.

Make-sample

git clone https://github.com/Joke-Shi/astrunc.git
cd astrunc
make

cd sample
ls

API

/**
 * @Brief: Split utf-8 encoded text segment, and then output sentence string vector.
 *
 * @Param: __vs,     Output split text segment result, which sentence string vector.
 * @Param: __seg,    Input utf-8 encoded text segment.
 * @Param: __lang,   Specify in the astrunc.h file.
 * @Param: __nchars, Max a sentence chars size.
 *
 * @Return: Ok->0, Other->-1.
 **/
int astrunc::access::split( std::vector< std::string > &__vs, const std::string &__seg, astrunc::access::lang_t __lang, int __nchars);

SAMPLE

#include "astrunc.h"


int main( int argc, const char **argv)
{         
  std::string s_en = "He said: Chang suggested organizing a canine patrol squad at the "
                     "museum and, in 1987, dogs came back. At 1 year old, each dog is "
                     "given a tailored training course. ";
                     
  std::vector< std::string> vec_s;
  int rc = astrunc::access::split( vec_s, s_en, astrunc::access::EN, 32);
  if ( 0 == rc) {
    for ( auto const &__s : vec_s) {
      std::cout << __s << std::endl;
      std::cout << "------------------------------------------------------------------\n";
    }
  } else { std::cerr << "[Error]: Split error\n"; }


  return rc;
}

中文说明

astrunc：用在自然语言处理中对utf-8编码的文本段进行断句处理的库。

Make-实例

git clone https://github.com/Joke-Shi/astrunc.git
cd astrunc
make

cd sample
ls

使用接口

/**
 * @Brief: Split utf-8 encoded text segment, and then output sentence string vector.
 *
 * @Param: __vs,     分割utf-8编码文本段过后的句子列表
 * @Param: __seg,    输入的utf-8编码的文本段；
 * @Param: __lang,   语言简称，具体在astrunc.h文件中描述；
 * @Param: __nchars, 每个句子限定的字符个数。
 *
 * @Return: Ok->0, Other->-1.
 **/
int astrunc::access::split( std::vector< std::string > &__vs, const std::string &__seg, astrunc::access::lang_t __lang, int __nchars);

使用示例

#include "astrunc.h"


int main( int argc, const char **argv)
{
  std::string s_zh = "Prometheus，它的价值在于可靠性，甚至在很恶劣的环境下，你都可以随时访问它和查"
                     "看系统服务各种指标的统计信息。 如果你对统计数据需要100%的精确，它并不适用，例"
                     "如：它不适用于实时计费系统。";
                     
  std::vector< std::string> vec_s;
  int rc = astrunc::access::split( vec_s, s_zh, astrunc::access::ZH, 32);
  if ( 0 == rc) {
    for ( auto const &__s : vec_s) {
      std::cout << __s << std::endl;
      std::cout << "------------------------------------------------------------------\n";
    }
  } else { std::cerr << "[Error]: Split error\n"; }


  return rc;
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
include		include
sample		sample
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

astrunc （Used C++11）

English-Description

Make-sample

API

SAMPLE

中文说明

Make-实例

使用接口

使用示例

About

Releases

Packages

Languages

License

Joke-Shi/astrunc

Folders and files

Latest commit

History

Repository files navigation

astrunc （Used C++11）

English-Description

Make-sample

API

SAMPLE

中文说明

Make-实例

使用接口

使用示例

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages