还是工程应用的文章,utf8 3位汉字编码,至少在中日韩统一汉字中是这样,但怎么判断一个长度为3的string是不是汉字呢?
int is_utf8_zh_basic(const char * str)
    if (strlen(str)<3) return 0;

    /*basic check if str is 1110xxxx 10xxxxxx 10xxxxxx*/
    if ((str[0]+256)/16!=14) return 0;
    if ((str[1]+256)/64!=2) return 0;
    if ((str[2]+256)/64!=2) return 0;

    int code=(((str[0]+256)%16)*64*64+((str[1]+256)%64)*64+(str[2]+256)%64);
    if ((code>=0x4E00)&&(code<=0x9FbF))  return 1; else return 0;

This entry was posted in 未分类. Bookmark the permalink.

One Response to Unicode(UTF8)中日韩统一汉字(U+4E00–U+9FBF)判断程序

  1. Susan says:

    中日韩统一汉字it is a very good idea, there are great difficulties in realization of it, but if it comes true how great progress would be in the world communiction and and world culture, great contritution to the human being.!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s