C#把中英文标点符号转Unicode编码

亮术网 2020-08-11 本网原创

判断文字是中文还是英文时，需要根据它们各自的 Unicode 范围判断。汉字和26个字母都有固定连续的 Unicode 范围，但标点符号Unicode编码范围就不连续，用 Unicode 范围很不好判断，但可以用枚举的方法判断。中文标点符号Unicode编码在网上很容易找到，但都不齐全；英语标点符号Unicode编码在网上几乎找不到，如果到 Unicode编码表中一个个找将相当麻烦，费时费力，有无更好的办法解决这个问题？

既然会编写程序，何不先找到标点符号（容易找），再把它位转为 Unicode编码，转换代码又简单，并且自动输出指定标点符号的Unicode编码，十分灵活，想要什么标点输出什么。

中英文标点符号转Unicode编码实现方法：

/// <summary>
  /// C#标点符号转Unicode编码
  /// </summary>
  /// <param name="text">标点符号</param>
  /// <returns>标点符号Unicode编码</returns>
  public string PuncationToUnicode(string text)
  {
    bool flag = false;
    string temp = null;
    foreach (char c in text)
    {
      if (flag)
        temp += ",0x" + string.Format("{0:x}", (int)c);
      else
        temp += "0x" + string.Format("{0:x}", (int)c);

    if (!flag)
        flag = true;
    }
    return temp;
  }

1、中文标点符号转Unicode编码：

string cnPuncation = "。？！，、；：‘’“”（）〔〕【】「」『』—…–．《》〈〉";

PuncationToUnicode(cnPuncation);

输出中文标点符号Unicode编码：　

0x3002,0xff1f,0xff01,0xff0c,0x3001,0xff1b,0xff1a,0x2018,0x2019,0x201c,0x201d,0xff08,0xff09,
0x3014,0x3015,0x3010,0x3011,0x300c,0x300d,0x300e,0x300f,0x2014,0x2026,0x2013,0xff0e,

0x300a,0x300b,0x3008,0x3009

2、英语标点符号转Unicode编码：

string enPuncation = ",.:;？!\"'...-()()[]{}/～¨";

PuncationToUnicode(enPuncation);

输出英语标点符号Unicode编码：

0x2c,0x2e,0x3a,0x3b,0xff1f,0x21,0x22,0x27,0x2e,0x2e,0x2e,0x2d,0x28,0x29,0x28,0x29,0x5b
0x5d,0x7b,0x7d,0x2f,0xff5e,0xa8

本文浓缩标签：标点符号 unicode 编码 C#

提问或评注

相关阅读

C#把中英文标点符号转Unicode编码

中英文标点符号转Unicode编码实现方法：

1、中文标点符号转Unicode编码：

2、英语标点符号转Unicode编码：

相关阅读