C# convert punctuation marks to Unicode encoding

Lionsure 2020-08-11 Original by the website

When judging whether the text is DBCS or English, it needs to be judged according to their respective Unicode ranges. DBCS characters and 26 letters have a fixed and continuous Unicode range, but the Unicode encoding range of punctuation is not continuous. It is difficult to judge using the Unicode range, but it can be judged by enumeration. Punctuation Unicode codes are hardly found on the Internet. It will be quite troublesome to find one by one in the Unicode code table. Is there a better way? solve this problem?

Since I can write programs, why not first find the punctuation mark(easy to find), and then convert it to Unicode. The conversion code is simple, and the Unicode code of the specified punctuation mark is automatically output. It is very flexible, and what punctuation you want to output.

 

The method of converting punctuation marks to Unicode encoding:

/// <summary>
       /// C# convert punctuation marks to Unicode encoding
       /// </summary>
       /// <param name="text">Punctuation marks</param>
       /// <returns>Punctuation Unicode encoding</returns>

       public string PuncationToUnicode(string text)
       {
              bool flag = false;
              string temp = null;
              foreach (char c in text)
              {
                     if (flag)
                            temp += ",0x" + string.Format("{0:x}", (int)c);
                     else
                            temp += "0x" + string.Format("{0:x}", (int)c);

              if (!flag)
                            flag = true;
              }
              return temp;
       }

 

onvert English punctuation marks to Unicode encoding:

string enPuncation = ",.:;?!\"'...-()()[]{}/~¨";

PuncationToUnicode(enPuncation);

 

Output English punctuation Unicode encoding:

0x2c,0x2e,0x3a,0x3b,0x3f,0x21,0x22,0x27,0x2e,0x2e,0x2e,0x2d,0x28,0x29,0x28,0x29,0x5b,
0x5d,0x7b,0x7d,0x2f,0x7e,0xa8