程式寫的好,要飯要到老: [vc]Big5、Unicode、UTF-8轉換

最近在寫QRCode，我一直以為QRCode內若要顯示中文，則必須將Big5轉為Unicode後填入即可，結果證明，實際上是填入UTF-8碼才能被讀取。

那麼Unicode跟UTF-8有差嗎?
答案是：兩個本來就不一樣！

每個軟體開發者都絕對一定要會的Unicode及字元集必備知識(沒有藉口！)

一開始我打算將"中文"兩個字弄到QRCode內顯示，
其Big5為：0xA4 0xA4 0xA4 0xE5
其Unicode為：0x4E 0x2D 0x65 0x87
我把 0x4E 0x2D 0x65 0x87塞到QRCode中，想當然爾，怎麼掃都掃不出"中文"兩字，因為它其實是code point~

看過上面那篇文章後，才知道他是指 U+4E2D U+6587
再去查 Unicode Table ，對應到UTF-8的 0xE4 0xB8 0xAD(中) 0xE6 0x96 0x87(文)
再把轉出來的UTF-8 code 塞到QRCode中，才使解碼器正確的解讀出"中文"兩字！

<code>

/***********將BIG5轉換為UTF8***********/

char *szData="中文",*sendbuf_utf8=NULL;
wchar_t *sendbuf_Unicode=NULL;

//big5->unicode
int nDataLen=MultiByteToWideChar (CP_ACP, 0, szData, -1, NULL,0) ;
sendbuf_Unicode=new wchar_t[nDataLen+1];
MultiByteToWideChar(CP_ACP, 0, szData, -1, sendbuf_Unicode, nDataLen);

//unicode->UTF-8
nDataLen=WideCharToMultiByte (CP_UTF8, 0, sendbuf_Unicode, -1, NULL,0 ,NULL, NULL);
sendbuf_utf8=new char[nDataLen+1];
WideCharToMultiByte (CP_UTF8, 0, sendbuf_Unicode, -1, sendbuf_utf8,nDataLen, NULL, NULL);
//到這邊，sendbuf_utf8內的資料即是"中文"兩字的utf-8編碼了~

//delete new buffer
delete []sendbuf_utf8;sendbuf_utf8=NULL;
delete []sendbuf_Unicode;sendbuf_Unicode=NULL;

</code>

程式寫的好,要飯要到老

2011/08/16

[vc]Big5、Unicode、UTF-8轉換

2 則留言: