Character Count By Bytes

Posted by

Validating user input by string length is the most common way to restrict a length of text data. However, the problem is that although a fullwidth character takes up twice as space as halfwidth, all counts as 1. If you want to display user input text based on how much space it takes up, the solution would be counting characters by bytes.

How to tell fullwidth from halfwidth characters

In UTF-8, halfwidth characters are always 1 byte worth of data while fullwidth characters vary from 2 to 6 bytes. Taking advantage of that, this function counts halfwidth as 1/2 and fullwidth as 1.

function countByByte(char) {
    var halfwidth = char.match(/^[\x01-\x7E\xA1-\xDF]+$/) ? true : false;
    if(halfwidth) {
        return 0.5;
    }
   return 1;
 }

Sample code here.

Thanks for reading.

Hope you enjoyed the article. If you have any question or opinion to share, feel free to write some comments.

Facebook Comments