Character Codes and (Special) Tag Characters in HTML5 - dummies

Character Codes and (Special) Tag Characters in HTML5

By Ed Tittel, Chris Minnick

Encodings for the ISO Latin-1 character set are supplied by default in all modern web browsers. (Search for “ISO Latin-1 character set” to find a complete table of values.) Thus, the character entities in that set may be used directly in HTML markup without going through any special contortions.

However, using other encodings requires inclusion of special markup to tell the browser to interpret Unicode character codes. (Unicode is an international standard — ISO standard 10645, in fact — that embraces enough codes to handle most human alphabets, plus plenty of symbols and non-alphabetic characters, too.) This special markup takes this form:

<meta charset="UTF-8">

Because the charset value reads UTF-8, you can reference all common Unicode values. (UTF-8 stands for UCS Transformation Format 8-bit, an encoding format that represents all Unicode characters. Search for “Unicode UTF-8 character table” to skim over its one-million-plus character codes.)

Although today’s browsers support UTF-8 more or less universally, expect to see support for UTF-16 character codes sometime soon. UTF-16 character codes let browsers deal more effectively with non-Roman alphabets such as Arabic, katakana (Japanese ideographs), and Hangul (Korean ideographs), which some browsers struggle to render correctly and completely today.

HTML-savvy software assumes that certain HTML characters, such as the left and right angle brackets (less-than and greater-than signs in math notation) are meant to be hidden and not displayed on your finished web pages. If you actually want to display these characters on your pages, you must make your wishes clear to the browser.

These entities enable display of characters that are normally part of hidden HTML markup:

  • left angle bracket (<): &lt;

  • right angle bracket (>): &gt;

  • ampersand (&): &amp;

If you need these symbols to appear, include their entities in your markup like this:

<p>The paragraph element identifies some text as a Paragraph: </p>
<p>&lt;p&gt;This is a paragraph&lt;/p&gt;</p>

This figure shows how these entities appear inside a browser window.