This page shows what happens when a page is encoded with one charset, and displayed with another.
The browser gets the encoding from either a meta tag in the page, or from HTTP headers sent by the server.
The meta tag takes the form:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
This tag indicates the page is encoded using UTF-8.
An HTTP header delivered by the server takes the form:
content-type: text/html;charset=utf-8
The HTTP header always overrides the meta tag in the page.
The table below is taken from the page Symbols, Entities and Numeric References.
- Display 1 - As it should be: encoded and displayed as UTF-8
- Display 2 - Again as it should be: but encoded and displayed as Windows 1252
- Display 3 - Encoded as UTF-8 and displayed as Windows 1252
- Display 4 - Encoded as Windows 1252 and displayed as UTF-8
Notice how some characters (the symbols) are morphed into other characters in displays 3 and 4. This is because the page has been encoded in one charset and displayed using another (for example, encoded in UTF-8 and displayed in windows-1252).
Display 1
Causes
There are (at least) two possible causes:
- The page is comprised of a main page encoded in, say windows-1252, but includes (using SSI or ASP) a page saved with a different encoding (say UTF-8)
- The page is correctly saved with windows-1252 encoding, but the server adds a HTTP header that overrides the <meta> statement in the page and tells the browser to use UTF-8 instead.
The answer to this problem is to make sure that:
- All pages are encoded to the same charset.
- Page encoding is the same as the encoding in the server headers.
- If using asp.NET, ensure the pages are encoded to UTF-8.asp.NET encodes controls to that charset regardless of the page encoding.