I recently transfered my website to ARVIXE, which includes a forum powered by an MSSQL database.
The forum is in the Greek language and thus, the posts contained latin and non-latin (Greek) characters. While the forum displayed nicely on my former web hosting company, the Greek text was garbled on ARVIXE. This was a major problem since the forum constituted some 75% of the total traffic of the site (around 700K pageviews/month).
When my programming experience doesn’t serve me, I resort to common sense. Since I had not made any code changes before and after transfering the site to the new host, it was obvious that this was a matter of a server / database configuration. The two companies (both located in the US) had applied different settings to the server.
At first I went down the collation road. I tried changing the collation of the database from the default Latin1_General_CI_AI to Greek_CI_AI, but nothing changed. Then I deleted the db, asked support to create it again using the correct collation, and restored it for one more time from my .bak file. As it turned out this was hopeless since:
If you restore a database, RESTORE uses the collation of the source database that was recorded in the backup file. The restored database has the same collation as the original database that was backed up. Individual objects within the database that have different collations also retain their original collation. The database can be restored even if the instance on which you run RESTORE has a default collation different from the instance on which BACKUP was run.
By restoring my database, I was also changing the default collation anyway. It was also stupid, because changing the collation does not change the data in the database:
Each SQL Server collation specifies three properties:
- The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.
- The sort order to use for non-Unicode character data types (char, varchar, and text).
- The code page used to store non-Unicode character data .(Note You cannot specify the equivalent of a code page for the Unicode data types (nchar, nvarchar, and ntext). The double-byte bit patterns used for Unicode characters are defined by the Unicode standard and cannot be changed.)
Since my database was using nvarchar fields to store the text, this made no difference whatsoever.
Clearly, the collation road was a dead-end. I had to chose a different approach. I started investigating the problem and noticed something really important. It was not only the database characters garbled, but also the forum labels and messages, such as the Greek equivalent of “Active users”, “Your last visit was on”, “Birtdays” etc. I completely overlooked this because my site was already down for 5 days and I was in a state of panic. Obviously, the problem was not with the database, but with the page itself.
I played around with the <meta http-equiv=”Content-Type” content=”text/html; charset=windows-1253″ /> metatag, although I was fairly certain it wouldn’t make a difference. And then it struck me. The site was using classic ASP and by adding the preferred codepage on top, the problem could go away.
Sure enough, I entered the line <% @ LANGUAGE=VBScript CODEPAGE=1252 %> on top of all the publishing pages and the problem disappeared.