Weird characters when saving UTF-8

Mar 7, 2011 at 8:55 AM

Hello there,

When i save the xml i have no problem when viewing with Internet Explorer or any other xml editor. But in unix systems xml has weird characters at the header which is at below. I have 3 questions according to my problem.

  1. Can i save xml without declaring encoding at top? If i use streamwriter without encoding, it makes it utf-8 again.
  2. Can i save xml with utf-8 encoding but can i make utf-8 with capital letters? <?xml version="1.0" encoding="UTF-8"?>
  3. What is the reason if i save the xml in utf-8 and put weird characters like this?

<?xml version="1.0" encoding="utf-8"?>


 


 

Mar 7, 2011 at 11:05 AM

1) The text at the top is written by the XMLWriter, not the StreamWriter, so the written encoding of the XML and the logical encoding of the file are two seperate things.

So, use the XMLWriter override of XTypedElement.Save(), create an XMLTextWriter (XMLWriter is just an abstract class) and set it's Settings property which takes an XMLWriterSettings instance. This has an Encoding propery, which tells the XMLWriter what to write in the XML declaration. However, it looks like it takes it default from the underlying TextWriter anyway (I'm reading the docs as I write this). So using the StreamWriter without encoding defaults to utf8.

2) No idea. Cheat, write it to a string first and use a Regex?

3) That's a BOM, a Byte Order Mark, in your case it's: 0xEF, 0xBB, 0xEF. There shouldn't be one on utf8 as it breaks the utf8/ASCII backwards compatibility (or it's recommended not to have one)

From the docs:

"StreamWriter defaults to using an instance of UTF8Encoding unless specified otherwise. This instance of UTF8Encoding is constructed without a byte order mark (BOM), so its GetPreamble method returns an empty byte array. To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding)."


Looks like it's the constructor you're using? Guess you'll have to figure out how to switch the BOM off.

Here's a bunch of stuff to read about:

BOM:

http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

Stream Writers and BOMs:

http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx

XMLTextWriter:

http://msdn.microsoft.com/en-us/library/system.xml.xmltextwriter.aspx

XMLWriterSettings:

http://msdn.microsoft.com/en-us/library/system.xml.xmlwritersettings.aspx

J.

Mar 7, 2011 at 12:41 PM

I tried with below code. But how can i validate the xml not setted according to BOM?

                                var streamWriter = new StreamWriter(fileName, true, new UTF8Encoding(false));
                                xml.Save(streamWriter);
                                streamWriter.Flush();
                                streamWriter.Dispose();

Mar 7, 2011 at 7:43 PM
Edited Mar 7, 2011 at 7:44 PM

I'm pretty sure that below code working fine. Thank you roboj1m for great information!

        xml.Save(fileName, new UTF8Encoding(false));

        public void Save(string xmlFile, Encoding encoding)
        {
            using (XmlWriter writer = new XmlTextWriter(xmlFile, encoding))
            {
                XTypedServices.Save(writer, Untyped);
                writer.Flush();
            }
        }

Mar 8, 2011 at 11:53 AM

Nice!

I expressed that method as an Extension for people that don't want to compile up their own DLL or just want a nicer way of adding toolkit methods.

I also added the 3 other normal Save() methods because you can only save instances of classes based on Elements, not ComplexTypes (because this saves xml that is invalid against the xsd)

But if you do want to anyway, these methods add Save() to any XTypedElement.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Xml.Schema.Linq;
using System.Xml.Linq;
using System.Xml.Schema;
using System.IO;
using System.Xml;

namespace Test
{
    public static class XTypedElementExtensions
    {
        public static void Save(this XTypedElement xte, string filename, Encoding encoding)
        {
            using (StreamWriter sw = new StreamWriter(File.Create(filename), encoding))
            {
                XTypedServices.Save(sw, xte.Untyped);
                sw.Flush();
            }
        }

        public static void Save(this XTypedElement xte, string filename)
        {
            XTypedServices.Save(filename, xte.Untyped);
        }

        public static void Save(this XTypedElement xte, XmlWriter xmlWriter)
        {
            XTypedServices.Save(xmlWriter, xte.Untyped);
        }

        public static void Save(this XTypedElement xte, TextWriter textWriter)
        {
            XTypedServices.Save(textWriter, xte.Untyped);
        }
    }
}


J.

Mar 8, 2011 at 1:44 PM

Nicely done m8. That's how i implement my program as well. Thank you for sharing.