首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >为什么XmlWriter无法用等效字符实体替换当前编码不支持的字符?

为什么XmlWriter无法用等效字符实体替换当前编码不支持的字符?
EN

Stack Overflow用户
提问于 2022-07-07 13:19:49
回答 2查看 88关注 0票数 -1

我试图使用UTF8以外的编码(在本例中为Encoding.ASCII)将一个XmlWriter写入文件,并让XmlWriter用等效字符实体自动替换编码不支持的字符。为此,我使用了来自https://stackoverflow.com/q/72348095/3744182的示例代码。但是,对于我的XML,它不是用字符实体替换字符,而是抛出一个异常,如下所示:

无法将索引5852处的Unicode字符\u2018转换为指定的代码page.Encode_Save

这一例外的原因是什么?为什么不支持的字符没有像预期的那样被转义呢?

使用的代码:

我的序列化代码:

代码语言:javascript
复制
using (var stream = new FileStream(clsGlobal.outputXMLPath, FileMode.OpenOrCreate))
{
    clsGlobal.XMLDoc.Save(stream, indent: false, encoding: Encoding.ASCII, omitXmlDeclaration: false);
}

链接问题中的Save()代码:

代码语言:javascript
复制
public static class XmlSerializationHelper
{
    public static string GetOuterXml(this XmlNode node, bool indent = false, Encoding encoding = null, bool omitXmlDeclaration = false)
    {
        if (node == null)
            return null;
        
            var stream = new MemoryStream();

            node.Save(stream, indent: indent, encoding: encoding, omitXmlDeclaration: omitXmlDeclaration, closeOutput: false);
            stream.Position = 0;
            
            var reader = new StreamReader(stream);
            return reader.ReadToEnd();
      
    }

    public static void Save(this XmlNode node, Stream stream, bool indent = false, Encoding encoding = null, bool omitXmlDeclaration = false, bool closeOutput = true) =>
        node.Save(stream, new XmlWriterSettings
        {
            Indent = indent,
            Encoding = encoding,               
            OmitXmlDeclaration = omitXmlDeclaration,
            CloseOutput = closeOutput,    
        });

    public static void Save(this XmlNode node, Stream stream, XmlWriterSettings settings)
    {
        try
        {
            using (var xmlWriter = XmlWriter.Create(stream, settings))
            {
                node.WriteTo(xmlWriter);
            }
        }
        catch (Exception ex)
        {
            clsGlobal.globalErrCount++;
            clsGlobal.WriteLog(ex.Message + "Encode_Save");
        }
    }
}

输入XML数据,存储在clsGlobal.XMLDoc

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<article dtd="RSCART3.8">
   <art-admin>
      <ms-id>BK9781839161964-00123</ms-id>
      <doi>10.1039/9781839165580-00123</doi>
   </art-admin>
   <published type="book">
      <journalref>
         <title>DNA Photodamage: From Light Absorption to Cellular Responses and Skin Cancer</title>
         <sercode>BK</sercode>
         <publisher>
            <orgname>
               <nameelt>Royal Society of Chemistry</nameelt>
            </orgname>
         </publisher>
         <issn type="isbn" />
         <cpyrt>© European Society for Photobiology 2022</cpyrt>
      </journalref>
      <volumeref>
         <link />
      </volumeref>
      <pubfront>
         <fpage>0</fpage>
         <lpage>0</lpage>
         <no-of-pages>0</no-of-pages>
         <date>
            <year>2022</year>
         </date>
      </pubfront>
   </published>
   <art-front>
      <titlegrp>
         <title>Chapter 2</title>
         <title>In Silico Tools to Assess Chemical Hazard</title>
      </titlegrp>
      <abstract>
         <p>
            Fundamentally, chemical hazard is a function of structure, and the quickest and cheapest way to predict toxicity is to do so from structure alone. Currently, there are many tools available to predict absorption, distribution, metabolism, and excretion (ADME), as well as some key endpoints, such as LD
            <inf>50</inf>
            (the minimal dose necessary to kill half the animals exposed), mutagenicity, skin sensitization, and ecotoxicity. While quantitative structure–activity relationships (QSARS) and read-across are well established, the field is rapidly changing with the advent of larger data sets and more sophisticated machine learning approaches. As computational power increases, 3D models may become widely available. However, virtually all models have blind spots, and some endpoints (such as developmental toxicity and endocrine disruption) have proven difficult to predict from structure alone – in these cases, it is necessary to use toxicity tests that capture the complexity of a biological system.
         </p>
      </abstract>
   </art-front>
   <art-body>
      <section>
         <no>0.0</no>
         <title>2.1 Introduction</title>
         <p>
            “It is obvious that there must exist a relation between the chemical constitution and the physiological action of a substance, but as yet scarcely any attempts have been made to discover what this relation is. . . .”
            <citref idrefs="cit1">1</citref>
            This was written in 1865 by Alexander Crum Brown, a chemist who worked in tandem with a medical student, and represents the very first conjecture of the basic principle that is the foundation of
            <it>in silico</it>
            toxicology: that, fundamentally, chemical hazard is a function of chemical structure. In theory, then, the quickest and cheapest way to predict toxicity is to do so from structure alone. In practice, as we shall see, this is often challenging – but understanding what we can and cannot predict from structure alone is a good way to understand how chemicals affect biological systems.
         </p>
         <p>
            At its most basic, a chemical can be said to be hazardous when it has the potential to interact with a biological system in a way that causes harm – or to use the regulatory term, “an adverse outcome.” Sometimes the negative effect is because a chemical is a mutagen –
            <it>e.g</it>
            . an electrophilic chemical might cause alkylation of DNA, which is nucleophilic, resulting in an error in the genetic code and, potentially, cancer. Or, a chemical might have a structure that so closely mimics a biological molecule that it can interact with a receptor for the endogenous molecule – as happens when chemicals that are large and coplanar, such as diethylstilbestrol, bind to the estrogen receptor and therefore prevent normal endocrine signaling. Similar mechanisms are thought to underlie many of the chemicals that are considered potential endocrine disruptors. A chemical can displace something essential –
            <it>e.g</it>
            . carbon monoxide (CO) binds more strongly to hemoglobin than oxygen, and in sufficient quantities, it will deprive tissues of oxygen, resulting in cellular death and eventually asphyxiation.
         </p>
         <p>
            Sometimes hazard is a straightforward result of the chemical properties of a molecule – most strong acids or bases will cause skin and eye irritation. Other times there are several steps –
            <it>e.g</it>
            . 2,4-dinitrochlorobenzene can easily be absorbed through the skin barrier, and then bind with many proteins in the dermal layer. These altered proteins (“haptens”) are then recognized by the immune system as “foreign material” – and because your immune system is always on the lookout for foreign proteins, it activates immune cells that respond to the hapten, creating an allergic reaction that will persist. In some cases, the chemical itself is not a problem, but once inside the body, it can be metabolized into something problematic, as in the case of acetaminophen.
         </p>
         <p>
            There are two main components to predicting toxicity. Toxicokinetics refers to how the xenobiotic is absorbed, distributed, metabolized, and excreted. Fundamentally, the balance of these factors determines the biologically effective dose – the amount of a xenobiotic that can cause harm. Toxicodynamics refers to how the chemical reacts in a negative way with biological molecules – proteins, DNA, or the cell membrane. Ultimately, the dose and the manner in which a compound causes harm determines whether there are effects at the cellular level. Severe enough effects at the cellular level eventually cause organ damage – the harmful outcome referred to as an “adverse effect.” (
            <figref idrefs="fig1">Figure 2.1</figref>
            ).
         </p>
      </section>      
      <section>
         <no>0.0</no>
         <title>2.5 Conclusion</title>
         <p>
            Skin sensitization is the one endpoint that also has multiple
            <it>in silico</it>
            tools models available, ranging from SAR approaches such as ToxTree to more sophisticated QSARs:
            <it>e.g.</it>
            PredSkin,
            <citref idrefs="cit47">47</citref>
            which is based on human data and available
            <it>via</it>
            the web, and the OECD QSAR Toolbox,
            <citref idrefs="cit48">48</citref>
            which has an automated workflow for skin sensitization. In general, most of these models perform well (with the OECD QSAR Toolbox having 80% balanced accuracy) although the models differ in their sensitivity and specificity
            <!--AQ27-->
            . The value of 80% might seem disappointing, but the reality is that the animal test these models are built off of – the LLNA test – is only ≈80% reproducible,
            <citref idrefs="cit46">46</citref>
            and although figures vary, it only predicts human sensitization with a similar level of accuracy.
            <citref idrefs="cit49">49</citref>
            As yet, these models typically predict binary sensitization status, instead of potency, which is a significant drawback – many chemicals that are very weak sensitizers are often predicted as sensitizers although their actual hazard under most exposure conditions might be small. However, the
            <it>in silico</it>
            models are, at this point, performing about as well as can be expected given the limitations of the data. Because skin sensitization represents an instance where the toxicodynamics are well understood – something we will discuss in Chapter 3 – it also offers an instance where
            <it>in vitro</it>
            data can be used as an effective supplement in
            <it>in silico</it>
            models. Further improvement will likely require new ways to think about combining
            <it>in silico</it>
            ,
            <it>in chemico</it>
            , and
            <it>in vitro</it>
            data.
         </p>
         <p>
            Currently, there are many tools available to predict ADME, as well as some key endpoints, such as LD
            <inf>50</inf>
            , mutagenicity, skin sensitization, and ecotoxicity. We can predict some important endpoints based on others –
            <it>e.g.</it>
            it does not take a great leap of imagination to understand that most skin irritants will also be eye irritants, even though the reverse is not always true. Skin sensitization should raise a concern for respiratory sensitization, although not conclusively as there are differences in bioavailability and mechanism that means this is not a universal rule.
            <citref idrefs="cit50">50</citref>
            A chemical that interferes in DNA replication is likely to cause developmental effects should it go through the fetal–placental barrier, but there are many mechanisms by which a chemical can cause developmental effects, and there are no validated models that are considered robust enough for regulatory acceptance. In theory, read-across and QSARs can be used in a well-defined chemical class if the mechanism is known. In practice, given the well-known difficulty of connecting structure to developmental toxicity, this remains an endpoint that requires an
            <it>in vivo</it>
            study for clarity.
         </p>
         <p>
            Of course, no model is perfect and there are several caveats that apply to all models broadly. A model is only as good as the data that goes into it, and in many instances the data will have a great deal of noise as well as missing data. Most data sets assembled for predictive models will not cover a diverse area of the chemical space, and are often biased towards positives, for the simple reason that people tend not to gather data on chemicals that are largely biologically inert. However, this can be problematic:
            <it>e.g.</it>
            if a data set consists of 100 chemicals, and 80% of them are considered skin sensitizers, a model that simply declares every chemical a sensitizer will have 80% accuracy. Therefore, when judging model performance, always look to the sensitivity, specificity, and balanced accuracy. Many models, like structural alerts and read-across, are better at identifying toxic compounds than establishing the absence of toxicity. While this is useful for screening-level approaches that are oriented towards being precautionary, it is problematic when trying to decide between chemical candidates in the R&amp;D phase.
         </p>
         <p>Passive diffusion is relatively easy to predict, because it depends solely on chemical properties, and because of this we have models that will predict diffusion across skin, intestine, and lung tissue. We can also predict whether a chemical will likely passively diffuse across the blood–brain barrier, but have few models that can identify transporter-mediated absorption. With the exception of the relatively well-studied PGP transporter, this has proven very difficult to model because of the diversity of transporters. The probability of a chemical being metabolized by a Phase I enzyme can also be predicted, even if the prediction of the metabolite is more difficult. Finally, based on physical chemical properties, we can estimate overall distribution, excretion, and half-life.</p>
         <p>
            In terms of toxicodynamics – predicting biological targets of chemicals and the downstream effects –the search space is more complicated both because of the diversity of targets and the biological variability of the subsequent events. Endpoints with a straightforward connection to chemical structure –
            <it>e.g.</it>
            mutagenicity and skin sensitization, which are both related to electrophilicity – can be proactively identified with structural alerts, and modeled with QSARs. More complicated endpoints can be predicted with limited success, and most such models should be treated with caution. If you do not truly understand the relationship between chemical structure and toxicity, read-across or QSARs will necessarily be limited – you can never know whether two similar molecules are in fact an activity cliff. Moreover, virtually all models will have some blindspots that will reflect the era in which they were developed as well as the data available, and if not updated will tend to become increasingly outdated.
         </p>
         <p>
            Finally,
            <it>in silico</it>
            approaches can only be used on discrete, organic structures
            <!--AQ28-->
            . By a rough estimate, however, that means that 50% of the chemicals within commerce cannot be evaluated with
            <it>in silico</it>
            tools, as they are mixtures (called UVCBs), metal compounds, or salts, in addition to containing impurities – and even small amounts of impurities can give rise to adverse events (
            <it>e.g.</it>
            sensitization or mutagenicity
            <!--AQ29-->
            ). Such chemicals are likely to increase as many bio-based chemicals are UVCBs, polymers, and engineered nanomaterials, which cannot be handled easily by existing
            <it>in silico</it>
            tools.
         </p>
         <p>Glossary</p>
         <figure id="fig1" xsrc="BK9781839161964-00123-f1.tif" pos="float">
            <title>
               Toxicokinetics and toxicodynamics together determine whether a xenobiotic will cause a disease. Adapted from ref.
               <citref idrefs="cit51">51</citref>
               , https://doi.org/10.14573/altex.1610101, under the terms of the CC BY 4.0 license,
               <url url="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</url>
               .
            </title>
         </figure>
         <figure id="fig2" xsrc="BK9781839161964-00123-f2.tif" pos="float">
            <title>Phase I metabolism involves either oxidation or hydrolysis, typically resulting in a more reactive intermediate. Phase II conjugates the compounds either with glutathione, in the case of electrophiles, or sulfation, acetylation, or glucuronidation to make a compound more water soluble.</title>
         </figure>
         <figure id="fig3" xsrc="BK9781839161964-00123-f3.tif" pos="float">
            <title>
               ADME is determined by absorption (ingestion, inhalation, or dermal), distribution primarily
               <it>via</it>
               The blood and lymph, and excretion
               <!--AQ87-->
               .
            </title>
         </figure>
         <figure id="fig4" xsrc="BK9781839161964-00123-f4.tif" pos="float">
            <title>
               Paracetamol metabolism. Paracetamol can be immediately glucuronidated or sulfated without being metabolized by a Phase I enzyme. However, some will be oxidized
               <it>via</it>
               CYP2E1 into a reactive intermediate.
            </title>
         </figure>
         <figure id="fig5" xsrc="BK9781839161964-00123-f5.tif" pos="float">
            <title>Phorbol ester structure, from PubChem.</title>
         </figure>
         <figure id="fig6" xsrc="BK9781839161964-00123-f6.tif" pos="float">
            <title>The ultimate rat carcinogen. Reproduced from Ref. 52, DOI:10.2788/6234, under the terms of the CC BY 4.0 license https://creativecommons.org/licenses/by/4.0/.</title>
         </figure>
         <figure id="fig7" xsrc="BK9781839161964-00123-f7.tif" pos="float">
            <title>Structural analogs for Bisphenol A as selected by GenRA. One the left is ToxPrints, on the right Morgan fingerprints.</title>
         </figure>         
         <table-entry id="tab4">
            <title>Table 2.4 Non-commercial read-across and QSAR</title>
            <table frame="topbot">
               <tgroup cols="3" align="left" colsep="1" rowsep="1" />
               <colspec colnum="1" colname="c1" />
               <colspec colnum="2" colname="c2" />
               <colspec colnum="3" colname="c3" />
               <thead />
               <tbody>
                  <row>
                     <entry>
                        <bo>
                           <it>Software</it>
                        </bo>
                     </entry>
                     <entry>
                        <bo>
                           <it>Models available</it>
                        </bo>
                     </entry>
                     <entry>
                        <bo>
                           <it>Platform</it>
                        </bo>
                     </entry>
                  </row>
                  <row>
                     <entry>OECD QSAR Toolbox</entry>
                     <entry>Read-across, QSARs, QSPR for multiple endpoints</entry>
                     <entry>Requires Windows</entry>
                  </row>
                  <row>
                     <entry>GenRA</entry>
                     <entry>Read-across</entry>
                     <entry>
                        Available
                        <it>via</it>
                        Web at the EPA Comptox Dashboard
                     </entry>
                  </row>
                  <row>
                     <entry align="char" char=".">T.E.S.T.</entry>
                     <entry>Global QSAR for acute toxicity, estrogen receptor binding, developmental toxicity, ecotoxicology endpoints</entry>
                     <entry>
                        Available
                        <it>via</it>
                        web at the EPA Comptox Dashboard and as stand-alone software
                     </entry>
                  </row>
                  <row>
                     <entry>ECOSAR</entry>
                     <entry>Ecotoxicology endpoints</entry>
                     <entry>Available as stand-alone software</entry>
                  </row>
                  <row>
                     <entry>VEGA</entry>
                     <entry>ADME, Read-across, and QSAR for multiple endpoints</entry>
                     <entry>Java application for Mac\Linux\</entry>
                  </row>
                  <row>
                     <entry>Danish QSAR Database</entry>
                     <entry>Global QSARs based on existing models for multiple endpoints; applicability domain indicated</entry>
                     <entry>
                        Available
                        <it>via</it>
                        the web
                     </entry>
                  </row>
               </tbody>
            </table>
         </table-entry>
      </section>
   </art-body>
   <art-back>
      <biblist title="References">
         <citgroup id="cit1">
            <journalcit>
               <citauth>
                  <fname>A. C.</fname>
                  <surname>Brown</surname>
               </citauth>
               <citauth>
                  <fname>T. R.</fname>
                  <surname>Fraser</surname>
               </citauth>
               <arttitle>On the Connection between Chemical Constitution and Physiological Action; with special reference to the Physiological Action of the Salts of the Ammonium Bases derived from Strychnia, Brucia, Thebaia, Codeia, Morphia, and Nicotia</arttitle>
               <title>J. Anat. Physiol.</title>
               <year>1868</year>
               <volumeno>2</volumeno>
               <pages>
                  <fpage>224</fpage>
                  <lpage>242</lpage>
               </pages>
            </journalcit>
         </citgroup>
         <citgroup id="cit2">
            <journalcit>
               <citauth>
                  <fname>C.</fname>
                  <surname>Lynch</surname>
               </citauth>
               <title>Anesth. Analg.</title>
               <year>2008</year>
               <volumeno>107</volumeno>
               <pages>
                  <fpage>864</fpage>
                  <lpage>867</lpage>
               </pages>
            </journalcit>
         </citgroup>
         <citgroup id="cit3">
            <journalcit>
               <citauth>
                  <fname>C. A.</fname>
                  <surname>Lipinski</surname>
               </citauth>
               <arttitle>Lead- and drug-like compounds: the rule-of-five revolution</arttitle>
               <title>Drug Discov. Today Technol.</title>
               <year>2004</year>
               <volumeno>1</volumeno>
               <pages>
                  <fpage>337</fpage>
                  <lpage>341</lpage>
               </pages>
            </journalcit>
         </citgroup>
         <citgroup id="cit4">
            <journalcit>
               <citauth>
                  <fname>D.</fname>
                  <surname>Epel</surname>
               </citauth>
               <citauth>
                  <fname>T.</fname>
                  <surname>Luckenbach</surname>
               </citauth>
               <citauth>
                  <fname>C. N.</fname>
                  <surname>Stevenson</surname>
               </citauth>
               <citauth>
                  <fname>L. A.</fname>
                  <surname>Macmanus-Spencer</surname>
               </citauth>
               <citauth>
                  <fname>A.</fname>
                  <surname>Hamdoun</surname>
               </citauth>
               <citauth>
                  <fname>T.</fname>
                  <surname>Smital</surname>
               </citauth>
               <arttitle>Efflux transporters: newly appreciated roles in protection against pollutants</arttitle>
               <title>Environ. Sci. Technol.</title>
               <year>2008</year>
               <volumeno>42</volumeno>
               <pages>
                  <fpage>3914</fpage>
                  <lpage>3920</lpage>
               </pages>
            </journalcit>
         </citgroup>
         <citgroup id="cit5">
            <journalcit>
               <citauth>
                  <fname>L.-A.</fname>
                  <surname>Clerbaux</surname>
               </citauth>
               <citauth>
                  <fname>A.</fname>
                  <surname>Paini</surname>
               </citauth>
               <citauth>
                  <fname>A.</fname>
                  <surname>Lumen</surname>
               </citauth>
               <citauth>
                  <fname>H.</fname>
                  <surname>Osman-Ponchet</surname>
               </citauth>
               <citauth>
                  <fname>A. P.</fname>
                  <surname>Worth</surname>
               </citauth>
               <citauth>
                  <fname>O.</fname>
                  <surname>Fardel</surname>
               </citauth>
               <arttitle>Membrane transporter data to support kinetically-informed chemical risk assessment using non-animal methods: Scientific and regulatory perspectives</arttitle>
               <title>Environ. Int.</title>
               <year>2019</year>
               <volumeno>126</volumeno>
               <pages>
                  <fpage>659</fpage>
                  <lpage>671</lpage>
               </pages>
            </journalcit>
         </citgroup>         
         <citgroup id="cit54">
            <journalcit>
               <citauth>
                  <surname>Oecd</surname>
               </citauth>
               <arttitle>Data from: EChemPortal: Global portal to information on chemical substances</arttitle>
               <title>OECD Obs.</title>
            </journalcit>
         </citgroup>
      </biblist>
      <compoundgrp />
      <annotationgrp />
      <datagrp />
      <resourcegrp />
   </art-back>
   <!--MAQ1: AQ: Please insert the expansion for the acronym ‘PGP’ if appropriate for the reader.-->
   <!--MAQ2: CE: The sentence beginning ‘The PGP transporter is expressed.’ has been altered for clarity, please check that the meaning is correct.-->
   <!--MAQ3: <AQ>The sentence beginning ‘The fraction unbound in plasma.’ has been altered for clarity, please check that the meaning is correct.</AQ>-->
   <!--MAQ5: <AQ>In the sentence beginning ‘In pharmacology studies.’ a word or phrase appears to be missing after ‘is known and the remaining’. Please check this carefully and indicate any changes required here.</AQ>-->   
</article>
EN

回答 2

Stack Overflow用户

发布于 2022-07-07 13:25:06

您可能需要提供更多的信息(例如示例代码、示例输入文件)才能得到准确的答案。

如果您试图用ISO-8859-1或类似的编码编码\u2018字符,最终导致异常的原因是该字符不在该编码中。ISO-8859-1是一种8位编码,不包含大多数Unicode字符,包括您的字符.您需要将其编码为字符实体引用:&#x2018;

票数 1
EN

Stack Overflow用户

发布于 2022-07-09 18:50:21

您的问题是试图在XML注释中写入当前ASCII编码不支持的Unicode字符,特别是该注释中的左边正确的单引号:

代码语言:javascript
复制
<!--MAQ1: AQ: Please insert the expansion for the acronym ‘PGP’ if appropriate for the reader.-->

由于无法将这些字符编码为XML注释,所以XmlWriter抛出所看到的异常。

但为什么不能将这些字符替换为字符实体回退呢?正如链接问题https://stackoverflow.com/q/72348095/3744182的答案所解释的那样,XmlWriter.Create(stream, new XmlWriterSettings { Encoding = encoding })返回的写入器将自动用等效字符实体替换指定编码不支持的文本内容和属性值中的Unicode字符。因此,如果输出使用<Root>‘</Root>编写XML Encoding.ASCII,您将得到<Root>&#x2018;</Root>

代码语言:javascript
复制
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml("<Root>‘</Root>");

// Output to XML and escape all non-ASCII characters.
var xml = xmlDoc.GetOuterXml(encoding : Encoding.ASCII, omitXmlDeclaration : true);

演示小提琴#1 这里

但是,XML注释中不支持的字符怎么办?正如XML规范所解释的,注释实际上不是文档字符数据的一部分:

[定义:注释可能出现在文档中其他标记之外的任何地方;此外,它们可能出现在文档类型声明中的语法允许的位置。它们不是文档字符数据的一部分;XML处理器可能(但不需要)使应用程序检索注释文本成为可能. 15注释::= '‘2 Char ::= #x9 #xA #xD \x20-#xD7FF#xD 000-#xFFFD /* #x10000-#x10 FFFF /*-任何Unicode字符,不包括代理项块、FFFE和FFFF。*/

此外,从形式语法中可以看出,注释文本不支持字符实体替换。因此,XmlWriter不能用任何等效字符替换不受支持的字符,而是抛出一个异常:

代码语言:javascript
复制
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml("<Root><!--‘--></Root>");

var xml = xmlDoc.GetOuterXml(encoding : Encoding.ASCII, omitXmlDeclaration : true); // Fails and throws an exception

演示小提琴#2 这里

那么,,您可能的解决办法是什么,

首先,你可以在写之前去掉所有的评论。注释实际上并不是文档内容的一部分,通常会被忽略。若要删除注释,请参见https://stackoverflow.com/q/1874132/3744182

其次,您可以创建一个自定义的XmlWriter 装潢工,它将不支持的注释替换为正在编写时由传入编码指定的一些回退。如下所述:

代码语言:javascript
复制
public static class XmlSerializationHelper
{
    public static string GetOuterXml(this XmlNode node, bool indent = false, Encoding encoding = null, bool omitXmlDeclaration = false)
    {
        if (node == null)
            return null;
        using var stream = new MemoryStream();
        node.Save(stream, indent : indent, encoding : encoding, omitXmlDeclaration : omitXmlDeclaration, closeOutput : false);
        stream.Position = 0;
        using var reader = new StreamReader(stream);
        return reader.ReadToEnd();
    }

    public static void Save(this XmlNode node, Stream stream, bool indent = false, Encoding encoding = null, bool omitXmlDeclaration = false, bool closeOutput = true) =>
        node.Save(stream, new XmlWriterSettings
                  {
                      Indent = indent,
                      Encoding = encoding ?? Encoding.UTF8,
                      OmitXmlDeclaration = omitXmlDeclaration,
                      CloseOutput = closeOutput,
                  });

    public static void Save(this XmlNode node, Stream stream, XmlWriterSettings settings)
    {
        using var xmlWriter = XmlWriter.Create(stream, settings);
        using var outerWriter = (settings?.Encoding != null && settings?.Encoding?.CodePage != Encoding.UTF8.CodePage) ? new TolerantCommentEncodingXmlWriter(xmlWriter, settings.Encoding) : null;
        node.WriteTo(outerWriter ?? xmlWriter);
    }
}

public class TolerantCommentEncodingXmlWriter : XmlWriterDecorator
{
    Encoding CommentEncoding { get; }

    public TolerantCommentEncodingXmlWriter(XmlWriter baseWriter, Encoding commentEncoding) : base(baseWriter) => this.CommentEncoding = commentEncoding;

    public override void WriteComment(string text) =>
        base.WriteComment(CommentEncoding?.GetString(CommentEncoding?.GetBytes(text)) ?? text);
}

public class XmlWriterDecorator : XmlWriter
{
    // Taken from this answer https://stackoverflow.com/a/32150990/3744182
    // by https://stackoverflow.com/users/3744182/dbc
    // To https://stackoverflow.com/questions/32149676/custom-xmlwriter-to-skip-a-certain-element
    // NOTE: async methods not implemented
    readonly XmlWriter baseWriter;

    public XmlWriterDecorator(XmlWriter baseWriter) => this.baseWriter = baseWriter ?? throw new ArgumentNullException();

    protected virtual bool IsSuspended { get { return false; } }

    public override WriteState WriteState => baseWriter.WriteState;
    public override XmlWriterSettings Settings => baseWriter.Settings;
    public override XmlSpace XmlSpace => baseWriter.XmlSpace;
    public override string XmlLang => baseWriter.XmlLang;

    public override void Close() => baseWriter.Close();

    public override void Flush() => baseWriter.Flush();

    public override string LookupPrefix(string ns) => baseWriter.LookupPrefix(ns);

    public override void WriteBase64(byte[] buffer, int index, int count)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteBase64(buffer, index, count);
    }

    public override void WriteCData(string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteCData(text);
    }

    public override void WriteCharEntity(char ch)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteCharEntity(ch);
    }

    public override void WriteChars(char[] buffer, int index, int count)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteChars(buffer, index, count);
    }

    public override void WriteComment(string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteComment(text);
    }

    public override void WriteDocType(string name, string pubid, string sysid, string subset)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteDocType(name, pubid, sysid, subset);
    }

    public override void WriteEndAttribute()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEndAttribute();
    }

    public override void WriteEndDocument()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEndDocument();
    }

    public override void WriteEndElement()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEndElement();
    }

    public override void WriteEntityRef(string name)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEntityRef(name);
    }

    public override void WriteFullEndElement()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteFullEndElement();
    }

    public override void WriteProcessingInstruction(string name, string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteProcessingInstruction(name, text);
    }

    public override void WriteRaw(string data)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteRaw(data);
    }

    public override void WriteRaw(char[] buffer, int index, int count)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteRaw(buffer, index, count);
    }

    public override void WriteStartAttribute(string prefix, string localName, string ns)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteStartAttribute(prefix, localName, ns);
    }

    public override void WriteStartDocument(bool standalone) => baseWriter.WriteStartDocument(standalone);

    public override void WriteStartDocument() => baseWriter.WriteStartDocument();

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteStartElement(prefix, localName, ns);
    }

    public override void WriteString(string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteString(text);
    }

    public override void WriteSurrogateCharEntity(char lowChar, char highChar)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteSurrogateCharEntity(lowChar, highChar);
    }

    public override void WriteWhitespace(string ws)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteWhitespace(ws);
    }
}   

然后,对于XML <Root>‘<!--‘--></Root>,使用Encoding.ASCII替换为?

代码语言:javascript
复制
<Root>&#x2018;<!--?--></Root>

而对于Encoding.Latin1,它将被'取代

代码语言:javascript
复制
<Root>&#x2018;<!--'--></Root>

演示小提琴#3 这里。演示小提琴#4显示原始的XML正在编写这里

注意,Latin1使用了比ASCII稍微好一点的退路。这将在文档页如何在.NET中使用字符编码类:选择回退策略中讨论。

最佳适配后备 当一个字符在目标编码中没有完全匹配时,编码器可以尝试将它映射到类似的字符。(最合适的退路主要是编码而不是解码问题。很少有代码页包含无法成功映射到Unicode的字符。)Encoding.GetEncoding(Int32)和Encoding.GetEncoding(String)重载检索的代码页和双字节字符集编码的默认设置是最佳回退。 ..。 替换后备 当一个字符在目标方案中没有完全匹配,但没有适当的字符可以映射到时,应用程序可以指定替换字符或字符串.它也是ASCIIEncoding类的默认行为,它替换不能用问号编码或解码的每个字符。

但是,无论选择哪种回退,如果您编写包含当前编码不支持的字符的注释文本,则不支持的字符将以某种方式丢失或重新映射。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72898584

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档