Which xml version to use
However I see there is also a version 1. If you need more info just let me know. Abel PeeHaa PeeHaa 68k 54 54 gold badges silver badges bronze badges. Add a comment. Active Oldest Votes. Use version 1. Rationale and list of changes for XML 1. Guffa Guffa k gold badges silver badges bronze badges.
The support for XML 1. NET framework for example won't read it. I don't understand why they had to change the version number suddenly. The first XML 1. It has undergone several revisions since then, without being given a new version number. Michael Kay Michael Kay k 10 10 gold badges 81 81 silver badges bronze badges. I think I fall in the category of the 99, : — PeeHaa.
Note that this also only applies to identifiers. You can still use those characters in content. Only people need it? I suspect the number of people who need to use obsolete Abyssinian characters in the names of elements and attributes is a lot lower than Report Error. Your message has been sent to W3Schools. W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning.
Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. While using W3Schools, you agree to have read and accepted our terms of use , cookie and privacy policy.
Copyright by Refsnes Data. The following are forbidden, and constitute fatal errors :. When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text MUST be processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text MUST always be treated as a normal data character and MUST NOT terminate the literal.
For example, this is well-formed:. When a general entity reference appears in the EntityValue in an entity declaration, it MUST be bypassed and left as is. Just as with external parsed entities, parameter entities need only be included if validating. When a parameter-entity reference is recognized in the DTD and included, its replacement text MUST be enlarged by the attachment of one leading and one following space x20 character; the intent is to constrain the replacement text of parameter entities to contain an integral number of grammatical tokens in the DTD.
It is an error for a reference to an unparsed entity to appear in the EntityValue in an entity declaration. In discussing the treatment of entities, it is useful to distinguish two forms of the entity's value. The literal entity value as given in an internal entity declaration EntityValue may contain character, parameter-entity, and general-entity references. Such references MUST be contained entirely within the literal entity value.
The actual replacement text that is included or included in literal as described above MUST contain the replacement text of any parameter entities referred to, and MUST contain the character referred to, in place of any character references in the literal entity value; however, general-entity references MUST be left as-is, unexpanded.
For example, given the following declarations:. These simple rules may have complex interactions; for a detailed discussion of a difficult example, see C Expansion of Entity and Character References.
A set of general entities amp , lt , gt , apos , quot is specified for this purpose. If the entities lt or amp are declared, they MUST be declared as internal entities whose replacement text is a character reference to the respective character less-than sign or ampersand being escaped; the double escaping is REQUIRED for these entities so that references to them produce a well-formed result. If the entities gt , apos , or quot are declared, they MUST be declared as internal entities whose replacement text is the single character being escaped or a character reference to that character; the double escaping here is OPTIONAL but harmless.
Validity constraint: Unique Notation Name. XML processors MUST provide applications with the name and external identifier s of any notation declared and referred to in an attribute value, attribute definition, or entity declaration. They MAY additionally resolve the external identifier into the system identifier , file name, or other information needed to allow the application to call a processor for data in the notation described.
It is not an error, however, for XML documents to declare and refer to notations for which notation-specific applications are not available on the system where the XML processor or application is running.
Conforming XML processors fall into two classes: validating and non-validating. Validating and non-validating processors alike MUST report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read. Note that when processing invalid documents with a non-validating processor the application may not be presented with consistent information.
For example, several requirements for uniqueness within the document may not be met, including more than one element with the same id, duplicate declarations of elements or notations with the same name, etc. In these cases the behavior of the parser with respect to reporting such information to the application is undefined. The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations.
Less is required of a non-validating processor; it need not read any part of the document other than the document entity.
This has two effects that may be important to users of XML processors:. Certain well-formedness errors, specifically those that require reading external entities, may fail to be detected by a non-validating processor. Examples include the constraints entitled Entity Declared , Parsed Entity , and No Recursion , as well as some of the cases described as forbidden in 4.
The information passed from the processor to the application may vary, depending on whether the processor reads parameter and external entities. For example, a non-validating processor may fail to normalize attribute values, include the replacement text of internal entities, or supply default attribute values , where doing so depends on having read declarations in external or parameter entities.
Applications which require DTD facilities not related to validation such as the declaration of default attributes and internal entities that are or may be specified in external entities SHOULD use validating XML processors.
Each rule in the grammar defines one symbol, in the form. Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter. Literal strings are quoted.
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:. The number of leading zeros in the xN form is insignificant.
Enumerations and ranges can be mixed in one set of brackets. Enumerations and ranges of forbidden values can be mixed in one set of brackets. These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:. This appendix contains the necessary definitions for character normalization. For additional background information and examples, see [Charmod].
This appendix contains some examples illustrating the sequence of entity- and character-reference recognition and expansion, as specified in 4. A more complex example will illustrate the rules and their effects fully. In the following example, the line numbers are solely for reference.
Since the replacement text is not rescanned, the reference to parameter entity " zz " is not recognized. And it would be an error if it were, since " zz " is not yet declared. The general entity " tricky " has now been declared, with the replacement text " error-prone ". As noted in 3. For example, the content model b, c b, d is non-deterministic, because given an initial b the XML processor cannot know which b in the model is being matched without looking ahead to see which element follows the b.
In this case, the two references to b can be collapsed into a single reference, making the model read b, c d.
An initial b now clearly matches only a single name in the content model. The processor doesn't need to look ahead to see what follows; either c or d would be accepted. More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e. In many such algorithms, a follow set is constructed for each position in the regular expression i.
The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use — which is what the internal label is trying to indicate.
In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases. Also, in many cases other sources of information are available in addition to the XML data stream itself.
Two cases may be distinguished, depending on whether the XML entity is presented to the processor without, or with, any accompanying external information. We consider the first case first. The notation is used to denote any byte value except that two consecutive s cannot be both In cases above which do not require reading the encoding declaration to determine the encoding, section 4. Also, it is possible that new character encodings will be invented that will make it necessary to use the encoding declaration to determine the encoding, in cases where this is not required at present.
This level of autodetection is enough to read the XML encoding declaration and parse the character-encoding identifier, which is still necessary to distinguish the individual members of each family of encodings e.
Because the contents of the encoding declaration are restricted to characters from the ASCII repertoire however encoded , a processor can reliably read the entire encoding declaration as soon as it has detected which family of encodings is in use. Since in practice, all widely used character encodings fall into one of the categories above, the XML encoding declaration allows reasonably reliable in-band labeling of character encodings, even when external sources of information at the operating-system or transport-protocol level are unreliable.
Once the processor has detected the character encoding in use, it can act appropriately, whether by invoking a separate input routine for each case, or by calling the proper conversion function on each character of input. Like any self-labeling system, the XML encoding declaration will not work if any software changes the entity's character set or encoding without updating the encoding declaration.
Implementors of character-encoding routines should be careful to ensure the accuracy of the internal and external information used to label the entity. The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML.
In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used if present to determine the character encoding. WG approval of this specification does not necessarily imply that all WG participants voted for its approval.
The participants in the WG at the time of publication of this edition were:. The following suggestions define what is believed to be best practice in the construction of XML names used as element names, attribute names, processing instruction targets, entity names, notation names, and the values of attributes of type ID, and are intended as guidance for document authors and schema designers.
All references to Unicode are understood with respect to a particular version of the Unicode Standard greater than or equal to 3. The first two suggestions are directly derived from the rules given for identifiers in the Unicode Standard, version 3. The other suggestions are mostly derived from [XML Since Cf characters are not directly visible, they should be employed with caution and only when necessary, to avoid creating names which are distinct to XML processors but look the same to human beings.
Combining characters meant for use with symbols only including those in the ranges [ x20D0- x20EF] and [ x1D x1D1AD] should not be used in names. Names which are nonsensical, unpronounceable, hard to read, or easily confusable with other names should not be employed.
Status of this Document This section describes the status of this document at the time of its publication. Table of Contents 1 Introduction 1. XML shall support a wide variety of applications. It shall be easy to write programs which process XML documents. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML documents should be human-legible and reasonably clear. The XML design should be prepared quickly. The design of XML shall be formal and concise. XML documents shall be easy to create.
Terseness in XML markup is of minimal importance. In addition, the terms defined in the following list are used in building those definitions and in describing the actions of an XML processor: error [ Definition : A violation of the rules of this specification; results are undefined. It meets all the well-formedness constraints given in this specification.
Note: Document authors are encouraged to avoid "compatibility characters", as defined in Unicode [Unicode]. Note: The Names and Nmtokens productions are used to define the validity of tokenized attribute values after normalization see 3.
Validity constraint: Standalone Document Declaration The standalone document declaration MUST have the value "no" if any external markup declarations contain declarations of: attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or entities other than amp , lt , gt , apos , quot , if references to those entities appear in the document, or attributes with tokenized types, where the attribute appears in the document with a value such that normalization will produce a different value from that which would be produced in the absence of the declaration, or element types with element content , if white space occurs directly within any instance of those types.
To simplify the tasks of applications , the XML processor MUST behave as if it normalized all line breaks in external parsed entities including the document entity on input, before parsing, by translating all of the following to a single xA character: the two-character sequence xD xA the two-character sequence xD x85 the single character x85 the single character x any xD character that is not immediately followed by xA or x Productions 33 through 38 have been removed.
Note: Language information may also be provided by external transport protocols e. Note: The composing character are all Unicode characters of non-zero combining class, plus a small number of class-zero characters that nevertheless take part as a non-initial character in certain Unicode canonical decompositions.
Validity constraint: Element Valid An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds: The declaration matches EMPTY and the element has no content not even entity references, comments, PIs or white space.
Attribute-list declarations may be used: To define the set of attributes pertaining to a given element type. To establish type constraints for these attributes.
To provide default values for attributes. Begin with a normalized value consisting of the empty string. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following: For a character reference, append the referenced character to the normalized value. For another character, append the character to the normalized value. Following are examples of attribute normalization.
An internal entity is a parsed entity. EncodingDecl S? Note: Only parsed entities that are referenced directly or indirectly within the document are required to be well-formed. The labels in the leftmost column describe the recognition context: Reference in Content as a reference anywhere after the start-tag and before the end-tag of an element; corresponds to the nonterminal content.
Reference in Attribute Value as a reference within either the value of an attribute in a start-tag , or a default value in an attribute declaration ; corresponds to the nonterminal AttValue. Reference in Entity Value as a reference within a parameter or internal entity's literal entity value in the entity's declaration; corresponds to the nonterminal EntityValue.
This has two effects that may be important to users of XML processors: Certain well-formedness errors, specifically those that require reading external entities, may fail to be detected by a non-validating processor. Keld Simonsen et al. Scott Bradner, Berners-Lee, R. Fielding, L.
Unicode The Unicode Consortium. The Unicode Standard, Version 4. Reading, Mass. Tim Bray, Jean Paoli, C. Compilers: Principles, Techniques, and Tools. Reading: Addison-Wesley, , rpt. Formal Models in Document Processing. Faculty of Mathematics at the University of Freiburg, Deterministic Regular Languages. Extended abstract in A. Finkel, M. Jantzen, Hrsg. Springer-Verlag, Berlin Lecture Notes in Computer Science Charmod W3C Working Draft.
Character Model for the World Wide Web 1. Martin J. Clark James Clark. Murata, S. Laurent, D. Hoffman, F. ISO E. Code for the representation of names of languages. Codes for the representation of names of countries and their subdivisions — Part 1: Country codes [Geneva]: International Organization for Standardization, First edition — Extended Facilities Annexe. ISO TC2. Information technology — Document Description and Processing Languages. Namespaces in XML. Textuality, Hewlett-Packard, and Microsoft.
World Wide Web Consortium, B Definitions for Character Normalization This appendix contains the necessary definitions for character normalization. C Expansion of Entity and Character References Non-Normative This appendix contains some examples illustrating the sequence of entity- and character-reference recognition and expansion, as specified in 4.
0コメント