MHTML Draft Agenda for the IETF Meeting in Munich 1997, Version 3

By Jacob Palme, e-mail: jpalme@dsv.su.se, at the research group for CMC (Computer Mediated Communication) in the Department of Computer and Systems Sciences at Stockholm University and KTH.

MHTML is the IETF working group for developing standards for sending HTML-formatted text in e-mail.

Table of contents:


Documents to be discussed during the meeting
Issue 1: Exact matches in section 8.2
Issue 2: Precedence of Content-Base and Content-Location in section 5
Issue 3: Use of Content-Base and Content-Location for information
Issue 4: Allow Content-Base, Content-Location outside Multipart/related?
Issue 5: Allow same Content-Location on two body parts in section 7
Issue 6: Content-Base in one part, not in another in section 8.2
Issue 7: Robustness Principle in general

Issue 8: Robustness Principles, one by one

Issue 8.1: Content of the type parameter (section 13.1)
Issue 8.2: Quoting of the type parameter (section 13.2)
Issue 8.3: Quoting of the start parameter (section 13.3)
Issue 8.4: Content-Base/Location in multipart headings (section 13.4)

Issue 9: Allow Content-Base, Content-Location to be valid for object parts?
Issue 10: Examples in chapter 9
Issue 11: Revised proposed standard or draft standard
Issue 12: Publishing of the info document
Issue 13: Charter and status of the working group
Issue 14: Value of start parameter to multipart/related


Documents to be discussed during the meeting

2112 PS
E. Levinson, "The MIME Multipart/Related Content-type", 03/12/1997. (Pages=9) (Format=.txt) (Obsoletes RFC1872)
2111 PS
E. Levinson, "Content-ID and Message-ID Uniform Resource Locators", 03/12/1997. (Pages=5) (Format=.txt)
2110 PS
J. Palme, A. Hopman, "MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)",03/12/1997. (Pages=19) (Format=.txt)
draft-ietf-mhtml-rev-01.txt
J. Palme, A. Hopman, "MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)"
ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-spec-07.txt
The same document is available in Microsoft Word format at URL:
ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-spec-07.doc And difference from RFC 2110 in Microsoft Word 6 format at URL:
ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-spec-07dif.doc
Note that the official IETF file name of the new draft is
"draft-ietf-mhtml-rev-01.txt" and not "draft-ietf-mhtml-spec-07.txt".
draft-ietf-mhtml-info-06.txt
J. PaLme, Sending HTML in E-mail, an informational supplement to RFC 2110: MIME E-mail Encapsulation of Aggregate HTML Documents (MHTML), version 06
It can be retrieved by anonymous FTP from URL:
ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-info-06.txt.

Issue 1: Exact matches in section 8.2

5.1 Do we by exact matches mean case sensitive matches and no resolution like "file%20name" to "file name". Note: This should not be any problem if standards are adhered to, since spaces are not legal in URLs. However, it is accepted practice for Web browsers to accept lots of kinds of illegal URLs, and the two most widely used products both accept spaces in URLs in hyperlinks in HTML documents. How should such a URL be handled in the Content-Location statement. Should the space be converted to %20 (then the words about exact matching in mhtml-spec chapter 8.2.2 most be changed) or should it be put in illegal format in the Content-Location header, too?

The MHTML proposed standard (RFC 2110) at present says that URL-s in e-mail headers are to be encoded using the encoding method of RFC 2017, and RFC 2017 refers to RFC 1738 which specifies that illegal characters in URL are to be encoded using the % method, for example a space is encoded as %20. Ed Levinson has proposed that the encoding method of RFC 2047 should be used instead in the special case where RFC 1738 encoding would make it impossible to make the exact match required by RFC 2110. The advantage with this is that when the RFC 2047 encoding is reversed, we get back the same string, and can do the exact match. If RFC 2017/RFC 1738 encoding is used, reversal may reverse too much, so that the exact match will not work.

5.2 Does this apply only to relative Content-Locations without any Content-Base? Should we say something about exactness of matchings when URL-s are resolved using a Content-Base? If so, what?

5.3 What about the case where the URL is relative and unresolvable in the header, but absolute in the HTML text. The present spec does not say what should be done in that case.

Here is an example which explains some of the choices:

Assume you have a HTML document which contains the following element:
   <IMG SRC="file name.gif">
and the owner of this HTML document requests that it is sent by e-mail.
How should the e-mail look like in this case?

(a)

     Content-Type: Text/HTML
 
     <IMG SRC="file name.gif">
 
     Content-Type: Image/GIF
     Content-Location: "file name.gif"

(b)

     Content-Type: Text/HTML
 
     <IMG SRC="file%20name.gif">
 
     Content-Type: Image/GIF
     Content-Location: "file%20name.gif"

(c)

     Content-Type: Text/HTML
 
     <IMG SRC="file name.gif">
 
     Content-Type: Image/GIF
     Content-Location: "file%20name.gif"

(a) is not in agreement with RFC 2017, which RFC 2110 refers to, so if we choose (a), RFC 2110 or RFC 2017 must be changed.

(b) means you have to edit the HTML text before sending it, which is not so nice, since you are then opening a big can of worms: Which corrections of faulty HTML should you correct before sending it via e-mail?

(c) requires change in the text about "exact match" in RFC 2110.

Issue 2: Precedence of Content-Base and Content-Location in section 5

If there is both a Content-Base and a Content-Location header, which of them should take precedence in resolving URL-s in the HTML content?

Issue 3: Use of Content-Base and Content-Location for information

Should the Content-Base and Content-Location be allowed in cases where they do not influence functionality, as a way of informing the reader that a body part was taken from a certain web location?

Issue 4: Allow Content-Base, Content-Location outside Multipart/related?

Any reason to remove this passage in RFC 2110 section 4.1:

These two headers may occur both inside and outside of a multipart/related part.

JP comment: The statement is true. The specific usage of Content-Base and Content-Location described in RFC 2110 SHOULD only occur inside Multipart/related, but these two headers can also occur as information to the reader that the body part is also available at a certain URL. And since Text/html can occur outside of Multipart/related (Multipart/related is only needed when the Text/html contains links to other body parts in the same message), Content-Base and Content-Location can also occur outside of Multipart/related, and in my opinion this text should not be removed. Possibly we could change the paragraph to the following.

These two headers may occur both inside and outside of a multipart/related part, but their usage for handling HTML links between body parts in a message SHOULD only occur inside Multipart/related.

Issue 5: Allow same Content-Location on two body parts in section 7

Should we allow the same Content-Location on two body parts, if they resolve to different URLs (last paragraph of section 7 in mhtml-spec).

Suggestion: Yes.

Issue 6: Content-Base in one part, not in another in section 8.2

Suppose there are two body parts in a multipart/related. One of them has a Content-Base statement, the other does not have.

Example:

     Part 1:
 
     Content-Type: Text/html
	Content-Base: http://foo.net
 
	<IMG SRC="picture.gif">
 
     Part 2:
 
     Content-Type: Image/gif
     Content-Location: picture.gif

In this case, should relative-to-absolute conversion take place on "picture.gif" in Part 1, so that it will not match the relative URL in Part 2?

Issue 7: Robustness Principle in general

Should the standard include the new chapter 13. Robustness Principle as suggested in draft-ietf-mhtml-spec-07 or should this chapter be put into the informational draft draft-ietf-mhtml-info or not be published at all.

Note: The present work in the IETF DRUMS working group, where

this kind of information, under the title "4. Obsolete Syntax" is included in the standard-to-be draft-ietf-drums-msg-fmt.

Issue 8: Robustness Principles, one by one

Every single subchapter in chapter 13. Robustness Principle is controversial and we should decide for or against having it (this applies whether this chapter goes into the standard or the informational document).

Issue 8.1: Content of the type parameter (section 13.1)

Should liberal implementations accept input where the type parameter is wrong or omitted?

Issue 8.2: Quoting of the type parameter (section 13.2)

Should liberal implementations accept input where the type parameter is not quoted?

Issue 8.3: Quoting of the start parameter (section 13.3)

Should liberal implementations accept input where the start parameter is not quoted with angle brackets?

Issue 8.4: Content-Base/Location in multipart headings (section 13.4)

Should liberal implementations accept and try to use, if necessary, Content-Base and Content-Location headers in multipart headings.

Issue 9: Allow Content-Base, Content-Location to be valid for object parts?

Any reason to change this passage in RFC 2110 section 4.1:

These two headers are valid only for exactly the content heading or message heading where they occurs and its text. They are thus not valid for the parts inside multipart headings, and are thus meaningless in multipart headings.

Issue 10: Examples in chapter 9

Can some of the implementors, who have executable code which can check examples, provide better examples? By better examples I mean examples with both are correct and which clarify the controversial points.

Issue 11: Revised proposed standard or draft standard

Are we aiming at revising RFC 2100 into a revised proposed standard or into a draft standard?

Issue 12: Publishing of the info document

Is it time now to publish draft-ietf-mhtml-info-06.txt as an informational RFC?

Issue 13: Charter and status of the working group

Is there any need for a discussion about the charter of the working group, and about whether the working group should be designated as "active" or "inactive"?

Issue 14: Value of start parameter to multipart/related

The present MHTML standard (RFC 2110 and RFC 2112 say that if the root body part of a multipart/related is of type multipart/alternative, then the type parameter of multipart/related should be "multipart/alternative". It has been suggested, that this be changed, so that the type parameter tells what is the main part of the multipart/alternative. One solution might be to change the syntax of the type parameter so that it can for example have the value "multipart/alternative;text/html" to indicate that the root is a multipart/alternative whose primary alternative is of type text/html.