Talk:Email system/Draft

From Citizendium
< Talk:Email system
Revision as of 08:38, 17 August 2009 by imported>David MacQuigg (→‎Adding a machine-level explanation)
Jump to navigation Jump to search
This article has a Citable Version.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition General overview of how the Internet electronic mail system works. [d] [e]
Checklist and Archives
 Workgroup category Computers [Categories OK]
 Talk Archive none  English language variant American English

This article is intended to be the most basic on how the Internet email system works (as opposed to history, applications of email, etc.) Our target audience includes non-technical professionals such as lawyers and administrators who make policy involving email systems. We will defer to subtopics details such as message formats and transfer protocols. Topics relating to email security are also proper subtopics, because it is much easier to discuss email security once you understand how the system works. Email abuse (spam, phishing, etc.) is a related topic, because it does not expand on or depend on this article.

Here is our current thinking on how this hierarchy of topics should be developed:

Email System
 Parents
   Computers > Networks > Applications > Email
                        > Internet > Email
 Subtopics
   Email Processes and Protocols
     SMTP           (RFC-5321)
     POP            (RFC-1939)
     IMAP           (RFC-3501)
     Submission     (RFC-4409)  port 587 
   Message Formats  (RFC-5322)
     Multipurpose Internet Mail Extensions (RFC-2045..2049)
     Message Headers (RFC-5322)
   Authentication Methods
     SPF
     SenderID
     DKIM
     CSV
     
 Other
   TCP
   DNS
   PGP
   Kerberos
   History
   Abuse
   Email User Programs (Webmail)

Progressing the article

First, I'm sorry; I think I missed the retitling.

From a procedural standpoint, I'm going to make suggestions on the talk page rather than directly edit them into the article. By doing so, I will be able, eventually, to Approve it on my own. If I made substantial changes, we'd need several Computers editors to approve.

Let's begin with the "lede". Right now, it's a bit too narrative and outside the CZ opening paragraph convention. An opening sentence, unless it just won't work gramatically, should restate the title in bold and explain briefly what the term means. Material about the purpose and context follows, but isn't the role of the first sentence.

I'd avoid just saying "see textbooks" in the introduction. Very short definitions in the opening paragraphs can be appropriate, and then use wikilinks, Related Pages, Bibliography, and External Links.

Since I know you are using "Actor" as a term of art, a brief explanation would help.

Next, start a subhead for "Architecture".

You have a lot of terms with external footnote definition, such as Transmitter, Relays, MDA (not defined except in the graphic; do think of the reader using text-to-speech), etc. External references, and even footnote definition, are often our last preference.

You can define some as subtopics in the article. For example, I'd define Transmitter and Relay under subheadings, perhaps as second-level subheads, and internally wikilink using a structure such as [[#Relays|Relays]]. Certainly, that subhead can be brief and then, preferably, link to at least a stub article, where the external references can be heavier. I personally dislike having much beyond citations and abbreviation expansions in footnotes; if an explanation, as for Relay, is important, it should be in the main article.

"Let's follow a message from start to finish." should form yet another top-level section, with appropriate subheads for readability.

Consider a section introducing administration, error handling, and defense, again that primarily links to other articles.

Rev. 2

I've posted the second revision of this article at User:David_MacQuigg/Sandbox/Email_System following the suggestions above. Feedback from the editor on the second rev tells me maybe we want to add a few footnotes back in. I'll wait for more on that. --David MacQuigg 01:04, 20 May 2009 (UTC)

Rev. 3

Resynchronizing...should I be looking at the page here or the sandbox?

If the page here, several first comments with a first cup of coffee.

If our goal is to be the best self-contained reference, I wouldn't refer to hard copy texts in the lede, even as introductions. Instead, I'd even redlink to tutorials subpages, or at least link to online presentations.

Good elementary discussions of these topics can also be found in most texts on computer networks.[1]

<ref name=PnD>{{citation | author = L. Peterson, B. Davie | title = Computer Networks: A Systems Approach | edition = 4th | year = 2007 | contribution = Sect. 9.1.1 Electronic Mail}}</ref> And yes, Bruce Davie is good people.

Next, we do have the usual CZ lede conventions, such as bolding the article title, or as close as grammatically possible, in the first sentence. As far as the title, maybe it's me, but I still wince at "email" rather than "electronic mail" as a formal title.

For the layman, however, I do think we need a very basic definition of the problem: passing "envelopes" around a "postal system". I've added a few words. Howard C. Berkowitz 12:59, 6 July 2009 (UTC)


Email Message Transfer

This is a table of definitions that can be worked into a subtopic article that delves into more technical details later on... -- Eric Gearhart

Term Definition
MTA Mail Transfer Agent; the software on the server side for moving email messages around and forwarding them to other email server hosts
MDA Mail Delivery Agent; the server that accepts mail for a user from a remote MTA and holds it until the user's mail client (their MUA) downloads the message
MUA Mail User Agent; a fancy name for an email client such as Mozilla Thunderbird or Microsoft Outlook. Nowadays a MUA can actually reside in a web browser or in a mobile phone as well
SMTP Simple Mail Transfer Protocol; the protocol used to transfer mail from one mail system to another. Uses port 25 or 587 for unencrypted message transfer.
POP Post Office Protocol; A protocol where a client connects, downloads mail from the server and then deletes that mail from the server. Mail that is downloaded then "sticks" on the computer the user retrieves their mail from. Contrast with IMAP.
IMAP Internet Message Access Protocol; IMAP differs from POP in that messages are left on the server; this allows a user to "float" between different clients at different locations but still have access to all their mail
Mutter...I have other names for Microsoft Outlook...Howard C. Berkowitz 20:10, 6 July 2009 (UTC)
I've added the table, and expanded it in the article "Email processes and protocols". --David MacQuigg 18:15, 16 August 2009 (UTC)

Rev. 03dmq

I've got a new rev at User:David_MacQuigg/Sandbox/Email_System_03. This is basically the same as Rev.2, but I've completely re-written the introductory paragraphs. The new introduction puts major emphasis on the distinction between machine-level and administrative-level entities. The previous text was clear to non-expert readers (most students), but readers with some experience in email systems were getting confused. I think the problem might be that the experts were skimming the introduction, and reading the article thinking we were still talking about machine-level entities. This new introduction should get through to all but the most narrowly-focused experts. There will always be a few who say things like an agent can't be a person. It's a little like arguing over the meaning of the word 'hacker'.

I thought about Eric's suggestion of adding a sidebar with definitions, but I think the best place for that level of detail will be in the subtopic on Email Message Transfer. In this top article, I would like to let words like actor and agent (now lower case) have their plain-english meanings (individuals or organizations), and introduce the special meaning (computer processes) in the more technically detailed subtopic. CS students already have plenty of experience using words like actor to mean an object in a computer program.

Non-technical readers will probably never get to the subtopics. I think it is OK that they won't learn acronyms like MTA. Think of an administrator in a government agency, or a staffer on some congressional subcommittee. If they understand this top article, we will have accomplished a lot.

Other issues from the comments above:

- Definition of roles such as Transmitter, Receiver, etc.
  I have added a subtopic, Email Agents, defining these roles
  more precisely in terms of their responsibilities, e.g.
    Transmitter
    - Spam Prevention
      - rate limits, content analysis, alerts
      - respond to spam reports
      - maintain reputation
    - Authentication
      - RFC compliance
      - IP authorization (SPF, SID, CSV, ...)
      - signatures & key management (DKIM ...)
      - Return Address validation code
 See http://open-mail.org/MHSmodels.html for more examples.
 
- Administration, error handling, and defense.
  See Related Articles: Message Transfer and Email Abuse.  I'm not sure
  what you have in mind for an introductory section in the main article.
  Unless we can really add value, I would keep the main article short.

- Email vs Electronic Mail.
  My preference for the title is Email, but either way is OK with me.
  I recall reading a discussion on Wikipedia, in which the decision was
  email, over e-mail and electronic mail.  If we are writing for the next
  generation, email is the word they grew up with.
 
- Postal System Analogies
  Most of these are too superficial to add any value, and may actually add
  confusion.  The intermediate post office is more like a router than an
  SMTP relay.  SMTP relays are like workstations within a post office that
  perform special functions, such as reading zip codes.  The one analogy I
  have found useful is between the MAIL FROM address in an email session,
  and the Return Address on a postal envelope.  This will fit nicely in the
  subtopics on Message Transfer and Simple Mail Transfer Protocol.
 

--David MacQuigg 07:29, 8 July 2009 (UTC)

Someone really must write MTA, as defined by the Kingston Trio. The song actually is a good description of a loop. Howard C. Berkowitz 19:07, 12 August 2009 (UTC)

Adding a machine-level explanation

After struggling with how to be more clear on the different model levels (administrative vs machine-level), I decided to add a whole new subtopic Email Processes and Protocols, which basically repeats the simple explanation in the main article, but this time using a machine-level model, pretty much the same as the model in RFC-5598. It also includes an expanded version of Eric's table of acronyms. The earlier machine-level subtopic Email Message Transfer now has some overlap with the new subtopic, but it is still worth keeping, since it provides a detailed example of an SMTP session. Maybe we should rename this subtopic as SMTP Session, and link to it from other articles, like the main SMTP article.

--David MacQuigg 18:59, 12 August 2009 (UTC)

Let me review it a little later. Lower-case editorial observation: unless an article title is a proper name, such as Simple Mail Transfer Protocol, it should not have other than initial caps (i.e., Email processes and protocols).
Understood. The article itself follows the correct style. I was using caps to "delimit" the title in the discussion on this Talk page. I'll use double quotes in the future ( "Email processes and protocols" ). --David MacQuigg 20:00, 12 August 2009 (UTC)
Now, as a lower layers and operating system sort of person, are we truly talking about a machine level, or a host level? It would seem to me that everything you are describing as "machine" could be a virtualized host process, moving among multicore processors, and even among clustered or distributed machines, invisible to the mail process itself. Howard C. Berkowitz 19:06, 12 August 2009 (UTC)
I make a distinction between machines, programs, and processes, as distinction that unfortunately is blurred in terms like MTA. When I don't clarify, I'm usually talking about a process. Let me know if there are any places in the article needing clarification (e.g. A key responsibility of the MSA is to ... ). --David MacQuigg 20:00, 12 August 2009 (UTC)
I'd emphasize process over machine. What if your mail server is physically four boxes, each with a quad-core processor? Howard C. Berkowitz 21:05, 12 August 2009 (UTC)
I agree. Processes are the elements, the smallest unit, in the "machine-level" model. The diagram in Email processes and protocols shows these processes grouped by machine. This makes the figure a little more concrete, and easy for students to understand. Other than that, we don't need to discuss machines. The only change in the figure would be to separate the Relay and Delivery processes on the Mailstore machine, and give each a separate connection to the Mailstore disks. --David MacQuigg 14:38, 17 August 2009 (UTC)
Using "network layer" in the second paragraph is also confusing, especially when you later say routers are out of scope. What you are describing, I think, is a message transfer sublayer of the application layer. Howard C. Berkowitz 03:01, 17 August 2009 (UTC)
We could also call it an "overlay network", and be more consistent with textbook models (e.g. Section 9.4 Overlay Networks, Peterson & Davie 4th ed.). "Sublayer" seems to imply something entirely enclosed within a layer. The nodes and links in this "Administrative Layer" are organizations and their relationships (e.g. the Recipient's network might include his Receiver, one or more Forwarders, and his Delivery Agent). Each of these administrative nodes might include any number of SMTP Relays and other processes on the Application Layer. The nodes and links on the Application Layer are the mail-handling processes and the TCP connections between them. The nodes and links on the layer below that are the routers and physical links between them.
The essential thing we have done here is define an Administrative Layer, and invoke the principle of Separation of Concerns to simplify the discussion of each layer. Perhaps I should put the link to the Separation of Concerns article right here where we first talk about network layers, instead of in the "processes and protocols" subtopic. The explanation above could then go in the subtopic where being concise is not so critical. --David MacQuigg 14:38, 17 August 2009 (UTC)
  1. See Bibliography [PnD07] and [Stevens04].