]> code.delx.au - gnu-emacs/blob - doc/misc/url.texi
Merge branch 'map'
[gnu-emacs] / doc / misc / url.texi
1 \input texinfo
2 @setfilename ../../info/url.info
3 @settitle URL Programmer's Manual
4 @include docstyle.texi
5
6 @iftex
7 @c @finalout
8 @end iftex
9 @c @setchapternewpage odd
10 @c @smallbook
11
12 @tex
13 \overfullrule=0pt
14 %\global\baselineskip 30pt % for printing in double space
15 @end tex
16 @dircategory Emacs lisp libraries
17 @direntry
18 * URL: (url). URL loading package.
19 @end direntry
20
21 @copying
22 This is the manual for the @code{url} Emacs Lisp library.
23
24 Copyright @copyright{} 1993--1999, 2002, 2004--2015 Free Software
25 Foundation, Inc.
26
27 @quotation
28 Permission is granted to copy, distribute and/or modify this document
29 under the terms of the GNU Free Documentation License, Version 1.3 or
30 any later version published by the Free Software Foundation; with no
31 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
32 and with the Back-Cover Texts as in (a) below. A copy of the license
33 is included in the section entitled ``GNU Free Documentation License''.
34
35 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
36 modify this GNU manual.''
37 @end quotation
38 @end copying
39
40 @c
41 @titlepage
42 @title URL Programmer's Manual
43 @subtitle First Edition, URL Version 2.0
44 @author William M. Perry @email{wmperry@@gnu.org}
45 @author David Love @email{fx@@gnu.org}
46 @page
47 @vskip 0pt plus 1filll
48 @insertcopying
49 @end titlepage
50
51 @contents
52
53 @node Top
54 @top URL
55
56 @ifnottex
57 @insertcopying
58 @end ifnottex
59
60 @menu
61 * Introduction:: About the @code{url} library.
62 * URI Parsing:: Parsing (and unparsing) URIs.
63 * Retrieving URLs:: How to use this package to retrieve a URL.
64 * Supported URL Types:: Descriptions of URL types currently supported.
65 * General Facilities:: URLs can be cached, accessed via a gateway
66 and tracked in a history list.
67 * Customization:: Variables you can alter.
68 * GNU Free Documentation License:: The license for this documentation.
69 * Function Index::
70 * Variable Index::
71 * Concept Index::
72 @end menu
73
74 @node Introduction
75 @chapter Introduction
76 @cindex URL
77 @cindex URI
78 @cindex uniform resource identifier
79 @cindex uniform resource locator
80
81 A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
82 name, such as an Internet address, that identifies some name or
83 resource. The format of URIs is described in RFC 3986, which updates
84 and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
85 @dfn{Uniform Resource Locator} (URL) is an older but still-common
86 term, which basically refers to a URI corresponding to a resource that
87 can be accessed (usually over a network) in a specific way.
88
89 Here are some examples of URIs (taken from RFC 3986):
90
91 @example
92 ftp://ftp.is.co.za/rfc/rfc1808.txt
93 http://www.ietf.org/rfc/rfc2396.txt
94 ldap://[2001:db8::7]/c=GB?objectClass?one
95 mailto:John.Doe@@example.com
96 news:comp.infosystems.www.servers.unix
97 tel:+1-816-555-1212
98 telnet://192.0.2.16:80/
99 urn:oasis:names:specification:docbook:dtd:xml:4.1.2
100 @end example
101
102 This manual describes the @code{url} library, an Emacs Lisp library
103 for parsing URIs and retrieving the resources to which they refer.
104 (The library is so-named for historical reasons; nowadays, the ``URI''
105 terminology is regarded as the more general one, and ``URL'' is
106 technically obsolete despite its widespread vernacular usage.)
107
108 @node URI Parsing
109 @chapter URI Parsing
110
111 A URI consists of several @dfn{components}, each having a different
112 meaning. For example, the URI
113
114 @example
115 http://www.gnu.org/software/emacs/
116 @end example
117
118 @noindent
119 specifies the scheme component @samp{http}, the hostname component
120 @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
121
122 @cindex parsed URIs
123 The format of URIs is specified by RFC 3986. The @code{url} library
124 provides the Lisp function @code{url-generic-parse-url}, a (mostly)
125 standard-compliant URI parser, as well as function
126 @code{url-recreate-url}, which converts a parsed URI back into a URI
127 string.
128
129 @defun url-generic-parse-url uri-string
130 This function returns a parsed version of the string @var{uri-string}.
131 @end defun
132
133 @defun url-recreate-url uri-obj
134 @cindex unparsing URLs
135 Given a parsed URI, this function returns the corresponding URI string.
136 @end defun
137
138 @cindex parsed URI
139 The return value of @code{url-generic-parse-url}, and the argument
140 expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
141 structure whose slots hold the various components of the URI@.
142 @xref{Top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
143 details about CL structures. Most of the other functions in the
144 @code{url} library act on parsed URIs.
145
146 @menu
147 * Parsed URIs:: Format of parsed URI structures.
148 * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
149 @end menu
150
151 @node Parsed URIs
152 @section Parsed URI structures
153
154 Each parsed URI structure contains the following slots:
155
156 @table @code
157 @item type
158 The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
159 Types}, for a list of schemes that the @code{url} library knows how to
160 process. This slot can also be @code{nil}, if the URI is not fully
161 specified.
162
163 @item user
164 The user name (a string), or @code{nil}.
165
166 @item password
167 The user password (a string), or @code{nil}. The use of this URI
168 component is strongly discouraged; nowadays, passwords are transmitted
169 by other means, not as part of a URI.
170
171 @item host
172 The host name (a string), or @code{nil}. If present, this is
173 typically a domain name or IP address.
174
175 @item port
176 The port number (an integer), or @code{nil}. Omitting this component
177 usually means to use the ``standard'' port associated with the URI
178 scheme.
179
180 @item filename
181 The combination of the ``path'' and ``query'' components of the URI (a
182 string), or @code{nil}. If the query component is present, it is the
183 substring following the first @samp{?} character, and the path
184 component is the substring before the @samp{?}. The meaning of these
185 components is scheme-dependent; they do not necessarily refer to a
186 file on a disk.
187
188 @item target
189 The fragment component (a string), or @code{nil}. The fragment
190 component specifies a ``secondary resource'', such as a section of a
191 webpage.
192
193 @item fullness
194 This is @code{t} if the URI is fully specified, i.e., the
195 hierarchical components of the URI (the hostname and/or username
196 and/or password) are preceded by @samp{//}.
197 @end table
198
199 @findex url-type
200 @findex url-user
201 @findex url-password
202 @findex url-host
203 @findex url-port
204 @findex url-filename
205 @findex url-target
206 @findex url-attributes
207 @findex url-fullness
208 These slots have accessors named @code{url-@var{part}}, where
209 @var{part} is the slot name. For example, the accessor for the
210 @code{host} slot is the function @code{url-host}. The @code{url-port}
211 accessor returns the default port for the URI scheme if the parsed
212 URI's @var{port} slot is @code{nil}.
213
214 The slots can be set using @code{setf}. For example:
215
216 @example
217 (setf (url-port url) 80)
218 @end example
219
220 @node URI Encoding
221 @section URI Encoding
222
223 @cindex percent encoding
224 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
225 one respect: it allows non-@acronym{ASCII} characters in URI strings.
226
227 Strictly speaking, RFC 3986 compatible URIs may only consist of
228 @acronym{ASCII} characters; non-@acronym{ASCII} characters are
229 represented by converting them to UTF-8 byte sequences, and performing
230 @dfn{percent encoding} on the bytes. For example, the o-umlaut
231 character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
232 then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
233 @acronym{ASCII} characters must also be percent encoded when they
234 appear in URI components.)
235
236 The function @code{url-encode-url} can be used to convert a URI
237 string containing arbitrary characters to one that is properly
238 percent-encoded in accordance with RFC 3986.
239
240 @defun url-encode-url url-string
241 This function return a properly URI-encoded version of
242 @var{url-string}. It also performs @dfn{URI normalization},
243 e.g., converting the scheme component to lowercase if it was
244 previously uppercase.
245 @end defun
246
247 To convert between a string containing arbitrary characters and a
248 percent-encoded all-@acronym{ASCII} string, use the functions
249 @code{url-hexify-string} and @code{url-unhex-string}:
250
251 @defun url-hexify-string string &optional allowed-chars
252 This function performs percent-encoding on @var{string}, and returns
253 the result.
254
255 If @var{string} is multibyte, it is first converted to a UTF-8 byte
256 string. Each byte corresponding to an allowed character is left
257 as-is, while all other bytes are converted to a three-character
258 sequence: @samp{%} followed by two upper-case hex digits.
259
260 @vindex url-unreserved-chars
261 @cindex unreserved characters
262 The allowed characters are specified by @var{allowed-chars}. If this
263 argument is @code{nil}, the allowed characters are those specified as
264 @dfn{unreserved characters} by RFC 3986 (see the variable
265 @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
266 be a vector whose @var{n}-th element is non-@code{nil} if character
267 @var{n} is allowed.
268 @end defun
269
270 @defun url-unhex-string string &optional allow-newlines
271 This function replaces percent-encoding sequences in @var{string} with
272 their character equivalents, and returns the resulting string.
273
274 If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
275 carriage returns and line feeds, which are normally forbidden in URIs.
276 @end defun
277
278 @node Retrieving URLs
279 @chapter Retrieving URLs
280
281 The @code{url} library defines the following three functions for
282 retrieving the data specified by a URL@. The actual retrieval protocol
283 depends on the URL's URI scheme, and is performed by lower-level
284 scheme-specific functions. (Those lower-level functions are not
285 documented here, and generally should not be called directly.)
286
287 In each of these functions, the @var{url} argument can be either a
288 string or a parsed URL structure. If it is a string, that string is
289 passed through @code{url-encode-url} before using it, to ensure that
290 it is properly URI-encoded (@pxref{URI Encoding}).
291
292 @defun url-retrieve-synchronously url
293 This function synchronously retrieves the data specified by @var{url},
294 and returns a buffer containing the data. The return value is
295 @code{nil} if there is no data associated with the URL (as is the case
296 for @code{dired}, @code{info}, and @code{mailto} URLs).
297 @end defun
298
299 @defun url-retrieve url callback &optional cbargs silent no-cookies
300 This function retrieves @var{url} asynchronously, calling the function
301 @var{callback} when the object has been completely retrieved. The
302 return value is the buffer into which the data will be inserted, or
303 @code{nil} if the process has already completed.
304
305 The callback function is called this way:
306
307 @example
308 (apply @var{callback} @var{status} @var{cbargs})
309 @end example
310
311 @noindent
312 where @var{status} is a plist representing what happened during the
313 retrieval, with most recent events first, or an empty list if no
314 events have occurred. Each pair in the plist is one of:
315
316 @table @code
317 @item (:redirect @var{redirected-to})
318 This means that the request was redirected to the URL
319 @var{redirected-to}.
320
321 @item (:error (@var{error-symbol} . @var{data}))
322 This means that an error occurred. If so desired, the error can be
323 signaled with @code{(signal @var{error-symbol} @var{data})}.
324 @end table
325
326 When the callback function is called, the current buffer is the one
327 containing the retrieved data (if any). The buffer also contains any
328 MIME headers associated with the data retrieval.
329
330 If the optional argument @var{silent} is non-@code{nil}, progress
331 messages are suppressed. If the optional argument @var{no-cookies} is
332 non-@code{nil}, cookies are not stored or sent.
333 @end defun
334
335 @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
336 This function acts like @code{url-retrieve}, but with limits on the
337 number of concurrently-running network processes. The option
338 @code{url-queue-parallel-processes} controls the number of concurrent
339 processes, and the option @code{url-queue-timeout} sets a timeout in
340 seconds.
341
342 To use this function, you must @code{(require 'url-queue)}.
343 @end defun
344
345 @vindex url-queue-parallel-processes
346 @defopt url-queue-parallel-processes
347 The value of this option is an integer specifying the maximum number
348 of concurrent @code{url-queue-retrieve} network processes. If the
349 number of @code{url-queue-retrieve} calls is larger than this number,
350 later ones are queued until earlier ones are finished.
351 @end defopt
352
353 @vindex url-queue-timeout
354 @defopt url-queue-timeout
355 The value of this option is a number specifying the maximum lifetime
356 of a @code{url-queue-retrieve} network process, once it is started.
357 If a process is not finished by then, it is killed and removed from
358 the queue.
359 @end defopt
360
361 @node Supported URL Types
362 @chapter Supported URL Types
363
364 This chapter describes functions and variables affecting URL retrieval
365 for specific schemes.
366
367 @menu
368 * http/https:: Hypertext Transfer Protocol.
369 * file/ftp:: Local files and FTP archives.
370 * info:: Emacs "Info" pages.
371 * mailto:: Sending email.
372 * news/nntp/snews:: Usenet news.
373 * rlogin/telnet/tn3270:: Remote host connectivity.
374 * irc:: Internet Relay Chat.
375 * data:: Embedded data URLs.
376 * nfs:: Networked File System
377 * ldap:: Lightweight Directory Access Protocol
378 * man:: Unix man pages.
379 @end menu
380
381 @node http/https
382 @section @code{http} and @code{https}
383
384 The @code{http} scheme refers to the Hypertext Transfer Protocol. The
385 @code{url} library supports HTTP version 1.1, specified in RFC 2616.
386 Its default port is 80.
387
388 The @code{https} scheme is a secure version of @code{http}, with
389 transmission via SSL@. It is defined in RFC 2069, and its default port
390 is 443. When using @code{https}, the @code{url} library performs SSL
391 encryption via the @code{ssl} library, by forcing the @code{ssl}
392 gateway method to be used. @xref{Gateways in general}.
393
394 @defopt url-honor-refresh-requests
395 If this option is non-@code{nil} (the default), the @code{url} library
396 honors the HTTP @samp{Refresh} header, which is used by servers to
397 direct clients to reload documents from the same URL or a or different
398 one. If the value is @code{nil}, the @samp{Refresh} header is
399 ignored; any other value means to ask the user on each request.
400 @end defopt
401
402 @menu
403 * Cookies::
404 * HTTP language/coding::
405 * HTTP URL Options::
406 * Dealing with HTTP documents::
407 @end menu
408
409 @node Cookies
410 @subsection Cookies
411
412 @findex url-cookie-delete
413 @defun url-cookie-list
414 This command creates a @file{*url cookies*} buffer listing the current
415 cookies, if there are any. You can remove a cookie using the
416 @kbd{C-k} (@code{url-cookie-delete}) command.
417 @end defun
418
419 @defopt url-cookie-file
420 The file in which cookies are stored, defaulting to @file{cookies} in
421 the directory specified by @code{url-configuration-directory}.
422 @end defopt
423
424 @defopt url-cookie-confirmation
425 Specifies whether confirmation is require to accept cookies.
426 @end defopt
427
428 @defopt url-cookie-multiple-line
429 Specifies whether to put all cookies for the server on one line in the
430 HTTP request to satisfy broken servers like
431 @url{http://www.hotmail.com}.
432 @end defopt
433
434 @defopt url-cookie-trusted-urls
435 A list of regular expressions matching URLs from which to accept
436 cookies always.
437 @end defopt
438
439 @defopt url-cookie-untrusted-urls
440 A list of regular expressions matching URLs from which to reject
441 cookies always.
442 @end defopt
443
444 @defopt url-cookie-save-interval
445 The number of seconds between automatic saves of cookies to disk.
446 Default is one hour.
447 @end defopt
448
449
450 @node HTTP language/coding
451 @subsection Language and Encoding Preferences
452
453 HTTP allows clients to express preferences for the language and
454 encoding of documents which servers may honor. For each of these
455 variables, the value is a string; it can specify a single choice, or
456 it can be a comma-separated list.
457
458 Normally, this list is ordered by descending preference. However, each
459 element can be followed by @samp{;q=@var{priority}} to specify its
460 preference level, a decimal number from 0 to 1; e.g., for
461 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
462 en;q=0.7"}}. An element that has no @samp{;q} specification has
463 preference level 1.
464
465 @defopt url-mime-charset-string
466 @cindex character sets
467 @cindex coding systems
468 This variable specifies a preference for character sets when documents
469 can be served in more than one encoding.
470
471 HTTP allows specifying a series of MIME charsets which indicate your
472 preferred character set encodings, e.g., Latin-9 or Big5, and these
473 can be weighted. The default series is generated automatically from
474 the associated MIME types of all defined coding systems, sorted by the
475 coding system priority specified in Emacs. @xref{Recognize Coding, ,
476 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
477 @end defopt
478
479 @defopt url-mime-language-string
480 @cindex language preferences
481 A string specifying the preferred language when servers can serve
482 files in several languages. Use RFC 1766 abbreviations, e.g.,
483 @samp{en} for English, @samp{de} for German.
484
485 The string can be @code{"*"} to get the first available language (as
486 opposed to the default).
487 @end defopt
488
489 @node HTTP URL Options
490 @subsection HTTP URL Options
491
492 HTTP supports an @samp{OPTIONS} method describing things supported by
493 the URL@.
494
495 @defun url-http-options url
496 Returns a property list describing options available for URL@. The
497 property list members are:
498
499 @table @code
500 @item methods
501 A list of symbols specifying what HTTP methods the resource
502 supports.
503
504 @item dav
505 @cindex DAV
506 A list of numbers specifying what DAV protocol/schema versions are
507 supported.
508
509 @item dasl
510 @cindex DASL
511 A list of supported DASL search types supported (string form).
512
513 @item ranges
514 A list of the units available for use in partial document fetches.
515
516 @item p3p
517 @cindex P3P
518 The @dfn{Platform For Privacy Protection} description for the resource.
519 Currently this is just the raw header contents.
520 @end table
521
522 @end defun
523
524 @node Dealing with HTTP documents
525 @subsection Dealing with HTTP documents
526
527 HTTP URLs are retrieved into a buffer containing the HTTP headers
528 followed by the body. Since the headers are quasi-MIME, they may be
529 processed using the MIME library. @xref{Top,, Emacs MIME,
530 emacs-mime, The Emacs MIME Manual}.
531
532 @node file/ftp
533 @section file and ftp
534 @cindex files
535 @cindex FTP
536 @cindex File Transfer Protocol
537 @cindex compressed files
538 @cindex dired
539
540 The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
541 @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
542 Such URLs have the form
543
544 @example
545 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
546 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
547 @end example
548
549 @noindent
550 If the URL specifies a local file, it is retrieved by reading the file
551 contents in the usual way. If it specifies a remote file, it is
552 retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
553 The GNU Emacs Manual}.
554
555 When retrieving a compressed file, it is automatically uncompressed
556 if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
557 @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
558 hard-coded, and cannot be altered by customizing
559 @code{jka-compr-compression-info-list}.)
560
561 @defopt url-directory-index-file
562 This option specifies the filename to look for when a @code{file} or
563 @code{ftp} URL specifies a directory. The default is
564 @file{index.html}. If this file exists and is readable, it is viewed.
565 Otherwise, Emacs visits the directory using Dired.
566 @end defopt
567
568 @node info
569 @section info
570 @cindex Info
571 @cindex Texinfo
572 @findex Info-goto-node
573
574 The @code{info} scheme is non-standard. Such URLs have the form
575
576 @example
577 info:@var{file}#@var{node}
578 @end example
579
580 @noindent
581 and are retrieved by invoking @code{Info-goto-node} with argument
582 @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
583 @samp{Top} node is opened.
584
585 @node mailto
586 @section mailto
587
588 @cindex mailto
589 @cindex email
590 A @code{mailto} URL specifies an email message to be sent to a given
591 email address. For example, @samp{mailto:foo@@bar.com} specifies
592 sending a message to @samp{foo@@bar.com}. The ``retrieval method''
593 for such URLs is to open a mail composition buffer in which the
594 appropriate content (e.g., the recipient address) has been filled in.
595
596 As defined in RFC 2368, a @code{mailto} URL has the form
597
598 @example
599 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
600 @end example
601
602 @noindent
603 where an arbitrary number of @var{header}s can be added. If the
604 @var{header} is @samp{body}, then @var{contents} is put in the message
605 body; otherwise, a @var{header} header field is created with
606 @var{contents} as its contents. Note that the @code{url} library does
607 not perform any checking of @var{header} or @var{contents}, so you
608 should check them before sending the message.
609
610 @defopt url-mail-command
611 @vindex mail-user-agent
612 The value of this variable is the function called whenever url needs
613 to send mail. This should normally be left its default, which is the
614 standard mail-composition command @code{compose-mail}. @xref{Sending
615 Mail,,, emacs, The GNU Emacs Manual}.
616 @end defopt
617
618 If the document containing the @code{mailto} URL itself possessed a
619 known URL, Emacs automatically inserts an @samp{X-Url-From} header
620 field into the mail buffer, specifying that URL.
621
622 @node news/nntp/snews
623 @section @code{news}, @code{nntp} and @code{snews}
624 @cindex news
625 @cindex network news
626 @cindex usenet
627 @cindex NNTP
628 @cindex snews
629
630 The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
631 1738, are used for reading Usenet newsgroups. For compatibility with
632 non-standard-compliant news clients, the @code{url} library allows
633 host and port fields to be included in @code{news} URLs, even though
634 they are properly only allowed for @code{nntp} and @code{snews}.
635
636 @code{news} and @code{nntp} URLs have the following form:
637
638 @table @samp
639 @item news:@var{newsgroup}
640 Retrieves a list of messages in @var{newsgroup};
641 @item news:@var{message-id}
642 Retrieves the message with the given @var{message-id};
643 @item news:*
644 Retrieves a list of all available newsgroups;
645 @item nntp://@var{host}:@var{port}/@var{newsgroup}
646 @itemx nntp://@var{host}:@var{port}/@var{message-id}
647 @itemx nntp://@var{host}:@var{port}/*
648 Similar to the @samp{news} versions.
649 @end table
650
651 The default port for @code{nntp} (and @code{news}) is 119. The
652 difference between an @code{nntp} URL and a @code{news} URL is that an
653 @code{nttp} URL may specify an article by its number. The
654 @samp{snews} scheme is the same as @samp{nntp}, except that it is
655 tunneled through SSL and has default port 563.
656
657 These URLs are retrieved via the Gnus package.
658
659 @cindex environment variable
660 @vindex NNTPSERVER
661 @defopt url-news-server
662 This variable specifies the default news server from which to fetch
663 news, if no server was specified in the URL@. The default value,
664 @code{nil}, means to use the server specified by the standard
665 environment variable @samp{NNTPSERVER}, or @samp{news} if that
666 environment variable is unset.
667 @end defopt
668
669 @node rlogin/telnet/tn3270
670 @section rlogin, telnet and tn3270
671 @cindex rlogin
672 @cindex telnet
673 @cindex tn3270
674 @cindex terminal emulation
675 @findex terminal-emulator
676
677 These URL schemes are defined in RFC 1738, and are used for logging in
678 via a terminal emulator. They have the form
679
680 @example
681 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
682 @end example
683
684 @noindent
685 but the @var{password} component is ignored.
686
687 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
688 @code{telnet} or @code{tn3270} (the program names and arguments are
689 hardcoded) session is run in a @code{terminal-emulator} buffer.
690 Well-known ports are used if the URL does not specify a port.
691
692 @node irc
693 @section irc
694 @cindex IRC
695 @cindex Internet Relay Chat
696 @cindex ZEN IRC
697 @cindex ERC
698 @cindex rcirc
699
700 The @code{irc} scheme is defined in the Internet Draft at
701 @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
702 was never approved as an RFC). Such URLs have the form
703
704 @example
705 irc://@var{host}:@var{port}/@var{target},@var{needpass}
706 @end example
707
708 @noindent
709 and are retrieved by opening an @acronym{IRC} session using the
710 function specified by @code{url-irc-function}.
711
712 @defopt url-irc-function
713 The value of this option is a function, which is called to open an IRC
714 connection for @code{irc} URLs. This function must take five
715 arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
716 @var{password}. The @var{channel} argument specifies the channel to
717 join immediately, and may be @code{nil}.
718
719 The default is @code{url-irc-rcirc}, which uses the Rcirc package.
720 Other options are @code{url-irc-erc} (which uses ERC) and
721 @code{url-irc-zenirc} (which uses ZenIRC).
722 @end defopt
723
724 @node data
725 @section data
726 @cindex data URLs
727
728 The @code{data} scheme, defined in RFC 2397, contains MIME data in
729 the URL itself. Such URLs have the form
730
731 @example
732 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
733 @end example
734
735 @noindent
736 @var{media-type} is a MIME @samp{Content-Type} string, possibly
737 including parameters. It defaults to
738 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
739 omitted but the charset parameter supplied. If @samp{;base64} is
740 present, the @var{data} are base64-encoded.
741
742 @node nfs
743 @section nfs
744 @cindex NFS
745 @cindex Network File System
746 @cindex automounter
747
748 The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
749 except that it points to a file on a remote host that is handled by an
750 NFS automounter on the local host. Such URLs have the form
751
752 @example
753 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
754 @end example
755
756 @defvar url-nfs-automounter-directory-spec
757 @end defvar
758 A string saying how to invoke the NFS automounter. Certain @samp{%}
759 sequences are recognized:
760
761 @table @samp
762 @item %h
763 The hostname of the NFS server;
764 @item %n
765 The port number of the NFS server;
766 @item %u
767 The username to use to authenticate;
768 @item %p
769 The password to use to authenticate;
770 @item %f
771 The filename on the remote server;
772 @item %%
773 A literal @samp{%}.
774 @end table
775
776 Each can be used any number of times.
777
778 @node ldap
779 @section ldap
780 @cindex LDAP
781 @cindex Lightweight Directory Access Protocol
782
783 The LDAP scheme is defined in RFC 2255.
784
785 @node man
786 @section man
787 @cindex @command{man}
788 @cindex Unix man pages
789 @findex man
790
791 The @code{man} scheme is a non-standard one. Such URLs have the form
792
793 @example
794 @samp{man:@var{page-spec}}
795 @end example
796
797 @noindent
798 and are retrieved by passing @var{page-spec} to the Lisp function
799 @code{man}.
800
801 @node General Facilities
802 @chapter General Facilities
803
804 @menu
805 * Disk Caching::
806 * Proxies::
807 * Gateways in general::
808 * History::
809 @end menu
810
811 @node Disk Caching
812 @section Disk Caching
813 @cindex Caching
814 @cindex Persistent Cache
815 @cindex Disk Cache
816
817 The disk cache stores retrieved documents locally, whence they can be
818 retrieved more quickly. When requesting a URL that is in the cache,
819 the library checks to see if the page has changed since it was last
820 retrieved from the remote machine. If not, the local copy is used,
821 saving the transmission over the network.
822 @cindex Cleaning the cache
823 @cindex Clearing the cache
824 @cindex Cache cleaning
825 Currently the cache isn't cleared automatically.
826 @c Running the @code{clean-cache} shell script
827 @c fist is recommended, to allow for future cleaning of the cache. This
828 @c shell script will remove all files that have not been accessed since it
829 @c was last run. To keep the cache pared down, it is recommended that this
830 @c script be run from @i{at} or @i{cron} (see the manual pages for
831 @c crontab(5) or at(1) for more information)
832
833 @defopt url-automatic-caching
834 Setting this variable non-@code{nil} causes documents to be cached
835 automatically.
836 @end defopt
837
838 @defopt url-cache-directory
839 This variable specifies the
840 directory to store the cache files. It defaults to sub-directory
841 @file{cache} of @code{url-configuration-directory}.
842 @end defopt
843
844 @defopt url-cache-creation-function
845 The cache relies on a scheme for mapping URLs to files in the cache.
846 This variable names a function which sets the type of cache to use.
847 It takes a URL as argument and returns the absolute file name of the
848 corresponding cache file. The two supplied possibilities are
849 @code{url-cache-create-filename-using-md5} and
850 @code{url-cache-create-filename-human-readable}.
851 @end defopt
852
853 @defun url-cache-create-filename-using-md5 url
854 Creates a cache file name from @var{url} using MD5 hashing.
855 This is creates entries with very few cache collisions and is fast.
856 @cindex MD5
857 @smallexample
858 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
859 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
860 @end smallexample
861 @end defun
862
863 @defun url-cache-create-filename-human-readable url
864 Creates a cache file name from @var{url} more obviously connected to
865 @var{url} than for @code{url-cache-create-filename-using-md5}, but
866 more likely to conflict with other files.
867 @smallexample
868 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
869 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
870 @end smallexample
871 @end defun
872
873 @defun url-cache-expired
874 This function returns non-@code{nil} if a cache entry has expired (or is absent).
875 The arguments are a URL and optional expiration delay in seconds
876 (default @var{url-cache-expire-time}).
877 @end defun
878
879 @defopt url-cache-expire-time
880 This variable is the default number of seconds to use for the
881 expire-time argument of the function @code{url-cache-expired}.
882 @end defopt
883
884 @defun url-fetch-from-cache
885 This function takes a URL as its argument and returns a buffer
886 containing the data cached for that URL.
887 @end defun
888
889 @c Fixme: never actually used currently?
890 @c @defopt url-standalone-mode
891 @c @cindex Relying on cache
892 @c @cindex Cache only mode
893 @c @cindex Standalone mode
894 @c If this variable is non-@code{nil}, the library relies solely on the
895 @c cache for fetching documents and avoids checking if they have changed
896 @c on remote servers.
897 @c @end defopt
898
899 @c With a large cache of documents on the local disk, it can be very handy
900 @c when traveling, or any other time the network connection is not active
901 @c (a laptop with a dial-on-demand PPP connection, etc.). Emacs/W3 can rely
902 @c solely on its cache, and avoid checking to see if the page has changed
903 @c on the remote server. In the case of a dial-on-demand PPP connection,
904 @c this will keep the phone line free as long as possible, only bringing up
905 @c the PPP connection when asking for a page that is not located in the
906 @c cache. This is very useful for demonstrations as well.
907
908 @node Proxies
909 @section Proxies and Gatewaying
910
911 @c fixme: check/document url-ns stuff
912 @cindex proxy servers
913 @cindex proxies
914 @cindex environment variables
915 @vindex HTTP_PROXY
916 Proxy servers are commonly used to provide gateways through firewalls
917 or as caches serving some more-or-less local network. Each protocol
918 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
919 conventionally configured commonly amongst different programs through
920 environment variables of the form @code{@var{protocol}_proxy}, where
921 @var{protocol} is one of the supported network protocols (@code{http},
922 @code{ftp} etc.). The library recognizes such variables in either
923 upper or lower case. Their values are of one of the forms:
924 @itemize @bullet
925 @item @code{@var{host}:@var{port}}
926 @item A full URL;
927 @item Simply a host name.
928 @end itemize
929
930 @vindex NO_PROXY
931 The @code{NO_PROXY} environment variable specifies URLs that should be
932 excluded from proxying (on servers that should be contacted directly).
933 This should be a comma-separated list of hostnames, domain names, or a
934 mixture of both. Asterisks can be used as wildcards, but other
935 clients may not support that. Domain names may be indicated by a
936 leading dot. For example:
937 @example
938 NO_PROXY="*.aventail.com,home.com,.seanet.com"
939 @end example
940 @noindent says to contact all machines in the @samp{aventail.com} and
941 @samp{seanet.com} domains directly, as well as the machine named
942 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
943 and @code{no_proxy} are also tried, in that order.
944
945 Proxies may also be specified directly in Lisp.
946
947 @defopt url-proxy-services
948 This variable is an alist of URL schemes and proxy servers that
949 gateway them. The items are of the form @w{@code{(@var{scheme}
950 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
951 gatewayed through @var{portnumber} on the specified @var{host}. An
952 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
953 a regexp matching host names not to be proxied. This variable is
954 initialized from the environment as above.
955
956 @example
957 (setq url-proxy-services
958 '(("http" . "proxy.aventail.com:80")
959 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
960 @end example
961 @end defopt
962
963 @node Gateways in general
964 @section Gateways in General
965 @cindex gateways
966 @cindex firewalls
967
968 The library provides a general gateway layer through which all
969 networking passes. It can both control access to the network and
970 provide access through gateways in firewalls. This may make direct
971 connections in some cases and pass through some sort of gateway in
972 others.@footnote{Proxies (which only operate over HTTP) are
973 implemented using this.} The library's basic function responsible for
974 making connections is @code{url-open-stream}.
975
976 @defun url-open-stream name buffer host service
977 @cindex opening a stream
978 @cindex stream, opening
979 Open a stream to @var{host}, possibly via a gateway. The other
980 arguments are as for @code{open-network-stream}. This will not make a
981 connection if @code{url-gateway-unplugged} is non-@code{nil}.
982 @end defun
983
984 @defvar url-gateway-local-host-regexp
985 This is a regular expression that matches local hosts that do not
986 require the use of a gateway. If @code{nil}, all connections are made
987 through the gateway.
988 @end defvar
989
990 @defvar url-gateway-method
991 This variable controls which gateway method is used. It may be useful
992 to bind it temporarily in some applications. It has values taken from
993 a list of symbols. Possible values are:
994
995 @table @code
996 @item telnet
997 @cindex @command{telnet}
998 Use this method if you must first telnet and log into a gateway host,
999 and then run telnet from that host to connect to outside machines.
1000
1001 @item rlogin
1002 @cindex @command{rlogin}
1003 This method is identical to @code{telnet}, but uses @command{rlogin}
1004 to log into the remote machine without having to send the username and
1005 password over the wire every time.
1006
1007 @item socks
1008 @cindex @sc{socks}
1009 Use if the firewall has a @sc{socks} gateway running on it. The
1010 @sc{socks} v5 protocol is defined in RFC 1928.
1011
1012 @c @item ssl
1013 @c This probably shouldn't be documented
1014 @c Fixme: why not? -- fx
1015
1016 @item native
1017 This method uses Emacs's builtin networking directly. This is the
1018 default. It can be used only if there is no firewall blocking access.
1019 @end table
1020 @end defvar
1021
1022 The following variables control the gateway methods.
1023
1024 @defopt url-gateway-telnet-host
1025 The gateway host to telnet to. Once logged in there, you then telnet
1026 out to the hosts you want to connect to.
1027 @end defopt
1028 @defopt url-gateway-telnet-parameters
1029 This should be a list of parameters to pass to the @command{telnet} program.
1030 @end defopt
1031 @defopt url-gateway-telnet-password-prompt
1032 This is a regular expression that matches the password prompt when
1033 logging in.
1034 @end defopt
1035 @defopt url-gateway-telnet-login-prompt
1036 This is a regular expression that matches the username prompt when
1037 logging in.
1038 @end defopt
1039 @defopt url-gateway-telnet-user-name
1040 The username to log in with.
1041 @end defopt
1042 @defopt url-gateway-telnet-password
1043 The password to send when logging in.
1044 @end defopt
1045 @defopt url-gateway-prompt-pattern
1046 This is a regular expression that matches the shell prompt.
1047 @end defopt
1048
1049 @defopt url-gateway-rlogin-host
1050 Host to @samp{rlogin} to before telnetting out.
1051 @end defopt
1052 @defopt url-gateway-rlogin-parameters
1053 Parameters to pass to @samp{rsh}.
1054 @end defopt
1055 @defopt url-gateway-rlogin-user-name
1056 User name to use when logging in to the gateway.
1057 @end defopt
1058 @defopt url-gateway-prompt-pattern
1059 This is a regular expression that matches the shell prompt.
1060 @end defopt
1061
1062 @defopt socks-server
1063 This specifies the default server, it takes the form
1064 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
1065 where @var{version} can be either 4 or 5.
1066 @end defopt
1067 @defvar socks-password
1068 If this is @code{nil} then you will be asked for the password,
1069 otherwise it will be used as the password for authenticating you to
1070 the @sc{socks} server.
1071 @end defvar
1072 @defvar socks-username
1073 This is the username to use when authenticating yourself to the
1074 @sc{socks} server. By default this is your login name.
1075 @end defvar
1076 @defvar socks-timeout
1077 This controls how long, in seconds, to wait for responses from the
1078 @sc{socks} server; it is 5 by default.
1079 @end defvar
1080 @c fixme: these have been effectively commented-out in the code
1081 @c @defopt socks-server-aliases
1082 @c This a list of server aliases. It is a list of aliases of the form
1083 @c @var{(alias hostname port version)}.
1084 @c @end defopt
1085 @c @defopt socks-network-aliases
1086 @c This a list of network aliases. Each entry in the list takes the form
1087 @c @var{(alias (network))} where @var{alias} is a string that names the
1088 @c @var{network}. The networks can contain a pair (not a dotted pair) of
1089 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1090 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
1091 @c address.
1092 @c @end defopt
1093 @c @defopt socks-redirection-rules
1094 @c This a list of redirection rules. Each rule take the form
1095 @c @var{(Destination network Connection type)} where @var{Destination
1096 @c network} is a network alias from @code{socks-network-aliases} and
1097 @c @var{Connection type} can be @code{nil} in which case a direct
1098 @c connection is used, or it can be an alias from
1099 @c @code{socks-server-aliases} in which case that server is used as a
1100 @c proxy.
1101 @c @end defopt
1102 @defopt socks-nslookup-program
1103 @cindex @command{nslookup}
1104 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1105 @end defopt
1106
1107 @menu
1108 * Suppressing network connections::
1109 @end menu
1110 @c * Broken hostname resolution::
1111
1112 @node Suppressing network connections
1113 @subsection Suppressing Network Connections
1114
1115 @cindex network connections, suppressing
1116 @cindex suppressing network connections
1117 @cindex bugs, HTML
1118 @cindex HTML ``bugs''
1119 In some circumstances it is desirable to suppress making network
1120 connections. A typical case is when rendering HTML in a mail user
1121 agent, when external URLs should not be activated, particularly to
1122 avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
1123 like. To arrange this, bind the following variable for the duration
1124 of such processing.
1125
1126 @defvar url-gateway-unplugged
1127 If this variable is non-@code{nil} new network connections are never
1128 opened by the URL library.
1129 @end defvar
1130
1131 @c @node Broken hostname resolution
1132 @c @subsection Broken Hostname Resolution
1133
1134 @c @cindex hostname resolver
1135 @c @cindex resolver, hostname
1136 @c Some C libraries do not include the hostname resolver routines in
1137 @c their static libraries. If Emacs was linked statically, and was not
1138 @c linked with the resolver libraries, it will not be able to get to any
1139 @c machines off the local network. This is characterized by being able
1140 @c to reach someplace with a raw ip number, but not its hostname
1141 @c (@url{http://129.79.254.191/} works, but
1142 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1143 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1144 @c rebuilt linked against the resolver library, it can use the external
1145 @c @command{nslookup} program instead.
1146
1147 @c @defopt url-gateway-broken-resolution
1148 @c @cindex @code{nslookup} program
1149 @c @cindex program, @code{nslookup}
1150 @c If non-@code{nil}, this variable says to use the program specified by
1151 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1152 @c @end defopt
1153
1154 @c @defopt url-gateway-nslookup-program
1155 @c The name of the program to do hostname lookup if Emacs can't do it
1156 @c directly. This program should expect a single argument on the command
1157 @c line---the hostname to resolve---and should produce output similar to
1158 @c the standard Unix @command{nslookup} program:
1159 @c @example
1160 @c Name: www.cs.indiana.edu
1161 @c Address: 129.79.254.191
1162 @c @end example
1163 @c @end defopt
1164
1165 @node History
1166 @section History
1167
1168 @findex url-do-setup
1169 The library can maintain a global history list tracking URLs accessed.
1170 URL completion can be done from it. The history mechanism is set up
1171 automatically via @code{url-do-setup} when it is configured to be on.
1172 Note that the size of the history list is currently not limited.
1173
1174 @vindex url-history-hash-table
1175 The history ``list'' is actually a hash table,
1176 @code{url-history-hash-table}. It contains access times keyed by URL
1177 strings. The times are in the format returned by @code{current-time}.
1178
1179 @defun url-history-update-url url time
1180 This function updates the history table with an entry for @var{url}
1181 accessed at the given @var{time}.
1182 @end defun
1183
1184 @defopt url-history-track
1185 If non-@code{nil}, the library will keep track of all the URLs
1186 accessed. If it is @code{t}, the list is saved to disk at the end of
1187 each Emacs session. The default is @code{nil}.
1188 @end defopt
1189
1190 @defopt url-history-file
1191 The file storing the history list between sessions. It defaults to
1192 @file{history} in @code{url-configuration-directory}.
1193 @end defopt
1194
1195 @defopt url-history-save-interval
1196 @findex url-history-setup-save-timer
1197 The number of seconds between automatic saves of the history list.
1198 Default is one hour. Note that if you change this variable directly,
1199 rather than using Custom, after @code{url-do-setup} has been run, you
1200 need to run the function @code{url-history-setup-save-timer}.
1201 @end defopt
1202
1203 @defun url-history-parse-history &optional fname
1204 Parses the history file @var{fname} (default @code{url-history-file})
1205 and sets up the history list.
1206 @end defun
1207
1208 @defun url-history-save-history &optional fname
1209 Saves the current history to file @var{fname} (default
1210 @code{url-history-file}).
1211 @end defun
1212
1213 @defun url-completion-function string predicate function
1214 You can use this function to do completion of URLs from the history.
1215 @end defun
1216
1217 @node Customization
1218 @chapter Customization
1219
1220 @cindex environment variables
1221 The following environment variables affect the @code{url} library's
1222 operation at startup.
1223
1224 @table @code
1225 @item TMPDIR
1226 @vindex TMPDIR
1227 @vindex url-temporary-directory
1228 If this is defined, @var{url-temporary-directory} is initialized from
1229 it.
1230 @end table
1231
1232 The following user options affect the general operation of
1233 @code{url} library.
1234
1235 @defopt url-configuration-directory
1236 @cindex configuration files
1237 The value of this variable specifies the name of the directory where
1238 the @code{url} library stores its various configuration files, cache
1239 files, etc.
1240
1241 The default value specifies a subdirectory named @file{url/} in the
1242 standard Emacs user data directory specified by the variable
1243 @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1244 the old default was @file{~/.url}, and this directory is used instead
1245 if it exists.
1246 @end defopt
1247
1248 @defopt url-debug
1249 @cindex debugging
1250 Specifies the types of debug messages which are logged to
1251 the @file{*URL-DEBUG*} buffer.
1252 @code{t} means log all messages.
1253 A number means log all messages and show them with @code{message}.
1254 It may also be a list of the types of messages to be logged.
1255 @end defopt
1256 @defopt url-personal-mail-address
1257 @end defopt
1258 @defopt url-privacy-level
1259 @end defopt
1260 @defopt url-uncompressor-alist
1261 @end defopt
1262 @defopt url-passwd-entry-func
1263 @end defopt
1264 @defopt url-standalone-mode
1265 @end defopt
1266 @defopt url-bad-port-list
1267 @end defopt
1268 @defopt url-max-password-attempts
1269 @end defopt
1270 @defopt url-temporary-directory
1271 @end defopt
1272 @defopt url-show-status
1273 @end defopt
1274 @defopt url-confirmation-func
1275 The function to use for asking yes or no functions. This is normally
1276 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1277 function taking a single argument (the prompt) and returning @code{t}
1278 only if an affirmative answer is given.
1279 @end defopt
1280 @defopt url-gateway-method
1281 @c fixme: describe gatewaying
1282 A symbol specifying the type of gateway support to use for connections
1283 from the local machine. The supported methods are:
1284
1285 @table @code
1286 @item telnet
1287 Run telnet in a subprocess to connect;
1288 @item rlogin
1289 Rlogin to another machine to connect;
1290 @item socks
1291 Connect through a socks server;
1292 @item ssl
1293 Connect with SSL;
1294 @item native
1295 Connect directly.
1296 @end table
1297 @end defopt
1298
1299 @node GNU Free Documentation License
1300 @appendix GNU Free Documentation License
1301 @include doclicense.texi
1302
1303 @node Function Index
1304 @unnumbered Command and Function Index
1305 @printindex fn
1306
1307 @node Variable Index
1308 @unnumbered Variable Index
1309 @printindex vr
1310
1311 @node Concept Index
1312 @unnumbered Concept Index
1313 @printindex cp
1314
1315 @bye