@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990-1995, 1998-1999, 2001-2012
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-1999, 2001-2016 Free Software
+@c Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@node Syntax Tables
@chapter Syntax Tables
Usually, this designator character is one that is often assigned that
class; however, its meaning as a designator is unvarying and
independent of what syntax that character currently has. Thus,
-@samp{\} as a designator character always means ``escape character''
+@samp{\} as a designator character always stands for escape character
syntax, regardless of whether the @samp{\} character actually has that
syntax in the current syntax table.
@ifnottex
The first character in a syntax descriptor must be a syntax class
designator character. The second character, if present, specifies a
-matching character (e.g.@: in Lisp, the matching character for
+matching character (e.g., in Lisp, the matching character for
@samp{(} is @samp{)}); a space specifies that there is no matching
character. Then come characters specifying additional syntax
properties (@pxref{Syntax Flags}).
punctuation, matching character slot unused, first character of a
comment-starter, second character of a comment-ender).
+ Emacs also defines @dfn{raw syntax descriptors}, which are used to
+describe syntax classes at a lower level. @xref{Syntax Table
+Internals}.
+
@menu
* Syntax Class Table:: Table of syntax classes.
* Syntax Flags:: Additional flags each character can have.
@node Syntax Class Table
@subsection Table of Syntax Classes
+@cindex syntax class table
Here is a table of syntax classes, the characters that designate
them, their meanings, and examples of their use.
The Lisp modes have two string quote characters: double-quote (@samp{"})
and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it
is used in Common Lisp. C also has two string quote characters:
-double-quote for strings, and single-quote (@samp{'}) for character
+double-quote for strings, and apostrophe (@samp{'}) for character
constants.
Human text has no string quote characters. We do not want quotation
comment delimiter, @samp{n} on either character makes it
nestable.
+@cindex comment style
Emacs supports several comment styles simultaneously in any one syntax
table. A comment style is a set of flags @samp{b}, @samp{c}, and
@samp{n}, so there can be up to 8 different comment styles.
@end table
@item
-@samp{p} identifies an additional ``prefix character'' for Lisp syntax.
+@samp{p} identifies an additional prefix character for Lisp syntax.
These characters are treated as whitespace when they appear between
expressions. When they appear within an expression, they are handled
according to their usual syntax classes.
otherwise, the parent is the standard syntax table.
In the new syntax table, all characters are initially given the
-``inherit'' (@samp{@@}) syntax class, i.e.@: their syntax is inherited
+``inherit'' (@samp{@@}) syntax class, i.e., their syntax is inherited
from the parent table (@pxref{Syntax Class Table}).
@end defun
@end defun
@deffn Command modify-syntax-entry char syntax-descriptor &optional table
+@cindex syntax entry, setting
This function sets the syntax entry for @var{char} according to
@var{syntax-descriptor}. @var{char} must be a character, or a cons
cell of the form @code{(@var{min} . @var{max})}; in the latter case,
The syntax is changed only for @var{table}, which defaults to the
current buffer's syntax table, and not in any other syntax table.
-The argument @var{syntax-descriptor} is a syntax descriptor, i.e.@: a
+The argument @var{syntax-descriptor} is a syntax descriptor, i.e., a
string whose first character is a syntax class designator and whose
second and subsequent characters optionally specify a matching
character and syntax flags. @xref{Syntax Descriptors}. An error is
@end group
@group
-;; Forward slash characters have punctuation syntax. Note that this
-;; @code{char-syntax} call does not reveal that it is also part of
-;; comment-start and -end sequences.
+;; Forward slash characters have punctuation syntax.
+;; Note that this @code{char-syntax} call does not reveal
+;; that it is also part of comment-start and -end sequences.
(string (char-syntax ?/))
@result{} "."
@end group
@group
-;; Open parenthesis characters have open parenthesis syntax. Note
-;; that this @code{char-syntax} call does not reveal that it has a
-;; matching character, @samp{)}.
+;; Open parenthesis characters have open parenthesis syntax.
+;; Note that this @code{char-syntax} call does not reveal that
+;; it has a matching character, @samp{)}.
(string (char-syntax ?\())
@result{} "("
@end group
the current buffer.
@end defun
-@defmac with-syntax-table @var{table} @var{body}@dots{}
+@deffn Command describe-syntax &optional buffer
+This command displays the contents of the syntax table of
+@var{buffer} (by default, the current buffer) in a help buffer.
+@end deffn
+
+@defmac with-syntax-table table body@dots{}
This macro executes @var{body} using @var{table} as the current syntax
table. It returns the value of the last form in @var{body}, after
restoring the old current syntax table.
underlying text character.
@item @code{(@var{syntax-code} . @var{matching-char})}
-A cons cell of this format specifies the syntax for the underlying
-text character. (@pxref{Syntax Table Internals})
+A cons cell of this format is a raw syntax descriptor (@pxref{Syntax
+Table Internals}), which directly specifies a syntax class for the
+underlying text character.
@item @code{nil}
If the property is @code{nil}, the character's syntax is determined from
@node Motion and Syntax
@section Motion and Syntax
+@cindex moving across syntax classes
+@cindex skipping characters of certain syntax
This section describes functions for moving across characters that
have certain syntax classes.
@node Parsing Expressions
@section Parsing Expressions
+@cindex parsing expressions
+@cindex scanning expressions
This section describes functions for parsing and scanning balanced
expressions. We will refer to such expressions as @dfn{sexps},
following the terminology of Lisp, even though these functions can act
on languages other than Lisp. Basically, a sexp is either a balanced
-parenthetical grouping, a string, or a ``symbol'' (i.e.@: a sequence
+parenthetical grouping, a string, or a symbol (i.e., a sequence
of characters whose syntax is either word constituent or symbol
constituent). However, characters in the expression prefix syntax
class (@pxref{Syntax Class Table}) are treated as part of the sexp if
A character's syntax controls how it changes the state of the
parser, rather than describing the state itself. For example, a
string delimiter character toggles the parser state between
-``in-string'' and ``in-code'', but the syntax of characters does not
+in-string and in-code, but the syntax of characters does not
directly say whether they are inside a string. For example (note that
15 is the syntax code for generic string delimiters),
@node Motion via Parsing
@subsection Motion Commands Based on Parsing
+@cindex motion based on parsing
This section describes simple point-motion functions that operate
based on parsing expressions.
expected, with nothing except whitespace between them, it returns
@code{t}; otherwise it returns @code{nil}.
-This function cannot tell whether the ``comments'' it traverses are
+This function cannot tell whether the comments it traverses are
embedded within a string. If they look like comments, it treats them
as comments.
@node Position Parse
@subsection Finding the Parse State for a Position
+@cindex parse state for a position
For syntactic analysis, such as in indentation, often the useful
thing is to compute the syntactic state corresponding to a given buffer
@var{start}, not scanning past @var{limit}. It stops at position
@var{limit} or when certain criteria described below are met, and sets
point to the location where parsing stops. It returns a parser state
+@ifinfo
+(@pxref{Parser State})
+@end ifinfo
describing the status of the parse at the point where it stops.
@cindex parenthesis depth
@node Control Parsing
@subsection Parameters to Control Parsing
+@cindex parsing, control parameters
@defvar multibyte-syntax-as-symbol
If this variable is non-@code{nil}, @code{scan-sexps} treats all
The behavior of @code{parse-partial-sexp} is also affected by
@code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}).
+@defvar comment-end-can-be-escaped
+If this buffer local variable is non-@code{nil}, a single character
+which usually terminates a comment doesn't do so when that character
+is escaped. This is used in C and C++ Modes, where line comments
+starting with @samp{//} can be continued onto the next line by
+escaping the newline with @samp{\}.
+@end defvar
+
You can use @code{forward-comment} to move forward or backward over
one comment or several comments.
as syntax properties (@pxref{Syntax Properties}).
@cindex syntax code
- Each entry in a syntax table is a cons cell of the form
-@code{(@var{syntax-code} . @var{matching-char})}. @var{syntax-code}
-is an integer that encodes the syntax class and syntax flags,
-according to the table below. @var{matching-char}, if non-@code{nil},
-specifies a matching character (similar to the second character in a
-syntax descriptor).
+@cindex raw syntax descriptor
+ Each entry in a syntax table is a @dfn{raw syntax descriptor}: a
+cons cell of the form @code{(@var{syntax-code}
+. @var{matching-char})}. @var{syntax-code} is an integer which
+encodes the syntax class and syntax flags, according to the table
+below. @var{matching-char}, if non-@code{nil}, specifies a matching
+character (similar to the second character in a syntax descriptor).
+
+ Here are the syntax codes corresponding to the various syntax
+classes:
@multitable @columnfractions .2 .3 .2 .3
@item
-@i{Syntax code} @tab @i{Class} @tab @i{Syntax code} @tab @i{Class}
+@i{Code} @tab @i{Class} @tab @i{Code} @tab @i{Class}
@item
0 @tab whitespace @tab 8 @tab paired delimiter
@item
@noindent
For example, in the standard syntax table, the entry for @samp{(} is
-@code{(4 . 41)}. (41 is the character code for @samp{)}.)
+@code{(4 . 41)}. 41 is the character code for @samp{)}.
Syntax flags are encoded in higher order bits, starting 16 bits from
the least significant bit. This table gives the power of two which
@samp{4} @tab @code{(lsh 1 19)}
@end multitable
-@defun string-to-syntax @var{desc}
-Given a syntax descriptor @var{desc}, this function returns the
-corresponding internal form, a cons cell @code{(@var{syntax-code}
-. @var{matching-char})}.
+@defun string-to-syntax desc
+Given a syntax descriptor @var{desc} (a string), this function returns
+the corresponding raw syntax descriptor.
@end defun
@defun syntax-after pos
-This function returns the syntax code of the character in the buffer
-after position @var{pos}, taking account of syntax properties as well
-as the syntax table. If @var{pos} is outside the buffer's accessible
-portion (@pxref{Narrowing, accessible portion}), this function returns
-@code{nil}.
+This function returns the raw syntax descriptor for the character in
+the buffer after position @var{pos}, taking account of syntax
+properties as well as the syntax table. If @var{pos} is outside the
+buffer's accessible portion (@pxref{Narrowing, accessible portion}),
+the return value is @code{nil}.
@end defun
@defun syntax-class syntax
-This function returns the syntax class of the syntax code
-@var{syntax}. (It masks off the high 16 bits that hold the flags
-encoded in the syntax descriptor.) If @var{syntax} is @code{nil}, it
-returns @code{nil}; this is so evaluating the expression
+This function returns the syntax code for the raw syntax descriptor
+@var{syntax}. More precisely, it takes the raw syntax descriptor's
+@var{syntax-code} component, masks off the high 16 bits which record
+the syntax flags, and returns the resulting integer.
+
+If @var{syntax} is @code{nil}, the return value is returns @code{nil}.
+This is so that the expression
@example
(syntax-class (syntax-after pos))
@end example
@noindent
-where @code{pos} is outside the buffer's accessible portion, will
-yield @code{nil} without throwing errors or producing wrong syntax
-class codes.
+evaluates to @code{nil} if @code{pos} is outside the buffer's
+accessible portion, without throwing errors or returning an incorrect
+code.
@end defun
@node Categories
the range @w{@samp{ }} to @samp{~}. You specify the name of a category
when you define it with @code{define-category}.
+@cindex category set
The category table is actually a char-table (@pxref{Char-Tables}).
The element of the category table at index @var{c} is a @dfn{category
set}---a bool-vector---that indicates which categories character @var{c}
Here's an example of defining a new category for characters that have
strong right-to-left directionality (@pxref{Bidirectional Display})
-and using it in a special category table:
+and using it in a special category table. To obtain the information
+about the directionality of characters, the example code uses the
+@samp{bidi-class} Unicode property (@pxref{Character Properties,
+bidi-class}).
@example
(defvar special-category-table-for-bidi
+ ;; Make an empty category-table.
(let ((category-table (make-category-table))
- (uniprop-table (unicode-property-table-internal 'bidi-class)))
+ ;; Create a char-table which gives the 'bidi-class' Unicode
+ ;; property for each character.
+ (uniprop-table (unicode-property-table-internal 'bidi-class)))
(define-category ?R "Characters of bidi-class R, AL, or RLO"
category-table)
+ ;; Modify the category entry of each character whose 'bidi-class'
+ ;; Unicode property is R, AL, or RLO -- these have a
+ ;; right-to-left directionality.
(map-char-table
#'(lambda (key val)
- (if (memq val '(R AL RLO))
- (modify-category-entry key ?R category-table)))
+ (if (memq val '(R AL RLO))
+ (modify-category-entry key ?R category-table)))
uniprop-table)
category-table))
@end example