Programming in Emacs Lisp: Digression into C

8.4 Digression into C

The copy-region-as-kill function (see copy-region-as-kill) uses the filter-buffer-substring function, which in turn uses the delete-and-extract-region function. It removes the contents of a region and you cannot get them back.

Unlike the other code discussed here, the delete-and-extract-region function is not written in Emacs Lisp; it is written in C and is one of the primitives of the GNU Emacs system. Since it is very simple, I will digress briefly from Lisp and describe it here.

Like many of the other Emacs primitives, delete-and-extract-region is written as an instance of a C macro, a macro being a template for code. The complete macro looks like this:

DEFUN ("delete-and-extract-region", Fdelete_and_extract_region,
       Sdelete_and_extract_region, 2, 2, 0,
       doc: /* Delete the text between START and END and return it.  */)
       (Lisp_Object start, Lisp_Object end)
{
  validate_region (&start, &end);
  if (XINT (start) == XINT (end))
    return empty_unibyte_string;
  return del_range_1 (XINT (start), XINT (end), 1, 1);
}

Without going into the details of the macro writing process, let me point out that this macro starts with the word DEFUN. The word DEFUN was chosen since the code serves the same purpose as defun does in Lisp. (The DEFUN C macro is defined in emacs/src/lisp.h.)

The word DEFUN is followed by seven parts inside of parentheses:

The first part is the name given to the function in Lisp, delete-and-extract-region.
The second part is the name of the function in C, Fdelete_and_extract_region. By convention, it starts with ‘F’. Since C does not use hyphens in names, underscores are used instead.
The third part is the name for the C constant structure that records information on this function for internal use. It is the name of the function in C but begins with an ‘S’ instead of an ‘F’.
The fourth and fifth parts specify the minimum and maximum number of arguments the function can have. This function demands exactly 2 arguments.
The sixth part is nearly like the argument that follows the interactive declaration in a function written in Lisp: a letter followed, perhaps, by a prompt. The only difference from the Lisp is when the macro is called with no arguments. Then you write a 0 (which is a ‘null string’), as in this macro.
If you were to specify arguments, you would place them between quotation marks. The C macro for goto-char includes "NGoto char: " in this position to indicate that the function expects a raw prefix, in this case, a numerical location in a buffer, and provides a prompt.
The seventh part is a documentation string, just like the one for a function written in Emacs Lisp. This is written as a C comment. (When you build Emacs, the program lib-src/make-docfile extracts these comments and uses them to make the “real” documentation.)

In a C macro, the formal parameters come next, with a statement of what kind of object they are, followed by what might be called the ‘body’ of the macro. For delete-and-extract-region the ‘body’ consists of the following four lines:

validate_region (&start, &end);
if (XINT (start) == XINT (end))
  return empty_unibyte_string;
return del_range_1 (XINT (start), XINT (end), 1, 1);

The validate_region function checks whether the values passed as the beginning and end of the region are the proper type and are within range. If the beginning and end positions are the same, then return an empty string.

The del_range_1 function actually deletes the text. It is a complex function we will not look into. It updates the buffer and does other things. However, it is worth looking at the two arguments passed to del_range. These are XINT (start) and XINT (end).

As far as the C language is concerned, start and end are two integers that mark the beginning and end of the region to be deleted¹⁰.

In early versions of Emacs, these two numbers were thirty-two bits long, but the code is slowly being generalized to handle other lengths. Three of the available bits are used to specify the type of information; the remaining bits are used as ‘content’.

‘XINT’ is a C macro that extracts the relevant number from the longer collection of bits; the three other bits are discarded.

The command in delete-and-extract-region looks like this:

del_range_1 (XINT (start), XINT (end), 1, 1);

It deletes the region between the beginning position, start, and the ending position, end.

From the point of view of the person writing Lisp, Emacs is all very simple; but hidden underneath is a great deal of complexity to make it all work.

Footnotes

(10)

More precisely, and requiring more expert knowledge to understand, the two integers are of type ‘Lisp_Object’, which can also be a C union instead of an integer type.