cortav: Diff

Differences From Artifact [4ed3bc7476]:

File cortav.ct — part of check-in [9215a9c850] at 2021-12-27 13:20:36 on branch trunk — add beginnings of groff renderer, document more planned syntaxes (user: lexi, size: 85225) [annotate] [blame] [check-ins using]

To Artifact [7b2c420807]:

File cortav.ct — part of check-in [7ba2577283] at 2021-12-29 12:19:20 on branch trunk — continue iterating on groff renderer; add headings, basic formatting, beginnings of a footnote and link system, colors (user: lexi, size: 89410) [annotate] [blame] [check-ins using]

* [*paragraphs] ([`.] [` ¶] [`❡]): a paragraph is a simple block of text. the period control sequence is only necessary if the paragraph text starts with text that would be interpreted as a control sequence otherwise
* newlines [` \\]: inserts a line break into previous paragraph and attaches the following text. mostly useful for poetry or lyrics
* [*section starts] [`#] [`§]: starts a new section. all sections have an associated depth, determined by the number of sequence repetitions (e.g. "###" indicates depth-three"). sections may have headers and IDs; both are optional. IDs, if present, are a sequence of raw-text immediately following the hash marks. if the line has one or more space character followed by styled-text, a header will be attached. the character immediately following the hashes can specify a particular type of section. e.g.:
** [`#] is a simple section break.
** [`#anchor] opens a new section with the ID [`anchor].
** [`# header] opens a new section with the title "header".
** [`#anchor header] opens a new section with both the ID [`anchor] and the title "header".
** [`#>conversation] opens a blockquote section named [`conversation] without a header.
** [`#&id mime] opens a new inline object [`id] of type [`mime]. useful for embedding SVGs. the ID and mime type must be specified.
* [*nonprinting sections] ([`^]): sometimes, you'll want to create a namespace without actually adding a visible new section to the document. you can achieve this by creating a [!nonprinting section] and defining resources within it. nonprinting sections can also be used to store comments, notes, or other information that is useful to have in the source file without it becoming a part of the output
* [*resource] ([`@]): defines a [!resource]. a resource is an file or object that exists outside of the document but which will be included in the document somehow. common examples of resources include images, videos, iframes, or headers/footers. see [>rsrc resources] for more information.
* [*lists] ([`*] [`:]): these are like paragraph nodes, but list nodes that occur next to each other will be arranged so as to show they compose a sequence. depth is determined by the number of stars/colons. like headers, a list entry may have an ID that can be used to refer back to it; it is indicated in the same way. if colons are used, this indicates that the order of the items is signifiant. :-lists and *-lists may be intermixed; however, note than only the last character in the sequence actually controls the depth type.
* [*directives] ([`%]): a directive issues a hint to the renderer in the form of an arbitrary string. directives are normally ignored if they are not supported, but you may cause a warning to be emitted where the directive is not supported with [`%!] or mark a directive critical with [`%!!] so that rendering will entirely fail if it cannot be parsed.
* [*comments] ([`%%]): a comment is a line of text that is simply ignored by the renderer.
* [*asides] ([`!]): indicates text that diverges from the narrative, and can be skipped without interrupting it. think of it like block-level parentheses. asides which follow one another are merged as paragraphs of the same aside, usually represented as a sort of box. if the first line of an aside contains a colon, the stretch of styled-text from the beginning to the aside to the colon will be treated as a "type heading," e.g. "Warning:"
* [*code] ([`~~~]): a line beginning with ~~~ begins or terminates a block of code. code blocks are by default not parsed, but parsing can be activated by preceding the code block with an [`%[*expand]] directive. the opening line should look like one of the below
................................................................................
* strikeout {obj ~|styled-text}: indicates that its text should be struck through or otherwise indicated for deletion
* insertion {obj +|styled-text}: indicates that its text should be indicated as a new addition to the text body.
** consider using a macro definition [`\edit: [~[#1]][+[#2]]] to save typing if you are doing editing work
* link \[>[!ref] [!styled-text]\]: produces a hyperlink or cross-reference denoted by [$ref], which may be either a URL specified with a reference or the name of an object like an image or section elsewhere in the document. the unicode characters [`→] and [`🔗] can also be used instead of [`>] to denote a link.
* footnote {span ^|ref|[$styled-text]}: annotates the text with a defined footnote. in interactive output media [`\[^citations.qtheo Quantum Theosophy: A Neophyte's Catechism]] will insert a link with the next [`Quantum Theosophy: A Neophyte's Catechism] that, when clicked, causes a footnote to pop up on the screen. for static output media, the text will simply have a superscript integer after it denoting where the footnote is to be found.
* superscript {obj '|[$styled-text]}
* subscript {obj ,|[$styled-text]}
* raw {obj \\ |[$raw-text]}: causes all characters within to be interpreted literally, without expansion. the only special characters are square brackets, which must have a matching closing bracket
* raw literal \[$\\[!raw-text]\]: shorthand for [\[$[\…]]]
* macro [`\{[!name] [!arguments]\}]: invokes a [>ex.mac macro], specified with a reference
* argument {obj #|var}: in macros only, inserts the [$var]-th argument. otherwise, inserts a context variable provided by the renderer.
* raw argument {obj ##|var}: like above, but does not evaluate [$var].
* term {obj &|name}, {span &|name|[$expansion]}: quotes a defined term with a link to its definition, optionally with a custom expansion of the term (for instance, to expand the first use of an acronym)
* inline image {obj &@|name}: shows a small image or other object inline. the unicode character [`🖼] can also be used instead of [`&@].
* unicode codepoint {obj U+|hex-integer}: inserts an arbitrary UCS codepoint in the output, specified by [$hex-integer]. lowercase [`u] is also legal.
................................................................................
<p>here is the resource in span context: <span class="res-smiley"></span></p>
<p>and here it is in block context:</p>
<div class=".res-smiley"></div>
~~~

note that empty elements with CSS classes are used in the output, to avoid repeating long image definitions (especially base64 inline encoded ones!)



























































### supported parameters
* [`src] (all): specifies where to find the file, what it is, and how to embed it. each line of [`src] should consist of three whitespace-separated words: embed method, MIME type, and URI.
** embed methods
*** [`local]: loads the resource at build time and embeds it into the output file. not all implementations may allow loading remote network resources at build time.
*** [`remote]: only embeds a reference to the location of the resource. use this for e.g. live iframes, dynamic images, or images hosted by a CDN.
*** [`auto]: embeds a reference in file formats where that's practical, and use a remote reference otherwise.
** MIME types: which file types are supported depends on the individual implementation and renderer backend; additionally, extensions can add support for extra types. MIME-types that have no available handler will, where possible, result in an attachment that can be extracted by the user, usually by clicking on a link. however, the following should be usable with all compliant implementations
................................................................................
*** [`video/*] (interactive outputs only)
*** [`image/svg+xml] is handled specially for HTML files, and may or may not be compatible with other renderer backends.
*** [`font/*] can be used with the HTML backend to reference a web font
*** [`font/woff2] can be used with the HTML backend to reference a web font
*** [`text/plain] (will be inserted as a preformatted text block)
*** [`text/css] (can be used when producing HTML files to link in an extra stylesheet, either by embedding it or referencing it from the header)
*** [`text/x-cortav] (will be parsed and inserted as a formatted text block; context variables can be passed to the file with [`ctx.[$var]] parameters)


*** any MIME-type that matches the type of file being generated by the renderer can be used to include a block of data that will be passed directly to the renderer.
** URI types: additional URI types can be added by extensions or different implementations, but every compliant implementation must support these URIs.
*** [`http], [`https]: accesses resources over HTTP. add a [`file] fallback if possible for the benefit of renderers/viewers that do not have internet access abilities.
*** [`file]: references local files. absolute paths should begin [`file:/]; the slash should be omitted for relative paths. note that this doesn't have quite the same meaning as in HTML -- [`file] can (and usually should be) used with HTML outputs to refer to resources that reside on the same server. a cortav URI of [`file:/etc/passwd] will actually result in the link [`/etc/passwd], not [`file:///etc/passwd] when converted to HTML. generally, you only should use [`http] when you're referring to a resource that exists on a different domain.
*** [`name]: a special URI used generally for referencing resources that are already installed on a target system and do not need to be embedded or linked, the name and type are enough for a renderer on another machine to locate the correct resource. this is useful mostly for [>fonts fonts], where it's more typical to refer to fonts that are installed on your system rather than providing paths to font files.
*** [`gemini]: accesses resources over the gemini protocol. currently you should really only use this for [`local] resources unless you're using the gemtext renderer backend, since nothing but gemini browsers are liable to support this protocol.
* [`desc]: supplies a narrative description of the resources, for use as an "alt-text" when the image cannot be loaded and for screenreaders.
* [`detail]: supplies extra narrative commentary that is displayed contextually, e.g. when the user hovers her mouse cursor over the embedded object.

note that in certain cases, full MIME types do not need to be used. say you're defining a font with the [`name] URI -- you can't necessary know what file type the system fonts on another computer are going to be. in this case, you can just write [`font] instead of [`font/ttf] or [`font/woff2] or similar. all cortav needs to know in this case is what abstract kind of object you're referencing.


##ctxvar context variables
context variables are provided so that cortav renderers can process templates. certain context variables are provided for by the standard. you can test for the presence of a context variable with the directive [`%[*when] ctx [$var]]. context variables are accessed with the [` \[#[$name]\]] span.

* {def cortav.file} the name of the file currently being rendered
* {def cortav.path} the absolute path of the file currently being rendered
................................................................................
	font-family: "fontdef-sans";
	src: local("Alegreya Sans"),
		local("Open Sans"),
		local("sans-serif");
}
~~~

there are two things that aren't super clear from the CSS, however. notice how we used [`auto] on a couple of those specs? this means it's up to the renderer to decide whether to link or embed the font. for html, a font specified by name can't really be embedded, but for some file formats, it can be. [`auto] lets us produce valid HTML while still taking advantage of font embedding in other formats.

now that we have our font families defined, we can use their identifiers with the [`%[*font]] directive to control the font stack. the first thing we need to do is push a new font context. there's two ways we can do this:
	fnd: [`%[*font] [#1]]
* {fnd dup} will create a copy of the current font context, allowing us to make some changes and then revert later with the {fnd pop} command. this isn't useful in our case, however, because right now the stack is empty; there's nothing to duplicate.
* {fnd new} will create a brand new empty context for us to work with and push it to the stack. this can also be used to temporarily revert to the system default fonts, and then switch back with {fnd pop}.
* {fnd set} changes one or more entries in the current font context. it can take a space-separated list of arguments in the form [`[$entry]=[$font-id]]. the supported entries are:
** [`body]: the fallback font. if only this is set in a given font context, it will be used for everything
................................................................................
~~~cortav
%% let's pretend we've also defined the fonts 'title', 'cursive', and 'thin'

%font new
%font set body=sans header=serif
%font dup
%font header=title
# lorem ipsum dolor
%font pop

%% we've now set up a default font context, created a new context for the title of the
%% document, and then popped it back off after the title was inserted so that our
%% first font context is active again. everything after that last '%font pop' will
%% be printed in sans, except for headers, which will be printed in 'serif'

lorem ipsum dolor sit amet, sed consectetur apiscing elit…

%font dup
%font set body=cursive
> sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
> Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
%font pop

%% above we created a blockquote whose text is printed in a cursive font; afterwards,
%% we simply remove this new context—

and everything is back the way it was at "lorem ipsum"

%% the font mechanism is at its most powerful when used with multiline macros:

	cursive-quote: %font dup
		%font set body=cursive
		> [#1]
		%font pop

%% now, whenever we want a block with a cursive body, we can simply invoke

&$cursive-quote Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident

%% without affecting the overall font context. in fact, since 'cursive-quote' creates
%% its context using 'dup', it would import all font specifications besides 'body'
%% from the environment it is invoked in
~~~

you may have noticed the rather odd bit at the end of our font definition, with the [`dit] URI. the reasons for this are tragic. groff, while delightful, has a thoroughly antiquated understanding of fonts, and doesn't support normal font formats like truetype. groff ships with a limited number of fonts in its own format, identified by obscurantist letter code ([`HBI] is "Helvetica Bold Italic", for instance) and lacking normal metadata. for this reason, you'll have to tell cortav how you want your fonts translated.
................................................................................
+ encoding-data-ucs-url | where to download UnicodeData.txt from, if encoding-data-ucs is not changed. defaults to the unicode consortium website

#### deterministic builds
some operating systems, like NixOS, require packages that can be built in reproducible ways. this implies that all data, all [!state] that goes into producing a package needs to be accounted for before the build proper begins. the [`cortav] build process needs to be slightly altered to support such a build process.

while the cortav specification itself does not concern itself with matters like whether a particular character is a numeral or a letter, optimal typesetting in some cases requires such information. this is the case for the equation span- and block-types, which need to be able to distinguish between literals, variables, and mathematical symbols in [^alas-math the equations they format]. the ASCII charset is small enough that exhaustive character class information can be manually hardcoded into a cortav implementation, the various encodings of Unicode most certainly are not.

	alas-math: sadly, i was not at any point consulted by any of the generations of mathematicians stretching back into antiquity who devised their notations without any regard for machine-readability. [!for shame!]

for this reason, the reference implementation of cortav embeds the file [`UnicodeData.txt], a database maintained by the Unicode Consortium. this is a rather large file that updates for each new Unicode version, so it is downloaded as part of the build process. to build on NixOS, you'll need to either disable the features that rely on this database (not recommended), or download the database yourself and tell the build script where to find it. this is the approach the official nix expression will take when i can be bothered to write it. see the examples below for how to conduct a deterministic build

~~~ deterministic build with unicode database [sh] ~~~
/src $ mkdir cortav && cd cortav
/src/cortav $ fossil clone https://c.hale.su/cortav .fossil && fossil open .fossil
/src/cortav $ curl https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt > /tmp/unicode.txt
................................................................................
~~~ [sh] deterministic build [!without] unicode database ~~~
/src $ mkdir cortav && cd cortav
/src/cortav $ fossil clone https://c.hale.su/cortav .fossil && fossil open .fossil
/src/cortav $ make build/cortav encoding-data=
~~~

! while most of the data used is taken directly from UnicodeData.txt, the database generated by [`tools/ucs.lua] splices in some extra character information before generating a database. this is partly because certain characters may not be classified in a useful way and need to be manually overwritten. however, the reference implementation also seeks to provide accurate data for certain character sets that are not part of unicode proper and can be expressed in UTF only through its private use areas.
! currently, only the [>corran Corran] script is currently supported in this fashion, but i intend to add [>tengwar Tengwar] as well. if there is a con-script or any other informally encoded script you would like supported by the reference implementation, please open an issue.

[*do note] that no cortav implementation needs to concern itself with character class data. this functionality is provided in the reference implementation strictly as an (optional) extension to the spec to improve usability, not as a normative requirement.

	corran: http://ʞ.cc/fic/spirals/society
	tengwar: https://en.wikipedia.org/wiki/Tengwar

###refimpl-switches switches
................................................................................
* [`@[*fg]]: resolves to a color expression denoting the selected foreground color. equivalent to [`[*tone](1)]
* [`@[*bg]]: resolves to a color expression denoting the selected background color. equivalent to [`[*tone](0)]
* [`@[*tone]\[/[$alpha]\]([$fac] \[[$shift] \[[$saturate]\]\] )]: resolves to a color expression. [$fac] is a floating-point value scaling from the background color to the foreground color. [$shift] is a value in degrees controlling how far the hue will shift relative to the accent. [$saturate] is a floating-point value controlling how satured the color is.

###refimpl-rend-groff groff
the [`groff] backend produces a text file suitable for supplying to a [`groff] compiler. [`groff] is the GNU implementation of a venerable typesetting system from the early days of UNIX

as a convenience, the groff backend supports two modes of operation: it can write a [`groff] file directly to disk, or it can automatically launch a [`groff] process with the appropriate command line options and environment variables. this second mode is recommended unless you're rendering very large files to multiple formats, as [`groff] invocation is nontrivial and it's best to let the renderer handle that for you.







####refimpl-rend-groff-modes modes
[`groff] supports the following modes:

* string [`groff:annotate] controls how footnotes will be handled.
** [`footnote] places footnotes at the end of the page they are referenced on. if the same footnote is used on multiple pages, it will be duplicated on each.
** [`secnote] places footnotes at the end of each section. footnotes used in multiple sections will be duplicated for each
** [`endnote] places all footnotes at the end of the rendered document.
* string [`groff:dev] names an output device (such as [`dvi] or [`pdf]). if this mode is present, [`groff] will be automatically invoked
* string [`groff:title-page] takes an identifier that names a section. this section will be treated as the title page for the document.


### directives
* [`%[*pragma] title-page [$id]] sets the title page to section [$id]. this causes it to be specially formatted, with a large, centered title and subtitle.

### quirks
if the [`toc] extension is active but [`%[*toc]] directive is provided, the table of contents will be given its own section at the start of the document (after the title page, if any).








<
<







 







|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







>
>







|

|







 







|







 







|







|



|
|



|

|










|







 







|







 







|







 







|
>
>
>
>
>
>








<

>

* [*paragraphs] ([`.] [` ¶] [`❡]): a paragraph is a simple block of text. the period control sequence is only necessary if the paragraph text starts with text that would be interpreted as a control sequence otherwise
* newlines [` \\]: inserts a line break into previous paragraph and attaches the following text. mostly useful for poetry or lyrics
* [*section starts] [`#] [`§]: starts a new section. all sections have an associated depth, determined by the number of sequence repetitions (e.g. "###" indicates depth-three"). sections may have headers and IDs; both are optional. IDs, if present, are a sequence of raw-text immediately following the hash marks. if the line has one or more space character followed by styled-text, a header will be attached. the character immediately following the hashes can specify a particular type of section. e.g.:
** [`#] is a simple section break.
** [`#anchor] opens a new section with the ID [`anchor].
** [`# header] opens a new section with the title "header".
** [`#anchor header] opens a new section with both the ID [`anchor] and the title "header".


* [*nonprinting sections] ([`^]): sometimes, you'll want to create a namespace without actually adding a visible new section to the document. you can achieve this by creating a [!nonprinting section] and defining resources within it. nonprinting sections can also be used to store comments, notes, or other information that is useful to have in the source file without it becoming a part of the output
* [*resource] ([`@]): defines a [!resource]. a resource is an file or object that exists outside of the document but which will be included in the document somehow. common examples of resources include images, videos, iframes, or headers/footers. see [>rsrc resources] for more information.
* [*lists] ([`*] [`:]): these are like paragraph nodes, but list nodes that occur next to each other will be arranged so as to show they compose a sequence. depth is determined by the number of stars/colons. like headers, a list entry may have an ID that can be used to refer back to it; it is indicated in the same way. if colons are used, this indicates that the order of the items is signifiant. :-lists and *-lists may be intermixed; however, note than only the last character in the sequence actually controls the depth type.
* [*directives] ([`%]): a directive issues a hint to the renderer in the form of an arbitrary string. directives are normally ignored if they are not supported, but you may cause a warning to be emitted where the directive is not supported with [`%!] or mark a directive critical with [`%!!] so that rendering will entirely fail if it cannot be parsed.
* [*comments] ([`%%]): a comment is a line of text that is simply ignored by the renderer.
* [*asides] ([`!]): indicates text that diverges from the narrative, and can be skipped without interrupting it. think of it like block-level parentheses. asides which follow one another are merged as paragraphs of the same aside, usually represented as a sort of box. if the first line of an aside contains a colon, the stretch of styled-text from the beginning to the aside to the colon will be treated as a "type heading," e.g. "Warning:"
* [*code] ([`~~~]): a line beginning with ~~~ begins or terminates a block of code. code blocks are by default not parsed, but parsing can be activated by preceding the code block with an [`%[*expand]] directive. the opening line should look like one of the below
................................................................................
* strikeout {obj ~|styled-text}: indicates that its text should be struck through or otherwise indicated for deletion
* insertion {obj +|styled-text}: indicates that its text should be indicated as a new addition to the text body.
** consider using a macro definition [`\edit: [~[#1]][+[#2]]] to save typing if you are doing editing work
* link \[>[!ref] [!styled-text]\]: produces a hyperlink or cross-reference denoted by [$ref], which may be either a URL specified with a reference or the name of an object like an image or section elsewhere in the document. the unicode characters [`→] and [`🔗] can also be used instead of [`>] to denote a link.
* footnote {span ^|ref|[$styled-text]}: annotates the text with a defined footnote. in interactive output media [`\[^citations.qtheo Quantum Theosophy: A Neophyte's Catechism]] will insert a link with the next [`Quantum Theosophy: A Neophyte's Catechism] that, when clicked, causes a footnote to pop up on the screen. for static output media, the text will simply have a superscript integer after it denoting where the footnote is to be found.
* superscript {obj '|[$styled-text]}
* subscript {obj ,|[$styled-text]}
* raw {obj \\ |[$raw-text]}: causes all characters within to be interpreted literally, without expansion. the only special characters are square brackets, which must have a matching closing bracket, and backslashes.
* raw literal \[$\\[!raw-text]\]: shorthand for [\[$[\…]]]
* macro [`\{[!name] [!arguments]\}]: invokes a [>ex.mac macro], specified with a reference
* argument {obj #|var}: in macros only, inserts the [$var]-th argument. otherwise, inserts a context variable provided by the renderer.
* raw argument {obj ##|var}: like above, but does not evaluate [$var].
* term {obj &|name}, {span &|name|[$expansion]}: quotes a defined term with a link to its definition, optionally with a custom expansion of the term (for instance, to expand the first use of an acronym)
* inline image {obj &@|name}: shows a small image or other object inline. the unicode character [`🖼] can also be used instead of [`&@].
* unicode codepoint {obj U+|hex-integer}: inserts an arbitrary UCS codepoint in the output, specified by [$hex-integer]. lowercase [`u] is also legal.
................................................................................
<p>here is the resource in span context: <span class="res-smiley"></span></p>
<p>and here it is in block context:</p>
<div class=".res-smiley"></div>
~~~

note that empty elements with CSS classes are used in the output, to avoid repeating long image definitions (especially base64 inline encoded ones!)

inline resources are defined a bit differently:

~~~cortav
@smiling-man-business-card text/plain {
	THE SMILING MAN  | tel. 0-Ω00-666█
	if you can read this | email: nameless@smiles.gov
	it is already too late | address: right behind you
}
@smiling-man-business-card image/png;base64 {
	%% incomprehensible gibbering redacted
}
~~~

for an inline resource, the identifier is followed by a MIME type and an opening bracket. the opening bracket may be any of the characters [`\{][`\[][`(][`<], and can optionally be followed by additional characters to help disambiguate the closing bracket. the closing bracket is determined by "flipping" the opening bracket, producing bracket pairs like the following:
* [`\{:][`:}]
* [`<!--] [`--!>]
* [`(*<][`>*)]
* [`<>][`<>] [!(disables nesting!)]
if the open and closing brackets are distinguishable, they will nest appropriately, meaning that [`{][`}] alone is very likely to be a safe choice to escape a syntactically correct C program (that doesn't abuse macros too badly). brackets are searched for during parsing; encoded resources are not decoded until a later stage, so a closing bracket character in a base64-encoded text file cannot break out of its escaping.

as a convenience, if the first line of the resource definition begins with a single tab, one tab will be dropped from every following line in order to allow legible indentation. similarly, if an opening bracket is followed immediately by a newline, this newline is discarded.

text within a resource definition body is not expanded unless the resource definition is preceded with an [`%[*expand]] directive. if an expand directive is found, the MIME type will be used to try and determine an appropriate type of formatting, potentially invoking a separate renderer. for example, [`text/html] will invoke the [`html] backend, and [`application/x-troff] will invoke the [`groff] backend. if no suitable renderer is available, expansions will generate only plain text.

two suffixes are accepted: [`;base64] and [`;hex]. the former will decode the presented strings using the base64 algorithm to obtain the resources data; the second will ignore all characters but ASCII hexadecimal digits and derive the resource data byte-by-byte by reading in hexadecimal pairs. for instance, the following sections are equivalent:

~~~
@propaganda text/plain {
	WORLDGOV SAYS
	“don't waste time with unproductive thoughts
	 your wages will be docked accordingly”
}
~~~
~~~
@propaganda text/plain;hex {
	574f 524c 4447 4f56 2053 4159 530a e280 9c64 6f6e 2774 2077 6173
	7465 2074 696d 6520 7769 7468 2075 6e70 726f 6475 6374 6976 6520
	7468 6f75 6768 7473 0a20 796f 7572 2077 6167 6573 2077 696c 6c20
	6265 2064 6f63 6b65 6420 6163 636f 7264 696e 676c 79e2 809d 0a
}
~~~
~~~
@propaganda text/plain;base64 {
	V09STERHT1YgU0FZUwrigJxkb24ndCB3YXN0ZSB0aW1lIHdpdGggdW5wcm9kdWN0aXZlIHRob3Vn
	aHRzCiB5b3VyIHdhZ2VzIHdpbGwgYmUgZG9ja2VkIGFjY29yZGluZ2x54oCdCg==
}
~~~

inline resources can also be (ab)used for multiline macros:
~~~
@def text/x-cortav {
	* [*[#1]] [!([#2])
	*: [#3]
}
&def nuclear bunker|n|that which will not protect you from the Smiling Man
~~~
to make this usage simpler, resources with a type of [`text/x-cortav] can omit the MIME type field.

### supported parameters
* [`src] (all): specifies where to find the file, what it is, and how to embed it. each line of [`src] should consist of three whitespace-separated words: embed method, MIME type, and URI.
** embed methods
*** [`local]: loads the resource at build time and embeds it into the output file. not all implementations may allow loading remote network resources at build time.
*** [`remote]: only embeds a reference to the location of the resource. use this for e.g. live iframes, dynamic images, or images hosted by a CDN.
*** [`auto]: embeds a reference in file formats where that's practical, and use a remote reference otherwise.
** MIME types: which file types are supported depends on the individual implementation and renderer backend; additionally, extensions can add support for extra types. MIME-types that have no available handler will, where possible, result in an attachment that can be extracted by the user, usually by clicking on a link. however, the following should be usable with all compliant implementations
................................................................................
*** [`video/*] (interactive outputs only)
*** [`image/svg+xml] is handled specially for HTML files, and may or may not be compatible with other renderer backends.
*** [`font/*] can be used with the HTML backend to reference a web font
*** [`font/woff2] can be used with the HTML backend to reference a web font
*** [`text/plain] (will be inserted as a preformatted text block)
*** [`text/css] (can be used when producing HTML files to link in an extra stylesheet, either by embedding it or referencing it from the header)
*** [`text/x-cortav] (will be parsed and inserted as a formatted text block; context variables can be passed to the file with [`ctx.[$var]] parameters)
*** [`application/x-troff] can be used to supply sections of text written in raw [`groff] syntax. these are ignored by other renderers.
*** [`text/html] can be used to supply sections of text written in raw HTML.
*** any MIME-type that matches the type of file being generated by the renderer can be used to include a block of data that will be passed directly to the renderer.
** URI types: additional URI types can be added by extensions or different implementations, but every compliant implementation must support these URIs.
*** [`http], [`https]: accesses resources over HTTP. add a [`file] fallback if possible for the benefit of renderers/viewers that do not have internet access abilities.
*** [`file]: references local files. absolute paths should begin [`file:/]; the slash should be omitted for relative paths. note that this doesn't have quite the same meaning as in HTML -- [`file] can (and usually should be) used with HTML outputs to refer to resources that reside on the same server. a cortav URI of [`file:/etc/passwd] will actually result in the link [`/etc/passwd], not [`file:///etc/passwd] when converted to HTML. generally, you only should use [`http] when you're referring to a resource that exists on a different domain.
*** [`name]: a special URI used generally for referencing resources that are already installed on a target system and do not need to be embedded or linked, the name and type are enough for a renderer on another machine to locate the correct resource. this is useful mostly for [>fonts fonts], where it's more typical to refer to fonts that are installed on your system rather than providing paths to font files.
*** [`gemini]: accesses resources over the gemini protocol. currently you should really only use this for [`local] resources unless you're using the gemtext renderer backend, since nothing but gemini browsers are liable to support this protocol.
* [`desc]: supplies a narrative description of the resources, for use as an "alt-text" when the image cannot be loaded and for screenreaders.
* [`detail]: supplies extra narrative commentary that is displayed contextually, e.g. when the user hovers her mouse cursor over the embedded object. also used for [`desc] if [`desc] is not supplied.

note that in certain cases, full MIME types do not need to be used. say you're defining a font with the [`name] URI -- you can't necessary know what file type the system fonts on another computer are going to be. in this case, you can just write [`font] instead of [`font/ttf] or [`font/woff2] or similar. all cortav needs to know in this case is what abstract kind of object you're referencing. [`groff] fonts (referenced with the [`dit] URI) don't have a specific MIME type either.


##ctxvar context variables
context variables are provided so that cortav renderers can process templates. certain context variables are provided for by the standard. you can test for the presence of a context variable with the directive [`%[*when] ctx [$var]]. context variables are accessed with the [` \[#[$name]\]] span.

* {def cortav.file} the name of the file currently being rendered
* {def cortav.path} the absolute path of the file currently being rendered
................................................................................
	font-family: "fontdef-sans";
	src: local("Alegreya Sans"),
		local("Open Sans"),
		local("sans-serif");
}
~~~

there are two things that aren't super clear from the CSS, however. notice how we used [`auto] on a couple of those specs? this means it's up to the renderer to decide whether to link or embed the font. in HTML, a font specified by name can't really be embedded, but for some file formats, it can be. [`auto] lets us produce valid HTML while still taking advantage of font embedding in other formats.

now that we have our font families defined, we can use their identifiers with the [`%[*font]] directive to control the font stack. the first thing we need to do is push a new font context. there's two ways we can do this:
	fnd: [`%[*font] [#1]]
* {fnd dup} will create a copy of the current font context, allowing us to make some changes and then revert later with the {fnd pop} command. this isn't useful in our case, however, because right now the stack is empty; there's nothing to duplicate.
* {fnd new} will create a brand new empty context for us to work with and push it to the stack. this can also be used to temporarily revert to the system default fonts, and then switch back with {fnd pop}.
* {fnd set} changes one or more entries in the current font context. it can take a space-separated list of arguments in the form [`[$entry]=[$font-id]]. the supported entries are:
** [`body]: the fallback font. if only this is set in a given font context, it will be used for everything
................................................................................
~~~cortav
%% let's pretend we've also defined the fonts 'title', 'cursive', and 'thin'

%font new
%font set body=sans header=serif
%font dup
%font header=title
# WorldGov  announcement
%font pop

%% we've now set up a default font context, created a new context for the title of the
%% document, and then popped it back off after the title was inserted so that our
%% first font context is active again. everything after that last '%font pop' will
%% be printed in sans, except for headers, which will be printed in 'serif'

WorldGov would like to congratulate 2274's Employee of the Year, [*The Smiling Man]! The Smiling Man had a few words of encouragement for the weary proles of the world when he graciously accepted his award at this year's ceremonial bloodletting:

%font dup
%font set body=cursive
> It is very important for you to understand that your dreams are the intellectual property of the WorldGov organization.
> Laborers who fail more than one duplicity check per workcycle will receive extra Pit Time.
%font pop

%% above we created a blockquote whose text is printed in a cursive font; afterwards,
%% we simply remove this new context, and everything is back the way it was at "WorldGov would like"

In addition to his 227th consecutive Employee of the Year Award, The Smiling Man has been nominated for a WorldGov Lifetime Achievement Award by the Hyperion Entity in recognition of his exceptional leadership in the Department Which Has No Name. Chief Ritual Officer Mr. Winthrop had this to say:

%% the font mechanism is at its most powerful when used with multiline macros:

	cursive-quote: %font dup
		%font set body=cursive
		> [#1]
		%font pop

%% now, whenever we want a block with a cursive body, we can simply invoke

&$cursive-quote A sea of blood yet lies between us and the Destination. It won't impede me. And I'm so very proud to say that, apparently, it won't impede the Smiling Man either, if the Svalbard contract was any indication! [pause for laughter]

%% without affecting the overall font context. in fact, since 'cursive-quote' creates
%% its context using 'dup', it would import all font specifications besides 'body'
%% from the environment it is invoked in
~~~

you may have noticed the rather odd bit at the end of our font definition, with the [`dit] URI. the reasons for this are tragic. groff, while delightful, has a thoroughly antiquated understanding of fonts, and doesn't support normal font formats like truetype. groff ships with a limited number of fonts in its own format, identified by obscurantist letter code ([`HBI] is "Helvetica Bold Italic", for instance) and lacking normal metadata. for this reason, you'll have to tell cortav how you want your fonts translated.
................................................................................
+ encoding-data-ucs-url | where to download UnicodeData.txt from, if encoding-data-ucs is not changed. defaults to the unicode consortium website

#### deterministic builds
some operating systems, like NixOS, require packages that can be built in reproducible ways. this implies that all data, all [!state] that goes into producing a package needs to be accounted for before the build proper begins. the [`cortav] build process needs to be slightly altered to support such a build process.

while the cortav specification itself does not concern itself with matters like whether a particular character is a numeral or a letter, optimal typesetting in some cases requires such information. this is the case for the equation span- and block-types, which need to be able to distinguish between literals, variables, and mathematical symbols in [^alas-math the equations they format]. the ASCII charset is small enough that exhaustive character class information can be manually hardcoded into a cortav implementation, the various encodings of Unicode most certainly are not.

	alas-math: sadly, i was not at any point consulted by any of the generations of mathematicians stretching back into antiquity, who as a direct consequence devised their notations without [*any] regard for machine-readability. [!for shame!]

for this reason, the reference implementation of cortav embeds the file [`UnicodeData.txt], a database maintained by the Unicode Consortium. this is a rather large file that updates for each new Unicode version, so it is downloaded as part of the build process. to build on NixOS, you'll need to either disable the features that rely on this database (not recommended), or download the database yourself and tell the build script where to find it. this is the approach the official nix expression will take when i can be bothered to write it. see the examples below for how to conduct a deterministic build

~~~ deterministic build with unicode database [sh] ~~~
/src $ mkdir cortav && cd cortav
/src/cortav $ fossil clone https://c.hale.su/cortav .fossil && fossil open .fossil
/src/cortav $ curl https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt > /tmp/unicode.txt
................................................................................
~~~ [sh] deterministic build [!without] unicode database ~~~
/src $ mkdir cortav && cd cortav
/src/cortav $ fossil clone https://c.hale.su/cortav .fossil && fossil open .fossil
/src/cortav $ make build/cortav encoding-data=
~~~

! while most of the data used is taken directly from UnicodeData.txt, the database generated by [`tools/ucs.lua] splices in some extra character information before generating a database. this is partly because certain characters may not be classified in a useful way and need to be manually overwritten. however, the reference implementation also seeks to provide accurate data for certain character sets that are not part of unicode proper and can be expressed in UTF only through its private use areas.
! currently, only the [>corran Corran] script is supported in this fashion, but i intend to add [>tengwar Tengwar] as well. if there is a con-script or any other informally encoded script you would like supported by the reference implementation, please open an issue.

[*do note] that no cortav implementation needs to concern itself with character class data. this functionality is provided in the reference implementation strictly as an (optional) extension to the spec to improve usability, not as a normative requirement.

	corran: http://ʞ.cc/fic/spirals/society
	tengwar: https://en.wikipedia.org/wiki/Tengwar

###refimpl-switches switches
................................................................................
* [`@[*fg]]: resolves to a color expression denoting the selected foreground color. equivalent to [`[*tone](1)]
* [`@[*bg]]: resolves to a color expression denoting the selected background color. equivalent to [`[*tone](0)]
* [`@[*tone]\[/[$alpha]\]([$fac] \[[$shift] \[[$saturate]\]\] )]: resolves to a color expression. [$fac] is a floating-point value scaling from the background color to the foreground color. [$shift] is a value in degrees controlling how far the hue will shift relative to the accent. [$saturate] is a floating-point value controlling how satured the color is.

###refimpl-rend-groff groff
the [`groff] backend produces a text file suitable for supplying to a [`groff] compiler. [`groff] is the GNU implementation of a venerable typesetting system from the early days of UNIX

you can produce a final output directly by piping from the [`cortav] driver into [`groff]. if your document uses an encoding other than ASCII, you'll need to notify [`groff] of this with the [`-K] flag. for example, to render a UTF8 cortav file to PDF:

~~~
$ cortav input.ct -m render:format groff | groff -Tpdf -Kutf8 > output.pdf
~~~

in the future, it is planned to enable the driver to operate groff automatically and directly produce the desired output format when the binary wrapper is in use. doing so securely and hygienically is not possible in pure lua, however.

####refimpl-rend-groff-modes modes
[`groff] supports the following modes:

* string [`groff:annotate] controls how footnotes will be handled.
** [`footnote] places footnotes at the end of the page they are referenced on. if the same footnote is used on multiple pages, it will be duplicated on each.
** [`secnote] places footnotes at the end of each section. footnotes used in multiple sections will be duplicated for each
** [`endnote] places all footnotes at the end of the rendered document.

* string [`groff:title-page] takes an identifier that names a section. this section will be treated as the title page for the document.
* string [`groff:title] sets a specific title to be used in headers instead of relying on header heuristics

### directives
* [`%[*pragma] title-page [$id]] sets the title page to section [$id]. this causes it to be specially formatted, with a large, centered title and subtitle.

### quirks
if the [`toc] extension is active but [`%[*toc]] directive is provided, the table of contents will be given its own section at the start of the document (after the title page, if any).

cortav Diff

Differences From Artifact [4ed3bc7476]:

To Artifact [7b2c420807]: