cortav  Check-in [dd5c3bfcb9]

Overview
Comment:various improvements
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: dd5c3bfcb91d4ab9aa0c1579329e6e81dd5ce90cf53111b078a729b07a1cd405
User & Date: lexi on 2022-09-14 12:22:00
Other Links: manifest | tags
Context
2022-09-30
18:57
"fix" macro bullshit check-in: 84b6c875fb user: lexi tags: trunk
2022-09-14
12:22
various improvements check-in: dd5c3bfcb9 user: lexi tags: trunk
2022-09-10
19:14
tweak docs check-in: 5a78370f0f user: lexi tags: trunk
Changes

Modified cli.lua from [6b3b695c7e] to [9a8697d2cc].

    30     30   	end
    31     31   
    32     32   	if not mode['render:format'] then
    33     33   		error 'what output format should i translate the input to?'
    34     34   	end
    35     35   	if mode['render:format'] == 'none' then return 0 end
    36     36   	if not ct.render[mode['render:format']] then
    37         -		ct.exns.unimpl('output format “%s” unsupported', mode['render:format']):throw()
           37  +		if (not ct.render.html) and not _G.native then
           38  +			-- we may be running uncompiled; otherwise something is seriously broken
           39  +			require('render.' .. mode['render:format'])
           40  +		else
           41  +			ct.exns.unimpl('output format “%s” unsupported', mode['render:format']):throw()
           42  +		end
    38     43   	end
    39     44   	
    40     45   	local render_opts = ss.kmap(function(k,v)
    41     46   		return k:sub(2+#mode['render:format'])
    42     47   	end, ss.kfilter(mode, function(m)
    43     48   		return ss.str.begins(m, mode['render:format']..':')
    44     49   	end))

Modified cortav.ct from [5152d1b5b1] to [607ec9b33c].

            1  +%% this is the reference specification that i used to initially cobble together my
            2  +%% spec for the language i was going to implement, and which i then expanded
            3  +%% as i added features to the reference implementation. it's a mess and it
            4  +%% urgently needs to be rewritten into a more accessible and navigable
            5  +%% document for new users. TODO
            6  +
     1      7   # cortav specification
     2         -[*cortav] is a markup language designed to be a simpler, but more capable alternative to markdown. its name derives from the [>dict Ranuir words] [!cor] "writing" and [!tav] "document", translating to something like "(plain) text document".
     3         -
            8  +[*cortav] is a markup language designed to be a simpler, [!well-specified], and more capable alternative to markdown. its name derives from the [>dict Ranuir words] [!cor] "writing" and [!tav] "document", translating to something like "(plain) text document".
     4      9   	dict: http://ʞ.cc/fic/spirals/glossary
     5     10   
     6     11   the cortav [!format] can be called [!cortavgil], or [!gil cortavi], to differentiate it from the reference implementation [!cortavsir] or [!sir cortavi].
     7     12   
     8     13   %toc
     9     14   
    10     15   ## cortav vs. markdown
................................................................................
    70     75   * level 1: [*styling]. simple inline formatting sequences like strong, emphatic, literal, links, etc. math equation styling need not be supported. paragraphs, lists, and references are the only block elements supported. suitable for styling tweets and other very short content.
    71     76   * level 2: [*layout]. implements header, paragraph, newline, directive, and reference block elements. supports resources at least for remote or attached images. suitable for longer social media posts.
    72     77   * level 3: [*publishing]. implements all currently standardized core behavior, including zero or more extensions.
    73     78   * level 4: [*reference]. implements all currently standardized behavior, including [!all] standardized extensions.
    74     79   
    75     80   ! note that which translators are implemented is not specified by level, as this is, naturally, implementation-dependent. (it would make rather little sense for the blurb parser of a cortav-enabled blog engine to support generating PDFs, after all.) level encodes only which features of the cortav [!language] are supported.
    76     81   
    77         -##onblocks structure
           82  +##onblocks structure (block elements)
    78     83   cortav is based on an HTML-like block model, where a document consists of sections, which are made up of blocks, which may contain a sequence of spans. flows of text are automatically conjoined into spans, and blocks are separated by one or more newlines. this means that, unlike in markdown, a single logical paragraph [*cannot] span multiple ASCII lines. the primary purpose of this was to ensure ease of parsing, but also, both markdown and cortav are supposed to be readable from within a plain text editor. this is the 21st century. every reasonable text editor supports soft word wrap, and if yours doesn't, that's entirely your own damn fault. hard-wrapping lines is incredibly user-hostile, especially to users on mobile devices with small screens. cortav does not allow it.
    79     84   
    80     85   the first character(s) of every line (the "control sequence") indicates the role of that line. if no control sequence is recognized, the line is treated as a paragraph. the currently supported control sequences are listed below. some control sequences have alternate forms, in order to support modern, readable unicode characters as well as plain ascii text.
    81     86   
    82     87   * [*paragraphs] ([`.] [` ¶] [`❡]): a paragraph is a simple block of text. the period control sequence is only necessary if the paragraph text starts with text that would be interpreted as a control sequence otherwise
    83     88   * [*newlines] [` \\]: inserts a line break into previous paragraph and attaches the following text. mostly useful for poetry or lyrics
    84     89   * [*section starts] [`#] [`§]: starts a new section. all sections have an associated depth, determined by the number of sequence repetitions (e.g. "###" indicates depth three). sections may have headers and IDs; both are optional. IDs, if present, are a sequence of raw-text immediately following the hash marks. if the line has one or more space character followed by styled-text, a header will be attached. the character immediately following the hashes can specify a particular type of section. e.g.:
    85     90   ** [`#] is a simple section break.
    86     91   ** [`#anchor] opens a new section with the ID [`anchor].
    87     92   ** [`# header] opens a new section with the title "header".
    88     93   ** [`#anchor header] opens a new section with both the ID [`anchor] and the title "header".
    89         -* [*nonprinting sections] ([`^]): sometimes, you'll want to create a namespace without actually adding a visible new section to the document. you can achieve this by creating a [!nonprinting section] and defining resources within it. nonprinting sections can also be used to store comments, notes, to-dos, or other meta-information that is useful to have in the source file without it becoming a part of the output. nonprinting sections can be used for a sort of "literate markup," where resource and reference definitions can intermingle with human-readable narrative about those definitions.
    90         -* [*resource] ([`@]): defines a [!resource]. a resource is a file or object that is to be embedded in the document somehow. common examples of resources include images, videos, iframes, or headers/footers. resources can be defined inline, or reference external objects. see [>rsrc resources] for more information.
           94  +* [*nonprinting sections] ([`^]): sometimes, you'll want to create a namespace without actually adding a visible new section to the document. you can achieve this by creating a [!nonprinting section] and defining resources within it. nonprinting sections can also be used to store comments, notes, to-dos, or other meta-information that is useful to have in the source file without it becoming a part of the output. nonprinting sections can be used for a sort of "literate markup," where resource and reference definitions can intermingle with human-readable narrative about those definitions. note that unlike comments, nonprinting sections are still parsed and can still affect other sections by means of definitions and pragmata.
           95  +* [*resource] ([`@]): defines a [!resource]. a resource is a file or object that is to be embedded in the document somehow. common examples of resources include images, videos, iframes, or headers/footers. resources can be defined inline, or reference external objects that are read in either at compile-time or view-time. see [>rsrc resources] for more information.
    91     96   * [*lists] ([`*] [`:]): these are like paragraph nodes, but list nodes that occur next to each other will be arranged so as to show they compose a sequence. depth is determined by the number of stars/colons. like headers, a list entry may have an ID that can be used to refer back to it; it is indicated in the same way. if colons are used, this indicates that the order of the items is signifiant. [`:]-lists and [`*]-lists may be intermixed; however, note than only the last character in the sequence actually controls the type. a blank line terminates the current list.
    92     97   * [*directives] ([`%]): a directive issues a hint to the renderer in the form of an arbitrary string. directives are normally ignored if they are not supported, but you may cause a warning to be emitted where the directive is not supported with [`%!] or mark a directive critical with [`%!!] so that rendering will entirely fail if it cannot be obeyed.
    93     98   * [*comments] ([`%%]): a comment is a line of text that is simply ignored by the renderer.
    94     99   * [*asides] ([`!]): indicates text that diverges from the narrative, and can be skipped without interrupting it. think of it like block-level parentheses. asides which follow one another are merged as paragraphs of the same aside, usually represented as a sort of box. if the first line of an aside contains a colon, the stretch of styled-text from the beginning to the aside to the colon will be treated as a "type heading," e.g. "Warning:"
    95    100   * [*code] ([`~~~]): a line beginning with ~~~ begins or terminates a block of code. code blocks are by default not parsed, but parsing can be activated by preceding the code block with an [`%[*expand]] directive. the opening line should look like one of the below
    96    101   ** [`~~~]
    97    102   ** [`~~~ language] (markdown-style shorthand syntax)
................................................................................
   103    108   ** [`~~~ title \[language\] #id ~~~]
   104    109   *[*definition] ([^def-ex tab]): a line [^def-tab-enc beginning with a tab] is a multipurpose metadata syntax. the tab may be followed by an identifier, a colon, and a value string, in which case it opens a new definition; alternatively, a second tab character turns the line into a [*definition continuation], adding the remaining characters as a new line to the definition value on the previous line.  when a new definition is opened on a line immediately following certain kinds of objects, such as resources, embeds, or multiline macro expansions, it attaches key-value metadata to that object. when a definition is not preceded by such an object, an independent [*reference] is created instad.
   105    110   ** a [*reference] is a general mechanism for out-of-line metadata, and references are used in many different ways -- e.g. to specify link destinations, footnote contents, abbreviations, or macro bodies. to ensure that a definition is interpreted as a reference, rather than as metadata for an object, precede it with a blank line.
   106    111   	def-tab-enc: in encodings without tab characters, a definition is opened by a line beginning with two blanks, and continued by a line beginning with four blanks.
   107    112   	def-ex: [*open a new reference]: [`[!\\t][$key]: [$value]]
   108    113   		[*continue a reference]: [`[!\\t\\t][$value]]
   109    114   * [*quotation] ([`<]): a line of the form [`<[$name]> [$quote]] denotes an utterance by [$name].
   110         -* [*blockquote] ([`>]): alternate blockquote syntax. can be nested by repeating the [`>] character.
   111         -* [*subtitle/caption] (["--]): attaches a subtitle to the previous header, or caption to the previous object
          115  +* [*blockquote] ([`>[$id] [$body]]): "inline" blockquote syntax. can be nested by repeating the [`>] character. the [$id] is optional, but the [`>] character must be immediately followed by whitespace if the block is not to have an ID.
          116  +* [*subtitle/caption] (["--]): attaches a subtitle to the previous header, or caption to the previous object. after a blockquote, attaches an attribution line
   112    117   * [*embed] ([`&]): embeds a referenced object. can be used to show images or repeat previously defined objects like lists or tables, optionally with a caption. an embed line can be followed immediately by a sequence of [*definitions] in the same way that resource definitions can, to override resource properties on a per-instance basis. note that only presentation-related properties like [$desc] can be meaningful overridden, as embed does not trigger a re-render of the parse tree; if you want to override e.g. context variables, use a multiline macro invocation instead.
   113    118   ** [`&[$image]] embeds an image or other block-level object. [!image] can be a reference with a url or file path, or it can be an embed section (e.g. for SVG files)
   114    119   ***[`&myimg All that remained of the unfortunate blood magic pageant contestants and audience (police photo)]
   115    120   ** [`&-[$ident] [$styled-text]] embeds a closed disclosure element containing the text of the named object (a nonprinting section or cortav resource should usually be used to store the content; it can also name an image or video, of course). in interactive outputs, this will display as a block which can be clicked on to view the full contents of the referenced object [$ident]; if [$styled-text] is present, it overrides the title of the section you are embedding (if any). in static outputs, the disclosure object will display as an enclosed box with [$styled-text] as the title text
   116    121   *** [`&-ex-a Prosecution Exhibit A (GRAPHIC CONTENT)]
   117    122   ** [`&+[$section] [$styled-text]] is like the above, but the disclosure element is open by default
   118    123   * [`$[$macro] [$arg1]|[$arg2]|[$argn]…] invokes a block-level macro with the supplied arguments, and can be followed by a property override definition list the same way embed and resource lines can. note that while both [`$[$id]] and [`&[$id]] can be used to instantiate resources of type [`text/x.cortav], there is a critical difference: [`$[$id]] renders out the sub-document separately each time it is named, allowing for parameter expansion and for context variables to be overridden for each invocation. by contrast, [`&[$id]] can only insert copies of the same render; no parameters can be passed and context variables will be expanded to their value at the time the resource was defined. only [`&[$id]] can instantiate resources of types other than [`text/x.cortav]. there is also a semantic distinction: resources interpreted as macros are inserted "in-band", on an equal basis with nearby elements; resources interpreted as embeds are set off to clearly indicate that they are a sub-document, and on interactive outputs may have their own independently-scrolling viewport.
................................................................................
   122    127   * [*table cells] ([`+ |]): see [>ex.tab table examples].
   123    128   * [*equations] ([`=]): block-level equations can be inserted with the [`=] sequence
   124    129   * [*cross-references] ([`=>] [`⇒]): inserts a block-level link. has two forms for the sake of gemtext compatibility. [$styled-text] is a descriptive text of the destination. especially useful for menus and gemtext output.
   125    130   ** the cortav syntax is [`=>[$ident] [$styled-text]], where [$ident] is an identifier; links to the same destination as [`\[>[$ident] [$styled-text]\]] would
   126    131   ** the compatibility syntax is [`=> [$uri] [$styled-text]] (note the space before [$uri]!). instead of taking an identifier for an object in the document, it directly accepts a URI. note that this is not formally equivalent to gemtext's link syntax, which also allows paths in place of URIs; [`cortav] does not. the gemtext line ["=> /somewhere] would need to be expressed as ["=> file:/somewhere], and ["=> /somewhere?key=val] as ["http:/somewhere?key=val] (or ["gemini:/somewhere?key=val], if the result is to be served over a gemini server).
   127    132   * [*empty lines] (that is, lines consisting of nothing but whitespace) constitute a [!break], which terminates multiline objects that do not have a dedicated termination sequence, for example lists and asides.
   128    133   
   129         -##onspans styled text
          134  +##onspans styled text (span elements)
   130    135   most blocks contain a sequence of spans. these spans are produced by interpreting a stream of [*styled-text] following the control sequence. styled-text is a sequence of codepoints potentially interspersed with escapes. an escape is formed by an open square bracket [`\[] followed by a [*span control sequence], and arguments for that sequence like more styled-text. escapes can be nested.
   131    136   
   132    137   * strong {obj *|styled-text}: causes its text to stand out from the narrative, generally rendered as bold or a brighter color.
   133    138   * emphatic {obj !|styled-text}: indicates that its text should be spoken with emphasis, generally rendered as italics
   134    139   * custom style {span .|id|[$styled-text]}: applies a specially defined font style. for example, if you have defined [`caution] to mean "demibold italic underline", cortav will try to apply the proper weight and styling within the constraints of the current font to the span [$styled-text]. see the [>fonts-sty fonts section] for more information about this mechanism.
   135    140   * literal {obj `|styled-text}: indicates that its text is a reference to a literal sequence of characters or other discrete token. generally rendered in monospace
   136    141   * variable {obj $|styled-text}: indicates to the reader that its text is a placeholder, rather than a literal representation. generally rendered in italic monospace, ideally of a different color
................................................................................
   145    150   * raw {obj \\ |[$raw-text]}: causes all characters within to be interpreted literally, without expansion. the only special characters are square brackets, which must have a matching closing bracket, and backslashes.
   146    151   * raw literal [` \["[$raw-text]\]]: shorthand for a raw inside a literal, that is ["[`[\\…]]]
   147    152   * macro [` \{[$name] [$arguments]}]: invokes a [>ex.mac macro] inline, specified with a reference. if the result of macro expansion contains newlines, they will be treated as line breaks, rather than paragraph breaks as they would be in a multiline context.
   148    153   * argument {obj #|var}: in macros only, inserts the [$var]-th argument. otherwise, inserts a context variable provided by the renderer.
   149    154   * raw argument {obj ##|var}: like above, but does not evaluate [$var].
   150    155   * term {obj &|name}, {span &|name|[$expansion]}: quotes a defined term with a link to its definition, optionally with a custom expansion of the term (for instance, to expand the first use of an acronym)
   151    156   * inline image {obj &@|name}: shows a small image or other object inline. the unicode character [`🖼] can also be used instead of [`&@].
   152         -* unicode codepoint {obj U+|hex-integer}: inserts an arbitrary UCS codepoint in the output, specified by [$hex-integer]. lowercase [`u] is also legal.
          157  +* unicode codepoint {obj U|hex-integer}: inserts an arbitrary UCS codepoint in the output, specified by [$hex-integer]. lowercase [`u] is also legal, as are [`U+] and [`u+].
   153    158   * math mode {obj =|equation}: activates additional transformations on the span to format it as a mathematical equation; e.g. [`*] becomes [`×] and [`/] --> [`÷].
   154    159   * extension {span %|ext|…}: invokes extension named in [$ext]. [$ext] will usually be an extension name followed by a symbol (often a period) and then an extension-specific directive, although for some simple extensions it may just be the plain extension name. further syntax and semantics depend on the extension. this syntax can also be used to apply formatting specific to certain renderers, such as assigning a CSS class in the [`html] renderer (["[%html.myclass my [!styled] text]]).
   155         -* critical extension {span %!|ext|…}: like [!extension], but will trigger an error if the requested extension is not available
          160  +* important extension {span %!|ext|…}: like [!extension], but will issue a warning if the requested extension is not available
          161  +* critical extension {span %!!|ext|…}: like [!important extension], but will trigger an error and abort compilation if the requested extension is not available
   156    162   * extension text {span %:|ext|[$styled-text]}: like [!extension], but when the requested extension is not present, [$styled-text] wlil be emitted as-is. this is a better way to apply CSS classes, as the text will still be visible when rendered to formats other than HTML.
   157    163   * inline comment {obj %%|...}: ignored. useful for editorial annotations not intended to be part of the rendered product.
   158    164   
   159    165   	span: [` \[[*[#1]][$[#2]] [#3]\]]
   160    166   	obj: [` \[[*[#1]][$[#2]]\]]
   161    167   
   162    168   ##tabs tables
................................................................................
   581    587   ***: [*heading]: the section can occur on the same page as text and  headings from other sections
   582    588   ** {d pragma accent} specifies an accent hue (in degrees around the color wheel) for renderers which support colorized output
   583    589   ** {d pragma accent-spread} is a factor that controls the "spread" of hues used in the document. if 0, only the accent color will be used; if larger, other hues will be used in addition to the primary accent color.
   584    590   ** {d pragma dark-on-light on\|off} controls whether the color scheme used should be light-on-dark or dark-on-light
   585    591   ** {d pragma page-width} indicates how wide the pages should be
   586    592   ** {d pragma title-page} specifies a section to use as a title page, for renderer backends that support pagination
   587    593   
   588         -! note on pragmata: particularly when working with collections of documents, you should not keep formatting metadata in the documents themselves! the best thing to do is to have a makefile for compiling the documents using whatever tools you want to support, and encoding the rendering options in this file (for the reference implementation this currently means as command line arguments, but eventually it will support intent files as well) so they can all be changed in one place; pragmas should instead be used for per-document [*overrides] of default settings.
          594  +! note on pragmata: particularly when working with collections of documents, you should not keep shared formatting metadata duplicated across the documents themselves! the best thing to do is to have a makefile for compiling the documents using whatever tools you want to support, and encoding the rendering options in this file (for the reference implementation this currently means as command line arguments, but eventually it will support intent files as well) so they can all be changed in one place; pragmas should instead be used for per-document [*overrides] of default settings.
   589    595   ! a workaround for the lack of intent files in the reference implementation is to have a single pseudo-stylesheet that contains only {d pragma} statements, and then import this file from each individual source file using the {d include} directive. this is suboptimal and recommended only when you need to ensure compatibility between different implementations.
   590    596   ! when creating HTML files, an even better alternative may be to turn off style generation entirely and link in an external, hand-written CSS stylesheet. this is generally the way you should compile sources for existing websites if you aren't going to write your own extension.
   591    597   
   592    598   ##ex examples
   593    599   
   594    600   ~~~ blockquotes #bq [cortav] ~~~
   595    601   the following excerpts of text were recovered from a partially erased hard drive found in the Hawthorne manor in the weeks after the Incident. context is unknown.
   596    602   
   597         -#>
   598         -—spoke to the man under the bridge again, the one who likes to bite the heads off the fish, and he suggested i take a brief sabbatical and journey to the Wandering Oak (where all paths meet) in search of inspiration and the forsaken sword of Pirate Queen Granuaile. a capital idea! i shall depart upon the morrow, having honored the Lord Odin and poisoned my accursed minstrels as is tradition—
   599         -—can't smell my soul anymore, but that's beside the point entirely—
   600         -—that second moon (always have wondered why nobody else seems to notice the damn fool thing except on Michaelmas day). alas, my luck did not endure, and i was soon to find myself knee-deep in—
   601         -—just have to see about that, won't we!—
   602         -#
          603  +> —spoke to the man under the bridge again, the one who likes to bite the heads off the fish, and he suggested i take a brief sabbatical and journey to the Wandering Oak (where all paths meet) in search of inspiration and the forsaken sword of Pirate Queen Granuaile. a capital idea! i shall depart upon the morrow, having honored the Lord Odin and poisoned my accursed minstrels as is tradition—
          604  +> —can't smell my soul anymore, but that's beside the point entirely—
          605  +> —that second moon (always have wondered why nobody else seems to notice the damn fool thing except on Michaelmas day). alas, my luck did not endure, and i was soon to find myself knee-deep in—
          606  +> —just have to see about that, won't we!—
   603    607   
   604    608   the nearest surviving relative of Lord Hawthorne is believed to be a wandering beggar with a small pet meerkat who sells cursed wooden trinkets to unwary children. she will not be contacted, as the officers of the Yard fear her.
   605    609   ~~~
   606    610   
   607    611   ~~~links & notes #lnr [cortav] ~~~
   608    612   this sentence contains a [>zombo link] to zombo com. you can do anything[^any] at zombo com.
   609    613   	zombo: https://zombo.com
................................................................................
   656    660   	.civil: (unknown)
   657    661   	.roe: Monitor; do not engage
   658    662   	.danger: (unknown)
   659    663   
   660    664   $agent ZUCCHINI PARABLE
   661    665   	.civil: Zephram "Rolodex" Goldberg
   662    666   	.danger: Category Scarlet
   663         -$agent RHADAMANTH EXQUISITE
          667  +$agent RHADAMANTH EXCISE
   664    668   	.roe: Eliminate with extreme prejudice; CBRN deployment authorized
   665    669   	.danger: [*Unquantifiable]
   666    670   ~~~
   667    671   
   668    672   ~~~ tables #tab [cortav] ~~~
   669    673   here is a glossary table.
   670    674   
................................................................................
   762    766   the interpreter should provide a [`cortav] table with the objects:
   763    767   * [`ctx]: contains context variables
   764    768   
   765    769   used files should return a table with the following members
   766    770   * [`macros]: an array of functions that return strings or arrays of strings when invoked. these will be injected into the global macro namespace.
   767    771   
   768    772   ###ts ts
   769         -the [*ts] extension allows documents to be marked up for basic classification constraints and automatically redacted. if you are seriously relying on [`ts] for confidentiality, make damn sure you start the file with [$%[*requires] ts], so that rendering will fail with an error if the extension isn't supported.
          773  +the [*ts] extension allows documents to be marked up for basic classification constraints and automatically redacted. if you are seriously relying on [`ts] for confidentiality, make damn sure you start the file with [$%!![*needs] ts], so that rendering will fail with an error if the extension isn't supported.
   770    774   
   771    775   [`ts] currently has no support for misinformation.
   772    776   
   773    777   [`ts] enables the directives:
   774    778   * [`%[*ts] class [$scope level] ([$styled-text])]: indicates a classification level for either the whole document (scope [$doc]) or the next section (scope [$sec]). if the ts level is below [$level], the section will be redacted or rendering will fail with an error, as appropriate. if styled-text is included, this will be treated as the name of the classification level.
   775    779   * [`%[*ts] word [$scope word] ([$styled-text])]: indicates a codeword clearance that must be present for the text to render. if styled-text is present, this will be used to render the name of the codeword instead of [$word].
   776    780   * [`%[*when] ts level [$level]]

Modified cortav.lua from [940c3efd41] to [2184d83e7b].

     1      1   -- [ʞ] cortav.lua
     2      2   --  ~ lexi hale <lexi@hale.su>
     3      3   --  © AGPLv3
     4      4   --  ? reference implementation of the cortav document language
            5  +--
            6  +--  ! TODO refactor encoding logic. it's a complete
            7  +--         mess and i seem to have repeatedly gotten
            8  +--         confused about how it's supposed to work.
            9  +--         the whole shitshow needs to be replaced
           10  +--         with a clean, simple paradigm: documents
           11  +--         are translated to UTF8 on the way in, and
           12  +--         translate back out on the way out. trying
           13  +--         to cope with multiple simultaneous
           14  +--         encodings in memory is a disaster zone.
     5     15   
     6     16   local ss = require 'sirsem'
     7     17   -- aliases for commonly used sirsem funcs
     8     18   local startswith = ss.str.begins
     9     19   local dump = ss.dump
    10     20   local declare = ss.declare
    11     21   
................................................................................
   732    742   			spans = {{
   733    743   				kind = 'raw';
   734    744   				spans = {str};
   735    745   				origin = o;
   736    746   			}};
   737    747   			origin = o;
   738    748   		}
          749  +	end
          750  +	local function unicodepoint(s,c)
          751  +		local cp = tonumber(s, 16)
          752  +		return {
          753  +			kind = 'codepoint';
          754  +			code = cp;
          755  +		}
   739    756   	end
   740    757   	ct.spanctls = {
   741    758   		{seq = '!', parse = formatter 'emph'};
   742    759   		{seq = '*', parse = formatter 'strong'};
   743    760   		{seq = '~', parse = formatter 'strike'};
   744    761   		{seq = '+', parse = formatter 'insert'};
   745    762   		{seq = '\\', parse = function(s, c) -- raw
................................................................................
   794    811   			}
   795    812   		end};
   796    813   		{seq = '>', parse = insert_link};
   797    814   		{seq = '→', parse = insert_link};
   798    815   		{seq = '🔗', parse = insert_link};
   799    816   		{seq = '##', parse = insert_var_ref(true)};
   800    817   		{seq = '#', parse = insert_var_ref(false)};
          818  +
          819  +		{seq = 'U+', parse = unicodepoint};
          820  +		{seq = 'u+', parse = unicodepoint};
          821  +		{seq = 'U',  parse = unicodepoint};
          822  +		{seq = 'u',  parse = unicodepoint};
          823  +
   801    824   		{seq = '%%', parse = function (s,c)
   802    825   			local com = s:match '^%%%%%s*(.*)$'
   803    826   			return {
   804    827   				kind = 'comment';
   805    828   				comment = com;
   806    829   			}
   807    830   		end};

Modified desk/cortav.xml from [a3601c692e] to [2b77e7703b].

    49     49   		<contexts>
    50     50   			<context name='init' attribute='Normal Text' lineEndContext='#pop' fallthroughContext='text'>
    51     51   				<RegExpr String='\\.' attribute='Escaped Char'/>
    52     52   				<RegExpr attribute='Section Cue' context='sec-ident' String='(#|§)+' firstNonSpace='true' />
    53     53   				<StringDetect String='~~~' attribute='Literal Block Cue' firstNonSpace='true' context='literal-block-cue'/>
    54     54   				<RegExpr attribute='List' String='[\*:]+' firstNonSpace='true' context='text' />
    55     55   				<Detect2Chars char='%' char1='%' attribute='Comment' context='comment'/>
    56         -				<Detect2Chars char='%' char1='!' attribute='Critical Directive Cue' context='directive'/>
           56  +				<Detect2Chars char='%' char1='!' attribute='Important Directive Cue' context='directive'/>
           57  +				<StringDetect String='%!!' attribute='Critical Directive Cue' firstNonSpace='true' context='directive'/>
    57     58   				<DetectChar char='%' attribute='Directive Cue' context='directive'/>
    58     59   				<DetectChar char='@' attribute='Resource Cue' context='resource-ident'/>
    59     60   				<DetectChar char='$' attribute='Deref Cue' context='block-macro-ident'/>
    60     61   				<DetectChar char='&amp;' attribute='Deref Cue' context='block-deref-ident'/>
    61     62   				<Detect2Chars char='&#9;' char1='&#9;' context='refdef'/>
    62     63   				<DetectChar char='&#9;' context='refdef-id'/>
    63     64   
................................................................................
   178    179   				<DetectChar   attribute='Span Cue' char='>' context='#pop!ref' />
   179    180   				<DetectChar   attribute='Span Cue' char='^' context='#pop!ref' />
   180    181   				<DetectChar   attribute='Span Cue' char='&amp;' context='#pop!ref' />
   181    182   				<DetectChar   attribute='Span Cue' char='#' context='#pop!var-ref' />
   182    183   				<DetectChar   attribute='Span Cue' char='\' context='#pop!flat-span' />
   183    184   				<DetectChar   attribute='Span Cue' char='=' context='#pop!inline-math' />
   184    185   				<Detect2Chars attribute='Comment' char='%' char1='%' context='#pop!inline-comment' />
   185         -				<Detect2Chars attribute='Critical Directive Cue' char='%' char1='!' context='#pop!inline-directive' />
          186  +				<StringDetect String='%!!' attribute='Critical Directive Cue' firstNonSpace='true' context='#pop!inline-directive'/>
          187  +				<Detect2Chars attribute='Important Directive Cue' char='%' char1='!' context='#pop!inline-directive' />
   186    188   				<DetectChar   attribute='Directive Cue' char='%' context='#pop!inline-directive' />
   187    189   			</context>
   188    190   
   189    191   			<context name='flat-span' attribute='Unstyled Text' lineEndContext='#pop'>
   190    192   				<DetectChar attribute='Unstyled Text' context='flat-span' char='['/>
   191    193   				<Detect2Chars attribute='Escaped Char' context='#stay' char='\' char1=']'/>
   192    194   				<DetectChar attribute='Span Delimiter' context='#pop' char=']'/>
................................................................................
   235    237   		</contexts>
   236    238   		<itemDatas>
   237    239   			<itemData name='Normal Text' defStyleNum='dsNormal'/>
   238    240   			<itemData name='Styled Text' defStyleNum='dsNormal'/>
   239    241   			<itemData name='Emphatic Text' defStyleNum='dsNormal' italic='true'/>
   240    242   			<itemData name='Strong Text' defStyleNum='dsNormal' bold='true'/>
   241    243   			<itemData name='Deleted Text' defStyleNum='dsNormal' strikeOut='true'/>
   242         -				
          244  +
   243    245   			<itemData name='Section Cue' defStyleNum='dsKeyword' bold='true'/>
   244    246   			<itemData name='Deref Cue' defStyleNum='dsKeyword' bold='true'/>
   245    247   			<itemData name='Header' defStyleNum='dsControlFlow' underline='true'/>
   246    248   			<itemData name='Identifier' defStyleNum='dsVariable'/>
   247    249   
   248    250   			<itemData name='Unstyled Text' defStyleNum='dsVerbatimString'/>
   249    251   			<itemData name='Escaped Char' defStyleNum='dsSpecialChar'/>
................................................................................
   251    253   			<itemData name='Span Cue' defStyleNum='dsKeyword' bold='true'/>
   252    254   			<itemData name='Resource Cue' defStyleNum='dsKeyword' bold='true'/>
   253    255   			<itemData name='Resource Identifier' defStyleNum='dsVariable' bold='true'/>
   254    256   			<itemData name='Span Delimiter' defStyleNum='dsKeyword'/>
   255    257   			<itemData name='Directive' defStyleNum='dsAttribute' bold='true'/>
   256    258   			<itemData name='Directive Cue' defStyleNum='dsAttribute'/>
   257    259   			<itemData name='Critical Directive Cue' defStyleNum='dsImport' bold='true'/>
          260  +			<itemData name='Important Directive Cue' defStyleNum='dsImport' bold='true'/>
   258    261   			<itemData name='Extension Directive' defStyleNum='dsImport' bold='true'/>
   259    262   			<itemData name='Renderer Directive' defStyleNum='dsExtension' bold='true'/>
   260    263   			<itemData name='Standard Namespace' defStyleNum='dsBuiltIn' bold='true'/>
   261    264   			<itemData name='Comment' defStyleNum='dsComment'/>
   262    265   			<itemData name='Error' defStyleNum='dsError'/>
   263    266   			<itemData name='Macro' defStyleNum='dsPreprocessor' bold='true'/>
   264    267   			<itemData name='Macro Delimiter' defStyleNum='dsPreprocessor'/>

Modified makefile from [10a67640c2] to [97fc0b1491].

    32     32   #    doesn't accept options like -l -x -o then you'll have to build
    33     33   #    the binary by hand, sorry. but if you want to contribute a build
    34     34   #    script to the repository, i'll happily take merge requests :)
    35     35   
    36     36   lua != which lua
    37     37   luac != which luac
    38     38   sh != which sh
           39  +
           40  +#sterilize the operating theatre
           41  +lua += -E
    39     42   
    40     43   extens = $(wildcard ext/*.lua)
    41     44   extens-names ?= $(basename $(notdir $(extens)))
    42     45   rendrs = $(wildcard render/*.lua)
    43     46   rendrs-names ?= $(basename $(notdir $(rendrs)))
    44     47   binds = $(wildcard bind/*.c)
    45     48   binds-names ?= $(basename $(notdir $(binds)))
................................................................................
    57     60   ifneq ($(filter net,$(binds-names)),)
    58     61       lua-bindeps += -lcurl
    59     62   endif
    60     63   
    61     64   dbg-flags-luac = $(if $(debug),,-s)
    62     65   dbg-flags-cc = $(if $(debug),-g,-s)
    63     66   
    64         -# sterilize the operating theatre
    65         -export LUA_PATH=./?.lua;./?.lc
    66         -export LUA_PATH_5_3=./?.lc;./?.lua
    67         -export LUA_PATH_5_4=./?.lc;./?.lua
    68         -export LUA_INIT=
    69         -export LUA_INIT_5_3=
    70         -export LUA_INIT_5_4=
    71     67   
    72     68   # by default, we fetch and parse information about encodings we
    73     69   # support so that cortav can do fancy things like format math
    74     70   # equations by character class (e.g. italicizing variables)
    75     71   # this is not necessary for parsing the format, and can be
    76     72   # disabled by blanking the encoding-data list when building
    77     73   # ($ make encoding-data=)
................................................................................
   107    103   
   108    104   .PHONY: syncdoc
   109    105   syncdoc: $(build)/cortav.html
   110    106   	fossil uv add $< --as cortav.html
   111    107   	fossil uv sync --all
   112    108   
   113    109   # clean is written in overly cautious fashion to minimize damage,
   114         -# just in case it ever gets invoked in a bad way
          110  +# just in case it ever gets invoked in a bad way (e.g. build=/)
   115    111   .PHONY: clean
   116    112   clean:
   117    113   	rm -f $(build)/*.{html,lc,sh,txt,desktop} \
   118         -	      $(build)/$(executable){,.bin}
          114  +	      $(build)/$(executable){,.bin} \
          115  +	      $(build)/bind
   119    116   	rmdir $(build)
   120    117   
   121    118   $(build)/%.sh: desk/%.sh | $(build)/
   122    119   	echo >$@ "#!$(sh)"
   123    120   	echo >>$@ 'cortav_exec="$(bin-prefix)/$(executable)"'
   124    121   	echo >>$@ 'cortav_flags="$${ct_format_flags-$(default-format-flags)}"'
   125    122   	cat $< >> $@

Modified render/groff.lua from [3a20caca9a] to [810c155515].

   343    343   			rs.macAdd 'strike'
   344    344   			rcc.prop.color = 'del'
   345    345   		elseif s.style == 'insert' then
   346    346   			rs.macAdd 'insert'
   347    347   			rcc.prop.color = 'new'
   348    348   		end
   349    349   		rs.renderSpans(rcc, s.spans, b, sec)
   350         -	end;
          350  +	end
          351  +
          352  +	function spanRenderers.codepoint(rc, s, b, sec)
          353  +		utf8.char(s.code)
          354  +	end
   351    355   
   352    356   	function spanRenderers.link(rc, l, b, sec)
   353    357   		rs.renderSpans(rc, l.spans, b, sec)
   354    358   		rs.linkctr = rs.linkctr + 1
   355    359   		rs.macAdd 'footnote'
   356    360   		local p = rc:span(string.format('[%u]', rs.linkctr))
   357    361   		if type(l.ref) == 'string' then

Modified render/html.lua from [28584fb71a] to [f4ce452531].

   694    694   			elseif sp.style == 'strike' or sp.style == 'insert' then
   695    695   				addStyle 'editors_markup'
   696    696   			elseif sp.style == 'variable' then
   697    697   				addStyle 'var'
   698    698   			end
   699    699   			return tag(tags[sp.style],nil,htmlSpan(sp.spans,...))
   700    700   		end
          701  +
          702  +		function span_renderers.codepoint(t,b,s)
          703  +			-- is this a UTF8 output?
          704  +			return utf8.char(t.code)
          705  +			-- else
          706  +			-- return string.format("&#%u;", code)
          707  +		end
   701    708   
   702    709   		function span_renderers.deref(t,b,s)
   703    710   			local r = b.origin:ref(t.ref)
   704    711   			local name = t.ref
   705    712   			if name:find'%.' then name = name:match '^[^.]*%.(.+)$' end
   706    713   			if type(r) == 'string' then
   707    714   				addStyle 'abbr'

Modified sirsem.lua from [2b64c7033a] to [c72ad0a8fc].

    17     17   		end
    18     18   		return pkg
    19     19   	end
    20     20   	ss = namespace 'sirsem'
    21     21   	ss.namespace = namespace
    22     22   end
    23     23   
           24  +-- the C shim provides extra functionality that cannot
           25  +-- be implemented in pure Lua. this functionality is
           26  +-- accessed through the _G.native namespace. native
           27  +-- functions should not be called directly; rather,
           28  +-- they should be called from sirsem.lua wrappers that
           29  +-- can provide alternative implementations or error
           30  +-- messages when cortav is build in plain lua mode
    24     31   local native = _G.native
    25     32   
    26     33   function ss.map(fn, lst)
    27     34   	local new = {}
    28     35   	for k,v in pairs(lst) do
    29     36   		table.insert(new, fn(v,k))
    30     37   	end