cortav  Diff

Differences From Artifact [5152d1b5b1]:

To Artifact [607ec9b33c]:


            1  +%% this is the reference specification that i used to initially cobble together my
            2  +%% spec for the language i was going to implement, and which i then expanded
            3  +%% as i added features to the reference implementation. it's a mess and it
            4  +%% urgently needs to be rewritten into a more accessible and navigable
            5  +%% document for new users. TODO
            6  +
     1      7   # cortav specification
     2         -[*cortav] is a markup language designed to be a simpler, but more capable alternative to markdown. its name derives from the [>dict Ranuir words] [!cor] "writing" and [!tav] "document", translating to something like "(plain) text document".
     3         -
            8  +[*cortav] is a markup language designed to be a simpler, [!well-specified], and more capable alternative to markdown. its name derives from the [>dict Ranuir words] [!cor] "writing" and [!tav] "document", translating to something like "(plain) text document".
     4      9   	dict: http://ʞ.cc/fic/spirals/glossary
     5     10   
     6     11   the cortav [!format] can be called [!cortavgil], or [!gil cortavi], to differentiate it from the reference implementation [!cortavsir] or [!sir cortavi].
     7     12   
     8     13   %toc
     9     14   
    10     15   ## cortav vs. markdown
................................................................................
    70     75   * level 1: [*styling]. simple inline formatting sequences like strong, emphatic, literal, links, etc. math equation styling need not be supported. paragraphs, lists, and references are the only block elements supported. suitable for styling tweets and other very short content.
    71     76   * level 2: [*layout]. implements header, paragraph, newline, directive, and reference block elements. supports resources at least for remote or attached images. suitable for longer social media posts.
    72     77   * level 3: [*publishing]. implements all currently standardized core behavior, including zero or more extensions.
    73     78   * level 4: [*reference]. implements all currently standardized behavior, including [!all] standardized extensions.
    74     79   
    75     80   ! note that which translators are implemented is not specified by level, as this is, naturally, implementation-dependent. (it would make rather little sense for the blurb parser of a cortav-enabled blog engine to support generating PDFs, after all.) level encodes only which features of the cortav [!language] are supported.
    76     81   
    77         -##onblocks structure
           82  +##onblocks structure (block elements)
    78     83   cortav is based on an HTML-like block model, where a document consists of sections, which are made up of blocks, which may contain a sequence of spans. flows of text are automatically conjoined into spans, and blocks are separated by one or more newlines. this means that, unlike in markdown, a single logical paragraph [*cannot] span multiple ASCII lines. the primary purpose of this was to ensure ease of parsing, but also, both markdown and cortav are supposed to be readable from within a plain text editor. this is the 21st century. every reasonable text editor supports soft word wrap, and if yours doesn't, that's entirely your own damn fault. hard-wrapping lines is incredibly user-hostile, especially to users on mobile devices with small screens. cortav does not allow it.
    79     84   
    80     85   the first character(s) of every line (the "control sequence") indicates the role of that line. if no control sequence is recognized, the line is treated as a paragraph. the currently supported control sequences are listed below. some control sequences have alternate forms, in order to support modern, readable unicode characters as well as plain ascii text.
    81     86   
    82     87   * [*paragraphs] ([`.] [` ¶] [`❡]): a paragraph is a simple block of text. the period control sequence is only necessary if the paragraph text starts with text that would be interpreted as a control sequence otherwise
    83     88   * [*newlines] [` \\]: inserts a line break into previous paragraph and attaches the following text. mostly useful for poetry or lyrics
    84     89   * [*section starts] [`#] [`§]: starts a new section. all sections have an associated depth, determined by the number of sequence repetitions (e.g. "###" indicates depth three). sections may have headers and IDs; both are optional. IDs, if present, are a sequence of raw-text immediately following the hash marks. if the line has one or more space character followed by styled-text, a header will be attached. the character immediately following the hashes can specify a particular type of section. e.g.:
    85     90   ** [`#] is a simple section break.
    86     91   ** [`#anchor] opens a new section with the ID [`anchor].
    87     92   ** [`# header] opens a new section with the title "header".
    88     93   ** [`#anchor header] opens a new section with both the ID [`anchor] and the title "header".
    89         -* [*nonprinting sections] ([`^]): sometimes, you'll want to create a namespace without actually adding a visible new section to the document. you can achieve this by creating a [!nonprinting section] and defining resources within it. nonprinting sections can also be used to store comments, notes, to-dos, or other meta-information that is useful to have in the source file without it becoming a part of the output. nonprinting sections can be used for a sort of "literate markup," where resource and reference definitions can intermingle with human-readable narrative about those definitions.
    90         -* [*resource] ([`@]): defines a [!resource]. a resource is a file or object that is to be embedded in the document somehow. common examples of resources include images, videos, iframes, or headers/footers. resources can be defined inline, or reference external objects. see [>rsrc resources] for more information.
           94  +* [*nonprinting sections] ([`^]): sometimes, you'll want to create a namespace without actually adding a visible new section to the document. you can achieve this by creating a [!nonprinting section] and defining resources within it. nonprinting sections can also be used to store comments, notes, to-dos, or other meta-information that is useful to have in the source file without it becoming a part of the output. nonprinting sections can be used for a sort of "literate markup," where resource and reference definitions can intermingle with human-readable narrative about those definitions. note that unlike comments, nonprinting sections are still parsed and can still affect other sections by means of definitions and pragmata.
           95  +* [*resource] ([`@]): defines a [!resource]. a resource is a file or object that is to be embedded in the document somehow. common examples of resources include images, videos, iframes, or headers/footers. resources can be defined inline, or reference external objects that are read in either at compile-time or view-time. see [>rsrc resources] for more information.
    91     96   * [*lists] ([`*] [`:]): these are like paragraph nodes, but list nodes that occur next to each other will be arranged so as to show they compose a sequence. depth is determined by the number of stars/colons. like headers, a list entry may have an ID that can be used to refer back to it; it is indicated in the same way. if colons are used, this indicates that the order of the items is signifiant. [`:]-lists and [`*]-lists may be intermixed; however, note than only the last character in the sequence actually controls the type. a blank line terminates the current list.
    92     97   * [*directives] ([`%]): a directive issues a hint to the renderer in the form of an arbitrary string. directives are normally ignored if they are not supported, but you may cause a warning to be emitted where the directive is not supported with [`%!] or mark a directive critical with [`%!!] so that rendering will entirely fail if it cannot be obeyed.
    93     98   * [*comments] ([`%%]): a comment is a line of text that is simply ignored by the renderer.
    94     99   * [*asides] ([`!]): indicates text that diverges from the narrative, and can be skipped without interrupting it. think of it like block-level parentheses. asides which follow one another are merged as paragraphs of the same aside, usually represented as a sort of box. if the first line of an aside contains a colon, the stretch of styled-text from the beginning to the aside to the colon will be treated as a "type heading," e.g. "Warning:"
    95    100   * [*code] ([`~~~]): a line beginning with ~~~ begins or terminates a block of code. code blocks are by default not parsed, but parsing can be activated by preceding the code block with an [`%[*expand]] directive. the opening line should look like one of the below
    96    101   ** [`~~~]
    97    102   ** [`~~~ language] (markdown-style shorthand syntax)
................................................................................
   103    108   ** [`~~~ title \[language\] #id ~~~]
   104    109   *[*definition] ([^def-ex tab]): a line [^def-tab-enc beginning with a tab] is a multipurpose metadata syntax. the tab may be followed by an identifier, a colon, and a value string, in which case it opens a new definition; alternatively, a second tab character turns the line into a [*definition continuation], adding the remaining characters as a new line to the definition value on the previous line.  when a new definition is opened on a line immediately following certain kinds of objects, such as resources, embeds, or multiline macro expansions, it attaches key-value metadata to that object. when a definition is not preceded by such an object, an independent [*reference] is created instad.
   105    110   ** a [*reference] is a general mechanism for out-of-line metadata, and references are used in many different ways -- e.g. to specify link destinations, footnote contents, abbreviations, or macro bodies. to ensure that a definition is interpreted as a reference, rather than as metadata for an object, precede it with a blank line.
   106    111   	def-tab-enc: in encodings without tab characters, a definition is opened by a line beginning with two blanks, and continued by a line beginning with four blanks.
   107    112   	def-ex: [*open a new reference]: [`[!\\t][$key]: [$value]]
   108    113   		[*continue a reference]: [`[!\\t\\t][$value]]
   109    114   * [*quotation] ([`<]): a line of the form [`<[$name]> [$quote]] denotes an utterance by [$name].
   110         -* [*blockquote] ([`>]): alternate blockquote syntax. can be nested by repeating the [`>] character.
   111         -* [*subtitle/caption] (["--]): attaches a subtitle to the previous header, or caption to the previous object
          115  +* [*blockquote] ([`>[$id] [$body]]): "inline" blockquote syntax. can be nested by repeating the [`>] character. the [$id] is optional, but the [`>] character must be immediately followed by whitespace if the block is not to have an ID.
          116  +* [*subtitle/caption] (["--]): attaches a subtitle to the previous header, or caption to the previous object. after a blockquote, attaches an attribution line
   112    117   * [*embed] ([`&]): embeds a referenced object. can be used to show images or repeat previously defined objects like lists or tables, optionally with a caption. an embed line can be followed immediately by a sequence of [*definitions] in the same way that resource definitions can, to override resource properties on a per-instance basis. note that only presentation-related properties like [$desc] can be meaningful overridden, as embed does not trigger a re-render of the parse tree; if you want to override e.g. context variables, use a multiline macro invocation instead.
   113    118   ** [`&[$image]] embeds an image or other block-level object. [!image] can be a reference with a url or file path, or it can be an embed section (e.g. for SVG files)
   114    119   ***[`&myimg All that remained of the unfortunate blood magic pageant contestants and audience (police photo)]
   115    120   ** [`&-[$ident] [$styled-text]] embeds a closed disclosure element containing the text of the named object (a nonprinting section or cortav resource should usually be used to store the content; it can also name an image or video, of course). in interactive outputs, this will display as a block which can be clicked on to view the full contents of the referenced object [$ident]; if [$styled-text] is present, it overrides the title of the section you are embedding (if any). in static outputs, the disclosure object will display as an enclosed box with [$styled-text] as the title text
   116    121   *** [`&-ex-a Prosecution Exhibit A (GRAPHIC CONTENT)]
   117    122   ** [`&+[$section] [$styled-text]] is like the above, but the disclosure element is open by default
   118    123   * [`$[$macro] [$arg1]|[$arg2]|[$argn]…] invokes a block-level macro with the supplied arguments, and can be followed by a property override definition list the same way embed and resource lines can. note that while both [`$[$id]] and [`&[$id]] can be used to instantiate resources of type [`text/x.cortav], there is a critical difference: [`$[$id]] renders out the sub-document separately each time it is named, allowing for parameter expansion and for context variables to be overridden for each invocation. by contrast, [`&[$id]] can only insert copies of the same render; no parameters can be passed and context variables will be expanded to their value at the time the resource was defined. only [`&[$id]] can instantiate resources of types other than [`text/x.cortav]. there is also a semantic distinction: resources interpreted as macros are inserted "in-band", on an equal basis with nearby elements; resources interpreted as embeds are set off to clearly indicate that they are a sub-document, and on interactive outputs may have their own independently-scrolling viewport.
................................................................................
   122    127   * [*table cells] ([`+ |]): see [>ex.tab table examples].
   123    128   * [*equations] ([`=]): block-level equations can be inserted with the [`=] sequence
   124    129   * [*cross-references] ([`=>] [`⇒]): inserts a block-level link. has two forms for the sake of gemtext compatibility. [$styled-text] is a descriptive text of the destination. especially useful for menus and gemtext output.
   125    130   ** the cortav syntax is [`=>[$ident] [$styled-text]], where [$ident] is an identifier; links to the same destination as [`\[>[$ident] [$styled-text]\]] would
   126    131   ** the compatibility syntax is [`=> [$uri] [$styled-text]] (note the space before [$uri]!). instead of taking an identifier for an object in the document, it directly accepts a URI. note that this is not formally equivalent to gemtext's link syntax, which also allows paths in place of URIs; [`cortav] does not. the gemtext line ["=> /somewhere] would need to be expressed as ["=> file:/somewhere], and ["=> /somewhere?key=val] as ["http:/somewhere?key=val] (or ["gemini:/somewhere?key=val], if the result is to be served over a gemini server).
   127    132   * [*empty lines] (that is, lines consisting of nothing but whitespace) constitute a [!break], which terminates multiline objects that do not have a dedicated termination sequence, for example lists and asides.
   128    133   
   129         -##onspans styled text
          134  +##onspans styled text (span elements)
   130    135   most blocks contain a sequence of spans. these spans are produced by interpreting a stream of [*styled-text] following the control sequence. styled-text is a sequence of codepoints potentially interspersed with escapes. an escape is formed by an open square bracket [`\[] followed by a [*span control sequence], and arguments for that sequence like more styled-text. escapes can be nested.
   131    136   
   132    137   * strong {obj *|styled-text}: causes its text to stand out from the narrative, generally rendered as bold or a brighter color.
   133    138   * emphatic {obj !|styled-text}: indicates that its text should be spoken with emphasis, generally rendered as italics
   134    139   * custom style {span .|id|[$styled-text]}: applies a specially defined font style. for example, if you have defined [`caution] to mean "demibold italic underline", cortav will try to apply the proper weight and styling within the constraints of the current font to the span [$styled-text]. see the [>fonts-sty fonts section] for more information about this mechanism.
   135    140   * literal {obj `|styled-text}: indicates that its text is a reference to a literal sequence of characters or other discrete token. generally rendered in monospace
   136    141   * variable {obj $|styled-text}: indicates to the reader that its text is a placeholder, rather than a literal representation. generally rendered in italic monospace, ideally of a different color
................................................................................
   145    150   * raw {obj \\ |[$raw-text]}: causes all characters within to be interpreted literally, without expansion. the only special characters are square brackets, which must have a matching closing bracket, and backslashes.
   146    151   * raw literal [` \["[$raw-text]\]]: shorthand for a raw inside a literal, that is ["[`[\\…]]]
   147    152   * macro [` \{[$name] [$arguments]}]: invokes a [>ex.mac macro] inline, specified with a reference. if the result of macro expansion contains newlines, they will be treated as line breaks, rather than paragraph breaks as they would be in a multiline context.
   148    153   * argument {obj #|var}: in macros only, inserts the [$var]-th argument. otherwise, inserts a context variable provided by the renderer.
   149    154   * raw argument {obj ##|var}: like above, but does not evaluate [$var].
   150    155   * term {obj &|name}, {span &|name|[$expansion]}: quotes a defined term with a link to its definition, optionally with a custom expansion of the term (for instance, to expand the first use of an acronym)
   151    156   * inline image {obj &@|name}: shows a small image or other object inline. the unicode character [`🖼] can also be used instead of [`&@].
   152         -* unicode codepoint {obj U+|hex-integer}: inserts an arbitrary UCS codepoint in the output, specified by [$hex-integer]. lowercase [`u] is also legal.
          157  +* unicode codepoint {obj U|hex-integer}: inserts an arbitrary UCS codepoint in the output, specified by [$hex-integer]. lowercase [`u] is also legal, as are [`U+] and [`u+].
   153    158   * math mode {obj =|equation}: activates additional transformations on the span to format it as a mathematical equation; e.g. [`*] becomes [`×] and [`/] --> [`÷].
   154    159   * extension {span %|ext|…}: invokes extension named in [$ext]. [$ext] will usually be an extension name followed by a symbol (often a period) and then an extension-specific directive, although for some simple extensions it may just be the plain extension name. further syntax and semantics depend on the extension. this syntax can also be used to apply formatting specific to certain renderers, such as assigning a CSS class in the [`html] renderer (["[%html.myclass my [!styled] text]]).
   155         -* critical extension {span %!|ext|…}: like [!extension], but will trigger an error if the requested extension is not available
          160  +* important extension {span %!|ext|…}: like [!extension], but will issue a warning if the requested extension is not available
          161  +* critical extension {span %!!|ext|…}: like [!important extension], but will trigger an error and abort compilation if the requested extension is not available
   156    162   * extension text {span %:|ext|[$styled-text]}: like [!extension], but when the requested extension is not present, [$styled-text] wlil be emitted as-is. this is a better way to apply CSS classes, as the text will still be visible when rendered to formats other than HTML.
   157    163   * inline comment {obj %%|...}: ignored. useful for editorial annotations not intended to be part of the rendered product.
   158    164   
   159    165   	span: [` \[[*[#1]][$[#2]] [#3]\]]
   160    166   	obj: [` \[[*[#1]][$[#2]]\]]
   161    167   
   162    168   ##tabs tables
................................................................................
   581    587   ***: [*heading]: the section can occur on the same page as text and  headings from other sections
   582    588   ** {d pragma accent} specifies an accent hue (in degrees around the color wheel) for renderers which support colorized output
   583    589   ** {d pragma accent-spread} is a factor that controls the "spread" of hues used in the document. if 0, only the accent color will be used; if larger, other hues will be used in addition to the primary accent color.
   584    590   ** {d pragma dark-on-light on\|off} controls whether the color scheme used should be light-on-dark or dark-on-light
   585    591   ** {d pragma page-width} indicates how wide the pages should be
   586    592   ** {d pragma title-page} specifies a section to use as a title page, for renderer backends that support pagination
   587    593   
   588         -! note on pragmata: particularly when working with collections of documents, you should not keep formatting metadata in the documents themselves! the best thing to do is to have a makefile for compiling the documents using whatever tools you want to support, and encoding the rendering options in this file (for the reference implementation this currently means as command line arguments, but eventually it will support intent files as well) so they can all be changed in one place; pragmas should instead be used for per-document [*overrides] of default settings.
          594  +! note on pragmata: particularly when working with collections of documents, you should not keep shared formatting metadata duplicated across the documents themselves! the best thing to do is to have a makefile for compiling the documents using whatever tools you want to support, and encoding the rendering options in this file (for the reference implementation this currently means as command line arguments, but eventually it will support intent files as well) so they can all be changed in one place; pragmas should instead be used for per-document [*overrides] of default settings.
   589    595   ! a workaround for the lack of intent files in the reference implementation is to have a single pseudo-stylesheet that contains only {d pragma} statements, and then import this file from each individual source file using the {d include} directive. this is suboptimal and recommended only when you need to ensure compatibility between different implementations.
   590    596   ! when creating HTML files, an even better alternative may be to turn off style generation entirely and link in an external, hand-written CSS stylesheet. this is generally the way you should compile sources for existing websites if you aren't going to write your own extension.
   591    597   
   592    598   ##ex examples
   593    599   
   594    600   ~~~ blockquotes #bq [cortav] ~~~
   595    601   the following excerpts of text were recovered from a partially erased hard drive found in the Hawthorne manor in the weeks after the Incident. context is unknown.
   596    602   
   597         -#>
   598         -—spoke to the man under the bridge again, the one who likes to bite the heads off the fish, and he suggested i take a brief sabbatical and journey to the Wandering Oak (where all paths meet) in search of inspiration and the forsaken sword of Pirate Queen Granuaile. a capital idea! i shall depart upon the morrow, having honored the Lord Odin and poisoned my accursed minstrels as is tradition—
   599         -—can't smell my soul anymore, but that's beside the point entirely—
   600         -—that second moon (always have wondered why nobody else seems to notice the damn fool thing except on Michaelmas day). alas, my luck did not endure, and i was soon to find myself knee-deep in—
   601         -—just have to see about that, won't we!—
   602         -#
          603  +> —spoke to the man under the bridge again, the one who likes to bite the heads off the fish, and he suggested i take a brief sabbatical and journey to the Wandering Oak (where all paths meet) in search of inspiration and the forsaken sword of Pirate Queen Granuaile. a capital idea! i shall depart upon the morrow, having honored the Lord Odin and poisoned my accursed minstrels as is tradition—
          604  +> —can't smell my soul anymore, but that's beside the point entirely—
          605  +> —that second moon (always have wondered why nobody else seems to notice the damn fool thing except on Michaelmas day). alas, my luck did not endure, and i was soon to find myself knee-deep in—
          606  +> —just have to see about that, won't we!—
   603    607   
   604    608   the nearest surviving relative of Lord Hawthorne is believed to be a wandering beggar with a small pet meerkat who sells cursed wooden trinkets to unwary children. she will not be contacted, as the officers of the Yard fear her.
   605    609   ~~~
   606    610   
   607    611   ~~~links & notes #lnr [cortav] ~~~
   608    612   this sentence contains a [>zombo link] to zombo com. you can do anything[^any] at zombo com.
   609    613   	zombo: https://zombo.com
................................................................................
   656    660   	.civil: (unknown)
   657    661   	.roe: Monitor; do not engage
   658    662   	.danger: (unknown)
   659    663   
   660    664   $agent ZUCCHINI PARABLE
   661    665   	.civil: Zephram "Rolodex" Goldberg
   662    666   	.danger: Category Scarlet
   663         -$agent RHADAMANTH EXQUISITE
          667  +$agent RHADAMANTH EXCISE
   664    668   	.roe: Eliminate with extreme prejudice; CBRN deployment authorized
   665    669   	.danger: [*Unquantifiable]
   666    670   ~~~
   667    671   
   668    672   ~~~ tables #tab [cortav] ~~~
   669    673   here is a glossary table.
   670    674   
................................................................................
   762    766   the interpreter should provide a [`cortav] table with the objects:
   763    767   * [`ctx]: contains context variables
   764    768   
   765    769   used files should return a table with the following members
   766    770   * [`macros]: an array of functions that return strings or arrays of strings when invoked. these will be injected into the global macro namespace.
   767    771   
   768    772   ###ts ts
   769         -the [*ts] extension allows documents to be marked up for basic classification constraints and automatically redacted. if you are seriously relying on [`ts] for confidentiality, make damn sure you start the file with [$%[*requires] ts], so that rendering will fail with an error if the extension isn't supported.
          773  +the [*ts] extension allows documents to be marked up for basic classification constraints and automatically redacted. if you are seriously relying on [`ts] for confidentiality, make damn sure you start the file with [$%!![*needs] ts], so that rendering will fail with an error if the extension isn't supported.
   770    774   
   771    775   [`ts] currently has no support for misinformation.
   772    776   
   773    777   [`ts] enables the directives:
   774    778   * [`%[*ts] class [$scope level] ([$styled-text])]: indicates a classification level for either the whole document (scope [$doc]) or the next section (scope [$sec]). if the ts level is below [$level], the section will be redacted or rendering will fail with an error, as appropriate. if styled-text is included, this will be treated as the name of the classification level.
   775    779   * [`%[*ts] word [$scope word] ([$styled-text])]: indicates a codeword clearance that must be present for the text to render. if styled-text is present, this will be used to render the name of the codeword instead of [$word].
   776    780   * [`%[*when] ts level [$level]]