cortav: Check-in [87fed4ec34]

Overview

Comment:	add rudimentary syntax hiliting for kate/kwrite/kdepart
Downloads:	Tarball \| ZIP archive \| SQL archive
Timelines:	family \| ancestors \| descendants \| both \| trunk
Files:	files \| file ages \| folders
SHA3-256:	87fed4ec344b47289992a1eb59eada5ae23749cda50d0dff5eb4d731e22e0bae
User & Date:	lexi on 2021-12-19 18:12:38
Other Links:	manifest \| tags

Context

2021-12-20
00:09		split cortav into modules, enable use as library, create extension mechanism stub, fix up docs check-in: 9c67b0312c user: lexi tags: trunk
2021-12-19
18:12		add rudimentary syntax hiliting for kate/kwrite/kdepart check-in: 87fed4ec34 user: lexi tags: trunk
05:25		further develop html renderer and document it, many doc fixes, fix misc bugs check-in: 2e37b523b5 user: lexi tags: trunk

Changes

Hide Diffs Unified Diffs Ignore Whitespace Patch

Modified cortav.ct from [6a93030d29] to [fcb217abd6].


## structure
cortav is based on an HTML-like block model, where a document consists of sections, which are made up of blocks, which may contain a sequence of spans. flows of text are automatically conjoined into spans, and blocks are separated by one or more newlines. this means that, unlike in markdown, a single logical paragraph [*cannot] span multiple ASCII lines. the primary purpose of this was to ensure ease of parsing, but also, both markdown and cortav are supposed to be readable from within a plain text editor. this is the 21st century. every reasonable text editor supports soft word wrap, and if yours doesn't, that's entirely your own damn fault.

the first character(s) of every line (the "control sequence") indicates the role of that line. if no control sequence is recognized, the sequence [$.] is implied instead. the standard line classes and their associated control sequences are listed below. some control sequences have alternate forms, in order to support modern, readable unicode characters as well as plain ascii text.

* paragraphs (. ¶ ❡): a paragraph is a simple block of text. the period control sequence is only necessary if the paragraph text begins with something that would otherwise be interpreted as a control sequence.
* newlines (\): inserts a line break into previous paragraph and attaches the following text. mostly useful for poetry or lyrics.
* section starts (# §): starts a new section. all sections have an associated depth, determined by the number of sequence repetitions (e.g. "###" indicates depth-three"). sections may have headers and IDs; both are optional. IDs, if present, are a sequence of raw-text immediately following the hash marks. if the line has one or more space character followed by styled-text, a header will be attached. the character immediately following the hashes can specify a particular type of section. e.g.:
** [$#] is a simple section break.
** [$#anchor] opens a new section with the ID [$anchor].
** [$# header] opens a new section with the title "header".
** [$#anchor header] opens a new section with both the ID [$anchor] and the title "header".
** [$#>conversation] opens a blockquote section named [$conversation] without a header.
** [$#^id] opens a footnote section for the multiline footnote [$id]. the ID must be specified.
................................................................................

## styled text
most blocks contain a sequence of spans. these spans are produced by interpreting a stream of [*styled-text] following the control sequence. styled-text is a sequence of codepoints potentially interspersed with escapes. an escape is formed by an open square bracket [$\[] followed by a [*span control sequence], and arguments for that sequence like more styled-text. escapes can be nested.

* strong \[*[!styled-text]\]: causes its text to stand out from the narrative, generally rendered as bold or a brighter color.
* emphatic \[![!styled-text]\]: indicates that its text should be spoken with emphasis, generally rendered as italics
* literal \[$[!styled-text]\]: indicates that its text is a reference to a literal sequence of characters, variable name, or other discrete token. generally rendered in monospace



* link \[>[!ref] [!styled-text]\]: produces a hyperlink or cross-reference denoted by [$ref], which may be either a URL specified with a reference or the name of an object like an image or section elsewhere in the document. the unicode characters [$→] and [$🔗] can also be used instead of [$>] to denote a link.
* footnote \[^[!ref] [!styled-text]\]: annotates the text with a defined footnote
* raw \[\\[!raw-text]\]: causes all characters within to be interpreted literally, without expansion. the only special characters are square brackets, which must have a matching closing bracket
* raw literal \[$\\[!raw-text]\]: shorthand for [\[$[\…]]]
* macro \{[!name] [!arguments]}: invokes a [>ex.mac macro], specified with a reference
* argument \[#[!var]\]: in macros only, inserts the [$var]-th argument. otherwise, inserts a context variable provided by the renderer.
* raw argument \[##[!var]\]: like above, but does not evaluate [$var].
* term \[&[!name] ([!label])\]: quotes a defined term with a link to its definition
* inline image \[&@[!name]\]: shows a small image or other object inline. the unicode character [$🖼] can also be used instead of [$&@].

## identifiers
any identifier (including a reference) that is defined within a named section must be referred to from outside that section as [$[!sec].[!obj]], where [$sec] is the ID of the containing section and [$obj] is the ID of the object one wishes to reference.
................................................................................
ts enables the spans:
* [$\[🔒#[!level] [!styled-text]\]]: redacts the span if the security level is below that specified.
* [$\[🔒.[!word] [!styled-text]\]]: redacts the span if the specified codeword clearance is not enabled.
(the padlock emoji is shorthand for [$%ts].)

ts redacts spans securely; that is, they are simply replaced with an indicator that they have been redacted, without visually leaking the length of the redacted text.

~~~ts-example example ~~~ cortav
%ts word doc sorrowful-pines SORROWFUL PINES

# intercept R1440 TCT S3
this communication between the ambassador of [*POLITY DOORMAT CRIMSON] "Socialist League world Glory" and an unknown noble of [*POLITY ROSE] "the Empire of a Thousand Suns" was intercepted by [*SYSTEM SUPINE WARBLE].

## involved individuals
* (A) [*DOORMAT CRIMSON] Ambassador [🔒.morose-frenzy Hyacinth Autumn-Lotus] (confidence 1.0)
................................................................................
| [$--version]             :|:[$-V]:| display the interpreter version             |

###refimpl-mode modes
most of [$cortav.lua]'s implementation-specific behavior is controlled by use of [!modes]. these are namespaced options which may have a boolean, string, or numeric value. boolean modes are set with the [$-y] [$-n] flags; other modes use the [$-m] flags.

most modes are defined by the renderer backend. the following modes affect the behavior of the frontend:

+ ID              + type   + effect
|   [$render:format]:| string | selects the [>refimpl-rend renderer] (default [$html])
| [$parse:show-tree]:| flag   | dumps the parse tree to the log after parsing completes

##refimpl-rend renderers
[$cortav.lua] implements a frontend-backend architecture, separating the parsing stage from the rendering stage. this means new renderers can be added to [$cortav.lua] relatively easily. currently, only an [>refimpl-rend-html HTML renderer] is included; however, a [$groff] backend is planned at some point in the future, so that PDFs and manpages can be generated from cortav files.

###refimpl-rend-html html
................................................................................
	-m render:format html \
	-m html:width 40em \
	-m html:accent 80 \
	-m html:hue-spread 35 \
	-y html:dark-on-light # could also be written as:
$ cortav readme.ct -ommmmy readme.html render:format html html:width 40em html:accent 80 html:hue-spread 35 html:dark-on-light
~~~


## structure
cortav is based on an HTML-like block model, where a document consists of sections, which are made up of blocks, which may contain a sequence of spans. flows of text are automatically conjoined into spans, and blocks are separated by one or more newlines. this means that, unlike in markdown, a single logical paragraph [*cannot] span multiple ASCII lines. the primary purpose of this was to ensure ease of parsing, but also, both markdown and cortav are supposed to be readable from within a plain text editor. this is the 21st century. every reasonable text editor supports soft word wrap, and if yours doesn't, that's entirely your own damn fault.

the first character(s) of every line (the "control sequence") indicates the role of that line. if no control sequence is recognized, the sequence [$.] is implied instead. the standard line classes and their associated control sequences are listed below. some control sequences have alternate forms, in order to support modern, readable unicode characters as well as plain ascii text.

* paragraphs (. ¶ ❡): a paragraph is a simple block of text. the period control sequence is only necessary if the paragraph text begins with something that would otherwise be interpreted as a control sequence.
* newlines (\\): inserts a line break into previous paragraph and attaches the following text. mostly useful for poetry or lyrics.
* section starts (# §): starts a new section. all sections have an associated depth, determined by the number of sequence repetitions (e.g. "###" indicates depth-three"). sections may have headers and IDs; both are optional. IDs, if present, are a sequence of raw-text immediately following the hash marks. if the line has one or more space character followed by styled-text, a header will be attached. the character immediately following the hashes can specify a particular type of section. e.g.:
** [$#] is a simple section break.
** [$#anchor] opens a new section with the ID [$anchor].
** [$# header] opens a new section with the title "header".
** [$#anchor header] opens a new section with both the ID [$anchor] and the title "header".
** [$#>conversation] opens a blockquote section named [$conversation] without a header.
** [$#^id] opens a footnote section for the multiline footnote [$id]. the ID must be specified.
................................................................................

## styled text
most blocks contain a sequence of spans. these spans are produced by interpreting a stream of [*styled-text] following the control sequence. styled-text is a sequence of codepoints potentially interspersed with escapes. an escape is formed by an open square bracket [$\[] followed by a [*span control sequence], and arguments for that sequence like more styled-text. escapes can be nested.

* strong \[*[!styled-text]\]: causes its text to stand out from the narrative, generally rendered as bold or a brighter color.
* emphatic \[![!styled-text]\]: indicates that its text should be spoken with emphasis, generally rendered as italics
* literal \[$[!styled-text]\]: indicates that its text is a reference to a literal sequence of characters, variable name, or other discrete token. generally rendered in monospace
* strikeout \[$[~styled-text]\]: indicates that its text should be struck through or otherwise indicated for deletion
* insertion \[$[+styled-text]\]: indicates that its text should be indicated as a new addition to the text body. 
** consider using a macro definition [$\edit: [~[#1]][+[#2]]] to save typing if you are doing editing work
* link \[>[!ref] [!styled-text]\]: produces a hyperlink or cross-reference denoted by [$ref], which may be either a URL specified with a reference or the name of an object like an image or section elsewhere in the document. the unicode characters [$→] and [$🔗] can also be used instead of [$>] to denote a link.
* footnote \[^[!ref] [!styled-text]\]: annotates the text with a defined footnote
* raw \[\\[!raw-text]\]: causes all characters within to be interpreted literally, without expansion. the only special characters are square brackets, which must have a matching closing bracket
* raw literal \[$\\[!raw-text]\]: shorthand for [\[$[\…]]]
* macro \{[!name] [!arguments]\}: invokes a [>ex.mac macro], specified with a reference
* argument \[#[!var]\]: in macros only, inserts the [$var]-th argument. otherwise, inserts a context variable provided by the renderer.
* raw argument \[##[!var]\]: like above, but does not evaluate [$var].
* term \[&[!name] ([!label])\]: quotes a defined term with a link to its definition
* inline image \[&@[!name]\]: shows a small image or other object inline. the unicode character [$🖼] can also be used instead of [$&@].

## identifiers
any identifier (including a reference) that is defined within a named section must be referred to from outside that section as [$[!sec].[!obj]], where [$sec] is the ID of the containing section and [$obj] is the ID of the object one wishes to reference.
................................................................................
ts enables the spans:
* [$\[🔒#[!level] [!styled-text]\]]: redacts the span if the security level is below that specified.
* [$\[🔒.[!word] [!styled-text]\]]: redacts the span if the specified codeword clearance is not enabled.
(the padlock emoji is shorthand for [$%ts].)

ts redacts spans securely; that is, they are simply replaced with an indicator that they have been redacted, without visually leaking the length of the redacted text.

~~~#ts-example example [cortav] ~~~
%ts word doc sorrowful-pines SORROWFUL PINES

# intercept R1440 TCT S3
this communication between the ambassador of [*POLITY DOORMAT CRIMSON] "Socialist League world Glory" and an unknown noble of [*POLITY ROSE] "the Empire of a Thousand Suns" was intercepted by [*SYSTEM SUPINE WARBLE].

## involved individuals
* (A) [*DOORMAT CRIMSON] Ambassador [🔒.morose-frenzy Hyacinth Autumn-Lotus] (confidence 1.0)
................................................................................
| [$--version]             :|:[$-V]:| display the interpreter version             |

###refimpl-mode modes
most of [$cortav.lua]'s implementation-specific behavior is controlled by use of [!modes]. these are namespaced options which may have a boolean, string, or numeric value. boolean modes are set with the [$-y] [$-n] flags; other modes use the [$-m] flags.

most modes are defined by the renderer backend. the following modes affect the behavior of the frontend:

+ ID                 + type   + effect
|   [$render:format]:| string | selects the [>refimpl-rend renderer] (default [$html])
| [$parse:show-tree]:| flag   | dumps the parse tree to the log after parsing completes

##refimpl-rend renderers
[$cortav.lua] implements a frontend-backend architecture, separating the parsing stage from the rendering stage. this means new renderers can be added to [$cortav.lua] relatively easily. currently, only an [>refimpl-rend-html HTML renderer] is included; however, a [$groff] backend is planned at some point in the future, so that PDFs and manpages can be generated from cortav files.

###refimpl-rend-html html
................................................................................
	-m render:format html \
	-m html:width 40em \
	-m html:accent 80 \
	-m html:hue-spread 35 \
	-y html:dark-on-light # could also be written as:
$ cortav readme.ct -ommmmy readme.html render:format html html:width 40em html:accent 80 html:hue-spread 35 html:dark-on-light
~~~

Modified cortav.lua from [1d4d9e0a4b] to [a950584594].

	local styles = {}
	if opts.width then
		table.insert(styles, string.format([[body {padding:0 1em;margin:auto;max-width:%s}]], opts.width))
	end
	if opts.accent then
		table.insert(styles, string.format(':root {--accent:%s}', opts.accent))
	end
	if opts.accent or (not opts['dark-on-light']) then
		stylesNeeded.accent = true
	end


	for k in pairs(stylesNeeded) do
		if not stylesets[k] then ct.exns.unimpl('styleset %s not implemented (!)',  k):throw() end
		table.insert(styles, prepcss(stylesets[k]))

	local styles = {}
	if opts.width then
		table.insert(styles, string.format([[body {padding:0 1em;margin:auto;max-width:%s}]], opts.width))
	end
	if opts.accent then
		table.insert(styles, string.format(':root {--accent:%s}', opts.accent))
	end
	if opts.accent or (not opts['dark-on-light']) and (not opts['fossil-uv']) then
		stylesNeeded.accent = true
	end


	for k in pairs(stylesNeeded) do
		if not stylesets[k] then ct.exns.unimpl('styleset %s not implemented (!)',  k):throw() end
		table.insert(styles, prepcss(stylesets[k]))

Added cortav.xml version [8189edad17].

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

<?xml version='1.0' encoding='UTF-8'?>
<!--
 [ʞ] cortav.xml
  ~ lexi hale <lexi@hale.su>
  © AGPLv3
  ? Kate/kwrite-compatible syntax definition for the cortav markup format
  > ln cortav.xml $HOME/.local/share/org.kde.syntax-highlighting/syntax/
-->
<language name='Cortav' version='1' kateversion='2.4' section='Markup' extensions='*.ct'>
	<highlighting>
		<list name='extension-directives'>
			<item>uses</item>
			<item>needs</item>
			<item>inhibits</item>
		</list>
		<list name='renderer-directives'>
			<item>html</item>
			<item>groff</item>
			<item>ps</item>
			<item>tex</item>
			<item>plaintext</item>
			<item>rtf</item>
			<item>svg</item>
		</list>
		<contexts>
			<context name='init' attribute='Normal Text' lineEndContext='#pop' fallthroughContext='text'>
				<RegExpr String='\\.' attribute='Escaped Char'/>
				<RegExpr attribute='Section Cue' context='sec-ident' String='(#|§)+' firstNonSpace='true' />
				<StringDetect String='~~~' attribute='Literal Block Cue' firstNonSpace='true' context='literal-block-cue'/>
				<RegExpr attribute='List' String='[\*:]+' firstNonSpace='true' context='text' />
				<Detect2Chars char='%' char1='!' attribute='Critical Directive Cue' context='directive'/>
				<DetectChar char='%' attribute='Directive Cue' context='directive'/>
				<DetectChar char='&#9;' attribute='Normal Text' context='refdef-id'/>
			</context>

			<context name='sec-ident' attribute='Identifier' lineEndContext='#pop'>
				<DetectSpaces context='#pop!sec' attribute='Normal Text'/>
			</context>

			<context name='sec' attribute='Header' lineEndContext='#pop'>
				<IncludeRules context='text'/>
			</context>

			<context name='refdef-id' attribute='Identifier' lineEndContext='#pop'>
				<DetectChar char=':' attribute='Normal Text' context='#pop!refdef'/>
			</context>
			<context name='refdef' attribute='Styled Text' lineEndContext='#pop'>
			</context>

			<context name='directive' attribute='Directive' lineEndContext='#pop'>
				<keyword attribute='Extension Directive' String='extension-directives'/>
				<keyword attribute='Renderer Directive' String='renderer-directives'/>
			</context>

			<context name='text' attribute='Normal Text' lineEndContext='#pop'>
				<RegExpr String='\\.' attribute='Escaped Char'/>
				<DetectChar attribute='Span Delimiter' context='span-cue' char='['/>
				<DetectChar attribute='Macro Delimiter' context='macro' char='{'/>
			</context>

			<context name='span' attribute='Styled Text' lineEndContext='#pop'>
				<IncludeRules context='text'/>
				<DetectChar attribute='Span Delimiter' context='#pop' char=']'/>
			</context>

			<context name='macro' attribute='Macro' lineEndContext='#pop'>
				<DetectSpaces context='#pop!macro-body'/>
				<DetectChar attribute='Macro Delimiter' char='}' context='#pop'/>
			</context>

			<context name='macro-body' attribute='Styled Text' lineEndContext='#pop'>
				<RegExpr String='\\.' attribute='Escaped Char'/>
				<DetectChar attribute='Field Delimiter' char='|'/>
				<DetectChar attribute='Macro Delimiter' char='}' context='#pop'/>
				<IncludeRules context='span'/>
			</context>

			<context name='span-emph' attribute='Emphatic Text' lineEndContext='#pop'>
				<IncludeRules context='span'/>
			</context>

			<context name='span-strong' attribute='Strong Text' lineEndContext='#pop'>
				<IncludeRules context='span'/>
			</context>

			<context name='span-del' attribute='Deleted Text' lineEndContext='#pop'>
				<IncludeRules context='span'/>
			</context>

			<context name='span-cue' attribute='Span Cue' lineEndContext='#pop'>
				<StringDetect attribute='Span Cue' String='$\' context='#pop!flat-span' />

				<DetectChar   attribute='Span Cue' char='!' context='#pop!span-emph' />
				<DetectChar   attribute='Span Cue' char='*' context='#pop!span-strong' />
				<DetectChar   attribute='Span Cue' char='~' context='#pop!span-del' />

				<AnyChar      attribute='Span Cue' String='$+🔒' context='#pop!span' />
				<StringDetect attribute='Span Cue' String='→' context='#pop!ref' />
				<StringDetect attribute='Span Cue' String='🔗' context='#pop!ref' />
				<DetectChar   attribute='Span Cue' char='>' context='#pop!ref' />
				<DetectChar   attribute='Span Cue' char='&amp;' context='#pop!ref' />
				<DetectChar   attribute='Span Cue' char='#' context='#pop!var-ref' />
				<DetectChar   attribute='Span Cue' char='\' context='#pop!flat-span' />
			</context>

			<context name='flat-span' attribute='Unstyled Text' lineEndContext='#pop'>
				<Detect2Chars attribute='Escaped Char' context='#stay' char='\' char1=']'/>
				<DetectChar attribute='Span Delimiter' context='#pop' char=']'/>
			</context>

			<context name='ref' attribute='Reference' lineEndContext='#pop'>
				<DetectSpaces context='#pop!span'/>
			</context>

			<context name='var-ref' attribute='Reference' lineEndContext='#pop'>
				<WordDetect String="cortav" attribute='Standard Namespace'/>
				<WordDetect String="env" attribute='Standard Namespace'/>
				<DetectChar attribute='Span Delimiter' context='#pop' char=']'/>
			</context>

			<context name='literal-block-cue' attribute='Literal Block Cue' lineEndContext='#pop!literal-block'>
				<RegExpr String='\[[^\]]+\]' attribute='External Reference'/>
				<RegExpr String='#[^\s]+' attribute='Identifier'/>
				<RegExpr String='~~~$' attribute='Literal Block Cue'/>
				<RegExpr String='[^\s]+' attribute='Header'/>
			</context>
			<context name='literal-block' attribute='Literal Block' lineEndContext='#stay'>
				<RegExpr String='~~~$' attribute='Literal Block Cue' firstNonSpace='true' context='#pop'/>
			</context>
		</contexts>
		<itemDatas>
			<itemData name='Normal Text' defStyleNum='dsNormal'/>
			<itemData name='Styled Text' defStyleNum='dsNormal'/>
			<itemData name='Emphatic Text' defStyleNum='dsNormal' italic='true'/>
			<itemData name='Strong Text' defStyleNum='dsNormal' bold='true'/>
			<itemData name='Deleted Text' defStyleNum='dsNormal' strikeout='true'/>
				
			<itemData name='Section Cue' defStyleNum='dsKeyword' bold='true'/>
			<itemData name='Header' defStyleNum='dsControlFlow' underline='true'/>
			<itemData name='Identifier' defStyleNum='dsVariable'/>

			<itemData name='Unstyled Text' defStyleNum='dsVerbatimString'/>
			<itemData name='Escaped Char' defStyleNum='dsSpecialChar'/>
			<itemData name='Reference' defStyleNum='dsControlFlow' underline='true'/>
			<itemData name='Span Cue' defStyleNum='dsKeyword' bold='true'/>
			<itemData name='Span Delimiter' defStyleNum='dsKeyword'/>
			<itemData name='Directive' defStyleNum='dsAttribute' bold='true'/>
			<itemData name='Directive Cue' defStyleNum='dsAttribute'/>
			<itemData name='Critical Directive Cue' defStyleNum='dsImport' bold='true'/>
			<itemData name='Extension Directive' defStyleNum='dsImport' bold='true'/>
			<itemData name='Renderer Directive' defStyleNum='dsExtension' bold='true'/>
			<itemData name='Standard Namespace' defStyleNum='dsBuiltIn' bold='true'/>
			<itemData name='Comment' defStyleNum='dsComment'/>
			<itemData name='Macro' defStyleNum='dsPreprocessor' bold='true'/>
			<itemData name='Macro Delimiter' defStyleNum='dsPreprocessor'/>
			<itemData name='Field Delimiter' defStyleNum='dsPreprocessor' bold='true'/>
			<itemData name='List' defStyleNum='dsOperator'/>

			<itemData name='Literal Block' defStyleNum='dsSpecialString'/>
			<itemData name='Literal Block Cue' defStyleNum='dsPreprocessor' bold='true'/>

			<itemData name='External Reference' defStyleNum='dsImport'/>
		</itemDatas>
	</highlighting>
	<general>
		<comments>
			<comment name='singleLine' start='%%' />
		</comments>
		<keywords weakDeliminator='-+:/' />
	</general>
</language>

cortav Check-in [87fed4ec34]