parsav  parsav.md at [576487f566]

File parsav.md artifact d83ecc5b3e part of check-in 576487f566


parsav

parsav is a lightweight social media server written in terra, intended to integrate to some degree with the fediverse. it is named for the Ranuir words par "speech, communication" and sav "unity, togetherness, solidarity".

backends

parsav is designed to be storage-agnostic, and can draw data from multiple backends at a time. backends can be enabled or disabled at compile time to avoid unnecessary dependencies.

  • postgresql (backend pgsql)

dependencies

  • runtime
    • mongoose
    • json-c
    • mbedtls
    • postgresql backend:
      • postgresql-libs
  • compile-time
    • cmark (commonmark implementation), for transformation of the help files, whose source is in commonmark. online documentation transforms these into html and embeds them in the binary; cmark is also used to to produce the troff source which is used to build the offline documentation. disable with parsav_online_documentation=no parsav_offline_documentation=no
    • troff implementation (tested with groff but as far as i know we don't need any groff-specific extensions) to produce PDFs and manpages from the cmark-generated intermediate forms. disable with parsav_offline_documentation=no

additional preconfigure dependencies are necessary if you are building directly from trunk, rather than from a release tarball that includes certain build artifacts which need to be embedded in the binary:

  • inkscape, for rendering out some of the UI graphics that can't be represented with standard svg
  • cwebp (libwebp package), for transforming inkscape PNGs to webp
  • sassc, for compiling the SCSS stylesheet into its final CSS

all builds require terra, which, unfortunately, requires installing an older version of llvm, v9 at the latest (which i develop parsav under). with any luck, your distro will be clever enough to package terra and its dependencies properly (it's trivial on nix, tho you'll need to tweak the terra expression to select a more recent llvm package if you want v9; this isn't necessary to successfully build parsav however); Arch Linux is one of those distros which is not so clever, and whose (AUR) terra package is totally broken. due to these unfortunate circumstances, terra is distributed not just in source form, but also in the the form of LLVM IR and x86-64 assembly + object code.

i've noticed that terra (at least with llvm 6 and 9) seems to get a bit cantankerous and trigger llvm to fail with bizarre errors when you try to cross-compile parsav from x86-64 to any other platform, even x86-32. i don't know if this problem exists on other architectures or in what form. as a workaround, i've tried generating LLVM IR (putatively for x86-64, though this is an ostensibly architecture-independent language), and then compiling that down to an object file with llc. it doesn't work. the generated binaries seem to run but they crash with bizarre errors and are impossible to debug, as llc refuses to include debug symbols. for these reasons, parsav will (almost certainly) not run on any architecture besides x86-64, at least until terra and/or llvm are fixed. there is a very small possibility however that compiling natively on an ARM or x86-32 host might succeed. if you can pull it off, please let me know and i'll update the docs.

also note that, while parsav has a flag to build with ASAN, ASAN has proven unusable for most purposes as it routinely reports false positive buffer-heap-overflows. if you figure out how to defuckulate this, i will be overjoyed.

building

first, either install any missing dependencies as shared libraries, or build them as static libraries with the command make dep.$LIBRARY. as a shortcut, make dep will build all dependencies as static libraries. note that if the build system finds a static version of a library in the lib/ folder, it will use that instead of any system library. note that these commands require GNU make (it may be installed as gmake on your system), although this is a fairly soft dependency -- if you really need to build it on BSD make, you can probably translate it with a minute or so of work; you'll just have to do some of the various gmake functions' work manually. this may be worthwhile if you're packaging for a BSD.

postgresql-libs must be installed systemwide, as parsav does not currently provide for statically compiling and linking it

if you use nixos and wish to build the pdf documentation, you're going to have to do a bit of extra work (but you're used to that, aren't you). for some incomprehensible reason, the groff package on nix is split up, seemingly randomly, with many crucial output devices relegated to the "perl" output of the package, which is not installed by default (and nix-env -iA nixos.groff.perl doesn't work either; i don't know why either). you'll have to instantiate and install the outputs directly by path, e.g. nix-env -i /nix/store/*groff*/ to get everything you need into your profile. alas, the battle is not over: you also need to change the environment variables GROFF_FONT_PATH and GROFF_TMAC_PATH to point at the font and tmac subdirs of ~/.nix-profile/share/groff/$groff_version/. once this is done, invoking groff -Tpdf will work as expected.

unfortunately, the produced daemon binary is rather large, weighing in around 600K at the time of writing. you can reduce this significantly however by stripping the binary, and reduce it further by compiling without debug functionality turned on (i.e. no debug symbols and no debug log level, both of which insert a large number of strings into the resulting object code).

configuration

the parsav configuration is comprised of two components: the backends list and the config store. the backends list is a simple text file that tells parsav which data sources to draw from. the config store is a key-value store which contains the rest of the server's configuration, and is loaded from the backends. the configuration store can be spread across the backends; backends will be checked for configuration keys according to the order in which they are listed. changes to the configuration store affect parsav in real time; you only need to restart the server if you make a change to the backend list.

you can directly modify the store from the command line with the parsav conf command; see parsav conf -h for more information.

by default, parsav looks for a file called backend.conf in the current directory when it is launched. you can override this default with the parsav_backend_file environment or with the -b/--backend-file flag. backend.conf lists one backend per line, in the form id type confstring. for instance, if you had two postgresql databases, you might write a backend file like

master   pgsql   host=localhost dbname=parsav
tweets   pgsql   host=420.69.dread.cloud dbname=content

the form the configuration string takes depends on the specific backend. for postgres, it's just the standard postgres connection string, and supports all the usual properties, as it's passed directly to the client library unmodified.

once you've set up a backend and confirmed parsav can connect succesfully to it, you can initialize the database with the command parsav db init <domain>, where <domain> is the name of the domain name you will be hosting parsav from. this will install all necessary structures and functions in the target and create all necessary files. it will not, however, create any users. you can create an initial administrative user with the parsav mkroot <handle> command, where <handle> is the handle you want to use on the server. this will also assign a temporary password for the user if possible. you should now be able to log in and administer the server.

if something goes awry with your administrative account, don't fret! you can get your powers themselves back with the command parsav user <handle> grant all, and if you're having difficulties logging in, the command parsav user <handle> auth pw reset will give you a fresh password. if all else fails, you can always run mkroot again to create a new root account, and try to repair the damage from there.

by default, parsav binds to [::1]:10917. if you want to change this (to run it on a different port, or make it directly accessible to other servers on the network), you can use the command parsav conf set bind <address>, where address is a binding specification like 0.0.0.0:80. it is recommended, however, that parsavd be kept accessible only from localhost, and that connections be forwarded to it from nginx, haproxy, or a similar reverse proxy. (this can also be changed with the online configuration UI)

postgresql backend

a database will need to be created for parsav's use before parsav db init will work. this can be accomplished with a command like $ createdb parsav. you'll also of course need to set up some way for parsavd to authenticate itself to postgres. peer auth is the most secure option, and this is what you should use if postgres and parsavd are running on the same box. specify the database name to the backend the usual way, with a clause like dbname=parsav in your connection string.

the postgresql backend has some extra features that enable it to be integrated with existing authentication databases you may have. when you initialize the database, a table parsav_auth will be created to hold the credentials of the instance users and the authentication mode will be set to "managed", which will enable parsav's built-in credential administration tools. if you would prefer to use your own source of credentials, you'll need to set parsav to "unmanaged" mode with the command parsav be pgsql setup-auth unmanaged.

this command will reconfigure parsav and remove the parsav_auth table, making room for you to create a view with the same name. if you want to go back to managed mode at any time, just run parsav be psql setup-auth managed; just be aware that this will delete your auth view!

parsav_auth has the following schema:

create table parsav_auth (
	aid bigint primary key,
	uid bigint,
	newname text,
	kind text not null,
	cred bytea not null,
	restrict text[],
	netmask cidr,
	blacklist bool
)

aid is a unique value identifying the authentication method. it must be deterministic -- values based on time of creation or a hash of uid+kind+cred are ideal. uid is the identifier of the user the row specifies credentials for. kind is a string indicating the credential type, and cred is the content of that credential.for the meaning of these fields and use of this structure, see authentication below.

in the most basic case, an authentication record would be something like {uid = 123, kind = "pw-sha512", cred = "\x12bf90…a10e"::bytea}. but parsav is not restricted to username-password authentication, and in addition to various hashing styles, it also will support more esoteric forms of authentcation. any individual user can have as many auth rows as she likes. there is also a restrict field, which is normally null, but can be specified in order to restrict a particular credential to certain operations, such as posting tweets or updating a bio. blacklist indicates that any attempt to authenticate that matches this row will be denied, regardless of whether it matches other rows. if netmask is present, this authentication will only succeed if it comes from the specified IP mask.

uid can also be 0 (emphatically not null, which causes the rule to match any user!), indicating that there is not yet a record in parsav_actors for this account. if this is the case, name must contain the handle of the account to be created when someone attempts to log in with this credential. whether name is used in the authentication process depends on whether the authentication method accepts a username. all rows with the same uid must have the same name.

invoking

the build process generates two binaries, parsav and parsavd. parsav is a driver tool that can be used to set up and start a parsav instance, as well as administer it from the command line. it accesses databases directly and uses the same backend configuration file as parsav, but can also send IPC messages directly to running parsavd instances.

as a convenience, the parsav start command can be used to start and daemonize a parsav instance. additionally, the -l option to parsav start can be used to redirect parsavd's logging output to a file; without -l, logging output will be discarded and can be viewed only by connecting to the running instance with parsav attach. parsav start passes its arguments on to parsavd; you can use this to pass options by separating parsav's arguments from parsavd's with --. if you launch an instance with parsav start -- -i chungus, you can then stop that instance with parsav -i chungus stop. parsav stop can be used on its own if only one parsavd instance is running; otherwise, parsav -a stop will cleanly terminate all running instances.

you generally should not invoke parsavd directly except for debugging purposes, or in the context of an init daemon (particularly systemd). if you launch parsavd directly it will not fork to the background.

authentication

below is a full list of authentication types we intend/hope to one day support. contributors should consider this a to-do list. a checked box indicates the scheme has been implemented.

  • ☑ pw-sha{512,384,256,224}: an ordinary password, hashed with the appropriate algorithm
  • ☐ pw-{sha1,md5,clear} (insecure, must be manually enabled at compile time with the config variable parsav_let_me_be_a_dumbass="i know what i'm doing")
  • ☐ pw-pbkdf2-hmac-sha{…}: a password hashed with the Password-Based Key Derivation Function 2 instead of plain SHA2
  • ☐ pw-extern-ldap: try to authenticate by binding against an LDAP server
  • ☐ pw-extern-cyrus: try to authenticate against saslauthd
  • ☐ pw-extern-dovecot: try to authenticate against a dovecot SASL socket
  • ☐ pw-extern-krb5: abuse MIT kerberos as a password verifier
  • ☐ pw-extern-imap: abuse an email server as a password verifier
  • (extra credit) ☐ pw-extern-radius: verify a user against a radius server
  • ☐ http-oauth: automatically created when a user grants access to an oauth application, consisting of a series of TLVs. these generally should not be created or fiddled with manually
  • ☐ http-gssapi: log in with a kerberos principle through the http-authenticate "negotiate" mechanism. do any browsers actually support this??
  • ☐ http-extern-header: a value of H=V where H is a header passed by an app server such as nginx, and V is the required value. could be used to e.g. tie parsav into an existing client certificate verification infrastructure with minimal effort.
  • ☐ http-extern-header: a value of H=V where H is a header passed by an app server such as nginx, and V is the required value. could be used to tie parsav into an existing client certificate verification infrastructure with minimal effort.
  • ☐ api-digest-sha{…}: a value that can be hashed with the current epoch to derive a temporary access key without logging in. these are used for API calls, sent in the header X-API-Key.
  • ☐ api-token-sha{…}: a password (ideally a very long, randomly generated one) that can be sent in the headers to automatically authenticate the user. far less secure than api-digest-*!
  • ☐ otp-time-sha1: a TOTP PSK: the first two bytes represent the step, the third byte the OTP length, and the remaining ten bytes the secret key
  • ☐ tls-cert-fp: a fingerprint of a client certificate
  • ☐ tls-cert-ca: a value of the form fp/key=value where a client certificate with the property key=value (e.g. uid=cyberlord19) signed by a certificate authority matching the given fingerprint fp can authenticate the user
  • ☐ challenge-rsa: an RSA public key. the user is presented with a challenge and must sign it with the corresponding private key using any one of the supported hash algorithms, ideally SHA512 or -256.
  • ☐ challenge-ecc a Curve25519 public key. the user is presented with a challenge and must sign it with a supported hash algorithm
  • ☐ challenge-ecc448: a Curve448 public key. the user is presented with a challenge and must sign it with the corresponding private key using a supported hash algorithm.
  • ☑ trust: authentication always succeeds (or fails, if blacklisted). only use in combination with netmask!!!

we should also look into support for various kinds of hardware auth. we already have TPM support through RSA auth, but external devices like security keys should be supported as well.

legal

parsav is released under the terms of the EUPL v1.2. copies of this license are included in the repository. by contributing any intellectual property to this project, you reassign ownership and all attendant rights over that intellectual property to the current maintainer. this is to ensure that the project can be relicensed without difficulty in the unlikely event that it is necessary.

code of conduct

when hacking on parsav, it is absolutely mandatory to wear a wizard hat and burgundy silk summoning cloak. this code of conduct is enforced capriciously by the Fair Folk, and violations are punishable by dancing hex.

future direction

parsav needs more storage backends, as it currently supports only postgres. some possibilities, in order of priority, are:

  • plain text/filesystem storage
  • lmdb
  • sqlite3
  • generic odbc
  • lua
  • ldap for auth (and maybe actors?)
  • cdb (for static content, maybe? does this make sense?)
  • mariadb/mysql
  • the various nosql horrors, e.g. redis, mongo, and so on

parsav urgently needs an internationalization framework as well. right now everything is just hardcoded in english. yuck.

parsav could be significantly improved by adjusting its memory management strategy. instead of allocating everything with lib.mem.heapa (which currently maps to malloc on all platforms), we can allocate a static buffer for the server overlord object which can simply be cleared and re-used for each http request, and enlarged with realloc when necessary. the entire region could be mlocked for better performance, and it would no longer be necessary to track and free memory, as the entire buffer would simply be discarded after use (similar to PHP's original memory management strategy). this would remove possibly the largest source of latency in the codebase, as parsav is regrettably quite heavy on malloc, performing numerous allocations for each page rendered. update: this is now in progress, and much of the UI code has been converted; the database code will also need to be converted, however, and this will be too time-consuming to be worth tackling any time soon. new functions should be written to use the memory pooling strategy, however.