Cyber

About This Site


There is good reason to create a semi-permanent residence on the web which you maintain some sovereignty over, though it is more effortful than outsourcing publishing to a social media platform. This site is published using a growing raft of optimisations to make updating and adding to the site easy. I’m not a programmer and I don’t know how to code, but I have made a few simple and inelegant tools work that streamline the production of this site.

Programs used to produce this site

  • Obsidian (writing, staging, organising notes)
    • Deployment folder, includes plaintext markdown body of each live page
    • Staging folder, includes unfinished pages and fragments
  • Pandoc (templating and conversion from markdown -> HTML)
    • Template files for various unique pages on the site (such as the Vault or the Kitchen), as well as generic template files for pages like this one
    • Commands with specific options chosen for each page, stored in shell script files to be called on by simple command from the terminal.
  • MacOS Terminal (running scripts to automate production)
    • Scripts to automate various processes– stage site, publish site, update vault contents, etc
  • Git (pushing changes to serverside repository)
    • YAML metadata hook file stored in remote directory serverside to send all files onward to public_html
  • Sublime Text (HTML/CSS editing, writing template files)

Rendering Vault with zsh Script and Pandoc

The Vault page of this site is spawned from a folder of PDFs in the site’s root directory. It doesn’t require me to manually add links to each file, and I can deal exclusively with files on my desktop, and have them made accessible via the vault automatically at every git push.

In brief, this script takes a list of the filenames from a local directory and converts them to valid HTML links, then converts that list to a freestanding html file linked to this site’s stylesheet, resulting in the vault webpage that is served.

Directory Structure

  • 📂 desktop
    • 📂 site-root (git repo)
      • HTML pages
      • 📂 vault-contents
        • file1.pdf
        • file2.pdf
        • file3.pdf
      • 📂 images
        • image1.jpg
        • image2.jpg
        • image3.jpg
      • stylesheet.css
      • meta.yaml

Script:

cd ~/desktop/site-root/vault-contents && ls > ~/desktop/site-root/vault1.txt && cd ~/desktop/site-root/vault-contents && ls > ~/desktop/site-root/vault2.txt && sed -i '' 's,$," class="vault_item">,g' ~/desktop/site-root/vault1.txt ; sed -i '' 's,^,<a href="vault-contents/,g' ~/desktop/site-root/vault1.txt && sed -i '' 's,$,</a>,g' ~/desktop/site-root/vault2.txt && cd ~/desktop/site-root && paste -d '\0' vault1.txt vault2.txt > vault_nostyle.html && rm vault1.txt vault2.txt && pandoc -f html -t html vault_nostyle.html -o vault.html --template=vault-template.html

Breakdown:

Open the directory of PDFs that forms the source material of the vault:

cd ~/desktop/site-root/vault-contents

list filenames of folder contents and echo them into a text file called vault1.txt (in the directory level above):

ls > ~/desktop/site-root/vault1.txt

change back to the original directory:

cd ~/desktop/site-root/vault-contents

list the contents of the folder again, into another .txt file in the same directory as the first. You now have 2 identical text files which are just lists of the contents of the directory with file extensions, 1 to each newline.

ls > ~/desktop/site-root/vault2.txt

sed -i '' 's,$," class="vault_item">,g' ~/desktop/site-root/vault1.txt

The purpose of this command is to use sed to append " class="vault_item"> to the end of each new line of vault1.txt.

This is a multi-part command which requires futher breaking decomposition:

Note: it’s maintained by many that sed and regex more generally should never be used to parse an HTML string, since the characters of the string will likely carry a meaning in regex/sed, and be picked up as live input rather than verbatim.

It’s true that I ran into trouble escaping some characters in this string, but fixing this was a matter of using a delimiter which did not also appear in the string (a comma: ,), and so couldn’t interfere.

sed opens the sed utility. -i '' signifies the ‘insert’ option, with a subsequent field to enter a delimiter, which I left empty. Text from later in the command will be inserted into the file, separated by the text between the '' (nothing— there will be no characters wrapping the string).

's,$, signifies the pattern to be searched for as a point to insert the text. The singlequote opens the command, the s signifies that this is a substitution command, and as previously mentioned, the commas are used as delimiters around a $, which is sed’s syntax for the end of a line.

What follows is the text to be inserted at the end of each line: " class="vault_item">

,g' is a comma delimiter, a g-command to signal that sed should replace the sought pattern with the specified string, and a singlequote to close the command off. Lastly, the path of the file to be edited: ~/desktop/site-root/vault1.txt


; sed -i '' 's,^,<a href="vault-contents/,g' ~/desktop/site-root/vault1.txt

This command is similar, preceded by a semicolon to signify a new command. This command appends text to the beginning of every new line, signified in sed syntax by ^. This command will append <a href="vault-contents/ to the beginning of each line.

So the file is now a list of items that look like:

<a href="vault-contents/FILENAME.pdf" class="vault_item">

That is, each item on the list is now a filename wrapped in an HTML <a> tag with the class ‘vault-contents’. This is a class that appears in the CSS file that the final HTML file for the rendered vault, and so the styling of the links can be manipulated remotely from that stylesheet.


This command uses the exact same syntax as the first sed command, but adds </a> to the end of each item in the second text file, vault2.txt.

sed -i '' 's,$,</a>,g' ~/desktop/site-root/vault2.txt


All that is left to do is to complete the <a> tag by putting the two halves together.

Return to the site’s root directory:

cd ~/desktop/site-root

paste -d '\0' vault1.txt vault2.txt > vault_nostyle.html

This command uses the paste utility to merge the two files horizontally, rather than appending a string to the end of a document. This is very handy, since it automates processes that would otherwise require copying small parts of large documents to the end of every line.

In short, this paste command adds the list of items formatted as <a href="vault-contents/FILENAME.pdf" class="vault_item"> to the list formatted as FILENAME.PDF</a>, forming a list of full <a> links with clickable text.

Paste syntax is relatively simple: after calling the application, -d chooses a delimiter, with \0 being a paste-friendly way to escape a null input and choose an empty delimiter— ie. there will be no space between the pasted lines. Then the filenames to be pasted are called, and a > command to chain the output into a new HTML file: vault_nostyle.html— a list of live links to each item in the vault.


Remove the two partial files to clean up

rm vault1.txt vault2.txt


pandoc -f html -t html vault_nostyle.html -o vault.html --template=vault-template.html

Use pandoc to convert the list of links into a freestanding HTML document, based on the template vault-template. Pandoc is an extremely handy tool which I use often to pass plaintext files through a template to form a webpage.

The template is a local HTML file that includes {variables} between curly brackets, which pandoc populates with data from the input file. That means that I can edit a plaintext document, dealing with no HTML syntax, and still output a freestanding webpage with one command.

Everything on the vault page that is not an unstyled list of links is rendered by this template.

-f signifies the format to convert from, -t the target format, then the file to be converted, and the output file signposted by the -o option, and the template to use, vault-template.html.


Rendering Pages from Plaintext with Pandoc

Bounties and Future Projects

Rendering a Tweet Archive as a webpage

  • Extracting only tweet data from archive file
  • Tweets as auto-ID’d div-blocks filterable by topic

Hand-Writing Content for a Webpage

Elements:

  • OCR handwriting recognition
  • Conversion to markdown
  • Recognition of plaintext syntax (modified for handwriting?)
  • Scanning with phone camera
  • Symbol system

Example process:

Handwrite article. Scan page with app on phone, which automatically sends as PDF to specified folder on PC. When new files appear in this folder, OCR API is called automatically from command line to extract plain text from PDF, save to .txt file with title extracted from text using (markdown? yaml?) title signifier. With no title specified, date is default. txt file is passed thru pandoc (or other) template and output into new HTML file in specified folder within git repo.

Index of posts can be created by a similar script as furnishes the Vault: a list of directory contents is wrapped in <a> tags and placed into a file to be used as a {variable} by pandoc or jekyll/liquid.

Note: it’s possible to get access to OCR API free for personal/small scale use.

STEAL button

Automator quick action / shell script, lift plaintext and assets of an article, re-host it styled with your own CSS, add a link on a specified page, add backlinked subtitle to stolen page: “this page originally appeared at (link)”

Download directory as PDF

Archives of the greatest posters

Automatically maintained ‘Latest Articles/Posts’ Section on homepage

shell script lists names of all html files in directory and wraps them in tags/classes and runs them through pandoc templates, links to and turns into styled, page is linked on index page, or mayhaps code is run to edit index directly