Notes

Peer-to-peer Connection with WebRTC in Rust Using webrtc-rs

2024-07-20T00:00:00.00-05:00

This is a guide for using webrtc-rs to create p2p connections that can go through NAT in Rust. This should be useful for anyone that wants to create a p2p/distributed program. I assume the reader knows about STUN, ICE¹, websocket. I’ll brefly explain how WebRTC works. Reader also needs to know PEM, DER, X.509, and PKI in general, for the security side of things.

I’m not an expert on WebRTC, just some dude that needs p2p and spent some time figuring out how to use webrt-rs for it; so if you spot some mistake, please do correct me!

Yuo can refer to this post: NAT traversal: STUN, TURN, ICE, what do they actually do?

Overall structure

There are several parts in the system. First there is a bunch of p2p programs that want to connect to each other, let’s call them peers. Then there needs to be a public-facing server that every peer can connect to. You’ll need to write this server and host it yourself. When peer A and B wants to connect to each other, they both send a request to the public server (let’s call it S); S will relay information between A and B, until A and B successfully establish a UDP “connection”². Finally, you need some STUN servers and maybe even TURN servers. There are plenty of free public STUN servers (Google, Cloudflare, etc hosts a bunch of them). On the other hand, free public TURN server is basically unheard of, since they’re so easy to abuse.

Most of the time people use UDP for NAT traversal, it’s rare to see TCP connections: it’s more difficult to establish through NAT, and only used when the firewall blocks UDP.

WebRTC

The main purpose of WebRTC is for video conferencing and VoIP on browsers; transfering arbitrary data requires much less hassle. So we really don’t need most of the WebRTC that deals with video codec and media channels. On top of that, WebRTC isn’t really a single protocol, but rather a bunch of revived protocols plus a spec defining how to use these protocols together. The Rust crate, webrtc-rs, implements each underlining protocol (SCTP, DTLS, STUN, ICE, ...) in separate crates, plus a webrtc glue layer. So it’s possible to only use the underlining crates and ignore the WebRTC layer altogether.

Technically, WebRTC already has what we want—data channel. It’s convenient if you’re using WebRTC in browsers with the Javascript API. But for us, it’s simpler to use the underlying protocol directly instead of going through WebRTC; it gives us more control over the process too.

The stack of WebRTC looks roughly like this:

Protocol	Description
WebRTC	Application
SCTP	Congestion and flow control
DTLS	Security
ICE	NAT traversal
UDP	Transport

Technically, DTLS (think of TLS for UDP) should run on top of SCTP (think of TCP Pro Max), right? But WebRTC uses them the other way around. Probably because SCTP provides a much nicer abstraction than DTLS? Anyway, the designers explained it in detail here: RFC 8831.

Now, here’s how WebRTC establish a connection between two peers A and B:

A creates a local SDP³ (called offer), send it to B through a third-party channel⁴.
B receives A’s SDP, sets it as the remote SDP, and sends B’s local SDP (called answer) to A. Meanwhile, B starts gathering ICE candidate according to the information in A’s SDP.
A receives B’s SDP, sets it to its remote SDP, and start gathering ICE candidates according to B’s SDP.
While A and B gather ICE candidates, they’ll send the candidates they gathered to each other through the signaling server, and try to establish a (UDP) connection.⁵
Once the connection is established, A and B setup a DTLS connection over it, then a SCTP connection over the DTLS connection.

SDP (Session Description Protocol) is basically a text packet with a bunch of metadata used for establishing the connection, including media codec, ICE information, fingerprints, etc. See SDP Anatomy for more. There’s no need to know the details, because we’re going to use our own kind of SDP.

WebRTC doesn’t specify this third-party, it can be copy-pasting in Message app between two users, email, pidegon, whatever. A common setup is to use a public “signaling server”, that’s our server S.

This is called “trickle ICE”. The alternative is to first gather all the candidates, then try to establish ICE connection. Trickle ICE is much faster and is pretty much the standard practice now.

Authentication

For authentication, A and B each generates a self-signed key, and hash it to get a fingerprint, then put the fingerprint in their SDP. Then they do the ICE exchange, and gets each other’s fingerprint from the SDP. When setting up the DTLS connection, they accpet any key that the other end provides. But after the handshake completes, they verify that the other end’s key matches the fingerprint

The implication here is that A and B must trust the signaling server to deliver their SDP securely.

The format of the fingerprint is specified in RFC 8122 section 5: “A certificate fingerprint is a secure one-way hash of the Distinguished Encoding Rules (DER) form of the certificate.”

Technically many hash functions can be used, but webrtc-rs only supports sha-256; maybe all the browsers and libraries decide to only use sha-256?

For reference, here is how does webrtc-rs hash and validate the fingerprint: validate_fingerprint.

Here’s an example SDP that contains two fingerprints. (a means attribute. See RFC 8866.) The fingerprint is produced by first hasing the DER, then print out each byte in hex, and join them together with colon.

m=image 54111 TCP/TLS t38 c=IN IP4 192.0.2.2
a=setup:passive a=connection:new a=fingerprint:SHA-256 \
12:DF:3E:5D:49:6B:19:E5:7C:AB:4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF: \
3E:5D:49:6B:19:E5:7C:AB:4A:AD a=fingerprint:SHA-1 \
4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB

Code

Knowing how WebRTC works is one thing, knowing how to conjure the right module and function in the library is another thing. It doesn’t help that webrtc-rs is relatively thin on documentation. So this section contains code snippets taken directly from working code, plus reference to my program and webrtc-rs.

Signaling server

This is not a part of WebRTC, but for completeness I’ll brefly explain how I wrote my signaling server. There are many articles online about signaling servers too.

For my signaling server, I used websocket since it allows the client to receive streams from the server, plus it provides a nice text/binary message abstraction, making it nicer to use than TCP. I used tokio-tungstenite for websocket.

When a client (say, A) wants to accept connections from other peers, it sends a Bind message to the signaling server S, along with an id. Then, another client (say, B) can send a Connect message to S that asks to connect to A by its id. B’s Connect message would contain its SDP. S relays B’s Connect message to A, then A sends its SDP via a Connect message to B (relayed by S). Then A and B would start sending each other ICE candidates via Candidate message through S. Finally, A and B establish e2e connection and don’t need S anymore.

My signaling code is in /src/signaling.rs and /src/signaling.

Cargo.toml

Here’s the relevant crates I used and their version:

sha2
=
"0.10.8" pem = "3.0.4" # Make sure the
version of webrtc-util matches the one that's used
by # webrtc-ice, webrtc-sctp, and
webrtc-dtls. webrtc-ice
=
"0.10.0" webrtc-util
=
"0.8.0" webrtc-sctp
=
"0.9.0" webrtc-dtls
=
"0.8.0" # Used by webrtc. bytes
=
"1.4.0" # This is the version used by webrtc-dtls.
rcgen
=
{
version = "0.11.1", features
=
["pem", "x509-parser"]} # This is the version
used by webrtc-dtls. rustls
=
"0.21.10"

ICE

webrtc_ice documentation.

Suppose we have two peer A and B; A wants to accept connection from B. Then A is the server in this situation, and B is the client. In the same time, both A and B are clients of the signaling server S. To avoid confusion, let’s call A the p2p server, B the p2p client, and call A & B the signaling client.

To start establishing an ICE connection, we need to create an agent (source):

use std::sync::Arc;
use
webrtc_ice::agent::agent_config::AgentConfig;
use
webrtc_ice::agent::Agent;
use
webrtc_ice::network_type::NetworkType;
use
webrtc_ice::udp_network::{EphemeralUDP, UDPNetwork};
use
webrtc_ice::url::Url;
let
mut
config = AgentConfig::default();
// "Controlling" should be true for the initiator
(p2p client), false // for the
acceptor (p2p server). config.is_controlling
=
false; config.network_types
=
vec![NetworkType::Udp4];
config.udp_network
=
UDPNetwork::Ephemeral(EphemeralUDP::default());
// A list of public STUN servers.
config.urls
=
vec![  Url::parse_url("stun:stun1.l.google.com:19302").unwrap(),
 Url::parse_url("stun:stun2.l.google.com:19302").unwrap(),
 Url::parse_url("stun:stun3.l.google.com:19302").unwrap(),
 Url::parse_url("stun:stun4.l.google.com:19302").unwrap(),
 Url::parse_url("stun:stun.nextcloud.com:443").unwrap(),
 Url::parse_url("stun:stun.relay.metered.ca:80").unwrap(),
]; let
agent = Arc::new(Agent::new(config));

If we were to use WebRTC’s glue layer, we would create a SDP and set two ICE attributes in it: ufrag and pwd. But since we aren’t using WebRTC’s glue layer, we just need to get ufrag and pwd from the ICE agent, serialize it, and send it through the signaling server. This will be our version of the SDP.

Our “SDP-at-home” also needs to include the fingerprint. Technically this fingerprint can be in any format you wish, but I decided to just follow WebRTC’s spec—hash the DER version of the public key. Here’s my hash function (source):

use sha2::{Digest, Sha256};
/// Hash the binary DER file and return the hash
in fingerprint /// format: each byte
in uppercase hex, separated by colons. pub fn hash_der(der: &[u8]) ->
String {  let
hash = Sha256::digest(der);  // Separate each byte with colon like webrtc
does.  let
bytes: Vec<String>
=
hash.iter().map(|x| format!("{x:02x}")).collect();
 bytes.join(":").to_uppercase() }
let
fingerprint = hash_der(key_der);
let
(ufrag, pwd) = agent.get_local_user_credentials().await;
// Then serialize them and send them over the
signaling server.

My hash function is mostly the same as the hash function in webrtc-rs.

Now assume both A and B have their own local SDP (ufrag, pwd, and fingerprint), and received each other’s SDP. The next step is to exchange ICE candidates.

To send out candidates, we register a callback function on agent, like so (source):

agent.on_candidate(Box::new(move
|candidate| {  if let
Some(candidate) = candidate
{ 
let candidate
=
candidate.marshal();
 tokio::spawn(async
move {  // Send out candidate
through the signaling server. 
});  }  Box::pin(async
{}) })) // And start
gathering candidates, once the agent got a candidate,
// it’ll invoke the on_candidate callback and our
code will send // it
out. agent.gather_candidates()?;

On the other side, we want to receive ICE candidates from the signaling server and feed them into agent (source):

while
let Some(candidate) = (receive
candidate from
signaling server) {  let
candidate = unmarshal_candidate(&candidate)?;  let
candidate: Arc<dyn Candidate
+
Send + Sync>
=
Arc::new(candidate);
 agent.add_remote_candidate(&candidate)?; }

While gathering and exchanging candidate run in the background, we block on agent.accept() (source) or agent.dial() (source) to get our connection:

// For p2p server A: let
ice_conn = agent.accept(cancel_rx, ufrag, pwd);
// For p2p client B: let
ice_conn = agent.dial(cancel_rx, ufrag, pwd);

DTLS

webrtc_dtls documentation.

Now we need to setup a DTLS connection from the ICE connection, and verify the fingerprint.
To create a DTLS connection, we need to pass it the key we used to generate the fingerprint earlier. Suppose variable key_der: u8[] contains the key in DER format, we create the certificate that webrtc_dtls accepts (source):

let
dtls_cert = webrtc_dtls::crypto::Certificate
{ 
certificate: vec![rustls::Certificate(key_der)],
 private_key: webrtc_dtls::crypto::CryptoPrivateKey::from_key_pair(  &rcgen::KeyPair::from_der(key_der).unwrap(),
 )
 .unwrap(),
};

Then create the DTLS connection. For p2p server, do this (source):

let
config = webrtc_dtls::config::Config
{ 
certificates: vec![dtls_cert],
 client_auth: webrtc_dtls::config::ClientAuthType::RequireAnyClientCert,  // We accept any certificate, and then verifies the
provided  // certificate with the cert we got from signaling
server.  insecure_skip_verify: true,
 ..Default::default()
}; // Pass
false for p2p server. let
dtls_conn = DTLSConn::new(ice_conn, config, false, None).await?;

For p2p client, do this (source):

let
config = webrtc_dtls::config::Config
{ 
certificates: vec![dtls_cert],
 // We accept
any certificate, and then verifies the provided  // certificate with the
cert we got from signaling server. 
insecure_skip_verify: true,
 ..Default::default()
}; // Pass true
for p2p client. let
dtls_conn = DTLSConn::new(ice_conn, config, true, None).await?;

Next, on both p2p server and p2p client, verify the peer certificate of the DTLS connection matches the fingerprint we received from the signaling server (we got it along with ufrag and pwd in the SDP) (source).

let
certs = dtls_conn.connection_state().await.peer_certificates; if certs.is_empty() {  // Throw error. } // hash_der is shown
in the previous section. let
peer_cert_hash_from_dtls
=
hash_der(&certs[0]);; if peer_cert_hash_from_dtls
!=
cert_hash_from_signaling_server {  // Throw
error. }

SCTP

webrtc_sctp documentation.

We’re getting there! The final step is to setup SCTP connection (source):

use webrtc_sctp::association;
let
assoc_config = association::Config
{ 
net_conn: dtls_conn,
 name: "whatever".to_string(),
}; // For p2p
server: let assoc_server
=
association::Association::server(assoc_config).await?; sctp_connection.accept_stream().await
// For p2p client: let
assoc_client = association::Association::client(assoc_config).await?; // The stream identifier can be anything (here I used
1). sctp_conn.open_stream(1, PayloadProtocolIdentifier::Binary).await?

Conclusion

That’s it! Now we have a binary stream between two peers. While setting everything up, it helps to go one layer at a time, verify it works, and add the next layer. It also helps to first set it up without authentication, then add the key verification step.

Appendix A, a rcgen pitfall

Because I fell into this trap using rcgen and spent two whole nights scratching my head, I want to call it out so readers can avoid it.

Say that you want to generate a certificate and pass it around your program. The intuitive way is to create a rcgen::Certificate, pass it around, and call rcgen::Certificate::serialize_der every time you need a DER, right? But actually, every time you call serialize_der, rather than just serializing the certificate, it generates a new certificate. Put it another way, every time you call serialize_der, it returns a different value.

So the correct way to generate a certificate and pass it around is to create a rcgen::Certificate, call serialize_der to get the DER, and pass the DER around. If you need to use the certificate in another format, just parse the DER.

Here’s an GitHub issue discussing it: Issue#62.

Classic Systems Papers: Notes for CSE 221

2024-01-05T19:45:00.00-05:00

During my time at UCSD, I enjoyed their systems courses greatly. It’d be a shame to let those wonderful things I learnt fade away from my memory. So I compiled my notes into this article. I hope this can be helpful to future me and entertaining for others.

CSE 221 is the entry course, introducing students to reading papers and the essential systems papers. The cast of papers is pretty stable over the years. Here is a syllabus of a CSE 221 similar to the one I took, you can find links to the papers and extension reading there: CSE 221: Reading List and Schedule, Winter 2021.

THE System

The Structure of the “THE”-Multiprogramming System, 1968, by none other than Edsger W. Dijkstra.

The main take away is “central abstraction in a hierarchy”. The central abstraction is sequential process, and hierarchy is basically “layers”. The benefit of layers is that it’s easy to verify soundness and prove correctness for each individual layer, which is essential to handle complexity.

To Dijkstra, if a designer structures their system well, the possible test cases for the system at each level would be so few such that it’s easy to cover every possible case.

He then mentioned that “industrial software makers” has mixed feelings of this methodology: they agree that it’s the right thing to do, but doubt whether it’s applicable in the real world, away from the shelter of academia. Dijkstra’s stance is that the larger the project, the more essential the structuring. This stance is apparent in his other writings¹.

EWD 1041. Now, I don’t think it is realistic to write proofs for every system you design (and how do you ensure the proof is correct?), but good structuring, and designing with testing in mind are certainly essential.

The downside of layering is, of course, the potential loss of efficiently, due to either the overhead added by layering, or the lack of details hidden by lower layers. For example, the graphic subsystem in win32 was moved into the kernel in NT4, because there were too many boundary crossing.

And sometimes it’s hard to separate the system into layers at all, eg, due to circular dependency, etc. For example, in Linux, memory used by the scheduler is pinned and never page.

We also learned some interesting terminology used at the time; “harmonious cooperation” means no deadlock, and “deadly embrace” means deadlock.

Nucleus

The Nucleus of a Multiprogramming System, 1970.

Basically they want a “nucleus” (small kernal) that supports multiple simultaneous operating system implementations. So the user can have their OS however they want. (Another example of “mechanism instead of policy”, sort of.) This school of thought would later reappear on exokernel and micro kernel.

The nucleus provides a scheduler (for process and I/O), communication (messages passing), and primitive for controlling processes (create, start, stop, remove).

In their design, the parent process is basically the OS of their child processes, controlling allocation of resources for them: starting/stoping them, allocating memory and storage to them, etc.

However, the parent process doesn’t have full control over their children: it doesn’t control scheduling for it’s children. Nucleus handles scheduling; it divides computing time by round-robin scheduling among all active processes.

A more “complete” abstraction would be having nucleus schedule the top-level processes and let those processes schedule their children themselves. Perhaps it would be too inconvenient if you need to implement scheduler for every “OS” you want to run.

HYDRA

HYDRA: The Kernel of a Multiprocessor Operating System, 1974.

The authros have several design goals for HYDRA: a) separation of mechanism and policy; b) reject strict hierarchy layering for access control, because they consider access control more of a mesh than layers; c) an emphasize on protection—not only comprehensive protection, but also flexible protection. They provide protection mechanism that can be used for not only for regular things like I/O, etc, but also arbitrary things that a higher-level program want to protect/control. It surely would be nice if UNIX has something similar to offer.

HYDRA structures protection around capabilities. Capability is basically the right to use some resource—a key to a door. Possessing the capability means you have the right of using whatever resource it references. For example, file descriptors in UNIX are capabilities: when you open a file, the kernel checks if you are allowed to read/write that file, and if the check passes, you get a file descriptor. Then you are free to read/write to that file as long as you hold the file descriptor; no need to go through access checks every time.

In genreal, in an access-controlled OS, there are resources, like data or a device; execution domains, like “execute as this user” or “execute as this group”; and access control, controlling which domain can access which resource.

In HYDRA, there is procedure, LNS, and process. Procedure is a executable program or a subroutine. LNS (local name space) is the execution domain. Conceptually it is a collection of capabilities, it determines what you can and cannot do. Each invocation of a procedure has a LNS attached to it. To explain it in UNIX terms, when a user Alice runs a program ls, the capabilities Alice has is the LNS, and ls is the procedure. Finally, a process is conceptually a (call) stack of procedures with their accompanying LNS.

Since each invocation of procedures have an accompanying LNS, the callee’s LNS could have more or different capabilities from its caller, so HYDRA can support right amplification.

Right amplification is when caller has more privilege/capabilities than the caller. For example, in UNIX, when a program uses a syscall, that syscall executed by the kernel has far more privilege than the caller. For another example, when Alice runs passwd to change her password, that program can modify the password file which Alice has no access to, because passwd has a euid (effective user id) with higher privilege.

Another concept often used in security is ACL (access control list). It’s basically a table recording who has access to what. ACL and capabilities each have their pros and cons. To use an ACL, you need to know the user; with capabilities, anyone with the capability can have access, you don’t need to know the particular user. Capabilities is easier to check, and useful for distributed systems or very large systems, where storing information of all users/entities is not possible.

However, capabilities are unforgettable, ie, you can’t take it back. Maybe you can make them expire, but that’s more complexity. Capabilities can also be duplicated and given away, which has it’s own virtues and vices.

Since ACL is easy to store and manage, and capability is easy to check, they are often used together. In UNIX, opening a file warrens a check in the ACL, and the file descriptor returned to you is a capability.

It’s interesting to think of the access control systems used around us. Windows certainly has a more sophisticated ACL than UNIX. What about Google Docs, eh? On top of the regular features, they also support “accessible through links”, “can comment but not edit”, etc.

TENEX

TENEX, a Paged Time Sharing System for the PDP-10, 1972.

TENEX is the predecessor of MULTICS, which in turn is the predecessor of UNIX. It runs on PDP-10, a machine very popular at the time: used by Harvard, MIT, CMU, to name a few. PDP-10 was manufactured by BBN, a military contractor at the time. It’s micro-coded, meaning its instructions are programmable.

In BBN’s pager, each page is 512 words, the TLB is called “associative register”. Their virtual memory supports 256K words and copy-on-write. A process in TENEX always have exactly one superior (parent) process and any number of inferior (child) processes. Processes communicate through a) sharing memory, b) direct control (parent to child only), and c) pseudo (software simulated) interrupts. Theses are also the only ways of IPC we have today in UNIX. Would be nice if we had message-passing built-in to the OS. But maybe D-Bus is even better, since it can be portable.

TENEX can run binary programs compiled for DEC 10/50, the vendor OS for the PDP-10. All the TENEX syscalls “were implemented with the JSYS instruction, reserving all old monitor [OS/kernel] calls for their previous use”. They also implemented all of the DEC 10/50 syscalls as a compatibility package. The first time a program calls a DEC 10/50 syscall, that package is mapped “to a remote portion of the process address space, an area not usually available on a 10/50 system”.

TENEX uses balanced set scheduling to reduce pagefaults. A balanced set is a set of highest priority processes whose total working set fits in memory. And the working set of a process is the set of pages this process reference.
Guess what is an “executive command language interpreter”? They descried it as “...which provides direct access to a large variety of small, commonly used system functions, and access to and control over all other subsystems and user programs”. It’s a shell!

Some other interesting facts: TENEX supports at most 5 levels in file paths; the paper mentions file extensions; files in TENEX are versioned, a new version is created every time you write to a file, old versions are automatically garbage collected by the system over time; TENEX has five access rights: directory listing, read, write, execute, and append; TENEX also has a debugger residing in the core memory alongside the kernel.

The file operations is the same as in UNIX, opening a file gives you a file descriptor, called JFN (job file number), and you can read or write the file. The effect of the write is seen immediately by readers (so I guess no caching or buffering). They even have “unthawed access”, meaning only one writer is allowed while multiple reader can read from the file at the same time. UNIX really cut a lot of corners, didn’t it?

Their conclusion section is also interesting…

One of the most valuable results of our work was the knowledge we gained of how to organize a hardware/software project of this size. Virtually all of the work on TENEX from initial inception to a usable system was done over a two year period. There were a total of six people principally involved in the design and implementation. An 18 month part-time study, hardware design and implementation culminated in a series of documents which describe in considerable detail each of the important modules of the system. These documents were carefully and closely followed during the actual coding of the system. The first state of coding was completed in 6 months; at this point the system was operating and capable of sustaining use by nonsystem users for work on their individual projects. The key design document, the JSYS Manual (extended machine code), was kept updated by a person who devoted full time to insuring its consistency and coherence; and in retrospect, it is out judgment that this contributed significantly to the overall integrity of the system.
We felt it was extremely important to optimize the size of the tasks and the number of people working on the project. We felt that too many people working on a particular task or too great an overlap of people on separate tasks would result in serious inefficiency. Therefore, tasks given to each person were as large as could reasonably be handled by that person, and insofar as possible, tasks were independent or related in ways that were well defined and documented. We believe that this procedure was a major factor in the demonstrated integrity of the system as well as in the speed with which it was implemented.

MULTICS

Protection and the Control of Information Sharing in Multics, 1974.

The almighty MULTICS, running on the equally powerful Honeywell 6180. There are multiple papers on MULTIC, and this one is about its protection system.

Their design principles are

Permission rather than exclusion (ie, default is no permission)
Check every access to every object²
The design is not secret (ie, security not by obscurity)
Principle of least privilege
Easy to use and understand (human interface) is important to reduce human mistakes

Early-day Linux doesn’t do this, which led to SELinux. It has merged into main Linux long ago.

MULTICS has a concept of descriptor segments. The virtual memory is made of segments, and each segment has a descriptor, which contains access-control information: access right, protection domain, etc. This way, MULTICS can access-control memory. The access check are done by hardware for performance. (Which means MULTICS depends on the hardware and isn’t portable like UNIX).

MULTIC uses an regular ACL for file-access-control. When opening a file, the kernel checks for access rights, creates a segment descriptor, and maps the whole file into virtual memory as a segment. In the paper, the ACL is described as the first level access-control, and the hardware-based access-control the second. Note that in MULTICS, you can’t read a file as a stream: the whole file is mmaped into memory, essentially.

MULTICS also has protected subsystems. It’s a collection of procedure and data that can only be used through designated entry points called “gates” (think of an API). To me, it’s like modules (public/private functions and variables) in programming languages, but in an OS. All subsystems are put in a hierarchy: Every subsystem within a process gets a number, lower-numbered subsystems can use descriptors containing higher-numbered subsystems. And the protection is guaranteed by the hardware. They call it “rings of protection”.

Speaking of rings, x86 supports four ring levels, this is how kernel protects itself from userspace programs. Traditionally userspace is on ring 3 and kernel is on ring 0. Nowadays with virtual machines, the guest OS is put on ring 1.

Protection

Protection, 1974.

This paper by Butler Lampson gave an overview of protection in systems, and introduces a couple useful concepts.

A protection domain is anything that has certain rights to do something and has some protection from other things, eg, kernel or userspace, a process, a user. A lot of words are used to describe it: protection context, environment, state, sphere, capability list, ring, domain. Then there are objects, things needs to be protected. Domains themselves can be objects.

The relationship between domains and objects form a matrix, the access matrix. Each relationship between a domain and an object can be a list of access attributes, like owner, control, call, read, write, etc.

When implementing the access matrix, the system might want to attach the list of accessible object of a domain to that domain. Each element of this list is essentially a capability.

Alternatively, the system can attach a list of domains that can access an object to that object. An object would have a procedure that takes a domain’s name as input and returns its access rights to this object. The domain’s name shouldn’t be forge-able. One idea is to use capability as the domain identifier: a domain would ask the supervisor (kernel) for an identifier (so it can’t be forged), and pass it to objects’ access-control procedure. An arbitrary procedure is often an overkill, and an access lock list is used instead.

Many system use a hybrid implementation in which a domain first access an object by access-key to obtain a capability, which is used for subsequent access. (Eg, opening a file and geting a file descriptor.)

UNIX

The UNIX Time-Sharing System, 1974.

The good ol’ UNIX! This paper describe the “modern” UNIX written in C, running on PDP-11.

Comparing to systems like TENEX and MULTICS, UNIX has a simpler design and does not require special hardware supports, since it has always been designed for rather limited machines, and for its creators’ own use only.

The paper spends major portions describing the file system, something we tend to take for granted from an operating system and view as swappable nowadays³. We are all too familiar with “everything as a file”. UNIX treats files as a linear sequence of bytes, but that’s not the only possible way. IBM filesystems has the notion of “records” like in a database⁴. And on MULTICS, as we’ve seen, the whole file is mmaped to the memory⁵.

Because most filesystems we use expose the same interface, namely the POSIX standard. They all have read, write, open, close, seek, makedir, etc. I wish in the future we can plug in custom filesystems to the OS and expose new interfaces for programs to use. For example, a network filesystem that can tell the program “I’m downloading this file from remote, the progress is xx%”. Right now network filesystems can only choose between blocking and immediately error out.

As every idea in CS, this might be coming back in another form. For example, Android uses (modified) SQLite for its filesystem.

Again, this might be coming back, in the form of persistent memory.

UNIX uses mounting to integrate multiple devices into a single namespace. On the other hand, MS DOS uses filenames to represent devices.

This version of UNIX only has seven protections bits, one of which switches set-user-id, so there is no permission for “group”. set-user-id is essentially effective user id (euid).

The paper talked about the shell in detail, for example the | < > ; & operators. Judging from the example, the < and > are clearly intended to be prefixes rather than operators (that was one of the mysteries for me before reading this paper):

ls
>temp1 pr -2 <temp1 >temp2 opr
<temp2

Plan 9

Plan 9 From Bell Labs, 1995.

According to the paper, by the mid 1980’s, people have moved away from centralized, powerful timesharing systems (on mainframes and mini-computers) to small personal micro-computers. But a network of machines have difficulty serving as seamlessly as the old timesharing system. They want to build a system that feels like the old timesharing system, but is made of a bunch of micro-computers. Instead of having a single powerful computer that does everything, they will have individual micro-computers for each task: a computing (CPU) server, a file server, routers, terminals, etc.

The central idea is to expose every service as files. Each user can compose their own private namespace, mapping files, devices, and services (as files) into a single hierarchy. Finally, all communication are made through a single protocol, 9P. Compare that to what we have now, where the interface is essentially C ABI plus web API, it certainly sounds nice. But on the other hand, using text stream as the sole interface for everything feels a bit shaky.

Their file server has an interesting storage called WORM (write-once, read many), it’s basically a time machine. Everyday at 5 AM, a snapshot of all the disks is taken and put into the WORM storage. People can get back old versions of their files by simply reading the WORM storage. Nowadays WORM snapshot is often used to defend against ransom attacks.

Medusa

Medusa: An Experiment in Distributed Operating Systems Structure, 1980.

A distributed system made at CMU, to closely match and maximally exploit its hardware: the distributed-processor Cm* system (Computer Modules).

On a distributed processor hardware, they can place the kernel code in memory in three ways:

Replicate the kernel on every node
Kernel code on one node, other nodes’ processors execute code remotely
Split the kernel onto multiple nodes

They chose the third approach: divide the kernel into utilities (kernel module) and distribute them among all the processors. When a running program needs to invoke a certain utility (basically some syscall provided by some kernel module), it migrates to the processor that has that utility. Different processors can have the same utility, so programs don’t have to fight for a single popular utility.

The design is primarily influenced by efficiency given their particular hardware, not structural purity, but some nice structure properties nonetheless arised. Boundaries between utilities are rigidly enforced, since each utility can only send messages to each other and can’t modify other’s memory. This improves security and robustness. For example, error in one utility won’t affect other utilities.

One problem that might occur when you split the kernel into modules is circular dependency and deadlocks. If the filesystem utility calls into the memory manager utility (eg, get a buffer), and the memory manager utility calls into the filesystem utility (eg, swap pages), you have a circular dependency. Mix in locks and you might get a deadlock.

To be deadlock-free, Medusa further divides each utility into service classes such that service classes don’t have circular dependencies between each other. It also makes sure each utility use separate and statistically allocated resources.

Programs written to run on Medusa are mostly concurrent in nature. Instead of conventional processes, program execution are carried out by task forces, which is a collection of activities. Each activity is like a thread but runs on different processors.

Activities access kernel objects (resources like memory page, pipe, file, etc) through descriptors. Each activity has a private descriptor list (PDL), and all activities in a task force share a shared descriptor list (SDL). There are also utility descriptor list (UDL) for utility entry points (syscalls), and external descriptor list (XDL) referencing remote UDL and PDL. Both UDL and XDL are processor-specific.

The task force notion is useful for scheduling: Medusa schedules activities that are in the same task force to run in the same time. It’s often referred to as gang scheduling or coscheduling, where you schedule inter-communicating processes to run together, just like working sets in paging. In addition, Medusa does not schedule out an activity immediately when it starts waiting, and instead spin-waits for a short while (pause time), in the hope that the wait is short (shorter than context switch).

Utilities store information for an activity alongside the activity, instead of storing it on the utility’s processor. This way if an utilities fails, another utility can come in, read the information, and carry on the work. The utility seals the information stored with the activity, so user programs can’t muddle with it. Only other utilities can unseal and use that information. Implementation wise, unsealing means mapping the kernel object into the XDL of the processor running the utility; sealing it means removing it from the XDL.

Medusa’s kernel also provide some handy utilities like the exception reporter and a debugger/tracer. When an exception occurs, the kernel on the processor sends exception data to the reporter, which sends that information to other activities (buddy activity) to handle. And you can use the debugger/tracer to online-debug programs. Must be nice if the kernel drops you into a debugger when your program segfaults, no?⁶ I feel that Ken Thompson being too good a programmer negatively impacted the capability of computing devices we have today. If he wasn’t that good, perhaps they would add a kernel debugger in UNIX ;-)

Common Lisp can do that, just sayin.

Pilot

Pilot: An Operating System for a Personal Computer, 1980.

A system developed by Xerox PARC on their personal work stations. Since it is intended for personal computing, they made some interesting design choices. The kernel doesn’t worry about fairness in allocating resources, and can take advices from userspace. For example, userspace programs can mark some process as high priority for scheduling⁷, or pin some pages in the memory so it’s never swapped out. (These are just examples, I don’t know for sure if you can do these things in Pilot.)

Recently we start to see big/small cores in Apple M1 and Intel 12th gen, and “quality of service” in macOS.

Pilot uses the same language, Mesa, for operating system and user programs. In result, the OS and user programs are tightly coupled.

Pilot provides defense (against errors) but not absolute protection⁸. And protection is language-based, provided by (and only by) type-checking in Mesa.

This is before Internet, and malicious program isn’t a thing yet, I think?

Lastly, Pilot has integrated support for networks. It is designed to be used in a network (of Pilots). In fact, the first distributed email system is created on Pilot.

The device on which Pilot runs is also worth noting. ’Twas a powerful machine, with high-resolution bitmap display, keyboard, and a “pointing device”. Xerox PARC basically invented personal computer, plus GUI and mouse.

The filesystem is flat (no directory hierarchy), though higher level software are free to implement additional structure. Files are accessed through mapping its pages (blocks) into virtual memory. Files and volumes (devices) are named by a 64-bit unique id (uid), which means files created anywhere anytime can be uniquely identified across different machines (and thus across the network). They used a classic trick, unique serial number plus real-time clock, to guarantee uniqueness.

A file can be marked immutable. An immutable file can’t be modified ever again, and can be shared across machines without changing its uid. This is useful for, eg, sharing programs.

Monitor

Monitors: An Operating System Structuring Concept, 1974, by C. A. R. Hoare.

Experience with Processes and Monitors in Mesa, 1980.

Monitor is a synchronization concept. Think of it as a class that manages some resource and synchronizes automatically. In C, you would manually create a mutex and lock/unlock it; in Java, you just add some keyword in front of a variable and the runtime creates and manages the lock for you—that’s a monitor.

The Hoare paper introduced the concept and gave a bunch of examples. The Mesa paper describes how did they implement and use monitors in Mesa. If you recall, Mesa is the system and application language for Pilot.

Pilot uses monitors provided by Mesa to implement synchronization in the kernel, another example of the tight coupling of Pilot and Mesa.

I have some notes on the differences between Mesa’s monitors and Hoare’s monitors, but they aren’t very interesting. Basically Mesa folks needed to figure out a lot of details for using monitors for Pilot, like nested wait, creating monitor, handling exceptions in monitor, scheduling, class level vs instance level, etc.

Pilot didn’t use mutual monitors between devices. If two devices with orders of magnitude difference in processing speed shares a monitor, the fast device could be slowed down by waiting for the slower device to finish its critical section.

V Kernel

The Distributed V Kernel and its Performance for Diskless Workstations, 1983.

Back in the day, professors and their grad students work together to build an awesome and cutting-edge system, and journals invite them to write down their thoughts and experiences. Papers we’ve read up to this point are mostly describing the system the authors built, and sharing their experiences and lessons learned.

This paper is a bit different—it presents performance measurements and use it to argue a claim. You see, the conventional approach to build a distributed workstation is to use a small local disk for caching, and these systems usually use specialized protocols. This papar tries to build a distributed workstation without local disks (diskless) and only use generic message-based IPC. The authors argue that the overhead added by this two decisions are ok.

The paper introduced V message. It’s synchronous (request and response), has a small message size (32 bytes), and has separate control data messages. Though they also have a “control+data message” (ReplyWithSegment), presumably to squeeze out some performance.

They used various measures to reduce the overhead. They put everything into the kernel, including the file server. They didn’t use TCP but used Ethernet frames directly. There is no separate ACK message, instead ACK is implied by a response.

The paper analyzed what network penalty consists of. When you send a message from one host to another, it goes from RAM to the network interface, then it’s transferred on wire to the destination interface, then it’s copied into RAM. Their argument is that message layer doesn’t add much overhead comparing to the base network penalty—copying between RAM and network interface, and waiting in the interface before going onto the wire. They also argued that remote file access adds little overhead comparing to already-slow disk access.

Overall, their argument do have some cracks. For example, they argue that there is no need for specialized message protocol, but their protocol ends up specializing. They also argued that no streaming is needed, but large data packet are effectively streaming.

Sprite

The Sprite Network Operating System, 1988.

Sprite is another distributed system. It tries to use large memory cache to improve file access; and do it transparently, giving the user the illusion of a local system. It also has a very cool process migration feature. Sadly, process migration never caught up in the industry.

Several trends at the time influenced Sprite’s design. Distributed system was popular (at least in academia); memories are getting larger and larger; and more and more systems are featuring multiple processors.

To present the illusion of a local file system, Sprite uses prefix tables. Here, prefix means path prefix. When the userspace access a file, the kernel looks for a prefix of the path that’s in the prefix table. In the prefix table, the prefix can either point to the local filesystem or a remote filesystem. If it points to a remote filesystem, the kernel makes RPC calls to the remote host, which then access the local filesystem of that remote host.

Prefix table isn’t only useful for distributed system. In general, OS that uses file paths usually cache the file paths it reads in a prefix table, because resolving a file path is very slow. When the OS resolves a file path, it needs to read each directory in the path to find the next directory.

With cache, the biggest issue is consistency: if two clients get a file and stored it in their cache, and both write to their cache, you have a problem. Sprite’s solution is to allow only one writer at a time and track the current writer of every file. When a client needs to read a file, it finds the current writer and requests the file from it. This is sequential write-sharing.

If multiple clients needs to write the same file (concurrent write-sharing), Sprite just turns off caching. This is rare enough to not worth complicating the system. (And you probably need a substantially more complicated system to handle this.)

Grapevine

Experience with Grapevine: The Growth of a Distributed System, 1984.

A classic paper in distributed systems, even considered the MULTICS of distributed systems by some. Grapevine is a distributed email delivery and management system; it provides message delivery, naming, authentication, resource location, access control—you name it.

The main takeaway is the experience they got from running Grapevine. To support scaling, the cost of any computation/operation should not grow as the size of the system grows. But on the other hand, sometimes you can afford to have complete information—maybe that information can never get too large, regardless of how large the system grows.

Grapevine generally tries to hide the distributed nature of the system, but that caused some problem. First of all, they can’t really hide everything: update in the sytem takes time to propagate, and sometimes users get duplicated messages, all of which are confusing for someone accustomed to the mail service on time-sharing systems.

More importantly, user sometimes needs to know more information of the underlying system to understand what’s going on: when stuff doesn’t work, people want to know why. For example, removing an inbox is an expensive operation and removing a lot of them in the same time could overload the system. System administrators needs to understand this, and to understand this they need to understand roughly how the system works under the hood.

The lesson is, complete transparency is usually not possible, and often not what you want anyway. When you design a system, it is important to decide what to make transparent and what not to.

Finally, the paper mentioned some considerations about managing the system. Maintaining a geographically dispersed system involves on-site operators and system experts. On-site operators carry out operations on-site, but has little to no understanding of the underlying system. System experts has deep understanding of the system, but are in short supply and are almost always remote from the servers they need to work on. Grapevine has remote monitoring and debugging features to help an expert to diagnose and repair a server remotely.

The system structure of Grapevine.

Global memory

Implementing Global Memory Management in a Workstation Cluster, 1995.

This paper is purely academic, but pretty cool nonetheless. They built a cluster that shares physical memory at a very low level, below VM, paging, file-mapping, etc. This allows the system to utilize the physical memory much better and allows more file-caching. More file caches is nice because CPU was becoming much faster than the disk.

Each node in the cluster divides their memory into local memory and global memory. Local memory stores pages requested by local processes; global memory stores pages in behave of other nodes in the cluster.

When a fault occurs on a node P, one of four things could happen.

If the requested page is in the global memory of another node Q, P uses a random page in its global memory to trade the desired page with Q. (See illustration 1.)
If the requested page is in the global memory of another node Q, but P doesn’t have any page in its global memory, P use the least-recently used (LRU) local page to trade with Q.
If the requested page is on local disk, P reads it into its local memory, and evict the oldest page in the entire cluster to make room for the new page. If the oldest page is on P, evict that; if the oldest page is on a node Q, evict the page on Q, and send a page of P to Q. This page is either a random global page on P, or the LRU local page of P if P has no global pages. (See illustration 2.)
If the requested page is a local page of another node Q, duplicate that page into the local memory of P, and evict the oldest page in the entire cluster. Again, if the oldest page is on another node R, send one of P’s global pages or P’s LRU page to trade with R.

Illustration 1: Page exchange in case 1.

Illustartion 2: Page exchange in case 3.

This whole dance can improve performance of memory-intensive tasks because fetching a page from remote memory is about two to ten times faster than disk access. However, local hit is over three magnitudes faster than fetching remote memory, so the algorithm has to be very careful not to evict the wrong page.

The description above omits a crucial problem: how does memory management code running on each node know which page is the oldest page in the entire cluster?

Consider the naive solution, where the system is managed by a single entity, a central controller. The controller keeps track of every single page’s age and tells each node which node to evict. Of course, this is impossible because that’s way too slow, the controller has to be running at a much faster speed than the other nodes and the communication speed between nodes must be very fast.

Instead, each node must make local independent decisions that combines to achieve a global goal (evict the oldest page). The difficulty is that local nodes usually don’t have complete, up-to-date information.

A beautiful approach to this kind of problem is probability-based algorithm. We don’t aim to make the optimal decision for every single case, but use probability to approximate the optimal outcome.

We divide time into epochs, in each epoch, the cluster expects to replace m oldest pages. (m is predicted from date from previous epochs.) At the beginning of each epoch, every node sends a summary of its pages and their age to an initiator node (central controller). The initiator node sorts all the pages by their age, and finds the set of m oldest pages in the cluster (call it W). Then, it assigns each node i a weight w_i, where w_i is

Basically, w_i means “among the M oldest pages in the cluster, how many of them are in node i”.

The initiator node tells each node of every node’s weight, and when a node P encounters case 3 or 4 and wants to evict “the oldest page in the cluster”, it randomly picks a node by each node’s weight, and tells that node to evict its oldest page.

That takes care of finding which node to evict pages from, but tracking page age isn’t easy either. For one, in a mmaped file, memory access bypasses pagefault handler and goes straight to the TLB. More importantly, the OS uses FIFO second-chance page caching and hides many page request/eviction from their memory manager, because the memory manager runs at a lower level (presumably in pagefault handlers).

The authors resorted to hacking the TLB handler of the machine with PALcode (microcode). This would’ve been impossible on x86—it’s TLB is handled purely in hardware.

Probability-based algorithms sometimes feels outright magical—they seem to just bypass trade-offs. In reality, they usually just add a new dimension to the trade-off. We’ll see this again later in lottery scheduling.

μ-kernel

The Performance of μ-Kernel-Based Systems, 1997.

This paper is another measurement paper. It uses benchmarks to argue that a) micro kernel can deliver comparable performance, and b) the performance doesn’t depend on a particular hardware architecture.

The authors built a micro kernel L4, and ported Linux to run on it (called L⁴Linux). Then they ported L4 itself from Pentium to both Alpha and MIPS architecture—to show that L4 is architecture-independent. They also conducted some experiment to show L4’s extensibility and performance.

The paper considers micro kernels like Mach and Chrous to be first-generation, evolved out of earlier monolithic kernels. It considers later kernels like L4 and QNX to be second-generation, for that they are designed more rigorously from scratch, ie, more “pure”.

L4 allows user programs to control memory allocation like nucleus did: kernel manages top-level tasks’ memory, top-level tasks manages their children’s memory. And scheduling? Hard priorities with round-robin scheduling per priority, not unlike nucleus.

L⁴Linux only modifies the architecture-dependent part of Linux, meaning they didn’t have to modify Linux. The authors also restricted themselves to not make any Linux-specific change to L4, as a test for the design of L4. The result is not bad: in micro benchmarks, L⁴Linux is ×2.4 times slower than native Linux; in macro benchmarks, L⁴Linux is about 5–10% slower than native Linux. More over, L⁴Linux is much faster than running Linux on top of other micro kernels, like MkLinux (Linux + Mach 3.0).

The paper also mentions supporting tagged TLBs. Normal TLB needs to be flashed on context switch, which is a big reason why context switch is expensive. But if you tag each entry in the TLB with a tag to associate that entry with a specific process, you wouldn’t need to flush TLB anymore. The downside is that, tagged TLB needs some form of software-managed TLB, so not all architecture can support it. For example, x86 doesn’t support software-managed TLB.

The benefit of micro kernels is of course the extensibility. For example, when a page is swapped out, instead of writing to disk, we can swap to a remote machine, or encrypt the page and write to disk, or compress the page and write to page. A database program could bypass the filesystem and file cache, and control the layout of data on physical disk for optimization; it can control caching and keep pages in memory and not swapped out.

All of these are very nice perks, and the performance doesn’t seem too bad, then why micro kernels never caught on? Here’s our professor’s take: big companies can just hire kernel developers to modify Linux to their need⁹; smaller companies don’t have special requirements and can just use Linux. That leaves only the companies in the middle: have special requirements, but don’t want to modify Linux. (Professor’s take ends here.) However, extending micro kernel is still work, it might be easier than modifying Linux, but how much easier? Plus, if there are a lot of Linux kernel developers, perhaps modifying Linux is more easier afterall.

And they did. Since the paper has been written, Linux has gained many features of L4 described in the paper.

If we look at “we need a custom OS” scenario today, Nintendo Switch and Playstation use modified BSD, Steam Deck is built on top of Linux. And I’m sure most data centers run some form of Linux.

Beyond monolithic and microkernel, there are many other kernel designs: hybrid, exokernel, even virtual machines. Hybrid kernels include Windows NT, NetWave, BeOS, etc. Hybrid kernel leaves some modules in the kernel, like IPC, driver, VM, scheduling, and put others in the userspace, like filesystem.

Exokernel

Exokernel: An Operating System Architecture for Application-Level Resource Management, 1997.

The idea is to go one step further than microkernels and turn the kernel into a library. Kernel exposes hardware resources, provide multiplexing and protection, but leaves management to the application. The motivation is that traditional kernel abstraction hides key information and obstructs application-specific optimizations.

This idea can be nicely applied to single-purpose applicants, when the whole purpose of a machine is to run a single application, eg, a database, a web server, or an embedded program. In this case, things that a traditional kernel provides like users, permissions, fairness, are all unnecessary overhead. (Unikernel explored exactly this use-case.)

Exokernel exports hardware resources and protection, and leaves management to the (untrusted) application. Applications can request for resources and handle events. Each application cooperatively share the limited resources by participating in a resource revocation protocol. Eg, the exokernel might tell an application to release some resources for others to use. Finally, the exokernel can forcibly retract resources held by uncooperative applications by the abort protocol.

Exokenel doesn’t provide many of the traditional abstractions, like VM or IPC, those are left for the application to implement.

The protection provided by an exokernel is inevitably weaker: an application error could corrupt on-disk data; and because the kernel and application runs in the same VM, application error could corrupt kernel memory!

The existence of abort protocol kind of breaks the “no management” principle—retracting resources from an application is management.

Finally, their benchmark isn’t very convincing: there are only micro benchmarks and no macro benchmark; they only benchmarked mechanism (context switch, exception handler, etc) and has no benchmark for application.

Xen

Xen and the Art of Virtualization, 2003.

Xen is a virtual machine monitor (VMM), also called hypervisor—the thing that sits between an OS and the hardware. The goal of Xen is to be able to run hundreds of guest OS’s in the same time.

Xen provides a virtual machine abstraction (paravirtualization) rather than a full virtual hardware (full virtualization). Paravirtualization has better performance and gives the VMM more control, but requires modification to the guest OS. On the other hand, full virtualization VMM, for example VMWare, can work with unmodified guest OS.

Nowadays there are a plethora of virtual machine solutions, like VMWare, Hyper-V, VirtualBox, KVM, Xen. On top of that, there are containers like LXC, docker, etc. The whole stack contains OS, VMM/container engine, guest OS, and guest app. These solutions all have different configurations: The VMM can sit on the host OS or directly on the hardware; you can run one guest OS per app, or run a single guest OS for multiple apps; on the old IBM and VMS systems, the VMM supports both a batch processing OS and an interactive OS.

Let’s look at how does Xen virtualize and how does it compare to VMWare.

Scheduling virtualization: Xen uses the Borrowed Virtual Time (BVT) algorithm. This algorithm allows a guest OS to borrow future execution time to respond to latency-critical tasks.

Instructions virtualization: Boring instructions like add can just pass-through to the hardware, but privileged instructions (like memory access) needs intervention from the monitor.

In Xen, the guest OS is modified so that it is aware of the VMM, and instead of doing privileged task by itself, the guest OS delegates the work to the VMM by hypercalls. In VMWare, since they can’t modify the guest OS, privileged instructions simply trap into VMM. If you remember, we talked about rings in the MULTICS section. On x86, The CPU will trap if it’s asked to execute a privileged instruction when in a low ring level.

Memory virtualization: The guest OS isn’t managing physical memory anymore, though we still call it physical memory. VMM has real access to the phyiscal memory, often called machine memory.

Then, how is the virtual memory address in the guest OS translated into machine memory address?

In Xen, the guest OS is aware of the virtualization. It’s page table can map directly from virtual address to machine address, and MMU can just read off of guest OS’s page table. The VMM just need to verify writes to the page table to enforce protection.

In VMWare, however, the guest OS is unaware of the VMM, and its page table maps from virtual address to physical address. Also, the guest OS writes to its page table without bothering to notify anyone. VMM maintains a shadow page table that maps virtual address to actual machine address. It also uses dirty bits to make sure whenever the guest OS writs to the page table, it is notified and can update its shadow page table accordingly. (I forgot exactly how.) And MMU reads off the shadow page table. (Presumably by trapping to VMM when the guest OS tries to modify the CR3 register, and let VMM override CR3 to its shadow page table?)

Illustration of Xen and VMWare’s memory virtualization.

Note that VMWare needs all these complication only because x86’s memory management is completely hardware-based—the kernel can only point the MMU to the page table and has no other control over the MMU. Other “higher-end” architectures usually support software-managed and tagged TLB.

A clever trick that Xen uses is balloon driver. It’s a drive whose whole purpose is to take up memory. When the VMM wants to retract memory from the guest OS, it enlarges the “balloon”, so the guest OS relinquishes memory to the host.

VMS

Virtual Memory Management in VAX/VMS, 1982.

This paper mainly concerns of the implementation of the virtual memory for VMS. VMS has to run on a variety of low-end hardware with small memory and slow CPU; it also needs to support drastically different use-cases: real time, timeshared, and batch. These requirements all affected the design of VMS.

VMS’s virtual memory has three regions: program region, control region and system region. The highest two bits of an address indicates the region, after that are the regular stuff: 20 bits of virtual page number and 8 bits of byte offset. The system region (think of it as kernel stack) is shared by all processes; program and control region are process-specific.

The paper mentions a trick they used: they mark the first page in the VM as no access, so that an uninitialized pointer (pointing to 0x0) causes an exception. I think Linux does the same.

VMS uses a process-local page replacement policy. When a process requests for memory that needs to be paged in, kernel swaps out a page from this process’s resident set—the set of pages currently used by that process. This way a heavily paging process can only slow down itself.

When a page is removed from the resident set, it doesn’t go out of the memory immediately; instead, it’s appended to one of two lists. It goes to the free page list if it hasn’t been modified; otherwise it goes to the modified page list. When kernel needs a fresh page to swap data in, it takes a page from the head of the free list. When kernel decides to write pages back to paging file (swap file), it takes the page from the head of the modified list.

So a page is appended to the end of the list, and gradually moves to the head, until it’s consumed. But if the page is requested again by the process while still in the list, it is pulled out and put back into the the process’s resident set. This is basically second chance caching: we keep the page in the memory for a while before really discarding it, in case it is used again soon.

Because VMS uses a relatively small 512 byte page size¹⁰, pages causes a lot of I/O, which is obviously not good. To reduce the number of disk operations, they try to read and write several pages at once (they call this clustering).

To be compatible with PDP-11 and because of the promise of low-latency semiconductor disk technologies (which obviously didn’t materialize on time).

The paper also mentions some other nice features, like on-demand zeroed page, and copy-on-reference page. On-demand zeroed page are only allocated and zeroed when it’s actually referenced. Similarly, copy-on-reference pages are only copied when it’s actually referenced. I wonder why didn’t they make it copy-on-write though, they say it’s used for sharing executable files.

Quiz time: does kernel know about every memory access?

…The answer is no. Kernel only get to know about memory use when there’s a pagefault, which runs the pagefault handler provided by the kernel. If there’s no pagefault, memory access is handled silently by the MMU.

Mach

Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures, 1987.

Mach was a popular research OS. In fact, our professor, Dr. Zhou, did her PhD on Mach’s virtual memory. Mach actually influenced both Windows and Mac: one of the prominent Mach researcher went to Microsoft and worked on Windows NT, and Mac OSX was Mach plus BSD plus NextStep.

The main topic of this paper is machine-independent VM. The idea is to treat hardware information (machine-dependent, like TLB) as a cache of machine-independent information.

Mach’s page table is a sorted doubly linked list of virtual regions. Each virtual region stores some machine-independent info like address range, inheritance, protection, and some cache for the machine-dependent info. The machine-dependent part is a cache because it can be re-constructed from the machine-independent info. Also, since Mach uses doubly linked list, it can support sparse addresses (VMS can’t).

Each virtual region maps a virtual address range to a range in a memory object. A memory object is an abstraction over some data; it can be a piece of memory, secondary storage, and even remote data, I think?

A memory object is associated with a pager, which handles pagefault and page-out requests. This pager is outside of the kernel and is customizable. And we can make it do interesting things like encrypting memory, remote memory, etc.

When performing a copy-on-write, Mac creates a shadow memory object which only contains pages that have been modified. Access to the unmodified page will be redirected to the original memory object. Since shadow memory objects themselves can be shadowed, sometimes, large chains of shadow objects will manifest. Mach has to garbage collect intermediate shadow objects when the chain gets long. Reading the paper, this seems to be tricky to implement and was quite an annoyance to the designers.

When a task inherits memory from its parent task, the parent can set the inheritance flag of any page to either shared (read-write), copy (copy-on-write), or none (no access). To me, this would be very helpful for sandboxing.

FFS

A Fast File System for UNIX, 1984.

This paper literally describes a faster file system they implemented for UNIX. It was widely adopted.

The author identifies a series of shortcomings of the default file system of UNIX:

The free list (a linked list of all free blocks) starts out ordered, but over time becomes random, so when the file system allocates blocks for files, those block are not physically continuous but rather scatter around.

The inodes are stored in one place, and the data (blocks) another. File operations (list directory, open, read, write) involve editing meta information interleaved with writing data, causing long seeks between the inodes and the blocks.

The default block size of 512 bytes is too small and creates indirection and fragmentation. Smaller block size also means it takes more disk transactions to transfer the same amount of data.

With all these combined, the default file system can only produce 2% of the full bandwidth.

FFS improves performance by creating locality as much as possible. It divides a disk partition into cylinder groups. Each cylinder group has its own copy of the superblock, its own inodes, and a free list implemented with a bitmap. This way inodes and data blocks are reasonably close to each other. Each cylinder has a fixed number of inodes.

FFS uses a smart allocation policy for allocating blocks for files and directories. It tries to place inodes of files in the same directory in the same cylinder group; it places new directories in a cylinder group that has more free inocdes and less existing directories; it tries to place all the data blocks of a file in the same cylinder group. Basically, anthing that improves locality.

FFS uses a larger block size since 512 bytes is too small. But larger block size wastes space—most UNIX systems are composed of many small files that would be smaller than a larger block size. FFS allows a block to be splitted into fragments. A block can be broken into 2, 4, or 8 fragments. At the end, the author claims that FFS with 4096-byte blocks and 512-byte fragments has about the same disk utilization as the old 512-byte block file system.

FFS requires some percent of free space to maintain it’s performance. When the disk is too full, it’s hard for FFS to keep the blocks of a file localized. FFS performs best when there are around 10% of free space. This applies to most modern filesystems too.

To maximally optimize the file system, FFS is parameterized so it can be tuned according to the physical property of the disk (number of blocks on a track, spin speed), processor speed (speed of interrupt and disk transfer), etc.

Here’s one example of how these information could improve performance. Two physically consecutive blocks on the disk can’t be read consecutively, because it takes some time for the processor to process the data after reading a block. FFS can calculate the number of blocks to skip according to the processor speed and spin speed, such that when the OS finished reading one block, the next block of the file comes into position right under the disk head.

LFS

The Design and Implementation of a Log-Structured File System, 1991.

When this paper came out, it stirred quote some controversy on LFS vs extent-based FFS. Comparing to FFS, LFS has much faster writes, but it has slower read and needs garbage collection.

The main idea is this: since now machines have large RAMs, file cache should ensure read is fast; so the filesystem should optimize for write speed. To optimize write speed, we can buffer writes in the file cache and write them all at once sequentially.

This approach solves several shortcoming of FFS. In FFS, even though inodes are close to the data, they are still separate and requires seeking when writing. And the same goes for directories and files. The typical work load of the filesystem alternates between writing metadata and data, producing a lot of separate small writes. Further, most of the files are small, so most writes are really writing metadata. Writing metadata is much slower than writing files, because the filesystem has to do synchronous write for metadata, to ensure consistency in case of unexpected failure (power outage, etc).

On the other hand, LFS treats the whole disk as an append-only log. When writing a file, the filssytem just appends what it wants to write to the end of the log, followed by the new inodes pointing to the newly written blocks, followed by the new inode map pointing to the newly written inodes. The inode map is additionally copied in the memory for fast access.

To read, LFS looks into the inode map (always at the end of the log), finds the inodes, reads the inode to find the blocks, and pieces together the parts it wants to read.

When LFS has used the entire disk up, how does it keep appending new blocks? LFS divides the disk into segments, each consisting of a number of blocks. Some of the blocks are still being referenced (live blocks), some are free to be reused (free blocks). LFS will regularly perform garbage collection and create segments that only contains free blocks—during garbage collection, LFS copies all the live blocks in a segment to the end of the log, then this segment becomes a free segment. Finally, when LFS needs to write new logs, it writes them in free segments.

The challenge of garbage collection is to choose the best segment to clean. The authors first tried to clean least utilized segment first, ie, clean the segment with the least amount of live data. This didn’t go well, because segments don’t get cleaned until they cross the threshold, and a lot of segments lingers around the threshold, don’t get cleaned, and hold up a lot of space.

The authors found that it’s best to categorize segments into hot and cold segments. Hot segments are the ones that are actively updated, where blocks are actively marked free. Cleaning hot segments isn’t very valuable, because even if we don’t clean it, more and more of its blocks will become free by themselves. On the other hand, cold segments are valuable to clean, since it’s unlikely/slow to free up blocks by itself.

The authors also mentioned some crash recovery and checkpoint mechanism in the paper.

Soft update

Soft Updates: A Solution to the Metadata Update Problem in File Systems, 2000.

In LFS we mentioned that metadata edit requires synchronize writes. That’s because you want to ensure the data on disk (or any persistent storage) is always consistent. If the system writes only a partial of the data it wishes to write, then crashed, the disk should be in a consistent or at least recoverable state. For example, when adding a file to a directory, adding the new inode must happen before adding the file entry to the directory.

Folks has long sought to improve the performance of updating metadata, this paper lists several existing solutions.

Nonvolatile RAM (NVRAM): Use NVRAM to store metadata. Updating metadata is as fast as accessing RAM, and it persists.
Write-ahead logging: Ie, journaling. The filesystem first log the operation it’s about to perform, and performs it. If a crash happens, the filesystem can recover using the log.
Scheduler-enforced ordering: Modify disk request scheduler to enforce synchronous edit of metadata. Meanwhile, the filesystem is free to edit metadata asynchronously (since the disk request scheduler will take care of it)
Interbuffer dependencies: Use write cache, and let the cache write-back code enforce metadata ordering.

Soft update is similar to “interbuffer dependencies”. It maintains a log of metadata updates, and tracks dependencies at a fine granularity (per field or pointer), and can move the order of operations around to avoid circular dependencies. Then it can group some updates together and make less writes.

Rio

The Rio File Cache: Surviving Operating System Crashes, 1996.

The main point of Rio (RAM/IO) is to make memory survive crashes; then the OS doesn’t have to consistently write the cache to persistent storage.

Power outages can be solved by power supply with battery and dumping memory to persistent storage when power outage occurs. Alternatively, we can just use persistent memory. Then, during reboot, the OS goes through the dumped memory file to recover data (file cache). The authors call this “warm reboot”.

System crash is the main challenge, because kernel crash can corrupt the memory. The authors argue that the reason why people consider persistent storage to be reliable and memory to be unreliable is because of their interface: writing to disk needs drivers and explicit procedures, etc, while writing to memory only takes a mov instruction.

Then, protecting the file cache is just a matter of write-protecting the memory. And there are a myriad of techniques for that already. For example, you can use the protection that virtual memory already provides. Just turn off the write-permission bits in the page table for file cache pages. However, some systems allow kernel to bypass virtual memory protection. The authors resorted to disabling processor’s ability to bypass TLB. This is of course architecture-dependent.

Another way is to install checks for every kernel memory access, but that’s a heavy penalty on the performance.

What’s more interesting is perhaps the effect of having a reliable memory on the filesystem. First, you can turn off reliable sync writes (this is the motivation for this paper in the first place). But also, since memory is now permanent, metadata updates must be ordered, so that a crash in the middle of an operation doesn’t create an inconsistent state.

Nowadays, persistent memory is getting larger and cheaper to the point that it seems possible to use it to improve IO performance in datacenters. Problem is, every update has to be ordered, and you can’t control L1/2/3 cache. They can decide to write to memory at different orders than you intended.

Currently there are two approaches: treat the persistent memory as a super fast SSD, and slap a filesystem on it, the filesystem will take care of the dirty work. Others don’t want to pay for the overhead of a filesystem, and want to use it as a memory. To go this route, the programmer have to deal with the complications of consistency/ordering.

Scheduler activation

Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism, 1991.

Threading can be implemented in either kernel or userspace. However, both have their problems. If implemented in userspace, it has bad integration with kernel schedular—userspace thread scheduler has no way to know when a thread is going to run, and for how long. If implemented in kernel, thread management now requires a context-switch into kernel, which is very slow. Plus, like anthing else that goes into kernel, there won’t be much customizability.

The authors present a new abstraction as the solution—scheduler activation. The idea is to allow more cooperation between kernel and userspace. Kernel allocates processors, and notifies the userspace when it gives processors or takes processors away. The userspace decides what to run on the provided processors. Finally, the userspace can request or relinquish processors.

This way we get the best of both worlds: userspace thread scheduler has more information to make decisions, meanwhile userspace can do their own scheduling, requiring less context-switches.

When kernel notifies userspace of a change, it “activates” the userspace thread scheduler (that’s where the name “scheduler activation” comes from). A scheduler activation is like an empty kernel thread. When kernel wants to notify userspace of something, it creates a “scheduler activation”, assigns it a processor, and runs userspace scheduler in this “scheduler activation”. The userspace scheduler makes decisions by the information given in the scheduler activation by the kernel, then proceeds to run some thread on this scheduler activation.

The difference between a scheduler activation and normal kernel thread is that, when the kernel stops a scheduler activation, (maybe due to I/O), the kernel will create another scheduler activation to notify the userspace that the other scheduler activation has stopped; then the userspace scheduler can decide which thread to run on this scheduler activation. When the original scheduler activation is to be resumed (I/O completes), kernel blocks a running scheduler activation and creates a new scheduler activation, and let userspace scheduler decide which to run on this new scheduler activation.

For normal kernel threads, the kernel stops and resumes the thread without noticing userspace, and the kernel selects what to run.

Critical sections (where the executing program holds some locks) is a bit tricky in scheduler activation. When the thread is in critical section when it is blocked or preempted, performance might take a hit (no one else can run), or a deadlock might even appear. The solution is to let the thread run a little bit until it exits the critical section.

Scheduler activation is basically the N:M thread we’re taught in undergrad OS classes. Evidentally it isn’t very widely used, maybe because the performance improvement isn’t worth the complexity.

Lottery scheduling

Lottery Scheduling: Flexible Proportional-Share Resource Management, 1994.

Lottery scheduling is another probability-based algorithm, it uses a simple algorithm to solve a otherwise difficult problem. I really like probability-based algorithms in general.¹¹

Another great example is Google’s HyperLogLog: A problem so hard even Google relies on Random Chance

Scheduling is hard. There are so many requirements to consider: fairness, overhead, starvation, priority, priority inversion¹². However, lottery scheduling seemingly can have its cake and eat it too, solving all of the above simultaneously (with a catch, of course). Even better, lottery scheduling allows flexible distribution of resources, while normal priority-based scheduler only has corase control over processes: higher priority always wins¹³. It’s also general enough to apply to sharing other resources, like network bandwith or memory.

Priority inversion is when a preempted lower-priority process/thread holds a lock which the higher priority process/thread needs to acquire in order to progress. In this case, the lower-priority process is blocking the higher priority process, effectively inverting the priority.

Here’s how it works. Suppose we have some processes and want to allocate some proportion of execution time to each. We create 100 tickets, and assign each process tickets based on their allocated proportion. Eg, if we want alloacte 30% of the execution time to process A, we assign it 30 tickets.

Then, we divide time into epochs. At the start of each epoch, we randomly draw a ticket out of the 100, and run the process that owns this ticket. Over a period of time, the total execution time of each process should match the assigned proportion.

Lottery scheduling is probabilistically fair. The shorter the epoch, and the longer the measured duration, the more accurate and fair is the scheduling. To ensure fairness, when a process wins lottery and executes in an epoch, only to be blocked by I/O midway, the scheduler would give it more ticket in the next epoch to compensate.

Lottery scheduling doesn’t have starvation. As long as a process has some ticket, the probability of it getting executed is not zero.

Lottery scheduling is very responsive to changes in configuration, because any change in the allocation proportion is immediately reflected in the next epoch. Some scheduler, like the fair-use scheduler mentioned earlier, might take longer to adjust priorities.

Lottery scheduling has very low overhead. It just need to generate a random number and find the process that owns it. It takes ~1000 instructions to run scheduling; it takes ~10 for generating a random number, and the rest for finding the process. The processes are stored in a linked list, ordered by the number of tickets held.

Lottery scheduling handles priority inversion by allowing processes to transfer tickets to other process. Traditional schedulers would use priority inheritance: the higher priority process elevates the lower priority process temporarily to execute and release the lock that the higher priority process needs. It’s the same principle, but instead of elevating priority, a process lends its tickets.

Of course, there’s always a catch. Lottery scheduling isn’t very good at immediate, strict control over resources. Eg, in a real-time system, a very high priority task has to be executed immediately when it comes up. Lottery scheduling can’t run it immediately (epoch), and it can’t guarantee to run it (randomized).

Also, the simple lottery scheduling can’t express response time (maybe something needs to run immediately but won’t take a lot of CPU time). We can add another parameter to represent response time, in addition to CPU time allocation. Not exactly sure how that works though.

Nowadays, lottery scheduling isn’t used so much for CPU scheduling, but widely used in networking.

Epilogue

That was the last classic paper. For the rest of the course, we went through some more recent literature like Android, GFS, MapReduce, Haystack. Those are no less filled with interesting ideas, but this article is already so long and I want to stop here.

Incidentally, as I’m writing this, there’s only two days left in 2023. Judging from the tags, I started this article in February 15 this year. Back then I didn’t know it’ll take a whole year to finish; during half way I thought I’ll never finish this. But look where we are now! Persistence really do get things done eventually.

I also started a programming project at around the same time, and that project (after much head-scratches and typing late at night) is also coming to fruition around this time. Looking back, I can’t believe that I actually pulled both of these off in 2023, oh my!

Remap modifiers in Linux Desktop and Alacritty

2023-02-24T22:53:00.00-05:00

I’m used to macOS’s key binding, that means for a desktop environment, I want three things:

Caps lock act as Control
System bindings are on the Command key (ie, the Windows key), specifically, Command+C/V for copy/paste
In the terminal emulator, Command+C/V works as usual, and Ctrl+C/V sends respective control codes, as usual

I’m a simple man, and this is all I want, but Thy Voice From Above hath spoken: “lol no think again”

Command+C/V for copy and paste

Remapping Caps lock to Control is easy and there are plenty tutorials online for it. However, there is absolutely no way to change the default bindings of copy/paste on a Linux desktop reliably. Because there is simply no unified configuration for the keybinding of copy & paste. Qt supports rebinding copy & paste and Gtk straight up doesn’t support it¹. On top of that, applications bind their own keys and completely disregard the toolkit’s setting, except in some toolkit widgets they use, then you have different bindings within the same application.

Gtk 3 seems to support it through CSS themes, which is removed in Gtk 4. Anyway, I never got it to work.

The whole situation is pretty laughable, but live must go on. There are things like xkeysnail² that literally intercepts every keystroke you type and translate them into other keys depending on the application currently in focus. It requires some nontrivial configuration and may or may not work reliably on X11, definitely doesn’t work on Wayland³, and I don’t know how do I feel about a Python program running as root, intercepting and translating every key I type. There are Rust alternatives, but I didn’t have much luck with those either.

xkeysnail. There are also projects like kinto.sh that pre-configures it for you on both Linux and Windows. (On Windows it uses AutoHotkey.)

These type of program use X11 protocol, and Wayland just doesn’t support program intercepting and translating other program’s input.

The real way, the only good way, to do it is to just swap Control with Super (ie, Command) at X11 level. (Wayland picks it up so it works on Wayland too, or so I’m told). Since we also want to swap Caps lock and Control, we actually do a three-way swap:

Super → Control
Control → Caps lock
Caps lock → Super

So now when you press Command+C, the application gets Control+C.

To actually swap the modifiers, we edit

/usr/share/X11/xkb/keycodes/evdev

and reboot—no adding command to X init or some config file or some other crap. You edit the file, reboot, and it works, and keeps working. I learned this from a StackExchange question.

Below are the exact edit you need to make in that file, and their effect:

To map Left Control (keycode 37) to Caps lock:
Change <CAPS> = 66 to <CAPS> = 37

To map Left Super (keycode 133) to Control:
Change <LCTL> = 37 to <LCTL> 133

To map Caps lock (keycode 66) to Left Super:
Change <LWIN> = 133 to <LWIN> = 66

If you use Emacs, you need to swap Super and Control back. Add this to your init.el:

(setq x-super-keysym 'ctrl) (setq x-ctrl-keysym
'super)

Command+C/V in terminal

Now Command+C/V works in normal applications, but in terminal, Caps lock+C/V (appears as Super+C/V) will not send control keys and Command+C/V (appears as Control+C/V) will not do what you want—again, you need to swap Super and Control back, as we did for Emacs.

I looked at every terminal emulator on Linux, and Alacritty is the only one that allows remapping modifier keys, has sane configuration so that I can actually configure the remap, and has sane dependencies.

You want to remap all Control+x keys to simply x, except for Control+C/V/F, etc, which are bind to actions like Copy, Paste, SearchForward. And you want to remap all Super+x keys to Control+x. In effect, you have:

Command+C/V → Control+C/V → Copy/Paste
Caps lock+C/V → Super+C/V → Control+C/V

To do that, add this to the beginning of ~/.config/alacritty/alacritty.yml:

key_bindings: - { key: At, mods: Control, chars:
"@" } - { key: A, mods: Control, chars: "a" } - { key: B, mods: Control,
chars: "b" } - { key: C, mods: Control, action: Copy } - { key: D, mods:
Control, chars: "d" } - { key: E, mods: Control, chars: "e" } - { key: F,
mods: Control, action: SearchForward } - { key: F, mods: Control, mode:
~Search, action: SearchForward } - { key: F, mods: Control|Shift, action:
SearchBackward } - { key: F, mods: Control|Shift, mode: ~Search, action:
SearchBackward } - { key: G, mods: Control, chars: "g" } - { key: H,
mods: Control, chars: "h" } - { key: I, mods: Control, chars: "i" } - {
key: J, mods: Control, chars: "j" } - { key: K, mods: Control, chars: "k"
} - { key: L, mods: Control, chars: "l" } - { key: M, mods: Control,
chars: "m" } - { key: N, mods: Control, action: CreateNewWindow } - {
key: O, mods: Control, chars: "o" } - { key: P, mods: Control, chars: "p"
} - { key: Q, mods: Control, action: Quit } - { key: R, mods: Control,
chars: "r" } - { key: S, mods: Control, chars: "s" } - { key: T, mods:
Control, chars: "t" } - { key: U, mods: Control, chars: "u" } - { key: V,
mods: Control, action: Paste } - { key: W, mods: Control, action: Quit }
- { key: X, mods: Control, chars: Cut } - { key: Y, mods: Control, chars:
"y" } - { key: Z, mods: Control, chars: "z" } - { key: LBracket, mods:
Control, chars: "[" } - { key: Backslash, mods: Control, chars: "\\" } -
{ key: RBracket, mods: Control, chars: "]" } - { key: Grave, mods:
Control, chars: "^" } - { key: Underline, mods: Control, chars: "_" } - {
key: At, mods: Super, chars: "\x00" } - { key: A, mods: Super, chars:
"\x01" } - { key: B, mods: Super, chars: "\x02" } - { key: C, mods:
Super, chars: "\x03" } - { key: D, mods: Super, chars: "\x04" } - { key:
E, mods: Super, chars: "\x05" } - { key: F, mods: Super, chars: "\x06" }
- { key: G, mods: Super, chars: "\x07" } - { key: H, mods: Super, chars:
"\x08" } - { key: I, mods: Super, chars: "\x09" } - { key: J, mods:
Super, chars: "\x0a" } - { key: K, mods: Super, chars: "\x0b" } - { key:
L, mods: Super, chars: "\x0c" } - { key: M, mods: Super, chars: "\x0d" }
- { key: N, mods: Super, chars: "\x0e" } - { key: O, mods: Super, chars:
"\x0f" } - { key: P, mods: Super, chars: "\x10" } - { key: Q, mods:
Super, chars: "\x11" } - { key: R, mods: Super, chars: "\x12" } - { key:
S, mods: Super, chars: "\x13" } - { key: T, mods: Super, chars: "\x14" }
- { key: U, mods: Super, chars: "\x15" } - { key: V, mods: Super, chars:
"\x16" } - { key: W, mods: Super, chars: "\x17" } - { key: X, mods:
Super, chars: "\x18" } - { key: Y, mods: Super, chars: "\x19" } - { key:
Z, mods: Super, chars: "\x1a" } - { key: LBracket, mods: Super, chars:
"\x1b" } - { key: Backslash, mods: Super, chars: "\x1c" } - { key:
RBracket, mods: Super, chars: "\x1d" } - { key: Grave, mods: Super,
chars: "\x1e" } - { key: Underline, mods: Super, chars: "\x1f"
}

This configuration remaps all possible modifier keybindings available in a terminal environment⁴.

See this ASCII table.

Conclusion

At this point you should be able to copy & paste with Command+C/V in every application and terminal, and use Caps lock as Control in Emacs and terminal, as it should be.

Bonjour Crash Course

2023-02-23T00:37:00.00-05:00

Bonjour is Apple’s implementation of zeroconf networking. With Bonjour, you can plug in a printer into the local network and expect it to show up on computers in the network, without manually configuring anything. Linux’s implementation is Avahi.

I recently needed to use Bonjour for some project and read some documentation. This is an article summarizing some concepts one needs to know in order to use a Bonjour library. This article assumes some basic network knowledge (TPC/IP, DHCP, DNS, multicast, unicast, network layers, etc).

Everything in this article is based on Apple’s documentation at Bonjour Overview. (If you want to read it, I recommend starting with the “Bonjour Operations” section.)

Bonjour operates in the link-local network and provides three operations:

Registering services
Discovering available services
Resolving a service instance name to an address and port

Registering a service

When registering/publishing a service, you (or rather the library) creates a mDNS (multicast DNS) responder with three records: a service (SRV) record, a pointer (PTR) record, and a text (TXT) record. The text record is for providing additional information and is usually empty.

Service records

A service record maps the service name to the host name and the port of the service. It uses host name rather than an IP address so that the service can be on multiple IP addresses at the same time (eg, on both IPv4 and IPv6).

The full name of a service is made of three parts, the service instance name, the service type, and the domain, in the form of:

<service
instance name>.<service
type>.<domain>

The service instance name is a human-readable string showed to end-users, encoded in utf-8, and can be up to 63 bytes long.

The service type is made of the service type and the transport protocol, in the form of _type._protocol.. Eg, _ftp._tcp.. The underscore prefix is to distinguish from domain names. Bonjour basically uses the format described in RFC 2782.

Technically, both the type and the protocol are standardized. If you want to add a service type, you need to register it with IANA.

The domain name is just like an Internet domain name, eg, www.apple.com.. In addition, there is a pseudo domain, local., which refers to the link-local network. (So you have Bonjour to thank when you ssh to LAN hosts with <host>.local.)

Service instance name, service type, and domain name together make up the full name of a service instance. For example,

Alice’s Music
library._music._tcp.local.

Pointer records

A pointer record basically maps service types to full service names. Ie, it maps

<service
type>.<domain>

<service instance
name>.<service
type>.<domain>

This way you can search for a type of service and get a list of available service instances.

Publishing (advertising)

When publishing a service, a host will first make sure the intended service instance name is not taken by someone else, by broadcasting request to that service instance name: if there is a response, the name is taken. If someone else has taken it, the host will append a number to the service instance name and increment the number until it gets a name that no one is using.

If you use a library, this part is taken care for you. But it’s good to know how does Bonjour avoid name conflicts.

Discovering services

To discover service instances, you first request PTR records by mDNS, and get back a list of service instance names. …And that’s it. The host will save those names, and resolve a service name into actual address and port every time it needs to use the service.

Resolving service names

By the discovery step, we collected some service instance names that are available for us in the local network. The next step is to pick one, resolve it into an actual address and connect to it.

The host will send out a mDNS request for the service instance name, and get back a host name and a port. It then sends out a mDNS request for the host name and get an IP address. Now it can connect to the address on the port and start using the service.

Tree-sitter Starter Guide

2023-01-15T00:00:00.00-05:00

This guide gives you a starting point on writing a tree-sitter major mode. Remember, don’t panic and check your manuals!

Build Emacs with tree-sitter

You can either install tree-sitter by your package manager, or from
source:

git clone
https://github.com/tree-sitter/tree-sitter.git cd tree-sitter make make
install

To build and run Emacs 29:

git clone
https://git.savannah.gnu.org/git/emacs.git -b emacs-29 cd emacs
./autogen.sh ./configure make src/emacs

Require the tree-sitter package with (require 'treesit). Note that tree-sitter always appear as treesit in symbols. Now check if Emacs is successfully built with tree-sitter library by evaluating (treesit-available-p).

Tree-sitter stuff in Emacs can be categorized into two parts: the tree-sitter API itself, and integration with fontification, indentation, Imenu, etc. You can use shortdoc to glance over all the tree-sitter API functions by typing M-x shortdoc RET treesit RET. The integration are described in the rest of the post.

Install language definitions

Tree-sitter by itself doesn’t know how to parse any particular language. It needs the language grammar (a dynamic library) for a language to be able to parse it.

First, find the repository for the language grammar, eg, tree-sitter-python. Take note of the Git clone URL of it, eg, https://github.com/tree-sitter/tree-sitter-python.git. Now check where is the parser.c file in that repository, usually it’s in src.

Make sure you have Git, C and C++ compiler, and run the treesit-install-grammar command, it will prompt for the URL and the directory of parser.c, leave other prompts at default unless you know what you are doing.

You can also manually clone the repository and compile it, and put the dynamic library at a standard library location. Emacs will be able to find it. If you wish to put it somewhere else, set treesit-extra-load-path so Emacs can find it.

Tree-sitter major modes

Tree-sitter modes should be separate major modes, usually named xxx-ts-mode. I know I said tree-sitter always appear as treesit in symbols, this is the only exception.

If the tree-sitter mode and the “native” mode could share some setup code, you can create a “base mode”, which only contains the common setup. For example, there is python-base-mode (shared), and both python-mode (native), and python-ts-mode (tree-sitter) derives from it.

In the tree-sitter mode, check if we can use tree-sitter with treesit-ready-p, it will emit a warning if tree-sitter is not ready (tree-sitter not built with Emacs, can’t find the language grammar, buffer too large, etc).

Fontification

Tree-sitter works like this: It parses the buffer and produces a parse tree. You provide a query made of patterns and capture names, tree-sitter finds the nodes that match these patterns, tag the corresponding capture names onto the nodes and return them to you. The query function returns a list of (capture-name . node).

For fontification, we simply use face names as capture names. And the captured node will be fontified in their capture name (the face).

The capture name could also be a function, in which case (NODE OVERRIDE START END) is passed to the function for fontification. START and END are the start and end of the region to be fontified. The function should only fontify within that region. The function should also allow more optional arguments with &rest _, for future extensibility. For OVERRIDE check out the docstring of treesit-font-lock-rules.

Query syntax

There are two types of nodes: “named nodes”, like (identifier), (function_definition), and “anonymous nodes”, like "return", "def", "(", ";". Parent-child relationship is expressed as

(parent (child) (child) (child
(grand_child)))

Eg, an argument list (1, "3", 1) would be:

(argument_list "(" (number) (string) (number)
")")

Children could have field names:

(function_definition name:
(identifier) type: (identifier))

To match any one in the list:

["true" "false"
"none"]

Capture names can come after any node in the pattern:

(parent (child) @child)
@parent

The query above captures both the parent and the child.

The query below captures all the keywords with capture name
"keyword":

["return" "continue" "break"]
@keyword

These are the common syntax, check out the full syntax in the manual: Pattern Matching.

Query references

But how do one come up with the queries? Take python for an example, open any python source file, type M-x treesit-explore-mode RET. You should see the parse tree in a separate window, automatically updated as you select text or edit the buffer. Besides this, you can consult the grammar of the language definition. For example, Python’s grammar file is at

https://github.com/tree-sitter/tree-sitter-python/blob/master/grammar.js

Neovim also has a bunch of queries to reference from.

The manual explains how to read grammar files in the bottom of Language Grammar.

Debugging queries

If your query has problems, use treesit-query-validate to debug the query. It will pop a buffer containing the query (in text format) and mark the offending part in red. Set treesit--font-lock-verbose to t if you want the font-lock function to report what it’s doing.

Set up font-lock

To enable tree-sitter font-lock, set treesit-font-lock-settings and treesit-font-lock-feature-list buffer-locally and call treesit-major-mode-setup. For example, see python--treesit-settings in python.el. Below is a snippet of it.

Note that like the current font-lock system, if the to-be-fontified region already has a face (ie, an earlier match fontified part/all of the region), the new face is discarded rather than applied. If you want later matches always override earlier matches, use the :override keyword.

Each rule should have a :feature, like function-name, string-interpolation, builtin, etc. This way users can enable/disable each feature individually.

Read the manual section Parser-based Font-Lock for more detail.

Example from python.el:

(defvar
python--treesit-settings (treesit-font-lock-rules :feature 'comment
:language 'python '((comment) @font-lock-comment-face) :feature 'string
:language 'python '((string) @python--treesit-fontify-string) :feature
'string-interpolation :language 'python :override t '((interpolation
(identifier) @font-lock-variable-name-face)) ...))

In python-ts-mode:

(treesit-parser-create 'python) (setq-local
treesit-font-lock-settings python--treesit-settings) (setq-local
treesit-font-lock-feature-list '(( comment definition) ( keyword string
type) ( assignment builtin constant decorator escape-sequence number
property string-interpolation ) ( bracket delimiter function operator
variable))) ...
(treesit-major-mode-setup)

Concretely, something like this:

(define-derived-mode
python-ts-mode python-base-mode "Python" "Major mode for editing Python
files, using tree-sitter library. \\{python-ts-mode-map}" :syntax-table
python-mode-syntax-table (when (treesit-ready-p 'python)
(treesit-parser-create 'python) (setq-local
treesit-font-lock-feature-list '(( comment definition) ( keyword string
type) ( assignment builtin constant decorator escape-sequence number
property string-interpolation ) ( bracket delimiter function operator
variable))) (setq-local treesit-font-lock-settings
python--treesit-settings) (setq-local imenu-create-index-function
#'python-imenu-treesit-create-index) (setq-local
treesit-defun-type-regexp (rx (or "function" "class") "_definition"))
(setq-local treesit-defun-name-function #'python--treesit-defun-name)
(treesit-major-mode-setup) (when python-indent-guess-indent-offset
(python-indent-guess-indent-offset))))

Indentation

Indentation works like this: We have a bunch of rules that look like

(MATCHER ANCHOR OFFSET)

When the indenting a line, let NODE be the node at the beginning of the current line, we pass this node to the MATCHER of each rule, one of them will match the node (eg, “this node is a closing bracket!”). Then we pass the node to the ANCHOR, which returns a point (eg, the beginning of NODE’s parent). We find the column number of that point (eg, 4), add OFFSET to it (eg, 0), and that is the column we want to indent the current line to (4 + 0 = 4).

Matchers and anchors are functions that takes (NODE PARENT BOL &rest _). Matches return nil/non-nil for no match/match, and anchors return the anchor point. An Offset is usually a number or a variable, but it can also be a function. Below are some convenient builtin matchers and anchors.

For MATHCER we have

(parent-is TYPE) =>
matches if PARENT’s type matches TYPE as regexp (node-is TYPE) =>
matches NODE’s type (query QUERY) => matches if querying PARENT
with QUERY captures NODE. (match NODE-TYPE PARENT-TYPE NODE-FIELD
NODE-INDEX-MIN NODE-INDEX-MAX) => checks everything. If an
argument is nil, don’t match that. Eg, (match nil TYPE) is the same as
(parent-is TYPE)

For ANCHOR we have

first-sibling =>
start of the first sibling parent => start of parent parent-bol
=> BOL of the line parent is on. prev-sibling => start of
previous sibling no-indent => current position (don’t indent)
prev-line => start of previous line

There is also a manual section for indent: Parser-based Indentation.

When writing indent rules, you can use treesit-check-indent to
check if your indentation is correct. To debug what went wrong, set
treesit--indent-verbose to t. Then when you indent, Emacs
tells you which rule is applied in the echo area.

Here is an example:

(defvar
typescript-mode-indent-rules (let ((offset 'typescript-indent-offset))
`((typescript ;; This rule matches if node at point is ")", ANCHOR is the
;; parent node’s BOL, and offset is 0. ((node-is ")") parent-bol 0)
((node-is "]") parent-bol 0) ((node-is ">") parent-bol 0)
((node-is "\\.") parent-bol ,offset) ((parent-is "ternary_expression")
parent-bol ,offset) ((parent-is "named_imports") parent-bol ,offset)
((parent-is "statement_block") parent-bol ,offset) ((parent-is
"type_arguments") parent-bol ,offset) ((parent-is "variable_declarator")
parent-bol ,offset) ((parent-is "arguments") parent-bol ,offset)
((parent-is "array") parent-bol ,offset) ((parent-is "formal_parameters")
parent-bol ,offset) ((parent-is "template_substitution") parent-bol
,offset) ((parent-is "object_pattern") parent-bol ,offset) ((parent-is
"object") parent-bol ,offset) ((parent-is "object_type") parent-bol
,offset) ((parent-is "enum_body") parent-bol ,offset) ((parent-is
"arrow_function") parent-bol ,offset) ((parent-is
"parenthesized_expression") parent-bol ,offset)
...))))

Then you set treesit-simple-indent-rules to your rules, and call treesit-major-mode-setup.

Imenu

Set treesit-simple-imenu-settings and call treesit-major-mode-setup.

Set treesit-defun-type-regexp, treesit-defun-name-function, and call treesit-major-mode-setup.

C-like languages

[Update: Common functions described in this section have been moved from c-ts-mode.el to c-ts-common.el. I also made some changes to the functions and variables themselves.]

c-ts-common.el has some goodies for handling indenting and filling block comments.

These two rules should take care of indenting block comments.

((and (parent-is "comment")
c-ts-common-looking-at-star) c-ts-common-comment-start-after-first-star
-1) ((parent-is "comment") prev-adaptive-prefix
0)

standalone-parent should be enough for most of the cases where you want to "indent one level further", for example, a statement inside a block. Normally standalone-parent returns the parent’s start position as the anchor, but if the parent doesn’t start on its own line, it returns the parent’s parent instead, and so on and so forth. This works pretty well in practice. For example, indentation rules for statements and brackets would look like:

;; Statements in {} block. ((parent-is
"compound_statement") standalone-parent x-mode-indent-offset) ;; Closing
bracket. ((node-is "}") standalone-parent x-mode-indent-offset) ;;
Opening bracket. ((node-is "compound_statement") standalone-parent
x-mode-indent-offset)

You’ll need additional rules for “brackless” if/for/while statements, eg

if (true) return 0; else return
1;

You need rules like these:

((parent-is "if_statement") standalone-parent
x-mode-indent-offset)

Finally, c-ts-common-comment-setup will set up comment and filling for you.

Multi-language modes

Refer to the manual: Multiple Languages.

Common Tasks

M-x shortdoc RET treesit RET will give you a complete list.

How to...

Get the buffer text corresponding to a node?

(treesit-node-text node)

Don’t confuse this with treesit-node-string.

Scan the whole tree for stuff?

(treesit-search-subtree) (treesit-search-forward)
(treesit-induce-sparse-tree)

Find/move to to next node that...?

(treesit-search-forward node ...)
(treesit-search-forward-goto node ...)

Get the root node?

(treesit-buffer-root-node)

Get the node at point?

(treesit-node-at (point))

Tree-sitter in Emacs 29 and Beyond

2023-01-15T00:00:00.00-05:00

Emacs’ release branch is now on complete feature freeze, meaning absolutely only bug fixes can happen on it. Now is a good time to talk about the state of tree-sitter in Emacs: what do you get in Emacs 29, what you don’t, and what would happen going forward.

What’s in Emacs 29

From a pure user’s perspective, Emacs 29 just adds some new built-in major modes which look more-or-less identical to the old ones. There aren’t any flashy cool features either. That sounds disappointing, but there are a lot of new stuff under the hood, a solid base upon which exciting things can emerge.

If Emacs 29 is built with the tree-sitter library, you have access to most of the functions in its C API, including creating parsers, parsing text, retrieving nodes from the parse tree, finding the parent/child/sibling node, pattern matching nodes with a DSL, etc. You also get a bunch of convenient functions built upon the primitive functions, like searching for a particular node in the parse tree, cherry picking nodes and building a sparse tree out of the parse tree, getting the node at point, etc. You can type M-x shortdoc RET treesit RET to view a list of tree-sitter functions. And because it’s Emacs, there is comprehensive manual coverage for everything you need to know. It’s in “Section 37, Parsing Program Source” of Emacs Lisp Reference Manual.

Emacs 29 has built-in tree-sitter major modes for C, C++, C#, Java, Rust, Go, Python, Javascript, Typescript, JSON, YAML, TOML, CSS, Bash, Dockerfile, CMake file. We tried to extend existing modes with tree-sitter at first but it didn’t work out too well, so now tree-sitter lives in separate major modes. The tree-sitter modes are usually called xxx-ts-mode, like c-ts-mode and python-ts-mode. The simplest way to enable them is to use major-mode-remap-alist. For example,

(add-to-list
'major-mode-remap-alist '(c-mode . c-ts-mode))

The built-in tree-sitter major modes have support for font-lock (syntax highlight), indentation, Imenu, which-func, and defun navigation.

For major mode developers, Emacs 29 includes integration for these features for tree-sitter, so major modes only need to supply language-specific information, and Emacs takes care of plugging tree-sitter into font-lock, indent, Imenu, etc.

Fontification

In tree-sitter major modes, fontification is categorized into “features”, like “builtin”, “function”, “variable”, “operator”, etc. You can choose what “features” to enable for a mode. If you are feeling adventurous, it is also possible to add your own fontification rules.

To add/remove features for a major mode, use treesit-font-lock-recompute-features in its mode hook. For example,

(defun
c-ts-mode-setup () (treesit-font-lock-recompute-features '(function
variable) '(definition))) (add-hook 'c-ts-mode-hook
#'c-ts-mode-setup)

Features are grouped into decoration levels, right now there are 4 levels and the default level is 3. If you want to program in skittles, set treesit-font-lock-level to 4 ;-)

Language grammars

Tree-sitter major modes need corresponding langauge grammar to work. These grammars come in the form of dynamic libraries. Ideally the package manager will build them when building Emacs, like with any other dynamic libraries. But they can’t cover every language grammar out there, so you probably need to build them yourself from time to time. Emacs has a command for it: treesit-install-language-grammar. It asks you for the Git repository and other stuff and builds the dynamic library. Third-party major modes can instruct their users to add the recipe for building a language grammar like this:

(add-to-list 'treesit-language-source-alist
'(python
"https://github.com/tree-sitter/tree-sitter-python.git"))

Then typing M-x treesit-install-language-grammar RET python builds the language grammar without user-input.

Other features

Things like indentation, Imenu, navigation, etc, should just work.

There is no code-folding, selection expansion, and structural navigation (except for defun) in Emacs 29. Folding and expansion should be trivial to implement in existing third-party packages. Structural navigation needs careful design and nontrivial changes to existing commands (ie, more work). So not in 29, unfortunately.

Future plans

The tree-sitter integration is far from complete. As mentioned earlier, structural navigation is still in the works. Right now Emacs allows you to define a “thing” by a regexp that matches node types, plus optionally a filter function that filters out nodes that matches the regexp but isn’t really the “thing”. Given the definition of a “thing”, Emacs has functions for finding the “things” around point (treesit--things-around), finding the “thing” at point (treesit--thing-at-point), and navigating around “things” (treesit--navigate-thing). Besides moving around, these functions should be also useful for other things like folding blocks. Beware that, as the double dash suggests, these functions are experimental and could change.

I also have an idea for “abstract list elements”. Basically an abstract list element is anything repeatable in a grammar: defun, statement, arguments in argument list, etc. These things appear at every level of the grammar and seems like a very good unit for navigation.

Context extraction

There is also potential for language-agnostic “context extraction” (for the lack of a better term) with tree-sitter. Right now we can get the name and span of the defun at point, but it doesn’t have to stop there, we can also get the parameter list, the type of the return value, the class/trait of the function, etc. Because it’s language agnostic, any tool using this feature will work on many languages all at once.

In fact, you can already extract useful things, to some degree, with the fontification queries written by major modes: using the query intended for the variable query, I can get all the variable nodes in a given range.

There are some unanswered questions though: (1) What would be the best function interface and data structure for such a feature? Should it use a plist like (:name ... :params ...), or a cl-struct? (2) If a language is different enough from the “common pattern”, how useful does this feature remains? For example, there isn’t a clear parameter list in Haskell, and there could be several defun bodies that defines the same function. (3) Is this feature genuinely useful, or is it just something that looks cool? Only time and experiments can tell, I’m looking forward to see what people will do with tree-sitter in the wild :-)

Major mode fallback

Right now there is no automatic falling back from tree-sitter major modes to “native” major modes when the tree-sitter library or language grammar is missing. Doing it right requires some change to the auto-mode facility. Hopefully we’ll see a good solution for it in Emacs 30. Right now, if you need automatic fallback, try something like this:

(define-derived-mode python-auto-mode prog-mode
"Python Auto" "Automatically decide which Python mode to use." (if
(treesit-ready-p 'python t) (python-ts-mode)
(python-mode)))

Other plans

Existing tree-sitter major modes are pretty basic and doesn’t have many bells and whistles, and I’m sure there are rough corners here and there. Of course, these things will improve over time.

Tree-sitter is very different and very new, and touches many parts of Emacs, so no one has experience with it and no one knows exactly how should it look like. Emacs 29 will give us valuable experience and feedback, and we can make it better and better in the future.

If you are interested, get involved! Read Contributing to Emacs for some tips in getting involved with the Emacs development. Read Tree-sitter Starter Guide if you want to write a major mode using tree-sitter. And of course, docstrings and the manual is always your friend. If you have questions, you can ask on Reddit, or comment in this post’s public inbox (see the footer).

This Site is Changing its Domain

2022-11-18T00:00:00.00-05:00

Right now this site resides on archive.casouri.cat, I really love the .cat TLD. Alas, .cat was never meant for generic use and my site doesn't comply to its requirements, which is to use and promote Catalan language and culture. I don’t want to wake up one day recieving a take down notice, however slim the possibility is. Plus, the longer this site uses this domain, the more backlinks to it, the harder to move on.

Moving forward, this site will be on archive.casouri.cc. I'll keep the .cat domain around for a few years. In the meantime the .cat domain will redirect to the .cc domain by 301 redirect. The whole site is archived on the Wayback Machine. Hopefully someone in the future clicking on my .cat link knows about Wayback Machine and can view the page.

If you have a link to the .cat domain, you might want to edit the link to point to the new domain. Sorry for the incovenience! Fortunately there are so few links to my site :-)

NAT traversal: STUN, TURN, ICE, what do they actually do?

2022-03-20T20:26:00.00-05:00

When searching for NAT traversal I found all these protocols but no one can tell me what do they essentially do to traverse NAT, surely not by magic? Turns out it’s conceptually very simple.

What NAT traversal does is not really “punching holes” on the NAT, or delivering message through some tunnel, or some demotic portals, but to simply find the public address:port that can can reach me.

If I’m behind a NAT or even multiple NAT’s, my packets are relayed by these NAT’s and they appear on the public Internet at the out-most NAT’s address and port. And reply packets going to that address:port are relayed back to me. So, in some sense, I still got a public address:port that can reach me on the public Internet. The purpose of NAT traversal is to find that public address:port.

That’s basically what the initial/classic STUN (RFC 3489) does. You send a STUN server a message, the STUN server looks at the source IP address and port of the IP packet, and reply that back to you. Voilà, you know you public address:port!

Sometimes having that address:port isn’t enough, because many NAT poses extra restrictions¹. Then we have to resort to having a public-visible relay server in the middle, which is what TURN (RFC 5766) does.

Some NAT wouldn’t let a packet from an external host through if the host inside never sent a packet to that external host before. There are many ways a NAT could make your life difficult, check out “full cone”, “restricted cone”, “symmetric NAT”, etc.

As time goes by, STUN and TURN turns out to be still not enough. For one, one can usually find multiple address that could possibly work, but then which one to use? Eg, maybe a host has an IP assigned by a VPN, if the other host is also in the VPN, we should use this IP over the others; similarly, if the other host is in the same LAN, we should use the local IP; even over NAT, there could be multiple IP’s that can reach us.

ICE fills that gap. It gathers a bunch of address:port’s that possibly works (through STUN messages with STUN servers), sorts them by preference, and tries them one-by-one according to some algorithm², and reports to you the best one. If none works, it tries to establish a relay through TURN.

Or even gather candidates and try them out in the same time, instead of waiting for full candidates list before trying each out. This speeds up establishing connections and is called trickle ICE.

And here is where the new STUN comes in. People threw away the algorithm for finding address:port in classic STUN, and kept and extended the STUN message format. Now ICE runs a more thorough algorithm that uses STUN messages to communicate with STUN servers. And the new STUN (RFC 5389) just defines the STUN message format. There is a even newer version (RFC 8489) that updated RFC 5389 slightly, but with no fundamental changes.

Similarly, TURN is updated in RFC 8656 and now is a message protocol used by ICE rather than a standalone solution.

Using Fontsets in Emacs

2021-11-24T17:01:00.00-05:00

Fontset?

Fontset is a feature of Emacs that allows you to bundle together multiple fonts and use them as a single font, such that it covers more characters than a single font could have. For example, you can combine a Latin font, a Greek font and a Chinese font together.

With fontsets, we can use different Unicode fonts for different faces. For example, serif Latin and Chinese font for a “serif” face, and sans serif Latin and Chinese font for a “sans” face. Without fontsets, we can only set different Latin fonts to faces and use a single fall-back Chinese font.

Create a fontset

A fontset is recognized by its name. Each fontset has two names, one short and one long. The short name looks like fontset-xxx. The long name is a X Logical Font Description with last two fields being fontset and xxx. For example,

-*-ibm plex
mono-medium-*-*-*-13-*-*-*-*-*-fontset-my
fontset

Emacs come with three fontsets by default: fontset-startup, fontset-standard and fontset-default. We only care about fontset-default; it is the ultimate fall-back when Emacs cannot find a font to display a character. But more on that later.

To create a fontset, you can use create-fontset-from-fontset-spec and pass it a bunch of X Logical Font Descriptions, each for a font you want to include. I find that tedious. Instead, I like to create a fontset with a single ASCII font and use set-fontset-font to add other fonts later, like this:

(create-fontset-from-fontset-spec (font-xlfd-name
(font-spec :family "IBM Plex Mono" :size 13 :registry "fontset-my
fontset")))

Make sure you put the short fontset name under the :registry spec. The code above creates the fontset, and returns its long name,

-*-ibm plex mono-*-*-*-*-13-*-*-*-*-*-fontset-my
fontset

Now we can add a Chinese font and a Greek font:

(set-fontset-font
"fontset-my fontset" 'han (font-spec :family "Source Han Serif" :size
12)) (set-fontset-font "fontset-my fontset" 'greek (font-spec :family
"Academica"))

If you are not familiar with set-fontset-font, Emacs, fonts and fontsets is a good read.

Apply a fonset

Although the manual says we can use a fontset wherever a font is appropriate, it is not entirely true. If you pass your fontset through the :font attribute in set-face-attribute, Emacs takes the ASCII font from the fontset and only uses the ASCII font for the face¹. The real way to do it is to use the undocumented :fontset attribute:

(set-face-attribute 'some-face nil :fontset
"fontset-my fontset")

That’s not all. While the above code works for most faces, setting :fontset for default will not work as you expected, because Emacs again only takes the ASCII font, even if you use the fontset attribute². So don’t set the fontset for the default face; instead, just modify fontset-default (it’s the ultimate fall-back fontset we mentioned earlier) for Unicode fonts, and use whatever method you like for ASCII font. If you read Emacs, fonts and fontsets, you’ll know we can modify fontset-default by either

(set-fontset-font "fontset-default"
...)

(set-fontset-font t
...)

Technically you could set the font attribute of a frame to a fontset by set-frame-font and it works fine. But as soon as you change any font-related attributes in default face, like font size, your fontset in the frame attribute will be overwritten by the font derived from default face. So the best way is still to just modify fontset-default.

According to the source.

Basically, if the face is default, set-face-attribute calls set_font_frame_param (source), which only looks at the :font attribute (source).

Code Page 437

2021-10-23T00:11:00.00-05:00

So I was installing a new OS on my desktop machine, and for some technical reasons I need to install the OS manually. That means typing in a console. I couldn’t help but wonder: what the font is it showing?

I was typing in this

Turns out the typeface isn’t even a typeface. It is a encoding that extends ASCII. It maps 8-bit patterns to characters. For example, 10000110 corresponds to “å”. According to Wikipedia, It is the“standard character set of the original IBM PC”, and it “remains the primary set in the core of any EGA and VGA-compatible graphic cards”. Basically this is the most basic font on a personal computer, stored directly in hardware.

This character set is supposed to contain many characters including fancy ones like “⌠”, “☺”, “§”, etc. But my graphic card is missing most of the non-basic characters. (How disappointing!)

Many characters are missing

I don’t think this font is pretty or anything. What makes it so interesting to me is that it is such ubiquitous yet most people never notice it. Next time when your PC starts up or crashes¹, see if you can spot any message printed in this font.

You can even download the font file for this font: Code Page 437.

PS. this makes me wonder if Mac has something similar, and sure enough, there is. Someone asked about it on StackExchange. I bet even less people know about this one. I for one have never seen it despite using a MacBook for years (That’s probably a good thing, as one only see it when something goes hopelessly wrong.)

PPS. On Linux, you can drop yourself into a console by typing Ctrl+Alt+F1/F2/etc. Usually that screen is printed in Terminus.

You still use a PC, do you?

Dutch 801 Headline

2021-09-30T23:18:00.00-05:00

Today’s typeface isn’t really interesting in itself, but in the way I came across it. It’s a long story, are you ready? Ok, so I was reading assigned papers for my OS class, and I started on this one:

The title immediately caught my attention: it’s a elegant, graceful font. So I clipped an image and searched on myfont.com, and it turns out to be ... Dutch 801 Headline.

The end.

I don’t know about you, but isn’t it a rather strange name for a typeface? Why Dutch? Why 801? I still don’t know the answer. Anyway, I think its a cool name. Maybe the one who named it thought the same.

Despite its interesting name, information about this typeface is quite scarce. I only know it is Bitstream’s version of Times New Roman (ie, clone). I was kind of surprised when I found out, because I never associated Times New Roman with elegance. Maybe enlarging a font naturally releases it from its humble form, and brings out its gracefulness.

The title in its full glory

Another article title

As for the body text, I can only assume it to be Dutch 801 Text. I didn’t bother to check though.

Academica

2021-09-29T17:01:00.00-05:00

Academica is a typeface I found out when reading aeon (a digital magazine in Science and Humanities). Academica is designed by Josef Týfa for scientific texts. The original design was cut and cast in metal in 1968, and in 2003, Týfa and František Štorm worked together to rework it for digital printing.

Academica shares some similarities with Charter in tall x‑height and emphasize on legibility, but the similarity pretty much ends there. Comparing to Charter, Academica is considerably blacker. And comparing to Charter’s stoic stint on curves¹, Academica is lavishly rounded, tapered, bent, squished and stretched. In fact, I don’t even know why am I comparing it to Charter, Academica reminds me more of another typeface (that I dig), Cooper Black.

The alien-looking 0 is perhaps the most salient character (pun intended) in Academica. Instead of simply narrowing 0 to distinguish it from capital O, Academica “flipped” it such that the horizontal stroke is thicker than the vertical. The 0 is really the culmination of the overall vibe of Academica—little roundish goofiness here and there, slightly throwing the reader off; but when you zoom away, you see a legible, realistic academic typeface.

I love the color of Academica, it’s thiccc ;-) Use it for body text, and the dense, full color is beautiful. Looking at a block of Academica, you can almost feel the energy of live imbued in every corner. Also, the tall x-height means you can pack more lines into a page, increasing the information density.

Overall, Academica feels humane to me. It is a practical typeface for serious scientific publications, but in the same time has its very own quirky character. I’m very fond of it. It isn’t that expensive either. If you buy it on myfonts.com, each font costs $44 (at the time of writing). So regular, italic and bold² combined costs $132. That’s more than a cup of coffee, but still less than 20 cups (I think?)

Some more specimen:

Academica Text (Regular) in body text

Academica Book (Light) in slightly larger size

RFC: Emacs tree-sitter integration

2021-09-28T10:12:00.00-05:00

Tree-sitter is a incremental parser that can provide a concrete syntax tree for the source code and is fast enough to parse on each key press. It has supported a wide range of languages, and support for more languages is on the way.

I’ve been working on a integration of tree-sitter library into Emacs’ core. The integration consists of two parts, first the direct translate of tree-sitter’s API, second the integration with Emacs’ font-lock and indent system. The first part is completed and is rather uncontentious. I’d appreciate comments on the second: Is the interface easy to understand? Is it easy to use? Is it flexible enough for every language?

Whether you are a major mode author or just a interested Emacs user, I invite you to try hacking with this tree-sitter integration—recreate existing major mode features (font-lock, indent), create new features (structured editing, etc)—and tell me how well it works. Better yet, provide some suggestions on improving the interface.

Building Emacs with tree-sitter support

Install tree-sittter

First, install libtree-sitter, either by a package manager, or from source:

git clone
https://github.com/tree-sitter/tree-sitter.git cd tree-sitter make make
install

This should install libtree-sitter in standard location.

Build Emacs

Then, build Emacs from my GitHub repository. Make sure you clone the ts branch.

git clone https://github.com/casouri/emacs.git
--branch ts ./autogen.sh ./configure make

No need for special configure flags, tree-sitter is enabled automatically if libtree-sitter is present on the system. Now Emacs can be started by

src/emacs

Get language definitions

To use tree-sitter features in any meaningful way, we also need the language definition, eg, libtree-sitter-c for C. I wrote a script for automatically retrieving and compiling some of the libraries. The following commands

git clone
https://github.com/casouri/tree-sitter-module.git cd tree-sitter-module
./batch-new.sh

should produce libraries for C, JSON, Go, HTML, JavaScript, CSS and Python and store them in dist directory. From there you can copy these libraries to a standard path, or add that directory to LD_LIBRARY_PATH.

You can also find pre-built libraries in the release page: tree-sitter-module release v2.0.

Basic tree-sitter features

I suggest reading the tree-sitter node in the manual first, it covers how to create a parser, how to retrieve a node, how to pattern match nodes, and more. You can access the manual by typing

C-h i m elisp RET g Parsing
Program Source RET

The command(s) above opens the Info reader, goes to Elisp Reference Manual, and opens the “Parsing Program Source” node, which contains manual for tree-sitter. Alternatively, you can read the tree-sitter node that I clipped from the HTML manuel.

Once you’ve read the manual, you can (require 'tree-sitter) and hack away!

The manual only documents basic features of tree-sitter, leaving out font-lock and indent integration, because I expect the latter to change. They are instead documented below.

Font-lock interface

(From now on, I assume you have read the manual and I will use concepts introduced in the manual without explanation.)

If you are familiar with font-lock in Emacs, you know it is primarily configured by font-lock-defaults: major mode sets this variable with language-specific configuration, font-lock takes that variable and populate font-lock-keywords, which directly defines the pattern to fontify.

`tree-sitter-font-lock-settings`

Tree-sitter¹ provides two analogues variables, tree-sitter-font-lock-defaults and tree-sitter-font-lock-settings. tree-sitter-font-lock-settings is a list of SETTINGs where each SETTING looks like

(LANGUAGE
QUERY)

LANGUAGE is the language this setting should use, and QUERY is either a string or a sexp query. Each capture name in QUERY is either a face name, in which case the captured node is fontified in that face, or a function name, in which case the captured node is passed to the function for fontification. Specifically, the function is passed three arguments (BEG END NODE), where BEG and END is the beginning and end position of the node in the buffer, for convenience.

An example SETTING for C is

(tree-sitter-c ; LANGUAGE ((null)
@font-lock-constant-face (true) @font-lock-constant-face (false)
@font-lock-constant-face)) ; QUERY

From now on, “tree-sitter” refers to the Emacs integration of tree-sitter.

`tree-sitter-font-lock-defaults`

Tree-sitter font-lock, like font-lock, support fontification at different levels of decoration (controlled by font-lock-maximum-decoration). And this is the primary purpose of tree-sitter-font-lock-defaults. Its value is a list of

(DEFAULT :KEYWORD
VALUE...)

Where each DEFAULT may be a symbol or a list of symbols. The symbol should be either a variable containing (LANGUAGE QUERY), or a function that returns that. If DEFAULT is a list, each symbol corresponds to a decoration level. For example, if I want to implement three levels of decoration for C, I would populate tree-sitter-font-lock-defaults with

(((c-font-lock-settings-1
c-font-lock-settings-2 c-font-lock-settings-3) :KEYWORD
VALUE...))

where c-font-lock-settings-1 would contain, say,

(tree-sitter-c ((null)
@font-lock-constant-face (true) @font-lock-constant-face (false)
@font-lock-constant-face))

for those who need no more. And the other two levels could be for the rest mortals. As for :KEYWORD and VALUE, they are analogues to that in font-lock-defaults, used for specifying other configurations. Currently they are not used for tree-sitter font-lock.

To enable tree-sitter font-lock, a major mode should first assign tree-sitter-font-lock-defaults, then call tree-sitter-font-lock-enable. For example,

(define-derived-mode
ts-c-mode prog-mode "tree-sitter C" (setq-local
tree-sitter-font-lock-defaults '((ts-c-tree-sitter-settings-1)))
(tree-sitter-enable-font-lock))

Indentation

In Emacs, indentation is provided by indent-line-function. Tree-sitter provides a convenient system, tree-sitter-simple-indent, to simplify the implementation of a indenting function. To use it, bind indent-line-function to tree-sitter-indent, and fill in indentation configurations in tree-sitter-simple-indent-rules.

tree-sitter-simple-indent-rules is a list of rules, and each rule looks like

(MATCHER ANCHOR OFFSET)

When indenting, tree-sitter-simple-indent finds the largest node that starts at the beginning of the current line, and matches it against each MATCHER in tree-sitter-simple-indent-rules. If MATCHER matches that node, ANCHOR and OFFSET determines how to indent—find the column of ANCHOR (which represents a point), and add OFFSET to it.

By now you must be wondering what the heck is MATCHER. It is a function that takes (NODE PARENT BOL &rest _) as arguments, if the rule should apply to NODE, it returns non-nil. PARENT and BOL (position of beginning of line) are provided just for convenience. The “&rest _” part is required to allow the possibility to extend the interface in the future.

This function can do anything: check the type of that node, check the type of its parent, check whether this node is the first child node of its parent, etc. ANCHOR is also a function that takes theses arguments, but it returns a point, the “anchor”. If the rule determines that the node should be indented two columns inward comparing to its parent, ANCHOR should return the start of the parent node, and OFFSET should be 2.

For example, the following rule matches any line that starts with the null keyword, and indents the line inwards by two columns against the null’s parent node.

((lambda (n p bol &rest _) (equal
(tree-sitter-node-type n) "null")) ; MATCHER (lambda (n p bol
&rest _) (tree-sitter-node-start (tree-sitter-node-parent n))) ;
ANCHOR 2) ; OFFSET

Of course, it is terribly tedious to write out every MATCHER and ANCHOR explicitly. tree-sitter-simple-indent provides some predefined MATCHER and ANCHOR functions. Most of them are higher-order functions: they takes an argument and returns a function.

MATCHER presets:

(parent-is TYPE): Check that the parent has type TYPE.
(node-is TYPE): Check that node has type TYPE.
(match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX): NODE-TYPE checks for node’s type, PARENT-TYPE checks for parent’s type, NODE-FIELD checks for the field name for node int the parent, NODE-INDEX-MIN and NODE-INDEX-MAX limits the node’s index in the parent. Any argument left as nil are not checked. For example, to match the node that is the first child and has a parent of type argument_list, use
(match nil "argument_list" nil nil 0 0)
(query QUERY): Queries the parent with QUERY. Matches if the node is captured by any capture name.
no-node: Matches null node. When the current line is empty, there is no node at the beginning, so the node is nil.

ANCHOR presets:

first-child: Finds the first sibling of node, ie, the first child of the parent.
parent: Finds the parent node.
prev-sibling: Finds node’s first sibling.
no-indent: Do nothing, don’t indent. This is useful for a indenting a line inside a multiline string, where masterful inactivity is most preferred.
prev-line: Find the named node on the previous line. This can be used when indenting an empty line: just indent like the previous node.

Some handy tools

I have two handy tools for you to work with tree-sitter more easily: first, tree-sitter-inspect-mode will show the relevant information of the node at point in the mode-line; second, tree-sitter-check-indent can check the indent result against a stock major mode. Check out their docstring for more detail.

Feedback

You can send a message to emacs-devel, or open an issue on the GitHub repository.

An example

All these must be pretty confusing without seeing a concrete example, so here it is. This example code is for a demo C major mode, ts-c-mode, defined in the “;;; Lab” section in tree-sitter.el. (Here is a link to the file on GitHub.)

Indent:

(defvar ts-c-tree-sitter-indent-rules
`((tree-sitter-c ;; Empty line. (no-node prev-line 0) ;; Function/struct
definition body {}. ((match nil "function_definition" "body") parent 0)
((node-is "field_declaration_list") parent 0) ;; Call expression.
((parent-is "call_expression") parent 2) ;; If-else. ((match nil
"if_statement" "condition") parent 2) ((match nil "if_statement"
"consequence") parent 2) ((match nil "if_statement" "alternative") parent
2) ((match nil "switch_statement" "condition") parent 2) ((node-is
"else") parent 0) ;; Switch case. ((parent-is "case_statement") parent 2)
((node-is "case_statement") parent 0) ;; { and }. ((node-is
"compound_statement") parent 2) ((node-is "}") parent 0) ;; Multi-line
string. ((parent-is "string_literal") no-indent 0) ;; List. ,@(cl-loop
for type in '("compound_statement" "initializer_list" "argument_list"
"parameter_list" "field_declaration_list") collect `((match nil ,type nil
0 0) parent 2) collect `((match nil ,type nil 1) first-sibling
0)))))

Font-lock:

(defvar ts-c-tree-sitter-settings-1 '(tree-sitter-c
((null) @font-lock-constant-face (true) @font-lock-constant-face (false)
@font-lock-constant-face (comment) @font-lock-comment-face
(system_lib_string) @ts-c-fontify-system-lib (unary_expression operator:
_ @font-lock-negation-char-face) (string_literal) @font-lock-string-face
(char_literal) @font-lock-string-face (function_definition declarator:
(identifier) @font-lock-function-name-face) (declaration declarator:
(identifier) @font-lock-function-name-face) (function_declarator
declarator: (identifier) @font-lock-function-name-face) (init_declarator
declarator: (identifier) @font-lock-variable-name-face)
(parameter_declaration declarator: (identifier)
@font-lock-variable-name-face) (preproc_def name: (identifier)
@font-lock-variable-name-face) (enumerator name: (identifier)
@font-lock-variable-name-face) (field_identifier)
@font-lock-variable-name-face (parameter_list (parameter_declaration
(identifier) @font-lock-variable-name-face)) (pointer_declarator
declarator: (identifier) @font-lock-variable-name-face) (array_declarator
declarator: (identifier) @font-lock-variable-name-face)
(preproc_function_def name: (identifier) @font-lock-variable-name-face
parameters: (preproc_params (identifier) @font-lock-variable-name-face))
(type_identifier) @font-lock-type-face (primitive_type)
@font-lock-type-face "auto" @font-lock-keyword-face "break"
@font-lock-keyword-face "case" @font-lock-keyword-face "const"
@font-lock-keyword-face "continue" @font-lock-keyword-face "default"
@font-lock-keyword-face "do" @font-lock-keyword-face "else"
@font-lock-keyword-face "enum" @font-lock-keyword-face "extern"
@font-lock-keyword-face "for" @font-lock-keyword-face "goto"
@font-lock-keyword-face "if" @font-lock-keyword-face "register"
@font-lock-keyword-face "return" @font-lock-keyword-face "sizeof"
@font-lock-keyword-face "static" @font-lock-keyword-face "struct"
@font-lock-keyword-face "switch" @font-lock-keyword-face "typedef"
@font-lock-keyword-face "union" @font-lock-keyword-face "volatile"
@font-lock-keyword-face "while" @font-lock-keyword-face "long"
@font-lock-type-face "short" @font-lock-type-face "signed"
@font-lock-type-face "unsigned" @font-lock-type-face "#include"
@font-lock-preprocessor-face "#define" @font-lock-preprocessor-face
"#ifdef" @font-lock-preprocessor-face "#ifndef"
@font-lock-preprocessor-face "#endif" @font-lock-preprocessor-face
"#else" @font-lock-preprocessor-face "#elif" @font-lock-preprocessor-face
)))

Don’t Use Rubber Pin Backings on Backpacks

2021-09-13T22:01:00.00-05:00

If you like enamel pins, you know there are three types of backings: rubber, butterfly, and “secure/locking backing”. I grew up with butterfly backings, but nowadays, when you buy a enamel pin, more often than not, it comes with rubber backings.

Rubber backings—they feel insecure at first sight, but then the difficulty to remove one from the packaging might give you a false sense of security. Let me tell you: don’t trust them. My mistrust lost me a pin on my backpack. Thankfully I only lost one—the others are at most loose or missing one of two backings. Still, that’s enough proof that rubber backings are not suitable for surfaces that see a lot of movement, for example, a backpack.

Butterfly backings have their own problems: under stress or repeated use, they might loose the little metal butterfly wings. On top of that, they aren’t that much more secure than rubber backings. They won’t gradually loosen by time like the rubber backings do, but they can come loose or break under force.

That left the “secure/locking” backings. They can be a bit of pain when putting on and taking off—you need to get a feel for them. But they are secure. Buy a box of them from Amazon for a couple bucks, and your precious pins won’t gone missing from your backpack again.

自动处理网页里的全角引号和标点挤压

2021-09-03T13:15:00.00-05:00

全角引号

在 Unicode 里，问号、叹号、各种括号都有全角半角两种版本，各自有独立的编码；但因为莫名的原因，最常用的引号却不在此列。中英混排的时候想要正确显示直角和半角的引号就很头疼；搞不好的话，中文里显示半角引号还不算太违和，英文里蹦出来一个全角引号就太丑了。

CSS 没法自动区别什么时候用全角引号、什么时候用半角，只能靠标记。好在还没复杂到需要手工标记的地步，只要用程序检查引号前后的字是中文还是英文，以此标记全角还是半角，就基本不会出错。我现在的办法是这样，默认字体还是英文先中文后：

body { font-family: Charter, Source Han Serif CN,
serif; }

需要全角的引号用 span 标签包起来：

<span
class="full-width-quote">“</span>

然后用 CSS 指定中文字体：

span.full-width-quote { font-family: Srouce Han
Serif CN, serif; }

怎么区别一个引号应该全角还是半角呢？我用了一个简单的判断方法：如果前或后紧挨着中文字符，就全角；如果前后都不是中文字符，就半角。我目前还没发现这个简单判断不够用的情况。这样一来还需要判断一个字符是不是中文，最简单的办法是检查字符的 Unicode codepoint 在不在中文区间内。常用汉字和标点符号在 0x4E00–0x9FFF 和 0x3000–0x303F 两个区间里，检查这两个就够了，其他的区间里都是生僻字。

标点挤压

全角引号搞好了，又会贪心标点挤压。没有标点挤压的时候，几个标点排在一起确实不大好看：

余日摇滚第48期

挤压以后就不那么空了：

余日摇滚第48期

原理是设置 CSS 属性 font-feature-settings: "halt"，启用 OpenType 的 halt 特性。和全角引号一样，用程序自动识别需要挤压的标点，包在 span 标签里。要注意的是，你用的字体要有 halt 这个特性才行，我用的思源宋体是有的。

具体怎么挤压标点符号，我没找到现成的标准或者算法，下面是我的方法。这个方法并不完整，只处理比较常见的情况，但对我来说够用了。如果读者知道更好的算法，请一定告诉我。

首先，能挤压的标点符号可以分为三类：靠左，靠右，居中：

《中文排版需求》，W3C Working Draft 01 November 2020，3.1.6 标点符号的宽度调整，有修改

我们不考虑居中的符号，因为简体中文普遍不用，而我以简体中文写作。程序从头到尾遍历每个字符，决定每个字符要不要挤压。挤不挤压取决于这个字符和其前后的字符，以伪码表达为：

遍历 字符： 如果 此字符为靠左标点 且 后一字符为标点： 挤压此字符 如果 此字符为靠右标点 且
前一字符为靠右标点： 挤压此字符

这个算法运行的结果是这样：（（文字）），（文）「字」。

如果你用 pyftsubset 压缩过字体文件¹，注意它默认会把 halt 这样的 OTF 特性扔掉，这样一来即使加上挤压标签也没有效果。压缩的时候加上 --layout-features='*' 这个选项就可以保留所有 OTF 特性了。也可以用 --layout-features='halt' 只保留 halt 特性。

参见 Reduce Font Loading Time in My Blog。

破折号

我还发现破折号有时会显示成 em dash（因为破折号在 Unicode 里其实就是 em dash）。解决方法和全角引号一样，包上全角的 span 标签就可以了——这样就能正确显示破折号。

Notes

Peer-to-peer Connection with WebRTC in Rust Using webrtc-rs

Overall structure

WebRTC

Authentication

Code

Signaling server

Cargo.toml

ICE

DTLS

SCTP

Conclusion

Appendix A, a rcgen pitfall

Classic Systems Papers: Notes for CSE 221

THE System

Nucleus

HYDRA

TENEX

MULTICS

Protection

UNIX

Plan 9

Medusa

Pilot

Monitor

V Kernel

Sprite

Grapevine

Global memory

μ-kernel

Exokernel

Xen

VMS

Mach

FFS

LFS

Soft update

Rio

Scheduler activation

Lottery scheduling

Epilogue

Remap modifiers in Linux Desktop and Alacritty

Command+C/V for copy and paste

Command+C/V in terminal

Conclusion

Bonjour Crash Course

Registering a service

Service records

Pointer records

Publishing (advertising)

Discovering services

Resolving service names

Tree-sitter Starter Guide

Build Emacs with tree-sitter

Install language definitions

Tree-sitter major modes

Fontification

Query syntax

Query references

Debugging queries

Set up font-lock

Indentation

Imenu

Navigation

C-like languages

Multi-language modes

Common Tasks

Tree-sitter in Emacs 29 and Beyond

What’s in Emacs 29

Fontification

Language grammars

Other features

Future plans

Navigation

Context extraction

Major mode fallback

Other plans

This Site is Changing its Domain

NAT traversal: STUN, TURN, ICE, what do they actually do?

Using Fontsets in Emacs

`tree-sitter-font-lock-settings`

`tree-sitter-font-lock-defaults`