Richard's Blog

Selected Sequence CRDTs

Richard Larocque — Sun, 15 Apr 2018 10:05:31 GMT

When reading Designing Data-Intensive Applications, I was intrigued by the brief mention of Conflict-free replicated Datatypes, or CRDTs. These data structures can be copied to many computers in a network, where they undergo concurrent modification, and later be merged back together into a result that accurately reflects all of those concurrent modifications. The simplest such type is an incrementing counter, but there exist CRDTs for more complicated structures like lists and maps.

This reminded me of an interesting problem I worked on long ago. Abstractly, the problem was to allow concurrent edits to an ordered list in an environment where the computers making those changes were sometimes disconnected from the network, making coordination impossible. We came up with a solution, but didn't really have a name for it. I've long wondered whether or not there existed any well-known solutions to this problem.

It turns out that our solution was actually a "Sequence CRDT". These CRDTs are a topic of active research and have applications in collaborative document editing. If you can allow distributed edits to a sequence of characters, then you've solved one of the hairier problems of building a "Google Docs"-like collaborative editor.

Now that I know where to look, I can compare our approach to more well-known academic solutions to the same problem.

Introductions

All three of the solutions I'll be discussing have certain ideas in common.

Every scheme involves some sort of "position identifier" to represent a single element's position within a sequence. These identifiers must be unique and densely ordered.

Somewhat informally, given two positions $x$ and $y$ representing the positions of two elements in some sequence, then:

$x \neq y$ (uniquness); and
assuming $x < y$, there exists some $z$ where $x < z < y$ (density).

The use of position identifiers allows the position of an element can be specified entirely within the element itself, thus avoid many of the perils of conflicting changes being made without coordination.

If we had used a linked list structure, tracking successors or predecessor to define the sequence ordering, we might find ourselves in trouble if that predecessor or successor had been moved by another agent while we made an offline change.

If we had used a non-dense set of identifiers, like the set of integers, we might eventually find ourselves with $x$ and $y$ having values 23 and 24 respectively, and we would be unable to find any integer value between them.

If we have density but lack uniqueness guarantees, it's possible that user A and user B take similar actions at the same time that create positions $x$ and $y$ where $x = y$, but then you have the uncomfortable fact that there is no $z$ where $x < z < y$.

With these constraints in place, it is possible to write functions that generate positions for inserting elements between any two existing elements within the list. These new elements can be merged into other replicas of the sequence that may have undergone different modifications, and their relative positioning will remain unchanged.

Logoot

The paper introducing Logoot^[1] includes a good explanation of the underlying theory and the applications to collaborative document editing. My description of it will involve lots of non-rigorous hand-waving, so please refer to the paper if you want a more complete explanation.

In Logoot, the position consists of a list of (Integer, SiteId) tuples. The integer values are chosen for sequence ordering purposes, while the SiteId ensures that every position value is unique.

For example, we might have positions $x$ and $y$ that look like this:

$$
x=[(10,s_1),(5,s_2),(7,s_3)]
$$

$$
y = [(10, s_4), (6, s_6)]
$$

(where $s_n$ is some arbitrary site identifier)

Comparison of position identifiers is implemented by comparing the integer values at each position in the list from left to right. The SiteId values are used as a tie-breaker if the integer values are equal. The SiteId is generated when the element is created by combining a client-specific unique identifier and a client-local logical clock that is incremented with every new position brought into existence on that client, which guarantees no two SiteId values will ever be equal.

If we wanted to insert a position z between the two positions, it might look something like this:
$$
z = [(10, s_7), (5, s_8), (3, s_9)]
$$
(the last element could be omitted if $s_8 > s_2$, but this choice of $z$ works for any choice of $s_8$ and $s_2$.)

This is one of many possible $z$ values. It turns out that the algorithm used to choose among the many possible in-between values can have a big impact on performance. If the choice is made poorly, or a worst-case scenario is encountered, the length of these lists can grow without bound.

That problem forms the motivation for looking at the next algorithm.

LSEQ

The LSEQ paper^[2] includes a thorough discussion of allocation strategies and their consequences.

It refers to one of the allocation strategies recommended by the Logoot paper as the "boundary" allocation strategy, in that tended to stick to the "leftward" edge of possible position choices. That strategy is very effective when new elements are generally inserted at the end of the sequence, as is typical in document editing. But it performs poorly when new elements are added to the beginning of the sequence.

LSEQ uses two allocation strategies: "boundary+" and "boundary-". The former is a re-branding of Logoot's original "boundary" strategy and the latter is its opposite. The former favors right-insertion and the latter favors left-insertion.

LSEQ also introduces the concept of different "bases" at each level of the list. Logoot uniformly used "Integers" in its list elements (presumably meaning uint32_t or uint64_t). LSEQ doubles the available position-space with every list element. So one could have the first element in the range (0..8), the second in range (0..16), third in range (0..32), and so on.

This is combined with the clever idea of selecting different allocation strategies at each level. This ensures that, even if a particular edit-pattern is a worst-case at one level, the position IDs being created by that pattern will soon overflow to the next level, where that same editing pattern may well be a best-case scenario.

The paper ends with some benchmarks on real-world data (Wikpedia edits) that confirm this strategy does in fact perform better than Logoot in many cases.

UniquePositions

UniquePositions are a completely different implementation of the sequence CRDT concept.

The position identifiers are intended to be stored and compared as C strings. This has the benefit of making them compatible with existing systems that may not support more complex comparison algorithms. And since strcmp(3) is highly optimized on most systems, it's likely the comparison will be reasonably fast, too.

The uniqueness comes from a guaranteed-unique suffix that forms the last 28 bytes of the string. These are generated when the position identifier first comes into existence and depend on a large number of random bits to ensure uniqueness.

The algorithm for inserting between two existing positions is rather convoluted because of the complexity of keeping that unique suffix around while also trying to minimize ID growth.

All of this adds up to an algorithm that's kind of similar in behavior to Logoot, in that it suffers from unbounded identifier growth. We'll deal with that in the next section.

Compressing UniquePositions

If we look at the positions that result from worst-case edit patterns, we see what looks like a compression problem. The strings tend to grow to have large prefixes of 0xff, 0xff, 0xff, ... or 0x00, 0x00, 0x00, .. depending on whether the insertion is repeated on the right or left. This suggests that some form of run-length encoding might be appropriate.

So we start with textbook run-length encoding. When we see more than four characters of the same value x and certain alignment criteria are met, we can encode them as [x, x, x, x] ++ <32-bit big-endian count of x>. This is a wasteful encoding if there are only 4-7 of these characters present; but it's quite effective at compressing long runs of repeated characters up to 2^32 characters long.

But this means we'd lose the nice property we had earlier where our comparison function was simple string comparison. What we'd like is some compression function C that comes with a guarantee that x < y implies C(x) < C(y).

Taking some key ideas from a paper called "Order-Preserving Key Compression"^[3], we allow two different ways to encode the same count. Rather than having a simple 32-bit unsigned count, we allow counts up to 31-bits to be encoded either as c, the count itself, or 2^32 - c. For decompression purposes either encoding represents the same count.

The choice of encoding is not arbitrary. It is chosen to ensure the compressed strings have the same sort-ordering as their uncompressed equivalents. From the source code:

When producing a repeated character block, the count encoding must be chosen in such a way that the sort ordering is maintained. The choice is best illustrated by way of example:

When comparing two strings, the first of which begins with of 8 instances of the letter 'B' and the second with 10 instances of the letter 'B', which of the two should compare lower? The result depends on the 9th character of the first string, since it will be compared against the 9th 'B' in the second string. If that character is an 'A', then the first string will compare lower. If it is a 'C', then the first string will compare higher.

The key insight is that the comparison value of a repeated character block depends on the value of the character that follows it. If the character follows the repeated character has a value greater than the repeated character itself, then a shorter run length should translate to a higher comparison value. Therefore, we encode its count using the low encoding. Similarly, if the following character is lower, we use the high encoding.

In the end, we get the desired shortening of long-repeated strings of 0xffs caused by right-insertion and 0x00s caused by left-insertion.

Charts and Conclusions

I built some micro-benchmarks so I could test out UniquePosition against Logoot and LSEQ. Here is

The uncompressed variants of UniquePosition grow almost as quickly as Logoot, which is not good. Those 10KB position identifiers would be likely to create challenges for any normal use case. But we can also see that LSEQ and compresed unique positions both do a very good job of keeping the identifier size growth under control.

That said, I would definitely prefer LSEQ to UniquePosition in new projects. The current implementation of unique positions requires decompressed positions when performing inserts. Although it's only temporary, the additional allocations and O(n) comparisons performed on these long positions would be rather expensive. LSEQ has no such problem.

References

https://hal.inria.fr/inria-00432368 "Stéphane Weiss, Pascal Urso, Pascal Molli. Logoot: A Scalable Optimistic Replication Algorithm for Collaborative Editing on P2P Networks. 29th IEEE International Conference on Distributed Computing Systems - ICDCS 2009, Jun 2009, Montreal, Canada. IEEE, pp.404-412, 2009, 2009 29th IEEE International Conference on Distributed Computing Systems. < http://www.computer.org/portal/web/csdl/doi/10.1109/ICDCS.2009.75 > < 10.1109/ICDCS.2009.75 > < inria-00432368 > ↩︎
https://hal.archives-ouvertes.fr/hal-00921633 "Brice Nédelec, Pascal Molli, Achour Mostefaoui, Emmanuel Desmontils. LSEQ: an Adaptive Structure for Sequences in Distributed Collaborative Editing. 13th ACM Symposium on Document Engineering (DocEng), Sep 2013, Florence, Italy. pp.37–46, 2013, < 10.1145/2494266.2494278 >. < hal-00921633 >" ↩︎
http://bitsavers.trailing-edge.com/pdf/dec/tech_reports/CRL-94-3.pdf "Gennady Antoshenkov, David B. Lomet, and James Murray. 1996. Order Preserving Compression. In Proceedings of the Twelfth International Conference on Data Engineering (ICDE '96), Stanley Y. W. Su (Ed.). IEEE Computer Society, Washington, DC, USA, 655-663. " ↩︎

Web Service Concurrency Shoot-out

Richard Larocque — Sun, 07 Jan 2018 08:58:14 GMT

I work mainly on web services, mainly in Ruby. Our team is pretty happy with it. We have no intention of rewriting existing Ruby code into other languages.

However, we're running into cases where Ruby is not the right tool for the job. That has led us to search for a general purpose "fast" language that we can use as an alternative.

This post focuses on a particular aspect of "fast": support for threads and concurrent execution. Ruby and Python are sometimes criticized for their global interpreter locks that prevent them from using more than one CPU at a time. But how does this matter in practice for web services, and how much better is the competition?

Before we discuss the details, let's look more closely at our use case.

Web Services

As a web service, our number one metric for throughput is requests serviced per second. To optimize throughput, we can either serve individual requests more quickly (ie. reduce latency), or serve more requests concurrently.

Reducing latency is clearly the better option. Lower request time usually translates to happier users and decreased server costs. But this approach tends to hit diminishing returns pretty quickly.

Reducing latency in a typical web service is difficult because most of the request time is not dependent on the server. The server will do some processing, translate the request into a database query, forward it along, wait for a result, then re-encode the result and pass it back to the user. Most of the request time is spent in the "waiting for database result" step. If waiting for the database makes up 90% of the request latency, then optimizing the non-database, web-server-dependent parts will at most result in a 10% latency speedup (by Amdahl's Law).

So we turn our attention to the second optio: concurrency. Here we see a big opportunity to make an impact. Since a request that's blocked waiting for a response from a remote service requires no CPU time, we could in theory support hundreds of them at once. Maybe even tens of thousands of requests are not outside the realm of possibility.

This is a exciting possibility. If we could process thousands of requests in parallel (compared to the 40ish concurrent requests per CPU that our Rails servers are currently configured to support) we could serve the same amount of traffic with much fewer server instances.

But it turns out that serving ten thousand connections simultaneously is a pretty hard problem. In fact, they even have a name for it: the c10k problem.

Let's look at some common technologies to see how they've approached this problem and how well it scales.

MRI Ruby

Ruby 1.8

In Ruby 1.8, you could call Thread.new as often as you like, and top would still (correctly) show no new OS threads. Threads in Ruby 1.8 were more like "green threads" or "fibers". The OS had no idea how many of these Ruby 1.8 threads your program is running at a time.

This model has some drawbacks. Normally, when a thread makes a blocking call like sleep(2) or read(2), the OS will put that thread to sleep so it can give CPU time to some other thread that isn't sleeping. This ensures the CPU is always doing work, so long as there is work to be done. But the OS can't block and resume threads it doesn't know about.

The Ruby 1.8 VM uses some pretty clever tricks to work around this limitation. In that version of Ruby, Kernel#sleep did not call sleep(2). Instead, it coordinated with the VM's scheduler to yield the CPU to another Ruby thread. Same thing with File#read. The Ruby 1.8 VM effectively included logic equivalent to an OS process scheduler, including a form of preemption.

Unfortunately, Ruby 1.8 was less good at doing this sort of thing than a typical OS would be. Certain gems, including an early version of the MySQL gem, would still make OS calls that would block every thread in the process. In fact, writing gems the "wrong" way would be significantly easier than writing them the right way. And even without poorly written gems, the VM's threading logic was not especially good or fast. (See this blog post from Joe Damato for all the gory details.)

So in Ruby 1.9 they did things differently.

Ruby 1.9

In Ruby 1.9 Thread.new created real OS threads. They VM still did not allow threads to run in parallel; a Ruby process was still limited to using at most one CPU at a time, thanks to the interpreter lock. But at least the OS could see the Ruby threads as threads.

For most use cases, this was a big improvement. The Ruby VM no longer had to implement its own process scheduling and preemption logic. They could use the OS to schedule threads, and the OS generally did this faster and better than Ruby 1.8 used to. Gems could be written that would call out to C libraries and make blocking OS calls without blocking other threads.

But there are drawbacks to this approach, too. The next section will demonstrate one of them.

Micro-benchmark: Ruby 1.8 vs. Ruby 2.x

Here is a tiny benchmark meant to test some form of scalability. The idea is to create thousands of threads and have them all go to sleep.

NUM_THREADS = 100000  
ts = (1..NUM_THREADS).map do |i|  
  Thread.new(i) do |x|
    sleep 10
    puts x
  end
end

ts.each(&:join)  
puts 'done'

I compiled a non-optimized copy of Ruby 1.8.7 and compared it with the Ruby 2.3 that I obtained through my package manager. Here's how long this benchmark took for each of them.

At 10,000 threads:
Ruby 1.8: 24 seconds
Ruby 2.3: 18 seconds

At 100,000 threads:
Ruby 1.8: 3 minutes
Ruby 2.3: crashed

The lesson I would take from this benchmark is that OS threads come with a price. They're really good at what they do up until a point. Beyond that point, the scheduling and memory overheads start to get really expensive.

This matters because a web service is not far removed from this benchmark. From a scheduling point of view, a thread that's sleeping looks a lot like a thread that's waiting for a response from a database. Our ideal highly-concurrent web service would be able to support many thousands of these requests simultaneously.

Go

Go has made a name for itself in part because of its unique support for concurrency. Its goroutines are notable for their low memory usage and minimal scheduling overhead.

Goroutines are a kind of userspace thread. Go spawns OS threads in proportion to the number CPUs available (or the GOMAXPROCS setting). The Go runtime schedules goroutines to run on these threads. The OS knows about the threads, but not the goroutines.

It's also pretty smart about scheduling. Like Ruby 1.8, it implements a time.Sleep not as a sleep(2) system call, but as an indication to the thread scheduler to not schedule this thread for a while. That means sleeping is really fast.

I rewrote the 100,000 sleeping thread benchmark from the Ruby section in go:

package main

import "fmt"  
import "time"  
import "sync"

func main() {  
    const NumSleepers = 100000

    var wg sync.WaitGroup
    wg.Add(NumSleepers)

    for i := 0; i < NumSleepers; i++ {
        go func(index int) {
            time.Sleep(10 * time.Second)
            fmt.Println("Done: ", index)
            wg.Done()
        }(i)
    }
    wg.Wait()
}

Go crushed it. This program took less than 12 seconds to run. The number of OS threads remained constant throughout. Compared to Ruby, which took 3 minutes or crashed, this is quite an improvement.

But it turns out this is a case of gaming the benchmark. Or at least a demonstration that the benchmark is poorly written. While the go runtime does have a cheap implementation of sleep, other kinds of blocking calls are not so efficient. Here's another demonstration:

package main

import "fmt"  
import "time"  
import "bufio"  
import "os"

func main() {  
    const NumSleepers = 100

    // Spawn |NumSleepers| goroutines that block on STDIN.
    for i := 0; i < NumSleepers; i++ {
        go func(index int) {
            for {
                bio := bufio.NewReader(os.Stdin)
                bio.ReadLine() // Block waiting for input.
            }
        }(i)
    }
    // Block forever.
    x := make(chan struct{})
    <-x
}

In this case, Go actually creates 100 OS threads, in proportion to the 100 blocked goroutines. Unlike time.Sleep, the runtime can't avoid the syscall implicit in bio.Readline() quite so easily. Crank up NumSleepers to 100,000 and this program starts to panic and crash just as much as the Ruby 2.3 benchmark did in the previous section.

Unfortunately for us, the blocking I/O microbenchmark is a better simulation of the web server than the sleep test. Our web server's requests are not spending most of their time waiting for a wall-clock. They're performing read(2) and write(2) calls on sockets. Go, in its current implementation, would be forced to spawn OS threads to handle this I/O. If it spawns too many threads, it crashes.

What we really need to scale this web service is event-driven I/O, like what Ruby 1.8 had. But we want it to be faster and more consistently implemented than that. Which brings us to the next candidate on our list.

Node.js

Enter Node.js. It uses only one OS thread, but on that thread there are multiple contexts of execution. When one context is blocked because it's waiting on a timer or blocking I/O, the runtime just context-switches to the next one.

If you read through the c10k link from earlier, you already know that a typical good solution to this problem is to use non-blocking I/O with event loop based around a select(2)-like interface, probably wrapped in a library like libevent or libuv for convenience.

Node.js is a JavaScript wrapper around a c10k solution.

Like Go, it aces the 100,000 sleeping threads test:

#! /usr/bin/env node

var numSleepers = 100000;  
for (var i = 0; i < numSleepers; i++) {  
  var delay = 10000;
  setTimeout(function(x) {
    console.log('Done ' + x);
  }, delay, i);
}

This runs in 11.5 seconds. That's actually a little bit faster than Go, possibly due to having a simpler (and less fair) scheduling algorithm.

Unlike Go, it aces the blocked-on-file variant of the test, too:

#! /usr/bin/env node

var fs = require('fs');

var numSleepers = 100000;  
for (var i = 0; i < numSleepers; i++) {  
  var buf = new Buffer(1024);
  fs.read(process.stdin.fd, buf, 0, 1024, null,
          function(x) {
            return function() {
              console.log("Done" + x);
            }
          }(i));
};

This looks synchronous, but each of those callbacks contains its own context.
And yet this program spawns a constant number of OS threads no matter how many concurrent I/O operations are in progress. It's fast, too: when I pipe /bin/yes into that program it outputs 100,000 "Done" lines and terminates in under three seconds.

So it seems like the JavaScript community got something right. Although Node.js can't make use of multiple cores as easily as some other languages, and the non-preemptive scheduling might be problematic in CPU-bound workloads, for an I/O heavy server this model is pretty nice.
Conclusions and Caveats

So Node.js is the winner, right?

Well, that depends on context. I've made a few assumptions along the way that led to this conclusion. Your context may differ.

First, I assume that your web server isn't doing a lot of CPU-intensive work, or if it does, that this work could easily be farmed out to a separate process. If you insist on doing lots of computation in the same process that handles web requests, you might want to choose something else. A server like that is likely to hit CPU bottlenecks long before it hits 10,000 concurrent requests anyway.

Second, I assume that cooperative multitasking is good enough for use case. If your service supports even one kind of CPU-intensive request, then the Node.js cooperative multitasking model should scare you. In Node.js a single bad request could hog the CPU and the event loop could affect the latency of all other requests in the system. Preemptive multi-tasking models, like OS threads or goroutines, would preempt such a request before it disrupts the rest of the system.

Finally, I've implicitly rejected lots of other viable solutions because I assume that ecosystem support is important. The c10k solutions found in other languages tend to involve non-idomatic tricks that make the vast majority of their language's package ecosystem unavailable. Compare, for example, this IMAP for EventMachine and the Net::IMAP implementation that ships as part of Ruby's standard library. Which of these would you expect has the larger community? But if you're attached to a language other than JavaScript and don't mind working with a smaller ecosystem, then you can take your pick of non-JavaScript event-driven frameworks. Ruby's EventMachine, Python's Twisted, and Java's Netty all seem to be perfectly fine choices, though I admit to having little first-hand experience with them.

That said, if your use case requires a strong open-source ecosystem support, has a workload heavily dependent on I/O, and you want to minimize the number of CPU cores needed to solve the problem, then yes, Node.js is a good choice.

Announcing Vox

Richard Larocque — Sun, 25 Jan 2015 09:07:56 GMT

Vox is a login-free voice and text chat platform. I wrote and published it months ago, but haven't had a soapbox on which to talk about it until recently.

How to use it

Vox is a low-friction way to talk to a group of people. There are no installs, no sign-ups, and no prompts except for the browser's request for microphone access.

The basic concept is similar to IRC. If you visit http://vox.r6l7.com/r/demo1, you'll be brought to the "demo1" room. Anyone else who visits that channel will be brought to the same room. This makes it possible to set up a common meeting place based on a topics or shared interests.

If you'd prefer to avoid random strangers, you can pick a room name at random. Or you can visit http://vox.r6l7.com/ directly without specifying a room name, which will redirect you to a new, randomly generated room. Then you can share the link with whoever you want to talk to by other means.

This makes it easy to upgrade an existing text conversation into a voice conversation. All you have to do is visit the site then share the link with someone who has a browser and you're good to go. Modern mobile browsers should work as well just as well as Desktops.

In short: No more exchanging Skype IDs!

Underlying Technologies

The main technical ingredient is WebRTC. There's really no other way I could have written it, so keep that in mind as I rant about the drawbacks.

Support is mixed. Apparently it doesn't work on Safari or IE. I didn't bother to test those browsers since I don't use Windows or iOS, but I acknowledge a lot of others are still stuck on those platforms.

Even where it is supported, there are issues. I spent hours trying to implement a volume slider, but kept getting only silence in Chrome. Eventually I learned that combining WebRTC with WebAudio is currently not supported on Chrome. That's bug 121673. Expect to see a volume slider added some time after that bug is fixed.

All in all, it's very much a web technology. It's easy to build and use, except for the part where the browser vendors decided to each go their own way.

Limitations

The audio traffic is peer-to-peer. This makes the server very scalable. All it has to do is pass around a few control-plane messages and handle the text chat traffic. Clients, on the other hand, would probably have a bad time if the room had any more than a few people.

Scaling on the server side is pretty bad, too. I've been mainly focused on basic functionality, so I haven't gotten around to implementing twelve factor principles. The room state really should be stored in Redis or some other equivalent. For now, the assumption of "one service, one process, one machine" is baked into the code.

It Works: Running on NixOS

Richard Larocque — Mon, 12 Jan 2015 12:16:05 GMT

Well, that was an ordeal. It took the better part of a week, but I have a server again.

This blogging platform was not part of the problem. It was a joy to work with. Hats off to the people at Ghost

The reason it took so long to setup is that it's running on an OS from the future. It's called NixOS, and some of the time it's really great.

Nix

The headline feature for this operating system is that it uses Nix, the purely functional package manager. With Nix, all packages are given unique names that include a hash of all their inputs, including not just their sources but also the compilers used, the build scripts, and the rest of the environment. Inputs that are not explicitly specified are not present for the package build process, so it's really hard to create hidden dependencies. It's kind of like deb or rpm, but a little better about tracking dependencies.

Its approach to deploying those packages is unique. Rather than install packages in place, it stashes them all in the Nix store (located at /nix/store) and uses a forest of symlinks and $PATH entries to put them together into a useful combination. Want to use a new version of Python for a little while, without reconfiguring your system? Just invoke nix-shell with the proper incantations and it will build you a new environment and start a shell in it. With a system like this, package rollbacks are pretty easy, too.

If you're a Haskell programmer and you haven't heard of Nix before, you may be salivating right now. That language has an unforuntate combination of being finicky about package versions, and a tooling system that's not very good about installing two versions of the same library side-by-side. Nix's features will be of useful for lot of people, but Haskell programmers have more reasons to be excited about it.

NixOS

All that's pretty neat, and an operating system based on Nix would be interesting in of itself. But NixOS goes beyond that starting point.

NixOS puts itself in charge of building /etc and other config paths based on a single configuration.nix file. It does this using the same domain-specific language as the Nix package manager. The language has all the usual features, like functions, variables, and importing other files. It turns out this is a great idea.

I never noticed it before, but there's a lot of redundant work in configuring a typical Linux box. Ports and user names wind up in several different config files, and the system doesn't work if they don't line up. It's much easier to manage these things when a port number or user name can be specified as a variable which can then be pulled into several different config files.

Although the end result is an /etc that should look familiar to most admins, the work of formatting the Nix configs into the usual file formats is well hidden. That's another benefit of using a programming language for this; all the logic for writing these files can be handled in a library. The admin sees a single uniform interface for configuring everything.

Almost everything is written to be easy to use. Here's one of my favorites:

ghostContentPath = "/var/ghost/content";
services.tarsnap.enable = true;
services.tarsnap.config.ghost.directories = [ ghostContentPath ];

Those three lines enable tarsnap daily backups for this blog!

There's lots more goodies like this in the NixOS Manual's Appendix.

NixOps

Finally, there's a tool that ties this all into the cloud. This makes it easy to spin up an EC2 or GCE instance that matches that specification. Or you can build a template and stamp out a dozen different instances, with each one parameterized slightly differently. When the machine specifications are just expressions in a programming language, they can be copied, modified and instantiated easily.

Deploying changes is easy, too. Just update the config files and run nixops deploy. The program will take care of starting or stopping machines, installing packages, and restarting services.

At this point, it starts to encroach on Puppet and Chef's territory. Which is a good thing, since those tools seem to be regarded as (at best) a necessary evil.

My Experience

The down-side is that actually making these packages can be a bit of a pain. It's a young project. There's very little documentation or examples, so you're on your own once you've strayed from the well-traveled path. The Nix interpreter's error messages are often confusing. Perhaps worst of all, most editors don't have syntax highlighting for .nix files yet.

The reason it took me so long to build this site is that I had to build a bunch of my packages from scratch without much help from documentation.

Most of my work is too crude to be upstreamed into the nixpkgs tree, but I can still post it online. Maybe this can save someone else some time.

https://github.com/richardlarocque/nix-configs