top of page

WTF does a Distributed Systems Engineer do?

Interview with Paul Cruickshank, Distributed Systems Engineer at Ably

What do you do?

I’m a Distributed Systems Engineer.

What’s your day to day like?

Mainly, I write code. My morning would start with checking Slack, checking any pull requests I’m supposed to do for other people [pull requests are requests to have code peer reviewed], checking my own pull requests, faffing around a bit, reading Hacker News and then having morning stand up. Then, it depends - for example today I had a content piece to finish writing [The Mysterious Gotcha of gRPC Stream Performance], and then I was coding the rest of the day. I also write software tests, create pull requests and talk ideas through with colleagues. Most of my time is spent coding, I don’t have too many meetings.

What kind of coding do you do, and what problems do you solve with your code?

I do back end systems Go engineering [Go, or Golang is a popular open-sourced backend programming language] for a company called Ably. Ably builds a real time Pub/Sub network, which is very high throughput, reliable, distributed and scalable.

Pub/Sub (Publish/Subscribe) is a messaging pattern for how different computers send messages to each other. The idea is that traditional network communications are one to one, so if you want to send a message to 100 people, you need to send it 100 times, once to each of those individuals, and you need to know each of those individuals’ network addresses, and you have to keep track of any delivery failures. The idea of Pub/Sub is to separate your publishers (who send the messages) and your subscribers (who receive the messages).

In Pub/Sub, your messages go through a Message Broker. A publisher will just send a single message on a topic to the Message Broker, and all the subscribers for that topic will receive the message without the publisher having to care about who all of them are or how many there are. You can also have multiple publishers for a topic, in which case a subscriber would be receiving messages from different publishers. It lets you have multiple senders and multiple receivers for messages disconnected from each other. This is great for the internet, because when you have millions of subscribers, and a lot of messages going on, it’s much simpler for app and service developers to be able to outsource the communications. They just can go: “Well, we know this message will get there if I just plug it into the Message Broker,” and all the relevant subscribers will get the message. That's what Ably does at its core - we are a Message Broker and we also write client libraries to make it easier for customers to integrate with our protocols.

A Message Broker is easy to build, the difficult problem is to make it reliable and scalable. First of all, there are many different topics to deliver at the same time to many different people. That inherently leads to concurrency problems, because they're all competing for the same resources on a server. To manage that, Ably’s systems are elastic. So when there is a spike in traffic, we need to quickly scale and add more servers to our clusters. The workload gets rebalanced, so you need to reach consensus within the cluster of which servers own which topics and messages. Scaling back down has its issues too. It's very easy for a small problem on one server to cause problems everywhere. You have to write quite simple designs that handle all these complicated problems. There's a whole school of Software Engineering around it.

On the networking side, we have independent server clusters in data centers all around the world, and all these servers are talking to each other. It’s important I understand how they're actually communicating, what communication protocols are used, how data is encoded. It determines how efficient the whole system is. Essentially, you're trying to get as many messages through as efficiently and reliably as possible.

my time as a Back-end Engineer is spent ensuring the system works. It’s fun!

All that means is there's a lot of inherent complexity in making the system reliable and scalable. So my time as a Back-end Engineer is spent ensuring the system works. It’s fun!

We've got the good problem at the moment where we're scaling quickly. We have lots of new customers signing up and existing customers growing and using us more. So the platform is getting used at higher and higher scale. That means a lot of time is spent worrying about performance bottlenecks - it's a good problem to have, but it means my time can easily be taken over by fire-fighting issues. The biggest problem right now is scaling up the engineering team, there's just a lag to it - so I’m also interviewing candidates. If you are interested in what we do, check out the Ably careers page, we are hiring for a Distributed Systems Engineer and 10+ more roles.

How did you become a Distributed Systems Engineer?

I did some coding at university, but not not a huge amount. After graduating, I taught myself Python, worked on a few projects at home, and then used that to get an entry job as a Software Developer at a company called MetaSwitch, where I spent a few years. The company specialised in telecoms - so I worked on a lot of software networking and got a really good handle on networking protocols. It was quite a large company and I had a stable job, but after a while I wanted more excitement, so I joined a small startup. That was fun, if a bit chaotic but the business ran its course, and I started looking for a new job. I joined Ably because it's got all the networking and system design aspects that I like.

I like understanding all the details as well as the big picture design.

That's how I got where I am now. Ably hired me because of my networking experience and the fact that I like digging into specs, digging into code, working out how things work. I like understanding all the details as well as the big picture design.

What do you find to be the most meaningful part of your job?

I’m probably supposed to say something about helping customers, but it's actually solving deadlocks in concurrent systems. It's just really fun. It’s a nice puzzle to work out.

What are you most proud of in your career?

I’m probably the most proud of moving out to New Zealand and doing a full time job as soon as I got out there, while also finding myself somewhere to live and getting everything set up at the same time within a week. It’s slightly outside of work, but it was for work.

Do you ever feel like imposter syndrome?

Not really.

Sometimes a little bit, but it's usually when other people are bullshitting about themselves and make themselves bigger than they actually are. Certainly at the moment, and for the last while, I’m confident in my abilities. I know I'm quite good at what I do. I don't have a huge amount to prove, it’s a nice position to be in.

What would you recommend to somebody who wants to get into Distributed Systems Engineering?

I'd say learn about distributed systems. There's lots of books on it. It’s also really good to get used to being comfortable with digging through network specs. It’s important to be able to ask questions and talk to people. I don't quite understand how people get into Distributed Systems Engineering other than by chance. I feel like everyone I work with has not aimed to do it, but has fallen into it.

bottom of page