This is my reading list from the past few days. I decided to put them here as it might be helpful to someone else. It was deeply inspired by HighScalability blog, a source I’ve been consuming for years.
Microsoft all over the places
Microsoft keeps its push to become a major player in the Open Source community. Let's take a look at the majestic presence they have at the media recently.
At Microsoft, 47,000 developers generate nearly 30 thousand bugs a month. These items get stored across over 100 AzureDevOps and GitHub repositories. To better label and prioritize bugs at that scale, we couldn’t just apply more people to the problem.
Microsoft was on the wrong side of history when open source exploded at the beginning of the century, and I can say that about me personally.
A case study will be written on how Microsoft allowed Zoom to eat their lunch. They spent millions on subterfuge trying to paint Slack as an inferior enemy when MSFT Teams actually can't do what Slack does and Teams' real competitor was Zoom. Now Zoom has 300M Daily Users. Lol.
Rust/WinRT lets you call any WinRT API past, present, and future using code generated on the fly directly from the metadata describing the API and right into your Rust package where you can call them as if they were just another Rust module.
Rust on the Radar
As we're talking about Rust, it seems that is not only Microsoft who's investing time and effort on it.
There are many benefits a standardized ABI would bring to Rust. A stable ABI enables dynamic linking between Rust crates, which would allow for Rust programs to support dynamically loaded plugins (a feature common in C/C++). Dynamic linking would result in shorter compile-times and lower disk-space use for projects, as multiple projects could link to the same dylib. For example, imagine having multiple CLIs all link to the same core library crate.
Programming is hard.
Not because our hardware is complex, but simply because we’re all humans. Our attention span is limited, our memory is volatile — in other words, we tend to make mistakes.
The deno_core crate is a very bare bones version of Deno. It does not have dependencies on TypeScript nor on Tokio. It simply provides our Op and Resource infrastructure. That is, it provides an organized way of binding Rust futures to JavaScript promises. The CLI is of course built entirely on top of deno_core.
Fowler and Friends
It looks like a busy week for Martin Fowler and his friends. New ThoughtWorks Radar was released, a few blog entries has been updated and the man himself has carved another set terms and added to his legacy to the Software Engineering.
This division of development into lines of work that split and merge is central to the workflow of software development teams, and several patterns have evolved to help us keep a handle on all this activity. Like most software patterns, few of them are gold standards that all teams should follow.
For this Radar, we decided to call out again infrastructure as code as well as pipelines as code, and we also had a number of conversations about infrastructure configurations, ML pipelines and other related areas. We find that the teams who commonly own these areas do not embrace enduring engineering practices such as applying software design principles, automation, continuous integration, testing, and so on.
Coming to understand the threat model for your system is not simple. There are an unlimited number of threats you can imagine to any system, and many of them could be likely. [...] Cyber threats chain in unexpected, unpredictable and even chaotic ways. Factors to do with culture, process and technology all contribute. This complexity and uncertainty is at the root of the cyber security problem. This is why security requirements are so hard for software development teams to agree upon.
Other relevant quotes
Hum, let me see... What else should be mentioned?
Zoom scaled from 20 million to 300 million users</b> virtually over night. What's incredible is from the outside they've shown little in the way of apparent growing pains, though on the inside it's a good bet a lot of craziness is going on.
Besides being an interesting approach to a very common problem, their discussion of Piranha also provides some very interesting insights into an organization that's *heavily* invested in feature flagging....
Deferring integration can increase the risk of merge conflicts, which causes you to move more slowly as you spend more energy addressing those conflicts. Slow change can sometimes be more risky than you expect because of the costs of extra work needed to reconcile conflicts, as well as the technical debt that results from bypassing the normal process to fix critical errors.
Simply put, testing in production means testing your features in the environment where your features will live. So what if a feature works in staging, that's great, but you should care if the feature works in production, that's what matters.
[...] when I was asked to reduce the resource requirements of a large MongoDB cluster, I reached the conclusion that the most obvious target - attribute names - wouldn’t lead to the kind of impact I wanted.
The most considerable impact I see is in regards to velocity. The team can focus on other business-impactful projects, rather than EKS and Kubernetes maintenance -- the undifferentiated heavy lifting is eliminated. The same reason people move from physical data centers to the cloud, or from EC2 to Serverless: offloading that effort to AWS is a very good proposition.
Did you know that http://pypi.org serves 800 million requests and delivers 200 million packages totalling 400 terabytes ... a day? No. Exactly. You want it to just work. Every day, rain or shine. To keep it that way: sponsor them
We recently migrated a few small systems to CockroachDB (as a stepping stone). Overall, the experience was positive. The hassle free HA is a huge peace of mind. I know people say this is easy to do in PG. I have recently setup 2ndQuadrant's pglogical for another system. That was also easy (though the documentation was pretty bad). The end result is quite different though and CockroachDB is just simpler to reason about and manage and, I think, more generally applicable.
Our actual use-case is a little complex to go into in tweets. But suffice to say, the PUT costs alone to S3 if we did 1-to-1 would end up being just under half our total running costs when factoring in DDB, Lambda, SQS, APIG, etc.
Need operational analytics in #NoSQL? Maintain time bound rollups in @DynamoDB with Streams/Lambda then query relevant items by date range and aggregate client side for fast reporting on scaled out data. Turn complex ad hoc queries into simple select statements and save $$$
Another part of the solution is GPU acceleration using grCUDA — an open-source language binding that allows developers to share data between NVIDIA GPUs and GraalVM languages (R, Python, JavaScript), and also launch GPU kernels. The team implemented the performance critical components in CUDA for the GPU, and used grCUDA from Python to exchange data with the GPU and to invoke the GPU kernels.
Although event-driven architecture has existed for more than 15 years, only recently has it gained massive popularity, and there is a reason for that. Most companies are going through a “digital transformation” phase, and with that, crazy requirements occur. The complexity of these requirements force engineers to adopt new ways of designing software, ones that incur less coupling between services and lower maintenance overhead. EDA is one solution to these problems but it is not the only one.
So, let's look at the resulting context of moving to microservices with entity services:
- Performance analysis and debugging is more difficult. Tracing tools such as Zipkin are necessary.
- Additional overhead of marshalling and parsing requests and replies consumes some of our precious latency budget.
- Individual units of code are smaller.
- Each team can deploy on its own cadence.
- Semantic coupling requires cross-team negotiation.
- Features mainly accrue in "nexuses" such as API, aggregator, or UI servers.
- Entity services are invoked on nearly every request, so they will become heavily loaded.
- Overall availability is coupled to many different services, even though we expect individual services to be deployed frequently. (A deployment look exactly like an outage to callers!)
Moving all the “what does the world around me look like?” side effects to the beginning of the program, and all the “change the world around me!” side effects to the end of the program, we achieve maximum testability of program logic. And minimum convolution. And separation of concerns: one module makes the decisions, another one carries them out. Consider this possibility the next time you find yourself in testing pain.