✂️ Cut your QA cycles down to minutes with QA Wolf (Sponsored)If slow QA processes bottleneck you or your software engineering team and you’re releasing slower because of it — you need to check out QA Wolf. QA Wolf’s AI-native service supports web and mobile apps, delivering 80% automated test coverage in weeks and helping teams ship 5x faster by reducing QA cycles to minutes. QA Wolf takes testing off your plate. They can get you:
The benefit? No more manual E2E testing. No more slow QA cycles. No more bugs reaching production. With QA Wolf, Drata’s team of 80+ engineers achieved 4x more test cases and 86% faster QA cycles. This week’s system design refresher:
CPU vs GPU vs TPUWhy does the same code run fast on a GPU, slow on a CPU, and leave both behind on a TPU? The answer is architecture. CPUs, GPUs, and TPUs are designed for different workloads. CPU (Central Processing Unit): The CPU handles general-purpose computing. It's built for low latency and complex control flow, branching logic, system calls, interrupts, and decision-heavy code. Operating systems, databases, and most applications run on the CPU because they need that flexibility. GPU (Graphics Processing Unit): GPUs work differently. Instead of a few cores, they spread the work across thousands of cores that execute the same instruction across huge datasets (SIMT/SIMD-style). If your workload is repetitive like matrix math, pixel shading, tensor operations, GPUs handle it quickly. TPU (Tensor Processing Unit): TPUs are specialized hardware. The architecture is designed around matrix multiplication using systolic arrays, with compiler-controlled dataflow and on-chip buffers for weights and activations. They are fast at neural network training and inference, as long as the workload fits the hardware well. Over to you: When designing systems today, how do you decide what runs on CPU vs GPU vs specialized accelerators? How OAuth 2 WorksAuthorization Code Flow (+ PKCE) - for user login:
PKCE prevents intercepted authorization codes from being reused. That’s why it’s the modern default for web and mobile apps. Client Credentials Flow - for service-to-service:
No user. Just machine identity. How Distributed Tracing Works at the High Level?
Over to you: What else will you add to better understand distributed tracing? How GPUs Work at a High LevelWhen people say GPUs are powerful, what they really mean is this: GPUs are built for massive parallelism from the ground up. Let’s break down what’s happening under the hood. At the top level, a GPU chip is made up of many Streaming Multiprocessors (SMs). Think of SMs as mini parallel engines replicated across the chip. Instead of one big brain, you get dozens of smaller ones working simultaneously. Inside each SM:
Each SM works independently, but they’re connected through an on-chip interconnect. Below that sits the L2 Cache, shared across all SMs. This is the coordination layer. If one SM misses in L1, it checks L2 before going to global memory. Then come the Memory Controllers, which interface with Global Memory. This is where things get interesting:
That’s why GPUs rely on massive parallelism. While some threads wait on memory, thousands of others keep executing. Top 4 API Gateway Use CasesAn API Gateway sits between your clients and your services, and it does a lot more than just routing. Here are four use cases where it actually matters:
Over to you: Are you running an API gateway in production? What's the biggest win it gave you? |
EP205: CPU vs GPU vs TPU
Saturday, 7 March 2026
Subscribe to:
Post Comments (Atom)







No comments:
Post a Comment