<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[ZeroCopy]]></title><description><![CDATA[ZeroCopy, by Streambased, explores the systems, formats, and engines that power modern data infrastructure. From real-time pipelines to table formats, we cover the bleeding edge and the deep history. One clear, technical, and thoughtful post at a time.]]></description><link>https://blog.streambased.io</link><image><url>https://substackcdn.com/image/fetch/$s_!e_7r!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3f0570b-e2a6-40e5-b9df-1f0572d7f999_1280x1280.png</url><title>ZeroCopy</title><link>https://blog.streambased.io</link></image><generator>Substack</generator><lastBuildDate>Tue, 21 Apr 2026 10:43:24 GMT</lastBuildDate><atom:link href="https://blog.streambased.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Tom Scott]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[streambased@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[streambased@substack.com]]></itunes:email><itunes:name><![CDATA[Tom Scott]]></itunes:name></itunes:owner><itunes:author><![CDATA[Tom Scott]]></itunes:author><googleplay:owner><![CDATA[streambased@substack.com]]></googleplay:owner><googleplay:email><![CDATA[streambased@substack.com]]></googleplay:email><googleplay:author><![CDATA[Tom Scott]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[From Pipelines to Composable Data: Rethinking CDC]]></title><description><![CDATA[Stop writing every change: combining logs and tables for a simpler, real-time CDC model]]></description><link>https://blog.streambased.io/p/from-pipelines-to-composable-data</link><guid isPermaLink="false">https://blog.streambased.io/p/from-pipelines-to-composable-data</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Fri, 03 Apr 2026 19:19:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uIvJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uIvJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uIvJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 424w, https://substackcdn.com/image/fetch/$s_!uIvJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 848w, https://substackcdn.com/image/fetch/$s_!uIvJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 1272w, https://substackcdn.com/image/fetch/$s_!uIvJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uIvJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png" width="900" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:909495,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/193104259?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uIvJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 424w, https://substackcdn.com/image/fetch/$s_!uIvJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 848w, https://substackcdn.com/image/fetch/$s_!uIvJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 1272w, https://substackcdn.com/image/fetch/$s_!uIvJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1669cb5-6b3a-4255-88b1-bdad63ffa842_900x800.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Change Data Capture (CDC) has become one of the foundational patterns of modern data systems. At a high level, it&#8217;s a simple idea: instead of periodically copying entire datasets, we track changes as they happen. Databases emit inserts, updates, and deletes as a continuous stream of events, and downstream systems replay those events to reconstruct the latest state. In practice, this pattern fits naturally with streaming systems like Kafka, which act as an ordered log of changes, enabling real-time processing and distribution.</p><p>As architectures have evolved toward lakehouses, however, CDC&#8217;s reach has expanded. Apache Iceberg, for example, plays a very different role from the database copies that were the initial remit of CDC but the concept of a continuously updated data lake is too attractive to pass up. To emphasize the point, Iceberg is currently doing to disparate and fragile ETL interfaces what Kafka did to operational interfaces a decade ago. Having CDC data in Iceberg makes it available to an entire organisation, not just an isolated database endpoint.</p><p>Iceberg is designed as an analytical storage layer, however, optimized for large-scale reads over immutable files in object storage. The traditional pipeline capturing CDC from a database, streaming it through Kafka, and materialising it into Iceberg quickly encounters issues that expose limitations in Iceberg&#8217;s design for fast data.</p><p>The root of the problem lies in a mismatch of granularity. CDC operates at the level of individual rows, capturing each change as a discrete event. Iceberg, by contrast, operates at the level of files. Once a file is written, it is effectively immutable. This means that even the smallest update cannot be applied in place. Instead, the system must locate the file containing the affected row, mark the original version as deleted, write a new version of the row into a new file, and then commit an updated snapshot of the table. What appears to be a trivial change at the logical level becomes a surprisingly heavy operation at the storage layer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ktgj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ktgj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 424w, https://substackcdn.com/image/fetch/$s_!Ktgj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 848w, https://substackcdn.com/image/fetch/$s_!Ktgj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 1272w, https://substackcdn.com/image/fetch/$s_!Ktgj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ktgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png" width="1456" height="1137" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1137,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ktgj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 424w, https://substackcdn.com/image/fetch/$s_!Ktgj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 848w, https://substackcdn.com/image/fetch/$s_!Ktgj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 1272w, https://substackcdn.com/image/fetch/$s_!Ktgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1385664e-4821-45e0-88a7-cfa0b61e0bfd_1460x1140.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Over time, this mismatch creates compounding issues. Small updates trigger disproportionately large amounts of work, leading to a proliferation of small files and metadata. Deletes and updates are not spared, these accumulate as separate delete files, which increase the cost of query planning. None of these problems are insurmountable, Iceberg has compaction and deduplication processes able to restore efficient data and metadata layouts but these are heavyweight and their execution must also be carefully managed. In short, the operational overhead is large enough to consider whether the result is really worth it in the end?</p><p>Streambased&#8217;s approach starts by embracing a distinction between log based event stream and materialised tables instead of fighting it. Rather than insisting that all data must be materialised into Iceberg before it can be queried, the dataset is split into two complementary parts. One part (the &#8220;coldset&#8221;) is fully materialised and lives in Iceberg, representing the stable, finalised, historical view of the data. The other part (the &#8220;hotset&#8221;) remains in Kafka as an ephemeral stream of recent changes. Queries operate across both, combining the durable cold data with the live hot data to produce a complete and up-to-date view. The sections are composable with the size of the two sections balanced according to the workload they serve. An easy to explain (but very naive) composition may expand the hotset (say last 6 hrs) for workloads that expect many updates and deletes to avoid accruing metadata in Iceberg but may have a much smaller hotset (say last 15 mins) for workloads that are primarily insert of new rows (a much less expensive operation in well partitioned Iceberg).</p><p>This shift has a subtle but powerful effect. By allowing recent data to remain in Kafka, the system avoids the constant pressure to translate every small change into file-level operations. In this sense, materialisation becomes a controlled, deliberate step rather than a continuous background process. Operators are free to make informed decisions about how and when materialisation happens and can accept the right tradeoffs for their use cases.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CMtN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CMtN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 424w, https://substackcdn.com/image/fetch/$s_!CMtN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 848w, https://substackcdn.com/image/fetch/$s_!CMtN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 1272w, https://substackcdn.com/image/fetch/$s_!CMtN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CMtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png" width="1456" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CMtN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 424w, https://substackcdn.com/image/fetch/$s_!CMtN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 848w, https://substackcdn.com/image/fetch/$s_!CMtN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 1272w, https://substackcdn.com/image/fetch/$s_!CMtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32acc368-beb5-4056-a17d-15b87d91ab20_1580x785.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This model also aligns closely with how data behaves in real systems. Most datasets exhibit a clear distinction between recent, high-churn data and older, relatively static data. The last few hours or days tend to be latency-sensitive and subject to frequent updates, while older data is rarely modified and is primarily accessed for analysis. Kafka is well suited to handling the former, providing low-latency access to recent events, while Iceberg excels at storing the latter efficiently and cheaply over long periods.</p><p>By splitting the dataset along this natural boundary, the system can take advantage of both technologies without forcing either to operate outside its strengths. Kafka retains its role as the system of record for recent changes, enabling real-time querying and processing, while Iceberg provides a scalable, cost-effective store for long-term data. Importantly, this also reduces the need for heavy operational machinery. Many of the maintenance steps that plague traditional CDC pipelines: compaction, snapshot cleanup, repartitioning, become either less frequent or unnecessary, because the system avoids generating excessive intermediate state in the first place.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_QMQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_QMQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 424w, https://substackcdn.com/image/fetch/$s_!_QMQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 848w, https://substackcdn.com/image/fetch/$s_!_QMQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 1272w, https://substackcdn.com/image/fetch/$s_!_QMQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_QMQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png" width="1456" height="1010" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1010,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_QMQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 424w, https://substackcdn.com/image/fetch/$s_!_QMQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 848w, https://substackcdn.com/image/fetch/$s_!_QMQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 1272w, https://substackcdn.com/image/fetch/$s_!_QMQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60657a2e-97d2-4ce2-a05a-054a424c7da2_1885x1307.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Another consequence of this approach is that it restores the immediacy that CDC promises. In a fully materialised pipeline, there is always a delay between when an event occurs and when it becomes visible in Iceberg. Queries are effectively bounded by the last successful write. In contrast, when queries incorporate the live Kafka stream, newly arrived events are immediately reflected in results. The system is no longer querying a slightly outdated snapshot but is instead computing state directly from the most recent data available. The cost of this &#8220;zero latency&#8221; is a small amount of extra work at query time, where the system reconciles the Iceberg snapshot with the latest events in Kafka. Instead of reading a fully precomputed table, the query &#8220;plays forward&#8221; the recent portion of the log to produce the current state.</p><p>In practice, this is a favourable trade. The additional compute is limited to the hot data window, while the system avoids the much larger ongoing costs of constant writes, compaction, and maintenance in Iceberg. Rather than paying continuously for freshness, you pay only when you query, and only over a small slice of data.</p><p>Ultimately, this leads to a different way of thinking about CDC in the lakehouse. Instead of viewing it as a process of continuously writing changes into tables, it becomes a matter of composing views over a combination of log and storage. Kafka holds the evolving, mutable edge of the dataset, while Iceberg holds the stable, immutable core. The full state emerges from the interaction between the two, rather than being fully materialised in either system at all times.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PmZk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PmZk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 424w, https://substackcdn.com/image/fetch/$s_!PmZk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 848w, https://substackcdn.com/image/fetch/$s_!PmZk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 1272w, https://substackcdn.com/image/fetch/$s_!PmZk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PmZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png" width="1456" height="1406" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1406,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PmZk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 424w, https://substackcdn.com/image/fetch/$s_!PmZk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 848w, https://substackcdn.com/image/fetch/$s_!PmZk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 1272w, https://substackcdn.com/image/fetch/$s_!PmZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F894b5a25-0d5f-4769-8c56-729900d3b190_2048x1978.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The real issue isn&#8217;t that CDC is hard in Iceberg, it&#8217;s that we&#8217;re solving the wrong problem.</p><p>Fully materialising every change is an assumption carried over from older architectures. In a log-first world, it&#8217;s unnecessary. The log already contains the truth; Iceberg only needs to store the parts of that truth that have stabilised.</p><p>Once you accept that, the solution becomes obvious: stop writing everything, all the time. Let Kafka handle change, let Iceberg handle history, and draw from both as required and only when you need to.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rECx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rECx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 424w, https://substackcdn.com/image/fetch/$s_!rECx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 848w, https://substackcdn.com/image/fetch/$s_!rECx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 1272w, https://substackcdn.com/image/fetch/$s_!rECx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rECx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png" width="1456" height="722" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:722,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rECx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 424w, https://substackcdn.com/image/fetch/$s_!rECx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 848w, https://substackcdn.com/image/fetch/$s_!rECx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 1272w, https://substackcdn.com/image/fetch/$s_!rECx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8be2c7b-a0cb-4f99-ba02-8bc8ee92e7ea_2048x1016.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[The 5 Ways to Move Data Iceberg -> Kafka]]></title><description><![CDATA[An in-depth comparison of today's solutions to the expanding Iceberg to Kafka market.]]></description><link>https://blog.streambased.io/p/the-5-ways-to-move-data-iceberg-kafka</link><guid isPermaLink="false">https://blog.streambased.io/p/the-5-ways-to-move-data-iceberg-kafka</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Tue, 24 Feb 2026 13:00:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iYkC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iYkC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iYkC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!iYkC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!iYkC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!iYkC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iYkC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1776725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/189000710?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iYkC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!iYkC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!iYkC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!iYkC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6be2325-03cb-4019-a878-9c3a08cf3e9e_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>3 months ago I posted: <em>The 9 Ways to Move Data from Kafka to Iceberg</em>, an in depth look at a growing ecosystem of tools and architectural patterns for landing streaming data into an Iceberg lakehouse. I mapped out everything from &#8220;classic&#8221; connectors to newer table-native and managed approaches. If you haven&#8217;t already, check it out here:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b775463a-7418-4284-95be-ed9b151912cb&quot;,&quot;caption&quot;:&quot;If you were to intentionally design two protocols/formats with poor interopability, you would be hard pushed to create something worse than Kafka protocol and Apache Iceberg. Both standards were developed and optimised for entirely different purposes and share almost&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The 9 Ways to Move Data Kafka -> Iceberg&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:196116110,&quot;name&quot;:&quot;Tom Scott&quot;,&quot;bio&quot;:&quot;Lover of all things distributed data&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5c1fa01-0f98-476b-a55f-7b145928420d_1040x1040.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-20T13:11:04.366Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71507fdd-a153-48a7-936e-3ec8d0a6a91b_1600x900.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.streambased.io/p/the-9-ways-to-move-data-kafka-iceberg&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:176216257,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:3,&quot;publication_id&quot;:5572872,&quot;publication_name&quot;:&quot;ZeroCopy&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!e_7r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3f0570b-e2a6-40e5-b9df-1f0572d7f999_1280x1280.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>But data systems rarely flow in only one direction. Once Iceberg becomes the system of record, the next question is how to get that data <em>back out, </em>into Kafka, so that downstream services can react to failures, rehydrate and counter data corruption. All of these scenarios require the reintroduction of lakehouse data into the event-driven world (and beyond). In this follow-up, I&#8217;ll walk through three practical ways to move data from Iceberg to Kafka: building it as a connector, leaning on managed services, or using complementary services that sit on top of an existing Kafka/Iceberg estate.</p><p>I&#8217;ll focus on 3 main reasons for inverting the typical Kafka -&gt; Lakehouse flow:</p><ol><li><p><strong>To bootstrap new workloads</strong> -  When a new Kafka workload comes online it often needs <em>history</em> before it can do anything useful.You can&#8217;t compute aggregates, build search indexes or warm caches with data starting from right now. What&#8217;s more, Iceberg retention is generally much longer than Kafka, giving a far richer context to draw from.</p></li><li><p><strong>To recover from outages/poison pills</strong> - Production Kafka streams fail in messy ways: bad messages, schema breaks, consumer bugs, downstream outages. When this happens, teams usually need to &#8220;rewind&#8221; and replay from the point of failure to recover. The tricky part is that poison messages can go undetected for days/weeks/months so the amount of data you need to replay can be huge. Iceberg is ideal here: it retains large volumes cheaply, has powerful updates to fix bad data, and makes it practical to re-emit a correct stream without needing Kafka to hold months of retention.</p></li><li><p><strong>To save costs in exceptional circumstances</strong> - Kafka retention is expensive at scale. If a consumer falls behind (deploy issues, downstream throttling, sudden traffic spikes), the default solution is to keep enough Kafka retention to cover the worst case. But that means paying for expensive &#8220;just in case&#8221; storage 24/7 and worse, because the data is transferred &#8220;ahead of time&#8221;, duplicating your real-time dataset in your lake. A cheaper pattern is to let Kafka stay optimized for real-time retention, and use Iceberg as the long-term buffer.</p></li></ol><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3-UF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3-UF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!3-UF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!3-UF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!3-UF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3-UF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3-UF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!3-UF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!3-UF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!3-UF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddc6ede5-b8e3-417e-9c85-1c55ab182749_1536x1024.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Section 1:  Using a Connector</strong></h2><p>At the time of writing, there isn&#8217;t a true &#8220;Iceberg source connector&#8221; you can just drop into Kafka Connect and call it a day.</p><p>But you <em>can</em> still build a connector-based Iceberg &#8594; Kafka pipeline today by leaning on your favourite Iceberg query engine. For example: run Trino on top of Iceberg, then use the Kafka Connect JDBC Source Connector to query Trino and publish results into Kafka.</p><p>At that point, the pattern looks like any other database connector:</p><ol><li><p>Poll on an interval</p></li><li><p>Run a query</p></li><li><p>Publish the resulting rows into Kafka topics</p></li></ol><p>The main upside of this approach is that it&#8217;s straightforward and uses well understood, battle-hardened components: Kafka Connect plus a JDBC source connector is a pattern most teams already know how to run, scale, and observe, and it lets you reuse whichever Iceberg query engine you already have (Trino, Spark, Dremio, etc.).</p><p>The tradeoff is that there&#8217;s no end-to-end linkage between the Kafka events that originally landed in Iceberg and the records you&#8217;re now emitting back into Kafka. That lack of continuity makes it a poor fit for &#8220;Iceberg as Kafka retention&#8221; use cases like backfill and cost-saving replay, where you need stronger guarantees about exactly what range of data is being re-emitted.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YESM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YESM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!YESM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!YESM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!YESM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YESM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YESM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!YESM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!YESM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!YESM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6fbf72-bd1f-4d72-9183-5cc546252cca_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Section 2: Kafka Compatible</strong></h2><p>Kafka compatible services preserve the Kafka client experience (protocol, offsets, consumer groups) but use different engines to provide the underlying storage and metadata semantics. In the world of Iceberg this usually means the storage layer is significantly different to the custom log format and disk based approach used by Apache Kafka.</p><h3><strong>StreamNative Ursa</strong></h3><p>Ursa enables Kafka clients to work directly with Apache Iceberg data by replacing the traditional Kafka broker model with a compatible streaming engine that writes all streamed events as part of a unified lakehouse dataset that is eventually stored in open table formats like Iceberg. Ursa&#8217;s engine first ingests data into a streaming focused layer before persisting it to a durable Iceberg layer.</p><p>Ursa adds a Kafka-compatible read path that maps Kafka&#8217;s offset-based consumption model onto Iceberg&#8217;s file-based table layout. When a Kafka consumer issues fetch requests, Ursa resolves the requested offset range into the relevant Iceberg snapshot and then plans reads over the underlying Iceberg files, returning records through the standard Kafka protocol as if they came from a normal Kafka log.</p><p>This approach has a much cleaner operational model than the connector approach (StreamNative takes care of stitching real-time and analytical datasets together) and lends itself to a more cloud native deployment (StreamNative offers Ursa in their cloud product suite).</p><p>Ursa combines streaming log storage and Iceberg, translating Kafka offset-based fetches into planned reads over Iceberg table snapshots/files, before returning results through the Kafka protocol as if Iceberg were a regular collection of topics.</p><p>Unfortunately Ursa brings with it all the usual issues with compatible systems, at the end of the day your underlying data layer is not really Kafka and so is subject to feature drift and unexpected differences in behaviour (for instance Transactions and topic compaction are not fully supported yet). Additionally there are vendor lock in concerns, the Iceberg -&gt; Kafka path may not be relevant for all workloads and yet, with Ursa, the entire streaming stack is locked in.</p><h3><strong>Bufstream</strong></h3><p>Bufstream&#8217;s Kafka compatible engine writes Kafka topic data directly into Parquet files and Iceberg metadata, skipping the intermediate streaming layer from StreamNative&#8217;s approach. It then uses those same Parquet files to serve Kafka consumers or Iceberg clients.</p><p>The advantage of this approach is its simplicity. With only a single storage layer it is easy to ensure consistency of the two views and use cloud native storage options for cost reduction. Unfortunately this comes with a cost in write latency, object storage is on the write path and adds producer latency. Bufstream typically experiences 3-5x higher end to end latency (P99) than traditional Kafka.</p><p>Bufstream takes a simple &#8220;single storage layer&#8221; approach by writing Kafka topic data directly into Iceberg and serving both Kafka consumers and Iceberg readers from the same files. Having one single storage layer means that the Iceberg to Kafka flow is a core requirement of the service. In Bufstream both Kafka readers and writers interact with Iceberg data translating formats on the fly as required.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wL2K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wL2K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!wL2K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!wL2K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!wL2K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wL2K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wL2K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!wL2K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!wL2K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!wL2K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a24e994-54a8-4180-838c-58b2a2b52ef9_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Section 3: Complimentary</strong></h2><p>This category covers systems that augment an existing Kafka + Iceberg estate rather than replacing either layer. These solutions typically sit alongside Kafka brokers and Iceberg tables as add-ons. The main benefit is they don&#8217;t require any migration and do not lock in a user&#8217;s whole streaming stack.</p><h3><strong>Aiven Iceberg Topics</strong></h3><p>Aiven&#8217;s Iceberg Topics concept sits on top of an existing Apache Kafka deployments and leverages a custom Tiered Storage Manager (RSM) plugin to read/write data to Iceberg. As Kafka log segments roll, their data is transferred to Iceberg and tracked inside the Kafka cluster. On fetch the broker can reconstruct valid Kafka batches from those same files, effectively allowing Kafka consumers to replay data that lives in Iceberg storage without a separate layer or copying process.</p><p>The semantics of all Kafka client operations are preserved because the core idea is &#8220;just Kafka&#8221; but you gain an analytical view on the data that can be consumed by Iceberg engines. This approach elegantly leverages existing Kafka tiered storage functionality but, because of this, requires back end control of the cluster and for the cluster to be running the Apache Kafka flavour. For this reason Iceberg topics cannot be used with managed Kafka deployments or &#8220;Kafka compatible&#8221; systems. On top of this the github project that backs this feature appears stale and very much for testing only.</p><h3><strong>Streambased</strong></h3><p>Streambased functions as an abstraction layer above pre-existing Kafka and Iceberg deployments. Streambased composes a dataset made up of real-time focused data from Kafka and analytically prepared data from Iceberg and serves this to either Kafka clients or Iceberg clients in the format they are expecting.</p><p>Streambased exposes 3 datasets in this way:</p><ol><li><p>The hotset - data that resides in Kafka alone</p></li><li><p>The coldset - data that resides in Iceberg alone</p></li><li><p>The mergedset - a seamless joining of the hotset and coldset</p></li></ol><p>By exposing these 3 datasets as Iceberg, users can easily move data between hotset and coldset, structuring the mergedset in a way that is optimal for the workloads it must enable.</p><p>Under the hood, Streambased serves Kafka data in a similar manner to StreamNative&#8217;s Ursa above. Streambased intercepts fetch requests, computes the location of the relevant data (in Kafka, Iceberg or a combination of the two) and serves it back via the Kafka protocols. Where Streambased differs is that it is agnostic to the underlying data systems. Ursa requires that data be written to it before it is accessible whereas Streambased plugs into <strong>any system</strong> that implements the Kafka protocol (Apache Kafka, managed providers and compatible systems).</p><p>This makes Streambased a complementary layer that upgrades an existing Kafka + Iceberg estate with unified access and consistent semantics, rather than replacing core infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lx5E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png" data-component-name="Image2ToDOM"><div class="image2-inset image2-full-screen"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lx5E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 424w, https://substackcdn.com/image/fetch/$s_!Lx5E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 848w, https://substackcdn.com/image/fetch/$s_!Lx5E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 1272w, https://substackcdn.com/image/fetch/$s_!Lx5E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lx5E!,w_5760,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;full&quot;,&quot;height&quot;:707,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-fullscreen" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Lx5E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 424w, https://substackcdn.com/image/fetch/$s_!Lx5E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 848w, https://substackcdn.com/image/fetch/$s_!Lx5E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 1272w, https://substackcdn.com/image/fetch/$s_!Lx5E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf3aac5-0b24-4316-b454-8ac02cb5af7f_2048x994.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Conclusion</strong></h2><p>Iceberg is rapidly becoming the durable system of record for streaming data. However, the moment you treat it that way, you also need a clean path back into Kafka. With this established, applications can replay history, recover safely, and keep real-time systems lean.</p><p>The Iceberg &#8594; Kafka pattern you will choose depends on the existing environment and workload constraints. Connectors offer a pragmatic starting point, Kafka-compatible engines rethink the storage model entirely, and complementary approaches unlock replay and unified access without replacing your existing stack. The right choice depends on what you&#8217;re optimizing for but the direction is clear: the lakehouse isn&#8217;t just the destination anymore, it&#8217;s becoming part of the streaming runtime.</p><p><strong>Subscribe to follow how Iceberg and Kafka are converging and what that means for your architecture.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.streambased.io/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Turning the database inside out again]]></title><description><![CDATA[How Kafka and Iceberg are rebuilding database primitives. indexing, caching, and deduplication outside the database engine.]]></description><link>https://blog.streambased.io/p/turning-the-database-inside-out-again</link><guid isPermaLink="false">https://blog.streambased.io/p/turning-the-database-inside-out-again</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Thu, 29 Jan 2026 10:24:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9XCl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9XCl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9XCl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 424w, https://substackcdn.com/image/fetch/$s_!9XCl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 848w, https://substackcdn.com/image/fetch/$s_!9XCl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 1272w, https://substackcdn.com/image/fetch/$s_!9XCl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9XCl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png" width="692" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95519172-b169-4a22-bbd4-28163403a797_692x504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:692,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:633297,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/186116808?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9XCl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 424w, https://substackcdn.com/image/fetch/$s_!9XCl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 848w, https://substackcdn.com/image/fetch/$s_!9XCl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 1272w, https://substackcdn.com/image/fetch/$s_!9XCl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95519172-b169-4a22-bbd4-28163403a797_692x504.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Over a decade ago Martin Kleppmann predicted we&#8217;d turn the database inside out and that Kafka would be in the middle of it.</p><p><a href="https://martin.kleppmann.com/2015/03/04/turning-the-database-inside-out.html">https://martin.kleppmann.com/2015/03/04/turning-the-database-inside-out.html</a></p><p>Modern enterprises that embraced event streaming have essentially turned their monolithic databases inside out into distributed components that resemble a database when looked at holistically. Martin was right but Kafka is not the only star. Open table formats are the turning point where we finally turned the database inside out - they allow organizations to build their own Snowflake.</p><p>Martin talked about 4 things:</p><ul><li><p>Log Replication</p></li><li><p>Materialized Views</p></li><li><p>Secondary Indexes</p></li><li><p>Caching</p></li></ul><p>Of these four, two of them (Replication and Materialized Views), have successfully transitioned from internal database implementation to external core components of stream processing:</p><ul><li><p>Kafka and the rise of event streaming has featured ingestion of events into an externally,  durably-replicated facing log</p></li><li><p>Materialisation of these events in a downstream view (e.g. state in Streams/Flink or DB tables via Kafka Connect).</p></li></ul><p>But what about the other two? It&#8217;s been rare to hear about secondary indexes or caching in our day to day life as event streaming developers. These remain opaque, duplicated and fragmented inside the downstream systems fed by event streams.</p><p>For instance, it was not until you materialized a stream into e.g. Postgres that you could apply indexes and perform fast point lookups on it.</p><p>Streaming pipelines generate immense volumes of immutable data, but ad-hoc querying or joining across that data efficiently remains practically impossible due to a lack of secondary indexes.</p><p>Similarly, from the beginning caching was crucial for performance when data was repeatedly accessed and transformed. Kafka had basic caching inside it but nothing for external systems which required format conversion. For example, in retail cases transactions data was landed into Kafka. This data was then transformed and loaded into Logistics (for stock levels) and Marketing (for recommendations) domains via Connect. A cache avoided the two flows processing the same data in the same way twice (why burn that extra cpu?).</p><p>What&#8217;s more, turning the database inside out didn&#8217;t stop with Kafka. With the advent of open table formats such as Apache Iceberg more and more of data storage and processing internals were exposed. Iceberg exposed previously internal concepts like partitioning, versioning, logical deletion and statistics to allow processing engines to work with data at previously impossible levels of flexibility and performance.</p><p>In this article I&#8217;m going to explore these database concepts (and more) through the lens of Kafka and Iceberg to challenge how we think about the boundary between streaming systems and data lakes and how Iceberg might just be the bridge that finally unites them.</p><h1>Part 1: Exposing the WAL again</h1><p>Inside a modern database, new data is written to a Write-Ahead Log (WAL) before being persisted to long-term storage (like a diary the database keeps, noting down everything that happens so it can retrace its steps if needed). A decade ago, Martin Kleppmann championed the idea of making the WAL externally visible, streaming it through systems like Kafka, while pushing long-term storage into lakes via ETL such as Kafka Connect.</p><p>This separation unlocked real-time architectures, but it also created a new problem: Data became split into a hotset (WAL) and a coldset (data lake), and recombining them became the application&#8217;s burden.</p><p>A database owns both layers and so can seamlessly merge &#8220;just-written bytes&#8221; from the WAL with &#8220;years-old pages&#8221; on disk and present them as one coherent table</p><p>Caching, buffer pools etc. make this fast and transparent. But once we turn the database inside out, we lose this built-in functionality.</p><p>In the inverted architecture, Kafka becomes the WAL and as such is a short term buffer into more permanent storage. Kafka was never designed for long retention and keeping large retention windows is expensive and impractical.</p><p>For the long-term storage destination, the answer today is clear: Apache Iceberg</p><p>It provides:</p><ul><li><p>Cheap durable object storage</p></li><li><p>More complex access patterns (SQL)</p></li><li><p>Schema and partition evolution</p></li></ul><p>Unfortunately, streaming ingestion into Iceberg causes unmanageable metadata explosion.</p><p>Fundamentally, the WAL (Kafka) and the lake (Iceberg) speak different languages, records vs. files, offsets vs. snapshots. Each is highly attuned to it&#8217;s specific role.</p><p>The missing piece: a layer that projects Kafka into Iceberg. We call this Streambased I.S.K. (Iceberg Service for Kafka). I.S.K. is a database-like fusion of WAL+ long term storage, that functions as a cache above Kafka that:</p><ul><li><p>Mirrors the structure of Iceberg tables, not the structure of Kafka topics</p></li><li><p>Presents the hottest window of data remaining in Kafka + cold data already in Iceberg</p></li><li><p>Dynamically translates Kafka records into Iceberg-shaped files/metadata</p></li></ul><p>This allows Iceberg clients to see one logical table, clients query a single table but under the hood, the cache is doing the work of DB internals.</p><p><strong>&#8220;But Kafka already caches data!&#8221;</strong></p><p>Yes, Kafka relies heavily on the Linux page cache. This is what makes it fast for consumers: recent reads and writes are served directly from memory.</p><p>But this cache:</p><ul><li><p>Is accidental, not controlled</p></li><li><p>Is record-oriented, not table-oriented</p></li><li><p>Does not mirror the layout or semantics of Iceberg tables</p></li></ul><p>Great for stream consumption, but not the caching layer you need to unify stream+lake data.</p><p>A true &#8220;inverted database&#8221; needs more than a visible WAL and scalable lake storage, it needs a caching layer that reassembles them into a single, coherent table the way a traditional database always has.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n1Gd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n1Gd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 424w, https://substackcdn.com/image/fetch/$s_!n1Gd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 848w, https://substackcdn.com/image/fetch/$s_!n1Gd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 1272w, https://substackcdn.com/image/fetch/$s_!n1Gd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n1Gd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png" width="1456" height="881" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:881,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n1Gd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 424w, https://substackcdn.com/image/fetch/$s_!n1Gd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 848w, https://substackcdn.com/image/fetch/$s_!n1Gd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 1272w, https://substackcdn.com/image/fetch/$s_!n1Gd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69ebca43-a48a-45cf-addd-a93e13d98c77_2048x1239.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Part 2: Indexing</h1><p><strong>Does Kafka have indexes?</strong></p><p>I asked this question in a recent talk at Confluent&#8217;s Current Conference and only 2 thought it did. Even engineers that have worked on Kafka for years aren&#8217;t aware of Kafka&#8217;s built in indexing.</p><p>Kafka actually maintains two indexes:</p><ul><li><p>An offset index maps message offsets to physical byte positions in the Kafka data files - this index is used to quickly find the correct position in the data to read from.</p></li><li><p>A time index maps message timestamps to message offsets - this allows you to seek a particular time in a topic quickly and easily.</p></li></ul><p>Don&#8217;t take my word for it, if you open the data directory in your Kafka cluster you will see files ending &#8220;.index&#8221; and &#8220;.timeindex&#8221; alongside the log data.</p><p>These indexes don&#8217;t physically reorder data to answer queries faster. Instead, they layer lightweight additional structures that map question (&#8220;what happened at time x&#8221;) onto the original data layout.</p><p>The same idea can be applied in Iceberg where data is physically laid out in partitions. A partition is a collection of data files that share common attributes. For instance, partitioning may group financial transactions by account no., making it faster to calculate the balance for an account.</p><p>Unfortunately, because partitioning affects the physical data layout, it&#8217;s difficult to optimise for multiple query patterns. If we wanted to search our financial data by transaction id, partitioning by account doesn&#8217;t help and the engine would need to visit every account partition to find the correct data.</p><p>We can address this problem in the same way Kafka addresses querying by timestamp: by layering indexes on top of each other to resolve the correct data. With a new index that maps transaction id -&gt; account no.. our Iceberg engine can lookup by transaction id with the same performance boost as if it was looking up by account no. The flow is this:</p><ol><li><p>Find the transaction id in the index and find the account no(s) associated</p></li><li><p>Query the table for the account no. pruning away unneeded partitions</p></li><li><p>Filter the (much smaller) result set for the transaction id.</p></li></ol><p>Iceberg already provides the tools for this. An Iceberg table is the ideal place to store the index and applying the index is a case of adjusting the query:</p><pre><code>SELECT *
FROM  transactions_by_account_no tab
JOIN transaction_id_to_account_no index
ON  tab.account_no = index.account_no
WHERE index.transaction_id = &#8216;1234&#8217;</code></pre><p>Given the above, Iceberg engines (Spark/Trino etc.) are able to follow the join and execute the query faster.</p><p>The principle is the same, Kafka translates timestamps &#8594; offsets &#8594; bytes; Iceberg translates predicates &#8594; partitions &#8594; files, both without changing the layout.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4hGR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4hGR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 424w, https://substackcdn.com/image/fetch/$s_!4hGR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 848w, https://substackcdn.com/image/fetch/$s_!4hGR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 1272w, https://substackcdn.com/image/fetch/$s_!4hGR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4hGR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png" width="800" height="857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:857,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4hGR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 424w, https://substackcdn.com/image/fetch/$s_!4hGR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 848w, https://substackcdn.com/image/fetch/$s_!4hGR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 1272w, https://substackcdn.com/image/fetch/$s_!4hGR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69546d53-f4d3-4bb4-a449-0e8206f13b05_800x857.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is not a silver bullet however, for translation layers to work well the data it is addressing should be naturally-clustered  (spoiler alert: a LOT of data is naturally clustered) and high cardinality. Timestamps are a great example of this,  in most cases Kafka data is roughly time ordered anyway, we can expect records to be loosely clustered by timestamp but because we cannot guarantee this, selecting a time range in Kafka without an index involves a full scan. Kafka&#8217;s time index takes advantage of this naturally clustered layout to quickly find the clusters we are interested in and save a lot of scanning.</p><p>Another great example is a sessionId. Sessions are generally short-lived, so you have high certainty that events for a given session are grouped close together. This means:</p><ul><li><p>An index on sessionId allows lookups to jump to a small contiguous region(s)</p></li><li><p>Reads on the underlying data stay mostly sequential and cheap</p></li><li><p> A predictable, bounded amount of data to read per session regardless of total topic size</p></li></ul><p>That last point is important because it highlights why this scales: the cost of the lookup grows with the session size not with the log size.</p><p>You see the same pattern with request IDs, trace IDs, transaction IDs, common fields that already have great locality.</p><p>Not all data can leverage translation layers though. Consider something like a product category: A low cardinality field that would usually be evenly spread across the entire offset range. If we apply our translation technique here we end up with the following roadblocks:</p><ul><li><p>Every value maps to offsets across the entire topic</p></li><li><p>Index lookups fan out into many small, non-sequential reads</p></li><li><p>The end result is close to a full scan of the underlying data </p></li></ul><p>Instead of looking closer at the data, we often reach for practices that expensively rewrite the underlying data (repartitioning, compaction etc.) or move it to entirely new storage layers. All the while ignoring the fact that the existing layout could already support fast access with a thin translation layer on top. This is what we&#8217;re building in Streambased.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BMi_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BMi_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 424w, https://substackcdn.com/image/fetch/$s_!BMi_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 848w, https://substackcdn.com/image/fetch/$s_!BMi_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 1272w, https://substackcdn.com/image/fetch/$s_!BMi_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BMi_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png" width="800" height="852" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:852,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BMi_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 424w, https://substackcdn.com/image/fetch/$s_!BMi_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 848w, https://substackcdn.com/image/fetch/$s_!BMi_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 1272w, https://substackcdn.com/image/fetch/$s_!BMi_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bf4352d-22f8-4050-bb66-94bb78c62299_800x852.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Part 3: Too much indexing</h1><p>In the previous section we talked about what&#8217;s possible with indexing via translation structures. However, in data streaming today, indexing typically means materialisation of events into a state store. From this separate store, streaming data is read, re-organised and stored duplicately in a layout more suitable for access by future operations. For instance, taking a stream of financial transactions and using them to construct a table that is organised by the account number the transaction relates to.</p><p>Traditionally, these materialisations are built directly into the streaming applications that use them. Applications individually load source Kafka topics and materialise stores for their use and their use only. If you have 3 applications that need the same store that&#8217;s 3 duplicate copies of the data. This is in direct contrast to database style indexing where indexes are built once and available to all. Platforms like Flink and KSql went some way to address this inefficiency by treating all streaming flows as a single distributed application but it is still not common practice to reuse the materialisations of one flow in another.</p><p>Unlike in the database (where a database engine operates over all indexes), there is no component in streaming that ensures materialised tables are consistent, not duplicated and optimized for the tasks they perform across all applications. The result of this missing component we call &#8220;index fan out&#8221;: a new pipeline, a new sink, a new copy of the same data for every materialisation and no consistency guarantees between them.</p><p>How can we address this limitation? By centralising the index. Creating a single authority for what is indexed where. All we need is a central place to store this indexing data: enter Iceberg. The ideal open and central place for applications to source from.</p><p>Fan-out indexing isn&#8217;t evil, it&#8217;s a workaround for missing primitives and a central management component. As logs and tables converge, the opportunity arises to stop duplicating data and start indexing it properly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O1R_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O1R_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 424w, https://substackcdn.com/image/fetch/$s_!O1R_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 848w, https://substackcdn.com/image/fetch/$s_!O1R_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 1272w, https://substackcdn.com/image/fetch/$s_!O1R_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O1R_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png" width="800" height="877" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:877,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O1R_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 424w, https://substackcdn.com/image/fetch/$s_!O1R_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 848w, https://substackcdn.com/image/fetch/$s_!O1R_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 1272w, https://substackcdn.com/image/fetch/$s_!O1R_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07972588-f5fc-4842-bd63-22e26d0cfde7_800x877.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Part 4: Deduplication</h1><p>On our path towards turning the database inside out there&#8217;s an elephant in the room that it&#8217;s finally time to confront: data duplication between streaming and analytical systems.</p><p>In a typical modern architecture, data lands in a streaming system, then gets ETL&#8217;d into an analytical store. Streaming applications read one copy, analytical applications read another and a multi billion dollar industry has grown up around transferring between the two.</p><p>However, by taking this approach we&#8217;re not just moving bytes around, we&#8217;re creating two (or many more) versions of reality. One might be a few seconds behind. One might have a slightly different schema. One might silently drop a field or re-order events. And now every downstream team has to ask the same uncomfortable question: which one is correct?</p><p><strong>What if the problem isn&#8217;t ETL, but the boundary we&#8217;ve chosen?</strong></p><p>Instead of treating streaming and analytics as two separate worlds that must be synchronized after the fact, imagine storage being composed of two parts: one optimized for streaming access and one optimized for analytical access. On top of that, a shared abstraction lets applications interact with the data without caring where a particular read or write is physically served from. This is the principle Streambased is founded around.</p><p>Earlier in this article I wrote about consuming WAL and long term storage elements of data architectures and it is these that create this composed dataset. In our inverted architecture Kafka provides the WAL element and Iceberg the long term storage. These are joined for analytical purposes by Streambased I.S.K. This lets analytical systems query live streaming topics as if they were just another set of tables, giving teams immediate access to the freshest data alongside full history in one place. All without ETL, duplication, or lag.</p><p>But what about streaming? This idea works in both directions, if we can compose storage for analytical workloads, why can&#8217;t we do the same for streaming workloads? That&#8217;s where Streambased K.S.I. comes in, instead of forcing Kafka clients to live only in the kafka data and leave the cold, analytical data unreachable. K.S.I. maps Iceberg&#8217;s rich historical store back into a Kafka-like interface so streaming systems can see the full timeline as a single, logically continuous topic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9fAx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9fAx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 424w, https://substackcdn.com/image/fetch/$s_!9fAx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 848w, https://substackcdn.com/image/fetch/$s_!9fAx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!9fAx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9fAx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png" width="1456" height="917" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:917,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9fAx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 424w, https://substackcdn.com/image/fetch/$s_!9fAx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 848w, https://substackcdn.com/image/fetch/$s_!9fAx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!9fAx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41b02ce1-04c2-4f9c-925f-eda1a02dc55f_2048x1290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Turning the database inside out again</h1><p>To really &#8220;turn the database inside out&#8221; we need to stop thinking of Kafka and Iceberg as two separate destinations, and start treating them as two layers of the same logical system. Kafka is the hot edge: a fast, append-only WAL optimized for fan-out and low-latency consumption. Iceberg is the durable core: a table abstraction over cheap object storage that makes history queryable, evolvable, and governable.</p><p>The mistake the industry made was assuming the boundary between these two layers should be crossed with ETL, rather than bridged with database primitives.</p><p>Once you accept that, indexing and caching stop being &#8220;features of downstream systems&#8221; and become shared infrastructure again. Meaning:</p><ul><li><p>A translation layer can project the log into the table, preserve locality, and expose secondary indexes as first-class assets rather than bespoke application code.</p></li><li><p>Materializations can be built once and reused everywhere, with consistency guarantees that feel more like a database than a pipeline zoo.</p></li><li><p>A single source of truth dataset can be consumed by operational and analytical clients in the formats they expect</p></li></ul><p>In that world, the question isn&#8217;t whether Kafka replaces the database or Iceberg replaces the warehouse. The real shift is that the database becomes a set of open, interoperable components (log, cache, indexes, and tables) and we get to assemble them without giving up correctness or performance. Kafka was the start of the inversion, adding Iceberg makes it permanent. And the missing bridge between them is where the next decade of data infrastructure will be built.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tcuo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tcuo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 424w, https://substackcdn.com/image/fetch/$s_!Tcuo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 848w, https://substackcdn.com/image/fetch/$s_!Tcuo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 1272w, https://substackcdn.com/image/fetch/$s_!Tcuo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tcuo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png" width="1456" height="962" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:962,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tcuo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 424w, https://substackcdn.com/image/fetch/$s_!Tcuo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 848w, https://substackcdn.com/image/fetch/$s_!Tcuo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 1272w, https://substackcdn.com/image/fetch/$s_!Tcuo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b9512ea-55d6-4ee1-af23-a6829a17c519_2048x1353.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading ZeroCopy! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Making Iceberg Real-time]]></title><description><![CDATA[Apache Iceberg is the latest entry in the never ending quest for a simple, single way to store and process data at scale.]]></description><link>https://blog.streambased.io/p/making-iceberg-real-time</link><guid isPermaLink="false">https://blog.streambased.io/p/making-iceberg-real-time</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Wed, 19 Nov 2025 14:00:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7A0Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7A0Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7A0Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!7A0Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!7A0Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!7A0Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7A0Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:1972193,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/179247021?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7A0Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!7A0Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!7A0Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!7A0Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48e3d59f-7e08-4149-bbf1-80bd7fb5377e_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apache Iceberg is the latest entry in the never ending quest for a simple, single way to store and process data at scale. Taking over from the enormously successful Apache Hive, it was  created to:</p><ol><li><p>Provide a single store of large amounts of raw data in a very cost effective way - in the modern cloud native world this means object storage.</p></li><li><p>Allow consumption of this data using a universal language (SQL)</p></li></ol><p>Iceberg was initially built to solve Hive&#8217;s problems:</p><ol><li><p>Slow query planning at scale (Hive required listing every partition)</p></li><li><p>Lack of ACID guarantees (concurrent writes could corrupt tables)</p></li><li><p>Schema evolution required full table rewrites</p></li><li><p>No time travel or snapshot isolation</p></li></ol><p>And perhaps more importantly:</p><ol><li><p>Provide an open, well-documented standard. in data serving that can be implemented in many engines (no vendor lock in).</p></li><li><p>Scale to handle the largest possible datasets (essentially infinite)</p></li></ol><p>As Iceberg adoption began growing people quickly saw the opportunity afforded and began looking beyond simple batch workloads to more modern data ingestion approaches. As usual Postgres is ahead of the game, the team at Mooncake Labs built <a href="https://www.mooncake.dev/moonlink/">Moonlink</a>: a real-time ingestion engine that moves data from Postgres directly into the open-table format Apache Iceberg.<a href="https://www.mooncake.dev/moonlink/?utm_source=chatgpt.com"> </a>This was recently acquired by Databricks and similarly, Snowflake&#8217;s acquisition of CrunchyData produced pg_lake with a similar goal of unifying Postgres and downstream Iceberg.</p><p>Data streaming needs to catch up! Unfortunately, however, real-time Kafka workloads such as these are not a good fit for Iceberg where every new write must:</p><ol><li><p>Generate data files (e.g. Parquet files)</p></li><li><p>Write manifest files describing those data files</p></li><li><p>Atomically update table metadata</p></li></ol><p>Today an intermediate system is needed to ingest Kafka data and provide an analytical interface to it. Many powerful options exist (Clickhouse, Pinot, Druid, etc.) but they all lack the &#8220;single store&#8221; benefits offered by a lake approach. If you wish to query both real-time and batch data (from Iceberg) today, <a href="https://blog.streambased.io/p/the-9-ways-to-move-data-kafka-iceberg">complex extra steps</a> (e.g. Confluent Flink snapshot queries) are required to combine the two views of the world.</p><h2><strong>What does great look like?</strong></h2><p>The above is far from optimal but what would great look like? The answer to this is simple. The optimal solution would provide a seamless dataset that stretches all the way from the newest real-time data point (written 1 nanosecond ago) to the oldest point available (written 5 yrs ago). Users would be able to interact with this view using the tools and techniques they use today (SQL) without adopting any special procedure or additional technologies. In fact, the surest sign that this optimal solution has been achieved is that its users do not even know it exists.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9kZy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9kZy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 424w, https://substackcdn.com/image/fetch/$s_!9kZy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 848w, https://substackcdn.com/image/fetch/$s_!9kZy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 1272w, https://substackcdn.com/image/fetch/$s_!9kZy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9kZy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png" width="2781" height="2360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2360,&quot;width&quot;:2781,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:310048,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/179247021?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ca05cc-542f-4b7e-ac47-045767db99a0_3032x2683.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9kZy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 424w, https://substackcdn.com/image/fetch/$s_!9kZy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 848w, https://substackcdn.com/image/fetch/$s_!9kZy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 1272w, https://substackcdn.com/image/fetch/$s_!9kZy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26e6cd5b-942c-439f-9ab8-92247d7706a5_2781x2360.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>What if everything was Iceberg? What if everything was Kafka?</strong></h2><p>Achieving the above is actually remarkably simple, we just take the interfaces offered by each system and apply them globally. We need something akin to a proxy layer that serves Kafka data as if it was Iceberg and Iceberg as if it was Kafka. Users could then use the proxy layer to fetch unified data, or skip it and access the underlying system directly when their requirements are satisfied.</p><p>Because the proxy layer would need to understand both Kafka and Iceberg, it can also trivially automate the movement of real-time data to the lake. This is exactly what we created at Streambased!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2VVG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2VVG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 424w, https://substackcdn.com/image/fetch/$s_!2VVG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 848w, https://substackcdn.com/image/fetch/$s_!2VVG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 1272w, https://substackcdn.com/image/fetch/$s_!2VVG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2VVG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png" width="4481" height="3027" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3027,&quot;width&quot;:4481,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:620449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/179247021?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca61a5b-70b9-4a1a-8038-ac06e624919d_4889x3300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2VVG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 424w, https://substackcdn.com/image/fetch/$s_!2VVG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 848w, https://substackcdn.com/image/fetch/$s_!2VVG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 1272w, https://substackcdn.com/image/fetch/$s_!2VVG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f17968-b93d-40d0-91cb-0407959d8e23_4481x3027.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>The How&#8230;</strong></h2><p>First some ground rules:</p><ol><li><p>Building a seamless real-time/batch view must not compromise the performance of either system in isolation</p></li><li><p>The solution should not involve data movement (No ETL), the real-time and batch systems should always remain the source of truth for their data</p></li></ol><p>To achieve this, we have two products - I.S.K. (Iceberg API on Kafka) and K.S.I. (Kafka API on Iceberg).</p><h3>One Table Across All Time</h3><p><strong>Streambased I.S.K.</strong> presents a set of Iceberg tables in which the underlying storage can be composed of a section of real-time data from Kafka (the hotset) and a section of physical Iceberg data (the coldset). Tables in I.S.K. combine these two datasets in a way that is completely transparent to any clients interacting with it (it just looks like a regular Iceberg table).</p><p>For example, with I.S.K., <code>&#8217;SELECT * FROM transactions&#8217;</code> executed from Snowflake would retrieve records stored in both the transactions Iceberg table and the transactions Kafka topic and mix them together to provide a seamless table.</p><p>The I.S.K. architecture consists of the following components:</p><ol><li><p>A storage gateway - Iceberg is expecting files so I.S.K. must have a way to provide a file based interface to engines. I.S.K. presents an Amazon S3 compatible API to engines that can serve both metadata and data files with data sourced from Kafka.</p></li><li><p>An Iceberg catalog - I.S.K. presents a simple, read only, catalog for Kafka data, this is the entrypoint for Iceberg engines.</p></li><li><p>A cache - To reduce impact on the Kafka cluster and improve Iceberg performance, I.S.K. caches files served by the storage gateway. These files represent sections of immutable Kafka log and so can be cached and invalidated at will.</p></li><li><p>An indexing engine - Most Iceberg queries will not address the entire dataset. The Kafka API does not allow access patterns that easily address subsets of data. To address this I.S.K. maintains indexes that map Iceberg partitions -&gt; Kafka offsets, making Iceberg engines able to prune away the Kafka data they do not need.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JhAP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JhAP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 424w, https://substackcdn.com/image/fetch/$s_!JhAP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 848w, https://substackcdn.com/image/fetch/$s_!JhAP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 1272w, https://substackcdn.com/image/fetch/$s_!JhAP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JhAP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png" width="4480" height="3869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3869,&quot;width&quot;:4480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:684571,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/179247021?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbf4cc4-7e85-4944-a712-3a35e2620f32_4947x4027.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JhAP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 424w, https://substackcdn.com/image/fetch/$s_!JhAP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 848w, https://substackcdn.com/image/fetch/$s_!JhAP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 1272w, https://substackcdn.com/image/fetch/$s_!JhAP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51587ca6-ed7a-4115-9b7e-1dc032ad4b75_4480x3869.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>One Stream Across All Time</strong></h3><p><strong>Streambased K.S.I.</strong> presents a number of Kafka topics in which the underlying storage can be composed of the same &#8220;hotset&#8221; section of data served directly from Kafka and &#8220;coldset&#8221; section served from Iceberg that I.S.K. uses above. K.S.I. maps columns from the Iceberg tables to Kafka&#8217;s partition and offset concepts allowing Kafka clients to interact with them as if they were Kafka topics.</p><p>For example, with I.S.K., a console consumer execution like:</p><p><code>kafka-avro-console-consumer --topic transactions --bootstrap-server ksi:9192 --from-beginning</code></p><p>Would start by reading older records from the transactions Iceberg table before progressing to newer records in the transactions Kafka topic.</p><p>The K.S.I. architecture is simpler than I.S.K. as the access patterns are a lot more limited. It consists of:</p><ol><li><p>An Iceberg engine -  to fetch table formatted data</p></li><li><p>A row processor - to converts Iceberg table rows into Kafka messages - This component reformats the column oriented Iceberg data into the key/value based messages Kafka clients expect. Governance steps like Schema Registry integration are applied here too.</p></li><li><p>A proxy (we use the open source Kroxylicious) - to serve Kafka clients. Most requests/responses will be passed through to the underlying Kafka cluster but fetch requests that reference cold stored Iceberg data will be served by K.S.I. and not the underlying cluster.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B64d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B64d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 424w, https://substackcdn.com/image/fetch/$s_!B64d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 848w, https://substackcdn.com/image/fetch/$s_!B64d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 1272w, https://substackcdn.com/image/fetch/$s_!B64d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B64d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png" width="3736" height="4426" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4426,&quot;width&quot;:3736,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:718887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/179247021?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad71a8ec-7021-46b1-b60b-89a204640f2e_4109x4672.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B64d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 424w, https://substackcdn.com/image/fetch/$s_!B64d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 848w, https://substackcdn.com/image/fetch/$s_!B64d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 1272w, https://substackcdn.com/image/fetch/$s_!B64d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bfbec78-e46f-4915-a17a-6ca9473d332b_3736x4426.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With I.S.K. and K.S.I., users can query and consume data continuously from the freshest event to the oldest record without ever worrying about synchronization or data duplication.</p><p><strong>TL;DR:</strong> Streambased makes real-time and historical data behave like a single Iceberg table or a single Kafka topic, eliminating ETL, preserving performance, and unifying streaming + batch workloads on one logical layer.</p><p><strong>Stop stitching systems together. Start using a real-time lakehouse that just works.</strong></p><p>If you&#8217;d like to read about what we learned building this: </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.streambased.io/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The 9 Ways to Move Data Kafka -> Iceberg]]></title><description><![CDATA[Exploring the trade-offs between zero-copy and copy architectures]]></description><link>https://blog.streambased.io/p/the-9-ways-to-move-data-kafka-iceberg</link><guid isPermaLink="false">https://blog.streambased.io/p/the-9-ways-to-move-data-kafka-iceberg</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Mon, 20 Oct 2025 13:11:04 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/71507fdd-a153-48a7-936e-3ec8d0a6a91b_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lbt3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lbt3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!lbt3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!lbt3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!lbt3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lbt3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2327199,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/176216257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lbt3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!lbt3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!lbt3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!lbt3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3313e234-bf50-4a2a-8a4f-577ef20395cd_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you were to intentionally design two protocols/formats with poor interopability, you would be hard pushed to create something <strong>worse</strong> than Kafka protocol and Apache Iceberg. Both standards were developed and optimised for entirely different purposes and share almost <strong>nothing</strong> in common:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IeFf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IeFf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 424w, https://substackcdn.com/image/fetch/$s_!IeFf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 848w, https://substackcdn.com/image/fetch/$s_!IeFf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 1272w, https://substackcdn.com/image/fetch/$s_!IeFf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IeFf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png" width="1240" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:1240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IeFf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 424w, https://substackcdn.com/image/fetch/$s_!IeFf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 848w, https://substackcdn.com/image/fetch/$s_!IeFf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 1272w, https://substackcdn.com/image/fetch/$s_!IeFf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8681fb-7b0b-4809-b98f-bec2882c48de_1240x492.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That said, the promise that the unification of these two projects brings is too tempting to pass up. If we were able to combine Iceberg and Kafka, we&#8217;d have a single continuous view of data from right this very millisecond all the way back to the beginning of time. No data inconsistency, no mental overhead in figuring out where to pull data from.</p><p>For this reason almost all vendors are currently building towards this goal. In flows of this type Kafka is the entry point, providing an easy place to land data for real-time serving to the more latency sensitive use cases. Things like microservices, web apps and observability platforms that need to give you answers in seconds, not hours.. Iceberg, on the other hand, provides long term storage for tasks that can wait for hours or days. Things like reporting, strategic decision making and AI/ML tasks.</p><p>Most importantly, any single data point has its place in both systems. I t may be processed in your microservice for your webapp, and then recorded for longer-term analysis by your CEO. This creates a natural flow of data from Kafka to Iceberg but this flow is not without its twists and turns. Let&#8217;s look at the 4 big issues:</p><h3><em><strong>Issue 1. Data Freshness </strong></em><strong>&#128338;</strong></h3><p>Kafka likes to pass messages around in small batches (~16KB). Conversely, Apache Iceberg works best when data is written in large chunks (~512MB). This mismatch is usually addressed by adding a step in the transfer flow that accumulates a number of small messages (usually driven by a total size in bytes or a time period) into a larger chunk before writing to Iceberg.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gi09!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gi09!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 424w, https://substackcdn.com/image/fetch/$s_!Gi09!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 848w, https://substackcdn.com/image/fetch/$s_!Gi09!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 1272w, https://substackcdn.com/image/fetch/$s_!Gi09!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gi09!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png" width="1456" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gi09!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 424w, https://substackcdn.com/image/fetch/$s_!Gi09!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 848w, https://substackcdn.com/image/fetch/$s_!Gi09!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 1272w, https://substackcdn.com/image/fetch/$s_!Gi09!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f5bac52-b732-4028-a06a-70e540d79fef_1600x932.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This inevitably results in time spent <em><strong>waiting</strong></em>. The downside of that is a reduced freshness of data available in Iceberg. The accumulation period means no Iceberg ingestion is happening until the batch size is hit. This creates a minimum lag time (&gt; 15mins). In practice, this means you&#8217;re never able to use your beloved SQL in describing something that happened now</p><h3><em><strong>Issue 2. Table Maintenance &#129529;</strong></em></h3><p>Apache Iceberg requires that a number of recurring maintenance procedures are executed on any dataset stored within it. The most common of these are:</p><p>1. Compaction - Iceberg is a file based format and, as mentioned above, Iceberg likes these files to be large. Even with a wait time for accumulation, hitting the right size in time isn&#8217;t always possible. It&#8217;s common for Iceberg deployments to end up holding files that are smaller in size than optimal. This hurts query performance. Compaction is Iceberg&#8217;s mechanism for fixing this - it combines the small files together into larger files.</p><p>2. Snapshot expiration - Snapshot&#8217;s in Iceberg represent the state of a table at a given time and are required for Iceberg&#8217;s time travel feature (time travel allows you to view a table at its previous state in history). Whenever new data is written to an Iceberg table a snapshot is created so that time travel can revert back to the table state before the write. Over time, these snapshots build up and make querying increasingly slower and more compute-intensive. Snapshot Expiration mechanism was created to delete older ones and restore performance.</p><h3><em><strong>Issue 3. Lacking a single source of truth </strong></em><strong>&#129517;</strong></h3><p>Any copy of data introduces the possibility of inconsistency. Corruption, duplication, schema skew are but a few examples of what can happen when the same data point is written across many destinations. There are many ways to slice the Kafka -&gt; Iceberg flow to reduce or promote duplication between the two systems. Each offers different trade offs and hence must be evaluated carefully.</p><h3><em><strong>Issue 4. Partitioning </strong></em><strong>&#128450;&#65039;</strong></h3><p>Both Kafka and Iceberg have the concept of a partition but they are vastly different in meaning.</p><ul><li><p><strong>In Kafka:</strong> a partition is a separate log. It acts as unit of concurrency used only to parallelise processing.</p></li><li><p><strong>In Iceberg</strong>: a partition is an organisational structure imposed on data to allow efficient querying. Iceberg partitions group similar data together and label it so that queries that reference that data know which groups are relevant to the query and which are not.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D6KD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D6KD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 424w, https://substackcdn.com/image/fetch/$s_!D6KD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 848w, https://substackcdn.com/image/fetch/$s_!D6KD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!D6KD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D6KD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png" width="1456" height="1508" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1508,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D6KD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 424w, https://substackcdn.com/image/fetch/$s_!D6KD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 848w, https://substackcdn.com/image/fetch/$s_!D6KD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!D6KD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef59b4b0-f28a-47a4-a7d3-75ce6468aff5_1545x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Effective partitioning in Iceberg means the difference between reading every file stored to satisfy a query and reading only a few relevant ones. It can increase query performance by orders of magnitude so must be considered.</p><h2><strong>Solutions</strong></h2><p>Ok, enough problems. Onto the solutions. There are 2 general approaches to the Kafka -&gt; Iceberg flow today:</p><ul><li><p><strong>copy</strong>: those that simply copy data</p></li><li><p><strong>zero copy</strong>: those that integrate the two systems more tightly.</p></li></ul><p>First I will look at the former: systems that treat Kafka as the source for an analytically focused Iceberg copy.</p><div><hr></div><h2>&#129534; Copy based solutions</h2><p>In general these have the following pros and cons:</p><p>&#9989; Pros:</p><ul><li><p>It is easy and generally not impactful to reorganise data in a copy - copies can have different partitioning, materialisation and other operations applied during the flow that make them better for analytical purposes without affecting the source Kafka.</p></li><li><p>Simplicity - Copies in Iceberg are separate from the original data in Kafka making it easy to understand the flow and resulting data sets.</p></li></ul><p>&#9888;&#65039; Cons:</p><ul><li><p>Every copy of a dataset requires extra storage resources (and associated costs) and introduces the possibility for inconsistency.</p></li><li><p>Copies involve the transfer of data from one system to another and so , due to the laws of physics, mean that the destination system (Iceberg) must lag behind the source system (Kafka). With Iceberg in particular, reducing this lag increases table maintenance so a careful balance must be maintained to achieve optimal resource usage and availability.</p></li><li><p>Copy based solutions usually have a large maintenance overhead coming from external systems and practices that work alongside the main data transfer flow.</p></li></ul><h3><strong>1. Kafka Connect</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rwqk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rwqk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 424w, https://substackcdn.com/image/fetch/$s_!rwqk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 848w, https://substackcdn.com/image/fetch/$s_!rwqk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 1272w, https://substackcdn.com/image/fetch/$s_!rwqk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rwqk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png" width="700" height="301" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:301,&quot;width&quot;:700,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rwqk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 424w, https://substackcdn.com/image/fetch/$s_!rwqk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 848w, https://substackcdn.com/image/fetch/$s_!rwqk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 1272w, https://substackcdn.com/image/fetch/$s_!rwqk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b4b43dc-70ff-4a70-bf5b-72c75fef1815_700x301.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://medium.com/@shahsoumil519/stream-real-time-data-to-aws-s3-tables-using-kafka-iceberg-sink-connector-hands-on-labs-fe43f6869ba7">credit</a></figcaption></figure></div><p>The Iceberg Kafka Connect Sink connector enables you to ingest records from Kafka topics and write them into Iceberg tables (on various storage backends) using the Iceberg table abstraction. As a core piece of the Apache Iceberg project it is open source, mature and feature rich supporting routing, schema evolution, table partitioning and external Iceberg catalogs (Hive, Glue, Nessie, etc.).</p><p>Its key advantage is integration with the Kafka Connect ecosystem. If you are already running Kafka Connect configuration is little more than a new library and some JSON. With that you get Connect&#8217;s scaling, fault tolerance, checkpointing, transforms (via SMT) and many other benefits. This is great if you have an existing Connect estate, however, if you do not have such an estate there is a steep learning curve in creating this platform before you can use the connector.</p><p>Unfortunately, like most connectors, the Iceberg connector considers its job merely to transport data. This means that table maintenance operations such as compaction and snapshot expiration must be handled separately. Also, due to the efficient data transfer focus there can be a significant lag between data creation in Kafka and availability in Iceberg (this is a common theme in copy solutions).</p><h3><strong>2. RedPanda Iceberg Topics</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jUKn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jUKn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 424w, https://substackcdn.com/image/fetch/$s_!jUKn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 848w, https://substackcdn.com/image/fetch/$s_!jUKn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 1272w, https://substackcdn.com/image/fetch/$s_!jUKn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jUKn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png" width="1200" height="542" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:542,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;About Iceberg Topics | Redpanda Self-Managed&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="About Iceberg Topics | Redpanda Self-Managed" title="About Iceberg Topics | Redpanda Self-Managed" srcset="https://substackcdn.com/image/fetch/$s_!jUKn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 424w, https://substackcdn.com/image/fetch/$s_!jUKn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 848w, https://substackcdn.com/image/fetch/$s_!jUKn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 1272w, https://substackcdn.com/image/fetch/$s_!jUKn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4460bb0-d0df-487d-bd57-ad914be2facc_1200x542.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://docs.redpanda.com/current/manage/iceberg/about-iceberg-topics/">credit</a></figcaption></figure></div><p>Iceberg Topics enables an additional flow in Redpanda to the regular log storage that persists topic data directly into Iceberg table format. Unlike the connect approach above this is handled within the broker without needing a separate connector/ETL process.</p><p>This closer integration means that RedPanda goes deeper into the downstream Iceberg tables providing features such as automated snapshot expiration and custom partitioning. RedPanda has also focused on the wider ecosystem, offering native integration with the largest Iceberg consumers (Snowflake/Databricks/AWS Glue) and their catalogs.</p><p>One current limitation of the RedPanda solution is the inability to backfill an Iceberg table from an existing Kafka topic. For instance, if your Kafka topic has a retention of 7 days and you enable Iceberg integration it will be a full week before the Iceberg table and Kafka topic are consistent. However, I spoke with the folks at RedPanda and they are actively developing a solution to this restriction and it&#8217;s due to be addressed in the near future.</p><p>RedPanda also provides other enterprise features needed to run an effective Kafka -&gt; Iceberg solution such as a DLQ table for records that are incompatible with the destination Iceberg table and support for storage on all 3 major cloud providers.</p><p>The killer feature here is time to value, RedPanda have resolved many of the barriers to an effective Kafka -&gt; Iceberg flow meaning you can be up and running in seconds, however, with enterprise features comes enterprise licensing and you will need one to use this feature .</p><h3><strong>3. Confluent TableFlow</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8pav!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8pav!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8pav!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8pav!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8pav!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8pav!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tableflow is now generally available&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tableflow is now generally available" title="Tableflow is now generally available" srcset="https://substackcdn.com/image/fetch/$s_!8pav!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8pav!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8pav!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8pav!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52d7db41-3761-4559-9a3d-670abb0db962_960x540.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.confluent.io/blog/tableflow-is-now-generally-available/">credit</a></figcaption></figure></div><p>Tableflow is a cloud based, managed Kafka -&gt; Iceberg flow from Confluent Cloud similar to RedPanda&#8217;s above. If you are an existing Confluent Cloud user, you can materialize Kafka topics into open table formats such as Apache Iceberg or Delta Lake with a simple topic configuration.</p><p>TableFlow automates many of the common tasks needed when creating a Kafka -&gt; Iceberg (or DeltaLake) pipeline including schema mapping / evolution, type conversions, compacting small files, publishing metadata to catalogs and many more. Like RedPanda, Confluent supports all major external catalogs (AWS Glue, Snowflake Polaris, Apache Polaris, Unity Catalog) but also provides the option to use its own catalog and storage for the true one-click up and running experience.</p><p>Confluent also provides other enterprise managed table features including compaction and snapshot expiration to keep tables clean and up to date. Furthermore, you can instruct TableFlow to do more than just make one to one copies of Kafka data, it supports Upsert/CDC modes to create materialisations based on keys/instructions.</p><p>The advantage of this solution is the maturity of the Confluent Cloud ecosystem. Confluent are able to leverage their mature cloud solution to create a &#8220;turn-key&#8221; experience that is trivial for existing Confluent Cloud customers. Unfortunately, this experience comes at a cost. TableFlow is significantly more expensive than its competitors today and locks you into the Confluent ecosystem.</p><h3><strong>4. WarpStream TableFlow</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FMuf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FMuf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 424w, https://substackcdn.com/image/fetch/$s_!FMuf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 848w, https://substackcdn.com/image/fetch/$s_!FMuf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 1272w, https://substackcdn.com/image/fetch/$s_!FMuf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FMuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png" width="1162" height="598" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:598,&quot;width&quot;:1162,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:270230,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/176216257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FMuf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 424w, https://substackcdn.com/image/fetch/$s_!FMuf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 848w, https://substackcdn.com/image/fetch/$s_!FMuf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 1272w, https://substackcdn.com/image/fetch/$s_!FMuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1ff6a83-09c4-4de7-b5d2-1bb81ea66fe6_1162x598.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.warpstream.com/tableflow">credit</a></figcaption></figure></div><p>The newest entry into the Kafka -&gt; Iceberg arena, Warpstream tableFlow is, like its namesake, a fully packaged solution for materialising Iceberg tables from Kafka data. Also like its Confluent counterpart, Iceberg tables created by Warpstream are fully managed right through data commitment, compaction, snapshot expiration and metadata management.</p><p>Where Warpstream&#8217;s solution differs is in it&#8217;s integration beyond the Confluent ecosystem. Warpstream TableFlow is a true BYOC system where you define source clusters (Kafka or Kafka-compatible from any provider) and designate where the resulting table data is stored (object storage buckets from any cloud provider either inside or outside Warpstream)</p><p>It supports integration with external catalogs and query engines. For example, you can register tables in catalogs (e.g. Glue, Snowflake, others) and provides guides on how to access this from the usual Iceberg engines (Trino, Spark, Clickhouse etc.).</p><p>Warpstream TableFlow also provides fine-grained data retention controls, allowing users to configure how long data is preserved at both the stream and table levels. Retention policies automatically manage snapshot expiration and file cleanup in object storage, ensuring that Iceberg tables remain compact, cost-efficient, and compliant with organizational data lifecycle requirements.</p><p>In summary, WarpStream&#8217;s TableFlow emphasizes simplicity, openness, and low-cost Iceberg materialization, while Confluent&#8217;s focuses on enterprise-grade governance, observability. There is significant overlap and it&#8217;s interesting to see such potentially competing products come under the same company umbrella.</p><h3><strong>5. AutoMQ Table Topics</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q5oO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q5oO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 424w, https://substackcdn.com/image/fetch/$s_!Q5oO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 848w, https://substackcdn.com/image/fetch/$s_!Q5oO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!Q5oO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q5oO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png" width="1456" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Introducing AutoMQ Table Topic: Seamless Integration with S3 Tables and  Iceberg&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Introducing AutoMQ Table Topic: Seamless Integration with S3 Tables and  Iceberg" title="Introducing AutoMQ Table Topic: Seamless Integration with S3 Tables and  Iceberg" srcset="https://substackcdn.com/image/fetch/$s_!Q5oO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 424w, https://substackcdn.com/image/fetch/$s_!Q5oO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 848w, https://substackcdn.com/image/fetch/$s_!Q5oO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!Q5oO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8574360-2906-4b94-a04a-ac33f7043fbd_2560x1040.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.automq.com/blog/automq-table-topic-seamless-integration-with-s3-tables-and-iceberg">credit</a></figcaption></figure></div><p>Table Topics is a built-in AutoMQ feature that lets data sent to certain Kafka topics be automatically materialized into Apache Iceberg. Like RedPanda, you don&#8217;t need a separate connector or external streaming engine, and work with separate copies of the data for Iceberg and the Kafka log.</p><p>As above, AutoMQ handles the full Iceberg flow including file writing, metadata commits using internally managed services. All of these operations are coordinated inside the broker by services that piggy back on AutoMQ&#8217;s stateless, elastic and leaderless brokers. In AWS, AutoMQ is the only solution here that can integrate with the newer &#8220;S3 Tables&#8221; (a cloud provider managed data + catalog construct) so that the materialized tables use Amazon&#8217;s catalog and maintenance features (compaction, snapshot removal) and are queryable via Athena, etc. This is not yet available f</p><p>One early gotcha, however, is that it must be enabled at cluster deployment time and cannot be retroactively enabled like TableFlow and RedPanda. This means integration must be scheduled and managed in a much more intrusive way.</p><p>Like TableFlow AutoMQ&#8217;s implementation allows you to materialise tables using CDC/Upsert semantics and, in a similar way the data structures required for this are Schema Registry driven (usually Confluent Schema Registry or Aiven Karapace). Unlike TableFlow, AutoMQ is available outside of managed cloud services. The code behind its Table Topics (and all other features) is open source (Apache 2.0) and available to run on premise.</p><p>Ultimately, AutoMQ&#8217;s Table Topics is a powerful way to unify streaming and analytics, turning Kafka topics directly into queryable Iceberg tables without the need for separate pipelines or orchestration layers.</p><h2>Summary of copy based solutions</h2><p>We&#8217;re at the half way point &#127937; so let&#8217;s recap what we&#8217;ve seen so far:</p><p><strong>Kafka Connect Iceberg Sink</strong> &#8594;open source and very flexible, if you can stomach its higher ops overhead.</p><p><strong>Redpanda Iceberg Topics</strong> &#8594; Super easy setup for new topics/flows but cannot be imposed on existing data.</p><p><strong>Confluent Tableflow</strong> &#8594; Enterprise-friendly but cloud only and vendor locked.</p><p><strong>AutoMQ Table Topic</strong> &#8594; Best for tight cloud-native integration (eg. S3 tables) and open-source posture, but newer/less battle-tested.</p><div><hr></div><h2>&#9889;Zero Copy solutions</h2><p>Up until now we have focused on solutions that maintain a clear separation between Kafka data and its Iceberg counterpart. These solutions generally force you to make an ugly choice between data freshness and Iceberg storage efficiency:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R-TK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R-TK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 424w, https://substackcdn.com/image/fetch/$s_!R-TK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 848w, https://substackcdn.com/image/fetch/$s_!R-TK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!R-TK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R-TK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png" width="1456" height="1491" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1491,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:&quot;Iceberg Hurts diagrams (7).png&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="Iceberg Hurts diagrams (7).png" srcset="https://substackcdn.com/image/fetch/$s_!R-TK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 424w, https://substackcdn.com/image/fetch/$s_!R-TK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 848w, https://substackcdn.com/image/fetch/$s_!R-TK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!R-TK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f56954a-90bb-41e7-9856-49d8aa2606df_1562x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There is another class of solutions that address this trade off by integrating Kafka and Iceberg more closely. These typically share storage between Iceberg and Kafka in some way and eliminate copies. This provides reduced cost and improved consistency.</p><p>In general these have the following pros and cons:</p><p>&#9989; Pros:</p><ul><li><p>Reduced lag - Zero copy solutions streamline the path of data from ingest to Iceberg by sharing storage between the two.</p></li><li><p>A Single Source of Truth - Any data written into a shared Kafka/Iceberg solution will be stored only once and accessed by both Kafka and Iceberg clients. This ensures total consistency between the two systems.</p></li><li><p>Cost reduction - Only one copy of the data is stored removing not only duplicated storage costs but management and reformatting costs too.</p></li></ul><p>&#9888;&#65039; Cons:</p><ul><li><p>Restrictions - A tighter integration means that either side of the Kafka/Iceberg divide must take on aspects of the other. A good example of this is Iceberg partitioning. If a system is to read Iceberg data efficiently through the Kafka protocol it should be partitioned by Kafka partition and offset and not in a way more suited to analytical queries..</p></li><li><p>Complexity - Integrated solutions require much closer management of the underlying data and associated read/write paths. This adds complexity and hidden behaviour. Usually this is internal complexity, however and these solutions often appear simpler to the wider world.</p></li></ul><h3><strong>6. Bufstream</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y7Vw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y7Vw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 424w, https://substackcdn.com/image/fetch/$s_!y7Vw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 848w, https://substackcdn.com/image/fetch/$s_!y7Vw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!y7Vw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y7Vw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y7Vw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 424w, https://substackcdn.com/image/fetch/$s_!y7Vw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 848w, https://substackcdn.com/image/fetch/$s_!y7Vw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!y7Vw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd1f134d-03bf-4d96-a347-e05a1d9a54aa_1456x1040.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://blog.dataengineerthings.org/bufstream-stream-kafka-messages-to-iceberg-tables-in-minutes-6c60c470e67f">credit</a></figcaption></figure></div><p>Bufstream is a Kafka-compatible streaming platform that can stream data directly from Kafka topics into Iceberg tables (i.e. broker-side materialization). Bufstream was developed with cloud native object storage as the primary storage and with a focus on schemas right from the beginning, excellent building blocks for an Iceberg solution. In Bufstream&#8217;s, parquet files are written to object storage by the broker on produce and shared by both Iceberg and Kafka readers, avoiding duplicating data and creating a true zero copy layout. The trade off here is that object storage is on the write path and adds producer latency. Bufstream typically experiences 3-5x higher end to end latency (P99) than traditional Kafka.</p><p>Bufstream handles the associated Iceberg metadata and maintenance operations are taken care of inline with Kafka (retention etc.) and Iceberg (compaction, snapshot expiration etc.) requirements. Bufstream also supports the various external Iceberg catalog types (REST catalogs, AWS Glue, Google BigQuery Metastore) however there are constraints here because the data is shared. The most important of these is that Iceberg data is read only and does not support operations that rewrite the underlying data files. Shared solutions must serve Iceberg data back to Kafka so rewrites of that data by external engines can have unexpected consequences.</p><p>As schemas are a first class citizen in Bufstream, the integration is schema-aware, interacting with a schema registry (especially for Protobuf schemas) to derive Iceberg schemas, enforce schema compatibility, and do semantic validation.</p><p>In the end, Bufstream design brings streaming and lakehouse paradigms together under a unified schema model, providing consistent semantics across ingestion, storage, and consumption, with a latency cost&#8230;</p><h3><strong>7. Aiven Iceberg Topics</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2q5c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2q5c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 424w, https://substackcdn.com/image/fetch/$s_!2q5c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 848w, https://substackcdn.com/image/fetch/$s_!2q5c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 1272w, https://substackcdn.com/image/fetch/$s_!2q5c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2q5c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png" width="1456" height="881" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:881,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Getting Started with Iceberg Topics for Apache Kafka&#174;: A Beginner's Guide&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Getting Started with Iceberg Topics for Apache Kafka&#174;: A Beginner's Guide" title="Getting Started with Iceberg Topics for Apache Kafka&#174;: A Beginner's Guide" srcset="https://substackcdn.com/image/fetch/$s_!2q5c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 424w, https://substackcdn.com/image/fetch/$s_!2q5c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 848w, https://substackcdn.com/image/fetch/$s_!2q5c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 1272w, https://substackcdn.com/image/fetch/$s_!2q5c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd69152a-6f12-4598-a521-5e9bc9ba1c01_5259x3181.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://aiven.io/blog/getting-started-with-iceberg-topics-for-apache-kafkar-a-beginners-guide">credit</a></figcaption></figure></div><p>Aiven&#8217;s solution adds native Iceberg support into Apache Kafka itself via a custom implementation of Kafka&#8217;s pre-existing tiered storage mechanism. This splits Kafka data into two sections, a &#8220;hotset&#8221; stored in Kafka format to handle the most recent data and serve most Kafka clients and a &#8220;coldset&#8221; of older data that is transferred to object storage and persisted as Iceberg. This architecture is zerocopy because these two sets are interoperable, with the &#8220;coldset&#8221; able to be recalled to Kafka duty as required. This simply takes Kafka&#8217;s existing tiered storage flows (Kafka &lt;--&gt; S3) and changes the format (Kafka &lt;--&gt; Iceberg).</p><p>Unfortunately the reliance on tiered storage introduces some lag into the Iceberg path. Tiered storage was designed to facilitate the cheap storage of rarely utilised data and so typically allows a configurable but usually significant amount (&gt;24hrs worth) of data to accumulate in the &#8220;hotset&#8221; before it is tiered. The time taken for this accumulation is also the time behind real-time that the Iceberg data will be.</p><p>By design, Iceberg Topics establishes a boundary around what is Kafka&#8217;s responsibility and what is Iceberg&#8217;s. Unfortunately this means that compaction, snapshot expiration and other Iceberg maintenance operations are beyond the scope of the plugin and will need to be added externally.</p><p>Tiered storage is one of the few pluggable areas of the Apache Kafka project and Aiven have added Iceberg features by extending their industry standard open source tiered storage plugin to include Iceberg support. If you&#8217;re already using Aiven tiered storage, adopting Iceberg is just a simple update and config switch.</p><p>Aiven&#8217;s Iceberg Topics plugin elegantly leverages Kafka&#8217;s tiered storage to unify streaming and analytical use cases without duplicating data, maintaining zero-copy interoperability between Kafka and Iceberg.</p><h3><strong>8. StreamNative Ursa</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eVL1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eVL1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 424w, https://substackcdn.com/image/fetch/$s_!eVL1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 848w, https://substackcdn.com/image/fetch/$s_!eVL1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!eVL1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eVL1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png" width="1456" height="1057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1057,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;StreamNative Perspective: Connecting Real-Time Streaming with Data Catalogs  for AI&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="StreamNative Perspective: Connecting Real-Time Streaming with Data Catalogs  for AI" title="StreamNative Perspective: Connecting Real-Time Streaming with Data Catalogs  for AI" srcset="https://substackcdn.com/image/fetch/$s_!eVL1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 424w, https://substackcdn.com/image/fetch/$s_!eVL1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 848w, https://substackcdn.com/image/fetch/$s_!eVL1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!eVL1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddaa17cb-9781-4725-b442-218e4942633d_1510x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://streamnative.io/blog/streamnative-perspective-connecting-real-time-streaming-with-data-catalogs-for-ai">credit</a></figcaption></figure></div><p>Ursa is a &#8220;lakehouse-native&#8221; streaming engine from StreamNative, which provides Kafka-API compatibility but writes data to open table formats (Iceberg, Delta Lake) on object storage. It uses a leaderless, stateless architecture taken from Apache Pulsar: rather than Kafka&#8217;s traditional brokers with leaders and replicas. This is an architecture similar to the recent diskless topics addition to the Kafka world but it has been in Pulsar from the beginning.</p><p>As a recently developed, Kafka compatible, engine Ursa is missing some features on the Kafka side, most notably topic compaction and transactions. Both of these are commonly used so could affect Ursa&#8217;s adoption rate.</p><p>Ursa includes stream storage + columnar storage layers: it writes a write-ahead log (WAL) for fast ingestion and then commits data into Parquet files (archive files) for analytic reads. It integrates with the usual bunch of external catalogs and some more specialised ones (e.g. S3 tables).</p><p>The use of a WAL introduces a similar lag problem to Aiven&#8217;s solution and, with similar boundary definitions, Ursa will not automatically handle compaction and snapshot expiration internally.</p><p>Also in a similar vein to Aiven&#8217;s solution, Ursa supports the use of the Iceberg layer to serve Kafka tasks (a true zerocopy). It even has a clever indexing system that spans both WAL and Parquet data to increase the efficiency of this.</p><p>Ursa combines streaming and lakehouse concepts through a Pulsar-inspired, stateless design that writes data directly into open table formats without traditional brokers. Its dual-layer storage model enables zero-copy interoperability while maintaining strong performance for both streaming and batch workloads.</p><h3><strong>9. Streambased</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lH35!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lH35!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 424w, https://substackcdn.com/image/fetch/$s_!lH35!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 848w, https://substackcdn.com/image/fetch/$s_!lH35!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!lH35!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lH35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png" width="1456" height="839" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:839,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:&quot;Blank diagram (1).png&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="Blank diagram (1).png" srcset="https://substackcdn.com/image/fetch/$s_!lH35!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 424w, https://substackcdn.com/image/fetch/$s_!lH35!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 848w, https://substackcdn.com/image/fetch/$s_!lH35!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!lH35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ed3015-0e31-4d1f-bbbe-54f718d9cf8e_2048x1180.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Streambased takes a completely original approach to this problem. Like tiered storage, it maintains the concepts of &#8220;hotset&#8221; stored in Kafka and a &#8220;coldset&#8221; stored in Iceberg. Where it differs however is that Streambased surfaces the hotset via protocol translation without requiring any data movement. The flow is such that, at query time, the latest hotset data is fetched from Kafka, transformed to Iceberg and combined with the coldset (already Iceberg) before being surfaced to the client. In this way clients see the entire dataset available to them from the latest data produced a millisecond ago to the oldest data stored in Iceberg since the beginning of time.</p><p>The big advantage of this approach is that there is, by definition, zero lag. Because the hotset data is fetched on demand at query time and not surfaced via an asynchronous background process, Streambased can guarantee that every Iceberg query is working with the very latest data available to it.</p><p>Such an approach means differences in resource usage too. Streambased decouples the pure Kafka load (produce) from the load incurred converting from Kafka to Iceberg. Other solutions mentioned here run background processes to do this conversion meaning that, when Kafka load spikes, so does conversion load. With Streambased the conversion load is coupled to the Iceberg query load not the Kafka write load, isolating the 2 usages.</p><p>Like Aiven, Bufstream and Ursa. Streambased can also serve Kafka data from Iceberg. Streambased K.S.I. (Kafka Service for Iceberg), is a Kafka proxy that federates incoming fetch requests between the hotset in Kafka and the coldset in Iceberg. Data from either is surfaced as appropriate to Kafka clients.</p><p>In deployment, Streambased is also a significant departure from the other solutions in this article. Streambased does not seek to replace or modify an existing Kafka instance, instead it sits as a layer above requiring only the public consumer and admin protocols to function. This means Streambased can be deployed rapidly on top of managed Kafka like Confluent or Bufstream and on prem open source distributions.</p><p>Streambased&#8217;s architecture redefines zero-copy by translating Kafka data into Iceberg format on demand, merging data at query time for truly real-time analytics. This eliminates latency entirely.</p><h2>Solution Summary</h2><p>As can be seen from this overview, all solutions in this space have the required building blocks to create an effective Kafka to Iceberg solution. The decision as to which technology to adopt will likely be driven by external factors rather than any particular feature. For this reason I&#8217;ve outlined which solution you may choose given some of these factors below:</p><p><strong>For open-source, DIY control</strong> &#8594; Kafka Connect Sink is flexible but ops-heavy. Aiven and AutoMQ provide open source versions for customization.</p><p><strong>For vendor simplicity</strong> &#8594; Redpanda Iceberg Topics and Confluent TableFlow offer simple enterprise style integration if you can accept the vendor lock-in.</p><p><strong>For maximum data freshness</strong> &#8594; Streambased&#8217;s query-time translation guarantees zero lag between Kafka and Iceberg, ideal for real-time analytics  and observability workloads.</p><p><strong>For maximum compatibility</strong> &#8594; Streambased requires only the Kafka wire protocol so can function with any number of existing Kafka deployments.</p><p><strong>For schema-first organisations</strong> &#8594; Bufstream&#8217;s schema-aware design ensures consistency and easier evolution across streaming and lakehouse layers.</p><p><strong>For cost-efficient scalability</strong> &#8594; Ursa, Bufstream and AutoMQ&#8217;s stateless brokers and shared object storage reduce infrastructure costs while maintaining Kafka-compatible performance at scale.</p><h2>Conclusion</h2><p>The convergence of Kafka and Apache Iceberg represents one of the most exciting frontiers in modern data infrastructure. While each technology was born to solve fundamentally different problems, the demand for real-time decision-making has forced these worlds together. The appetite for a unified data view is strong as evidenced by the widespread support that has come so quickly after Iceberg won the table format wars.</p><p>The industry&#8217;s range of approaches, from traditional copy-based pipelines to innovative zero-copy architectures, underscores that there is no single &#8220;right&#8221; solution, only a spectrum of tradeoffs between cost, complexity, freshness, and control. Copy-based systems like Kafka Connect and TableFlow deliver predictable, well-understood architectures but introduce lag and duplication, while zero-copy systems like Streambased, Bufstream, and Ursa push the limits of what&#8217;s possible with shared storage and dynamic federation.</p><p>In this evolving landscape, the unification of streaming and analytical data systems is no longer a distant vision, it&#8217;s quickly becoming the default expectation for modern data infrastructure.</p><p>Please find a summary table below of all solutions side by side:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WTAZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png" data-component-name="Image2ToDOM"><div class="image2-inset image2-full-screen"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WTAZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 424w, https://substackcdn.com/image/fetch/$s_!WTAZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 848w, https://substackcdn.com/image/fetch/$s_!WTAZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 1272w, https://substackcdn.com/image/fetch/$s_!WTAZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WTAZ!,w_5760,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;full&quot;,&quot;height&quot;:924,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:775513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/176216257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-fullscreen" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WTAZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 424w, https://substackcdn.com/image/fetch/$s_!WTAZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 848w, https://substackcdn.com/image/fetch/$s_!WTAZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 1272w, https://substackcdn.com/image/fetch/$s_!WTAZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03754d23-4ee4-4d22-b672-7ad0b6979cd1_3240x2056.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading ZeroCopy! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Why Kafka + Iceberg Will Define the Next Decade of Data Infrastructure]]></title><description><![CDATA[How two open standards can replace pipeline sprawl with a unified, low-cost, and future-proof data architecture.]]></description><link>https://blog.streambased.io/p/why-kafka-iceberg-will-define-the</link><guid isPermaLink="false">https://blog.streambased.io/p/why-kafka-iceberg-will-define-the</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Wed, 27 Aug 2025 13:35:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2bf46bf8-77ac-41e4-bcdd-09b1556bad19_4000x2266.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today&#8217;s data architectures are complex, but does it have to be this way? In this post I&#8217;m going to show how we can replace pipeline sprawl <strong>with 2 open standards</strong> to reduce cost, complexity and fragility to near zero.</p><p>Architectures grow organically, technologies and processes are chosen to suit a specific first case and, if successful, expand. The problem is that this is happening (and has been happening for the last 20yrs) concurrently across orgs across the entire business. The result: many disparate technologies connected by brittle pipelines.</p><p>For example, a simple retail company may have:</p><ul><li><p>Started with a MySQL database for its point-of-sale system</p></li><li><p>Later added a SaaS CRM to track customer interactions</p></li><li><p>Then adopted a cloud data warehouse to support online analytics.</p></li><li><p>Then marketing spun up its own data pipeline into a separate BI tool</p></li><li><p>SREs added metrics to one system and logs to another</p></li><li><p>The CTO insisted on adopting the next big thing</p></li><li><p>The data science team set up their own Jupyter cluster with custom ETL jobs pulling from production databases</p></li><li><p>Shadow IT introduced spreadsheets and Google Sheets integrations as "glue" for missing workflows</p></li><li><p>[these are all things that I have experienced, add your own in the comments ;-) ]</p></li></ul><p>Meanwhile:</p><ul><li><p>Operations engineers maintain bash scripts that shuttle CSVs between systems</p></li><li><p>The warehouse has nightly jobs failing silently</p></li><li><p>Marketing dashboards rely on stale data</p></li><li><p>Customer service can&#8217;t reconcile records between CRM and order history</p></li><li><p>Data schemas become rigid and stale because of the impact of changes on downstream systems.</p></li><li><p>&#8230;</p><p></p></li></ul><p>This story has been played out time and time again and always has the same conclusion. Data infrastructure eventually buckles under the weight of scale, regulation, and business complexity.</p><p><strong>Standardization</strong> is one of the key levers for regaining control when architectures have grown in an organic, fragmented way. Consolidating around standards breaks down silos, reduces complexity and increases scalability.</p><p>This is not a novel idea, most cutting edge operational teams have already standardized around Kafka and similar leading analytical teams have already standardized around Iceberg.</p><p>Clear but isolated standards only solve half the problem however, leaving the fundamental disconnect between operational and analytical systems intact.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ivo9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ivo9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 424w, https://substackcdn.com/image/fetch/$s_!ivo9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 848w, https://substackcdn.com/image/fetch/$s_!ivo9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 1272w, https://substackcdn.com/image/fetch/$s_!ivo9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ivo9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png" width="1456" height="617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ivo9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 424w, https://substackcdn.com/image/fetch/$s_!ivo9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 848w, https://substackcdn.com/image/fetch/$s_!ivo9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 1272w, https://substackcdn.com/image/fetch/$s_!ivo9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55434c00-8ff4-43c3-af85-77e3aee8a546_1600x678.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The combination of Kafka and Iceberg, on the other hand, is incredibly powerful because it finally resolves the decades-long tension between real-time operations and durable analytics. Used separately they still leave us with duplicated datasets, costly ETL jobs, schema mismatches, and latency compromises. Together, they create a unified architecture where streaming and batch pull together rather than against each other. An architecture much more powerful than the sum of its parts.</p><p>Let&#8217;s explore an architecture based exclusively on Kafka and Iceberg with the aim of achieving the gold standard in both operational and analytical arenas. Including:</p><ul><li><p><strong>Cheap long term retention</strong> - Store data for as long as it&#8217;s required without having to worry about mounting storage costs and maintenance overheads</p></li><li><p><strong>A single source of truth</strong> - Achieve total consistency across the entire data estate</p></li><li><p><strong>&lt;100ms access</strong> - Clients on both sides of the operational/analytical divide should work with the latest data available</p></li><li><p><strong>Easy evolution</strong> - Data should be able to evolve with the business, adopting volume change, schema evolutions and changing business needs at will.</p></li></ul><h2><strong>The new (old) architecture</strong></h2><p>There is an architecture that fulfils these requirements and it&#8217;s actually a well understood, production proven one: <strong>The lambda architecture</strong>.</p><p>For many (including me) the Lambda architecture (proposed by Nathan Marz in 2011) was a first introduction to real-time data, stream processing and concepts beyond the high volume, batch based infrastructures that we all grew up on.</p><p>As a quick recap it consists of a batch layer that provides a long term (months/years), high lag (hours/days) view that is topped up with a speed layer that provides short term (&lt;7 days) low lag (sub-second) data. A complete picture of the data can be built up by combining data from one or more of these layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aiha!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aiha!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 424w, https://substackcdn.com/image/fetch/$s_!aiha!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 848w, https://substackcdn.com/image/fetch/$s_!aiha!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 1272w, https://substackcdn.com/image/fetch/$s_!aiha!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aiha!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png" width="1456" height="646" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aiha!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 424w, https://substackcdn.com/image/fetch/$s_!aiha!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 848w, https://substackcdn.com/image/fetch/$s_!aiha!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 1272w, https://substackcdn.com/image/fetch/$s_!aiha!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7050cf3-60b1-49cf-97bc-03377c006901_1600x710.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The issue with this architecture is that Speed and Batch layers typically have vastly different access patterns and concepts. This makes Mixed Applications (where the most power lies) complex to develop, manage and evolve. For example, Speed Layer applications must routinely address situations like late arriving, out of order data etc., concepts that simply aren&#8217;t present in the Batch layer.</p><p>The fundamental problem here is that batch and speed speak very different languages, but what if they didn&#8217;t? What if they both spoke Iceberg?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qjp0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qjp0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 424w, https://substackcdn.com/image/fetch/$s_!Qjp0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 848w, https://substackcdn.com/image/fetch/$s_!Qjp0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 1272w, https://substackcdn.com/image/fetch/$s_!Qjp0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qjp0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png" width="1456" height="588" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qjp0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 424w, https://substackcdn.com/image/fetch/$s_!Qjp0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 848w, https://substackcdn.com/image/fetch/$s_!Qjp0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 1272w, https://substackcdn.com/image/fetch/$s_!Qjp0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3aa8b2-2481-4526-bf67-02bd9c588b62_1600x646.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Suddenly both our batch and speed layers speak Iceberg and unifying both data flows becomes a trivial problem</strong></p><p>In our new architecture both speed layer and batch layers are represented as Iceberg tables and the combination of them both can be achieved with a simple union:</p><pre><code>SELECT *
FROM
  speed_layer
UNION ALL
  batch_layer</code></pre><p>Traditionally this union was done by a complex client system (Apache Beam anyone?) but recently speed critical OLAP databases like <a href="https://startree.ai/resources/low-latency-serving-on-iceberg-with-apache-pinot-in-startree-cloud">Apache Pinot</a> are filling this role. However having everything in Iceberg means no client system commitment must be made and any processing engine supporting Iceberg (Spark/Dremio/Trino etc etc..) can be used.</p><p>So how do we implement this?</p><h2><strong>#NoETL</strong></h2><p>The traditional way to solve this problem was to have an ETL process (Kafka Connect is a great one) that sits between Kafka and Iceberg to copy the data across into a second, analytical dataset. Unfortunately this approach involves compromising on the <strong>Single Source of Truth</strong> and <strong>&lt;100ms</strong> goals:</p><ul><li><p><strong>Copies of the data</strong> - Any ETL process copies data from the operational layer (Kafka) to the analytical layer (Iceberg). Since these layers evolve independently, inconsistencies can arise. For example, if Kafka records are accidentally duplicated in Iceberg, can Iceberg still be trusted? Or if Iceberg enforces business rules (e.g., amount &#8805; 0) that Kafka doesn&#8217;t, can Kafka still be trusted?</p></li><li><p><strong>High latency</strong> - Copy-based approaches also add delay. To store data efficiently in Iceberg, records must be batched into larger chunks. This batching often takes minutes to hours (typically ~15 minutes), during which the data is unavailable in Iceberg. For more detail see my LinkedIn post on this <a href="https://www.linkedin.com/posts/tom-scott-82718114_whats-stopping-you-from-using-iceberg-for-activity-7363556808231133185-pl8E">here</a></p></li></ul><p>Not only that, but it forces you to contend with some troubling side effects:</p><ol><li><p><strong>High maintenance</strong> - ETL generally writes small and inefficient packets of data to Iceberg continuously and incur the maintenance overhead that comes along with this is large. (I&#8217;ve written previously on this <a href="https://streambased.substack.com/p/kafka-iceberg-hurts-the-hidden-cost">here</a>). Not only that but the operational and analytical realms often have different concepts making governance processes like schema evolution challenging.</p></li><li><p><strong>High storage costs</strong> - ETL requires keeping two full copies of the data: one in Kafka for operations, and another in Iceberg for analytics. This duplication inflates storage bills dramatically when volumes are high.</p></li></ol><h2><strong>It has to be a view</strong></h2><p>Surfacing the Speed Layer as Iceberg is not as simple as it seems. This process must retain the ultra low latency access expected from the speed layer but Iceberg is not optimised for this at all.</p><p>ETL processes can not achieve this, instead we can rely on a trusted mainstay of database systems: Logical views.</p><p>Logical views involve result sets computed at query time rather than pre-computed and so are guaranteed to work on the latest available data and achieve the latency goals we are looking for. What&#8217;s more, they needn&#8217;t be any more complex than their ETL counterparts, it&#8217;s the same processing taking place just executed at a different point.</p><p>This creates an Iceberg-like layer over Kafka that can be accessed by processors in the same way as any other Iceberg store. When an Iceberg query is executed, the view calculates the data from Kafka required to satisfy the query, fetches it, transforms it to an Iceberg format and returns it. This all happens on demand, with the latest available data from Kafka. For more information see my recent LinkedIn post on this <a href="https://www.linkedin.com/posts/tom-scott-82718114_kafka-sink-connectors-are-just-etl-pipelines-activity-7361716993562972160-Vp60">here</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pjiR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pjiR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 424w, https://substackcdn.com/image/fetch/$s_!pjiR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 848w, https://substackcdn.com/image/fetch/$s_!pjiR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 1272w, https://substackcdn.com/image/fetch/$s_!pjiR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pjiR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png" width="1456" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pjiR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 424w, https://substackcdn.com/image/fetch/$s_!pjiR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 848w, https://substackcdn.com/image/fetch/$s_!pjiR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 1272w, https://substackcdn.com/image/fetch/$s_!pjiR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9ca6af-b927-4451-b290-95be69e40389_1600x740.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This comes with a number of advantages over ETL:</p><ul><li><p><strong>Always up to date</strong> - In a logical view data is fetched at query time, giving access to the complete latest-millsecond available data. Connectors/materialized views involve copies of the data that may lag behind the source.</p></li><li><p><strong>Evolves with the source data</strong> - a logical view is guaranteed to be consistent with the source data meaning that any changes in the source data (schema evolutions, repartitioning etc.) are immediately reflected in the view. Other approaches require an expensive and time consuming replay of older data before evolutions are reflected. I&#8217;ve written on this previously <a href="https://www.linkedin.com/posts/tom-scott-82718114_apacheiceberg-queryoptimization-dataengineering-activity-7355945244531519488-UsBG">here</a></p></li><li><p><strong>Zero Copy</strong> - only a single copy of the data need be stored in Kafka, no separate Iceberg copies.</p></li></ul><h2><strong>Putting it all together</strong></h2><p>The final step towards our complete architecture is to create a path to transition data from speed layer to batch layer. As both layers are represented by Iceberg tables in this architecture this can be as simple as an insert statement:</p><pre><code>INSERT INTO
  batch_layer
SELECT *
FROM
  speed_layer</code></pre><p>Care must be taken to balance the sizes of Batch and Speed layers to maintain efficiency. However, as the transition process is just another Iceberg operation, it can be invoked at any point and there is no down time incurred. The transition process is as follows:</p><ol><li><p>Initially the majority of data exists in the batch layer with only a small &#8220;head&#8221; held in the speed layer:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4kOX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4kOX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 424w, https://substackcdn.com/image/fetch/$s_!4kOX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 848w, https://substackcdn.com/image/fetch/$s_!4kOX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 1272w, https://substackcdn.com/image/fetch/$s_!4kOX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4kOX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png" width="1428" height="517" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:517,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24320,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/172081226?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4kOX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 424w, https://substackcdn.com/image/fetch/$s_!4kOX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 848w, https://substackcdn.com/image/fetch/$s_!4kOX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 1272w, https://substackcdn.com/image/fetch/$s_!4kOX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c5c40a9-d477-413f-8bad-1257975d437a_1428x517.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ol><ol start="2"><li><p>New data is written to the speed layer only, increasing the share of data it is responsible for serving:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zWof!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zWof!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 424w, https://substackcdn.com/image/fetch/$s_!zWof!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 848w, https://substackcdn.com/image/fetch/$s_!zWof!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 1272w, https://substackcdn.com/image/fetch/$s_!zWof!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zWof!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png" width="1428" height="520" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:520,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24574,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/172081226?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zWof!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 424w, https://substackcdn.com/image/fetch/$s_!zWof!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 848w, https://substackcdn.com/image/fetch/$s_!zWof!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 1272w, https://substackcdn.com/image/fetch/$s_!zWof!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc805e5d-4b32-4ae1-85a0-fd1e145aa780_1428x520.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ol><ol start="3"><li><p>When a threshold is reached (usually driven by Kafka topic retention) a move process is triggered to move data from speed layer to batch:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2z5A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2z5A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 424w, https://substackcdn.com/image/fetch/$s_!2z5A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 848w, https://substackcdn.com/image/fetch/$s_!2z5A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 1272w, https://substackcdn.com/image/fetch/$s_!2z5A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2z5A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png" width="1428" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34097,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/172081226?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2z5A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 424w, https://substackcdn.com/image/fetch/$s_!2z5A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 848w, https://substackcdn.com/image/fetch/$s_!2z5A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 1272w, https://substackcdn.com/image/fetch/$s_!2z5A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0eb24c-8998-4b14-99bf-cb60918a099d_1428x578.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Note that this transition process involves moving a large amount of speed_layer data efficiently in large optimized chunks, thus avoiding the small files (compaction) and snapshot expiration issues usually associated with streaming to Iceberg</p></li><li><p>With this process complete we return to the initial state with the batch layer serving the majority of data and repeat the process again and again:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gDvo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gDvo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 424w, https://substackcdn.com/image/fetch/$s_!gDvo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 848w, https://substackcdn.com/image/fetch/$s_!gDvo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 1272w, https://substackcdn.com/image/fetch/$s_!gDvo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gDvo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png" width="1428" height="517" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:517,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24320,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.streambased.io/i/172081226?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gDvo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 424w, https://substackcdn.com/image/fetch/$s_!gDvo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 848w, https://substackcdn.com/image/fetch/$s_!gDvo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 1272w, https://substackcdn.com/image/fetch/$s_!gDvo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9033b5ba-19b3-41ca-9f25-50e5acea7341_1428x517.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li></ol><h2><strong>Conclusion</strong></h2><p>By unifying Kafka and Iceberg through a logical, view based, architecture we can finally dissolve the long-standing divide between operational and analytical systems. Instead of maintaining parallel infrastructures, duplicating data, and fighting constant trade-offs in latency and consistency, organizations can achieve a single, coherent data layer that is both real-time and durable.</p><p>This model not only simplifies architectures and reduces cost, but also empowers teams to build richer, mixed applications on top of a unified source of truth. In many ways, it delivers on the original promise of the Lambda architecture: speed and batch working together but with the elegance and efficiency of a single language: Iceberg.</p><p>We&#8217;ve spent the past 2yrs building this architecture at Streambased. Want to try it out? Drop me a DM:</p><div class="directMessage button" data-attrs="{&quot;userId&quot;:196116110,&quot;userName&quot;:&quot;Tom Scott&quot;,&quot;canDm&quot;:null,&quot;dmUpgradeOptions&quot;:null,&quot;isEditorNode&quot;:true}" data-component-name="DirectMessageToDOM"></div><p></p>]]></content:encoded></item><item><title><![CDATA[Kafka -> Iceberg Hurts: The Hidden Cost of Table Format Victory]]></title><description><![CDATA[Iceberg won the table format wars! And with victory come the spoils]]></description><link>https://blog.streambased.io/p/kafka-iceberg-hurts-the-hidden-cost</link><guid isPermaLink="false">https://blog.streambased.io/p/kafka-iceberg-hurts-the-hidden-cost</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Tue, 05 Aug 2025 12:46:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/43dc3a17-8523-49d7-850d-6d5c6b4bf3af_1281x723.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Iceberg won the table format wars! And with victory come the spoils. The question on everyone&#8217;s lips now is: <em><strong>&#8220;How do we get our Kafka data into Iceberg?&#8221;</strong></em>.</p><p>Answers have ranged from a super simple Kafka connector to multi-stage, Flink-based pipelines but that&#8217;s not what this post is about.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading ZeroCopy! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Instead, we&#8217;re going to address the elephant in the room and ask the real burning question: <em><strong>&#8220;What happens when your Kafka data gets there?&#8221;</strong></em>.</p><p>If you&#8217;re a mature Iceberg user, chances are you are butting up against some fundamental Iceberg challenges that are only exacerbated by streaming data sources.</p><p>In this post we&#8217;re going to look at 3 of the most common and explore a novel approach to address them.</p><ol><li><p><strong>Snapshot Expiration</strong></p></li></ol><p>Time travel is a flagship feature of Apache Iceberg that allows queries to be executed against a table state at any point in the table&#8217;s history. It&#8217;s powered snapshots, a &#8220;snapshot&#8221;: a unique piece of metadata that points to all the data in the table at that time.</p><p>Imagine a situation where I insert 2 rows and then 30 seconds later insert a further 2. This action created <strong>2 snapshots, </strong>the first pointing to a table with 2 rows and the second pointing to a table with 4.</p><p>Each snapshot is given a unique id, allowing you to time travel back to the table state at the snapshot:</p><pre><code>SELECT * FROM someTable FOR VERSION AS OF 2188465307835585443 -&gt; 2 rows
SELECT * FROM someTable FOR VERSION AS OF 4583289324735846932 -&gt; 4 rows</code></pre><p>Imagine, for instance, that you wrote every message from an event stream individually to an Iceberg table via an &#8220;INSERT INTO&#8221; statement (a really bad idea). By default the number of snapshots would be equal to the number of messages (potentially millions per second)!</p><p>Thankfully snapshots are metadata only, so the cost of creating one is not huge, however frequent writes etc. create a lot of snapshot and hence a lot of metadata that must be managed. If snapshots are allowed to grow uncontrolled you can suffer from some serious side effects:</p><ul><li><p><strong>Delayed query planning</strong> - Processing metadata is the first stage in any Iceberg query, the more metadata, the longer this process takes.</p></li><li><p><strong>Slower rollbacks</strong> - Iceberg rollbacks must traverse the snapshot list to find the correct snapshot to roll back to. The larger this list the slower this operation. Performance can be further hit if orphaned snapshot metadata is pruned during the rollback.</p></li><li><p><strong>Degraded maintenance operations</strong> (e.g. compaction)</p></li><li><p><strong>Increased resource consumption</strong></p></li></ul><p>The solution to this problem is Snapshot Expiration, a manually triggered Iceberg process where older snapshots are removed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jwxF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jwxF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 424w, https://substackcdn.com/image/fetch/$s_!jwxF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 848w, https://substackcdn.com/image/fetch/$s_!jwxF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 1272w, https://substackcdn.com/image/fetch/$s_!jwxF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jwxF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png" width="1456" height="751" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:751,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jwxF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 424w, https://substackcdn.com/image/fetch/$s_!jwxF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 848w, https://substackcdn.com/image/fetch/$s_!jwxF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 1272w, https://substackcdn.com/image/fetch/$s_!jwxF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba2a1ee6-ac38-4771-abe4-5e30c01d7878_1600x825.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Remember however, that <em>Iceberg is only a format so cannot do the expiration work by itself.</em></p><p>Instead it must rely on an external engine like Spark or Trino (or any number of other vendors) to perform it. Furthermore this is not an automated background process you can switch on and forget about, it&#8217;s a repeated batch job that must be scheduled and managed separately from read/write jobs.</p><p>Expiring a Snapshot can create its own problems! Once expired, Iceberg loses the ability to time travel to that snapshot so any queries run at that snapshot will fail.</p><p>In other words, your Iceberg administrator has to carefully balance metadata size against time travel capability manually in a never ending compromise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nusz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nusz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 424w, https://substackcdn.com/image/fetch/$s_!Nusz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 848w, https://substackcdn.com/image/fetch/$s_!Nusz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!Nusz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nusz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png" width="1456" height="1174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1174,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nusz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 424w, https://substackcdn.com/image/fetch/$s_!Nusz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 848w, https://substackcdn.com/image/fetch/$s_!Nusz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!Nusz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F094e4e2a-b567-4372-abdd-58fb22eff886_1600x1290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol start="2"><li><p><strong>Compaction</strong></p></li></ol><p>Snapshotting is not the only feature that suffers damage in the presence of small inserts in Iceberg. Such write patterns can also create many small files that wreck your general Iceberg query performance.</p><p>Standard Iceberg behavior dictates that <strong>every completed insert statement creates at least one new file</strong>.</p><p>Returning to our previous scenario where 2 rows are inserted and then 30 seconds later a further 2, would create <strong>2 new Parquet files</strong> to store their respective data in.</p><p>Technically we still live in the world of Big Data and a file containing 2 rows is very small. The last thing you want is a table made up of millions of tiny files that:</p><ul><li><p><strong>Reduce read performance</strong> - To scan the data in the table every file must be opened, read and closed. This can increase query times by a missive amount.</p></li><li><p><strong>Increase storage requirements</strong> - Files typically contain extra metadata, repeating this many times across many small files results in bloated inefficient use of disk space.</p></li><li><p><strong>Worsen compression</strong></p></li><li><p><strong>Increase costs</strong> via more separate access requests - If your filesystem incurs extra costs for open/close operations (such as Amazon S3) then more files means more of these costs.</p></li><li><p><strong>Complicate parallelization</strong> - Small files mean processing engines must make intelligent decisions around the way in which they combine files for parallelism. It is harder to define the optimum use of resources with this restriction.</p></li></ul><p>The solution is to accumulate data before writing to Iceberg in bugger chunks but this can incur significant delay. To put this in perspective, the Parquet project (Iceberg&#8217;s most popular file format), recommends file sizes between 128Mb and 1Gb. Kafka messages are typically around 1Kb - 10Kb. Imagine a Kafka topic with 20 partitions receiving 5Mb/s. This would map to 250Kb/s per file (1 file per partition) and take <em><strong>66 minutes</strong></em> to fill a 1Gb Parquet file. Not very real time, is it?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NlKq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NlKq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 424w, https://substackcdn.com/image/fetch/$s_!NlKq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 848w, https://substackcdn.com/image/fetch/$s_!NlKq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 1272w, https://substackcdn.com/image/fetch/$s_!NlKq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NlKq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png" width="1456" height="1258" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1258,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NlKq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 424w, https://substackcdn.com/image/fetch/$s_!NlKq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 848w, https://substackcdn.com/image/fetch/$s_!NlKq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 1272w, https://substackcdn.com/image/fetch/$s_!NlKq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9515d22a-e103-4c44-a0dc-36e62cd4dd73_1600x1382.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To counter this, streaming pipelines typically write inefficient small files into Iceberg first and combine them into larger files later. A process called Compaction. This is an essential housekeeping process and is manually triggered and monitored by your Iceberg administrator who must determine and orchestrate the data accumulation required to achieve freshness/efficiency balance required.</p><p>But again! <em>Remember that Iceberg is only a format so cannot do the work by itself. &#128161;</em></p><p>Like snapshot expiration, It must rely on an external engine to perform compaction and this is not an automated background process.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NLDe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NLDe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 424w, https://substackcdn.com/image/fetch/$s_!NLDe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 848w, https://substackcdn.com/image/fetch/$s_!NLDe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 1272w, https://substackcdn.com/image/fetch/$s_!NLDe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NLDe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png" width="1456" height="1110" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1110,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NLDe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 424w, https://substackcdn.com/image/fetch/$s_!NLDe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 848w, https://substackcdn.com/image/fetch/$s_!NLDe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 1272w, https://substackcdn.com/image/fetch/$s_!NLDe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab59a698-0182-4d3e-85a5-06a565224fb9_1600x1220.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Compaction determines the balance between <strong>freshness, performance, and cost</strong> and must be constantly maintained. Administrators should be constantly determining what the optimal file size should be and scheduling compaction when resources, a non trivial overhead.</p><ol start="3"><li><p><strong>Partition evolution</strong></p></li></ol><p>Partitioning is the fastest way to increase your Iceberg query performance by 100x or more.</p><p>It involves the organisation of data into subsets based on one or more column values. Query engines can take advantage of this organisation to only read the subsets relevant to the queries being executed.</p><p>For instance, given a table that is partitioned by timestamp and the query:</p><pre><code>SELECT * FROM table WHERE timestamp = '2025-07-01';</code></pre><p>The query engine can &#8220;prune&#8221; away any other timestamps, greatly reducing the amount of data read to complete the query (and making it go a lot faster!).</p><p>As you can tell above, partitioning needs to be matched to the query to work effectively. If this is not the case it can actually reduce performance and cause serious problems including:</p><ul><li><p><strong>Reduced parallelism</strong> - Processing engines can parallelize by partition, reducing partitioning can reduce the maximum parallelization possible.</p></li><li><p><strong>Under-utilization</strong> of system resources</p></li><li><p><strong>Poor join performance</strong> - Joins can often be optimized by working with partitions rather than the full table. With poor partitioning this is not possible.</p></li></ul><p>But what happens when you query patterns change? To maintain performance, partitioning must change along with them.</p><p>Iceberg supports &#8220;Partition Evolution&#8221; for this. Using a simple set of commands you can change the partitioned columns of your table to better match your query profiles.</p><p>Like most Iceberg maintenance operations this is a metadata only operation and so does not require the underlying data to be rewritten&#8230; yet.</p><p>Partition evolution applies only on data written after the partition spec has changed. Any earlier data still has the earlier spec applied and, whilst still available to queries, will not benefit from the increased performance promised by evolution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nIbN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nIbN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 424w, https://substackcdn.com/image/fetch/$s_!nIbN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 848w, https://substackcdn.com/image/fetch/$s_!nIbN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 1272w, https://substackcdn.com/image/fetch/$s_!nIbN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nIbN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png" width="1456" height="680" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:680,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nIbN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 424w, https://substackcdn.com/image/fetch/$s_!nIbN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 848w, https://substackcdn.com/image/fetch/$s_!nIbN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 1272w, https://substackcdn.com/image/fetch/$s_!nIbN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a91a439-4902-421e-b90e-e67d6c721a0b_1600x747.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The solution to this problem is to rewrite old data to match the latest spec.</p><p>Once more however, <em>remember that Iceberg is only a format so cannot do the work by itself. &#128161;</em></p><p>It must rely on an external engine to perform the required data rewrite. But this is a resource intensive process that can take a long time (think hours) to perform.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6ucG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6ucG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 424w, https://substackcdn.com/image/fetch/$s_!6ucG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 848w, https://substackcdn.com/image/fetch/$s_!6ucG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 1272w, https://substackcdn.com/image/fetch/$s_!6ucG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6ucG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png" width="1456" height="1014" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1014,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6ucG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 424w, https://substackcdn.com/image/fetch/$s_!6ucG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 848w, https://substackcdn.com/image/fetch/$s_!6ucG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 1272w, https://substackcdn.com/image/fetch/$s_!6ucG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc636c4ae-aa42-4099-8926-ac125137d446_1600x1114.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Iceberg administrators must pay attention to partitioning. At first this appears simple but, as volumes increase and query patterns change it can become complex, resource intensive and unpredictable.</p><p><strong>Streambased to the Rescue</strong></p><p>Up until now we&#8217;ve laid out a small selection of the administrative issues that plague Iceberg tables. It&#8217;s these issues that prompted us at Streambased to create I.S.K. (Iceberg Service for Kafka), an altogether simpler approach to streaming data via Iceberg.</p><p>Streambased performs an end run around all of this by surfacing Iceberg data as a <strong>logical projection</strong> rather than doing any physical movement of data.</p><p>In the Streambased approach the data that backs your Iceberg table remains in Kafka and ephemeral metadata is created on top of it. To clients, it appears as a normal Iceberg table but, when queries are executed, the data is read from Kafka, mapped and presented in a format clients are expecting</p><p>No data movement ahead of query time means no small files, no snapshots and no uncomfortable partitioning decisions. Instead, slice your data the way you want to at the point at which it is read. The need for Compaction is completely negated as the source system (Kafka) is responsible for the data layout, not Iceberg.</p><p>To maintain performance, Streambased indexes the data in Kafka as it is written or read for operational purposes. These indexes allow Streambased to mimic Iceberg features such as:</p><ul><li><p><strong>Partitioning:</strong> Using indexes, Streambased can target sections of Kafka data that are relevant to particular queries creating a similar outcome to Iceberg&#8217;s tradition partitioning. The advantage of taking this approach is that indexing is not tied to the physical layout of files on disk and so require no rewrite and no delays.</p></li><li><p><strong>Snapshots</strong>: Snapshots in Streambased are also related to indexing. Kafka data is already indexed by timestamp and other insert markers so identifying the sections relevant to a particular time period is an easy task. This means Streambased can impose any number of snapshots on the data as required by the clients and that these imposed snapshots can not and need not ever be expired..</p></li></ul><p>Streambased repositions the source system (Kafka) as the source of truth for both operational and analytical applications, removing the need for intermediate data stores and the expensive maintenance operations that come along with them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EyLw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EyLw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 424w, https://substackcdn.com/image/fetch/$s_!EyLw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 848w, https://substackcdn.com/image/fetch/$s_!EyLw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!EyLw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EyLw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png" width="1456" height="1256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1256,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EyLw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 424w, https://substackcdn.com/image/fetch/$s_!EyLw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 848w, https://substackcdn.com/image/fetch/$s_!EyLw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!EyLw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca3e94f7-ccd3-4933-9a5f-1aaad820f921_1600x1380.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Conclusion</strong></p><p>Ultimately, as organizations continue to embrace streaming architectures and real-time analytics in their lakehouse, the operational complexities of managing Iceberg in its traditional form become increasingly apparent.</p><p>Streambased offers a refreshing rethink, one that aligns better with the dynamism of modern data flows by abstracting away the burdens of compaction, snapshot expiration, and partition evolution.</p><p>With Streambased, not only is your Kafka data instantly accessible by any engine that supports iceberg, but you also don&#8217;t have to deal with any of the complexities of managing it.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading ZeroCopy! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is ZeroCopy.]]></description><link>https://blog.streambased.io/p/coming-soon</link><guid isPermaLink="false">https://blog.streambased.io/p/coming-soon</guid><dc:creator><![CDATA[Tom Scott]]></dc:creator><pubDate>Mon, 07 Jul 2025 15:18:31 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a2d5acd6-6e4b-451c-9139-79ebf28ca693_1281x723.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is ZeroCopy.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.streambased.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.streambased.io/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>