There is an option to use Lakehouse Tiered Storage [1] - that will work with Pulsar API and Kafka API so it can include Transactions and Compaction with Kafka.
BTW compaction is possible with URSA clusters. [2]
This is a really comprehensve breakdown, thanks for putting this together. The zero copy vs copy trade offs you laid out really clarify why there isnt a one size fits all solution here. Im particularly intrigued by the Streambased approach of doing the conversion at query time rather than asyncronously, that seems like it could be a game changer for use cases where data freshness is critical. The comparison table at the end is super useful too, makes it easy to weigh the options based on your specific constraints. Great stuff!
Nice analysis.
There is just one addition from StreamNative.
There is an option to use Lakehouse Tiered Storage [1] - that will work with Pulsar API and Kafka API so it can include Transactions and Compaction with Kafka.
BTW compaction is possible with URSA clusters. [2]
[1] - https://streamnative.io/blog/ursa-everywhere-lakehouse-native-future-data-streaming#ursa-storage-extension-for-classic-pulsar-lakehouse-for-all-deployments
[2] - https://docs.streamnative.io/cloud/build/kafka-clients/compatibility/kafka-protocol-and-features#supported-topic-configs
no apache flink and apache spark that have direct support from apache iceberg ?
This is a really comprehensve breakdown, thanks for putting this together. The zero copy vs copy trade offs you laid out really clarify why there isnt a one size fits all solution here. Im particularly intrigued by the Streambased approach of doing the conversion at query time rather than asyncronously, that seems like it could be a game changer for use cases where data freshness is critical. The comparison table at the end is super useful too, makes it easy to weigh the options based on your specific constraints. Great stuff!