Artwork

コンテンツは Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® によって提供されます。エピソード、グラフィック、ポッドキャストの説明を含むすべてのポッドキャスト コンテンツは、Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® またはそのポッドキャスト プラットフォーム パートナーによって直接アップロードされ、提供されます。誰かがあなたの著作物をあなたの許可なく使用していると思われる場合は、ここで概説されているプロセスに従うことができますhttps://ja.player.fm/legal
Player FM -ポッドキャストアプリ
Player FMアプリでオフラインにしPlayer FMう!

How to use Data Contracts for Long-Term Schema Management

57:28
 
シェア
 

Manage episode 358522668 series 2355972
コンテンツは Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® によって提供されます。エピソード、グラフィック、ポッドキャストの説明を含むすべてのポッドキャスト コンテンツは、Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® またはそのポッドキャスト プラットフォーム パートナーによって直接アップロードされ、提供されます。誰かがあなたの著作物をあなたの許可なく使用していると思われる場合は、ここで概説されているプロセスに従うことができますhttps://ja.player.fm/legal

Have you ever struggled with managing data long term, especially as the schema changes over time? In order to manage and leverage data across an organization, it’s essential to have well-defined guidelines and standards in place around data quality, enforcement, and data transfer. To get started, Abraham Leal (Customer Success Technical Architect, Confluent) suggests that organizations associate their Apache Kafka® data with a data contract (schema). A data contract is an agreement between a service provider and data consumers. It defines the management and intended usage of data within an organization. In this episode, Abraham talks to Kris about how to use data contracts and schema enforcement to ensure long-term data management.
When an organization sends and stores critical and valuable data in Kafka, more often than not it would like to leverage that data in various valuable ways for multiple business units. Kafka is particularly suited for this use case, but it can be problematic later on if the governance rules aren’t established up front.
With schema registry, evolution is easy due to its robust security guarantees. When managing data pipelines, you can also use GitOps automation features for an extra control layer. It allows you to be creative with topic versioning, upcasting/downcasting the data collected, and adding quality assurance steps at the end of each run to ensure your project remains reliable.
Abraham explains that Protobuf and Avro are the best formats to use rather than XML or JSON because they are built to handle schema evolution. In addition, they have a much lower overhead per-record, so you can save bandwidth and data storage costs by adopting them.
There’s so much more to consider, but if you are thinking about implementing or integrating with your data quality team, Abraham suggests that you use schema registry heavily from the beginning.
If you have more questions, Kris invites you to join the conversation. You can also watch the KOR Financial Current talk Abraham mentions or take Danica Fine’s free course on how to use schema registry on Confluent Developer.
EPISODE LINKS

  continue reading

1. Intro (00:00:00)

2. What is a data contract? (00:02:02)

3. What are the problems with using JSON Blobs? (00:03:27)

4. What are the advantages of using Avro and Protobuf formats? (00:08:38)

5. What are schema references? (00:17:00)

6. What support is available for changing the data format? (00:19:38)

7. What are forwards, backwards, and full compatibility? (00:22:33)

8. What should you do if you have two different formats? (00:31:17)

9. What are the tradeoffs of doing topic versioning? (00:35:28)

10. What are upcasters and downcasters? (00:43:13)

11. Are there any recommended tools for making data discoverability easier? (00:52:45)

12. It's a wrap! (00:56:02)

265 つのエピソード

Artwork
iconシェア
 
Manage episode 358522668 series 2355972
コンテンツは Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® によって提供されます。エピソード、グラフィック、ポッドキャストの説明を含むすべてのポッドキャスト コンテンツは、Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® またはそのポッドキャスト プラットフォーム パートナーによって直接アップロードされ、提供されます。誰かがあなたの著作物をあなたの許可なく使用していると思われる場合は、ここで概説されているプロセスに従うことができますhttps://ja.player.fm/legal

Have you ever struggled with managing data long term, especially as the schema changes over time? In order to manage and leverage data across an organization, it’s essential to have well-defined guidelines and standards in place around data quality, enforcement, and data transfer. To get started, Abraham Leal (Customer Success Technical Architect, Confluent) suggests that organizations associate their Apache Kafka® data with a data contract (schema). A data contract is an agreement between a service provider and data consumers. It defines the management and intended usage of data within an organization. In this episode, Abraham talks to Kris about how to use data contracts and schema enforcement to ensure long-term data management.
When an organization sends and stores critical and valuable data in Kafka, more often than not it would like to leverage that data in various valuable ways for multiple business units. Kafka is particularly suited for this use case, but it can be problematic later on if the governance rules aren’t established up front.
With schema registry, evolution is easy due to its robust security guarantees. When managing data pipelines, you can also use GitOps automation features for an extra control layer. It allows you to be creative with topic versioning, upcasting/downcasting the data collected, and adding quality assurance steps at the end of each run to ensure your project remains reliable.
Abraham explains that Protobuf and Avro are the best formats to use rather than XML or JSON because they are built to handle schema evolution. In addition, they have a much lower overhead per-record, so you can save bandwidth and data storage costs by adopting them.
There’s so much more to consider, but if you are thinking about implementing or integrating with your data quality team, Abraham suggests that you use schema registry heavily from the beginning.
If you have more questions, Kris invites you to join the conversation. You can also watch the KOR Financial Current talk Abraham mentions or take Danica Fine’s free course on how to use schema registry on Confluent Developer.
EPISODE LINKS

  continue reading

1. Intro (00:00:00)

2. What is a data contract? (00:02:02)

3. What are the problems with using JSON Blobs? (00:03:27)

4. What are the advantages of using Avro and Protobuf formats? (00:08:38)

5. What are schema references? (00:17:00)

6. What support is available for changing the data format? (00:19:38)

7. What are forwards, backwards, and full compatibility? (00:22:33)

8. What should you do if you have two different formats? (00:31:17)

9. What are the tradeoffs of doing topic versioning? (00:35:28)

10. What are upcasters and downcasters? (00:43:13)

11. Are there any recommended tools for making data discoverability easier? (00:52:45)

12. It's a wrap! (00:56:02)

265 つのエピソード

All episodes

×
 
Loading …

プレーヤーFMへようこそ!

Player FMは今からすぐに楽しめるために高品質のポッドキャストをウェブでスキャンしています。 これは最高のポッドキャストアプリで、Android、iPhone、そしてWebで動作します。 全ての端末で購読を同期するためにサインアップしてください。

 

クイックリファレンスガイド