Schema evolution: although not recommended in a production environment, this feature allows to change the schema over time.Schema is enforced to prevent bad data and help enforcing data quality.Delta tables can be filles using both streaming and batch processing.Parquet file format, an open-source format, is leveraged for its compression and encoding schemes.It helps when we need to revert a specific version of the data or audit it. Time travel: data is snapshotted, and every change made to it can be viewed and used by your pipelines.ACID transactions: Ensure data integrity.Delta Sharing: enable data sharing from Delta tables with other systems without having to copy it before.It brings the following benefits to a data lake: Databricks introduced the Delta Lake project and open sourced it in early 2019. Major vendors support data lakehouse features such as Databricks, and Microsoft Synapse Analytics. Log files from IoT systems, Json and XML, web pages, etc.ĭatabase, Excel spreadsheets, delimited files, etc. ![]() ![]() Media filles like music or videos.ĭata have some structured element but without formal or strict organization of them. Text documents like PDF files, word processing and image files.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |