Data Engineering

373 readers
1 users here now

A community for discussion about data engineering

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS
26
27
28
29
30
31
32
 
 

cross-posted from: https://programming.dev/post/2656516

What are your real-world applications of this versatile data structure?

They are useful for optimization in databases like sqlite and query engines like apache spark. Application developers can use them as concise representations of user data for filtering previously seen items.

The linked site gives a short introduction to bloom filters along with some links to further reading:

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.

33
 
 

Anyone else using Hollow (written by Netflix) in production?

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

34
 
 

This article helped defined the “data engineer” role so I’d say it belongs here!

Although some time has passed, I find it very relevant: SQL is used more than ever, graphical ETL tools that don’t output code are rare and vendors are still trying to convince executives to trust all their data to proprietary data warehouses.

The author Maxime Beauchemin also wrote Airflow and Superset so they have some experience worth listening to.

35
 
 

Is it starting to pick back up? I know it's been rough for a few these past few months. Are salaries dropping pretty heavily?

36
37