PinnedOfili LewisWindowing in Stream ProcessingWindowing is a critical concept in stream processing, as it allows data to be processed in small, manageable chunks over a specified…5 min read·Mar 17, 2023----
Ofili LewisTaming the Spark Shuffle: Optimizing Shuffle Operations in PySparkPySpark’s distributed nature empowers you to tackle massive datasets efficiently. However, shuffling data across executors can become a…3 min read·3 days ago----
Ofili LewisinDev GeniusOptimizing PySpark Data: Partitioning vs. BucketingPartitioning and bucketing are two key techniques that can significantly enhance query performance and data management within PySpark …4 min read·4 days ago----
Ofili LewisCrafting Compelling Connections: A Guide to Good API DesignAn API's design has a significant impact on its effectiveness and user experience.3 min read·Feb 24, 2024----
Ofili LewisUnlocking the Power of Knowledge GraphsRepresenting, organizing, and querying complex data.3 min read·Apr 20, 2023----
Ofili LewisModern Data Engineering Technologies to Learn in 2023In today’s rapidly evolving world, staying current with modern data engineering technologies is essential for any data professional. In…3 min read·Jan 27, 2023----
Ofili LewisData NormalizationData normalization is the process of organizing data in a consistent and uniform format to ensure that it is accurate, reliable, and easy…10 min read·Jan 26, 2023----
Ofili LewisData VisualizationData visualization is the process of representing data in a graphical or pictorial format. It is a powerful tool that allows data…3 min read·Jan 26, 2023----
Ofili LewisData Cleaning and PreprocessingData cleaning and preprocessing is an essential step in the data science process. It involves identifying and correcting any errors…3 min read·Jan 26, 2023--2--2