Sony Pictures increasingly required near real-time data capabilities, but their existing system was unable to meet this need without an upgrade to a next-generation data pipeline.
Sony Pictures Entertainment was shifting towards a business model that was both data-driven and agile. However, their existing data management infrastructure, comprised mainly of custom scripts, scheduled jobs, and conventional ETL processes, was proving inadequate. To align with their new business direction, they required an upgraded data architecture – one that was scalable, adaptable, and could evolve with the company's needs over time. Additionally, Sony Pictures aimed to enhance their content recommendation capabilities, necessitating a pipeline capable of handling raw usage data for training machine learning models.
Sony Pictures Entertainment's existing data management platform for their streaming services, reliant on a slow ETL process, was proving insufficient. Hosted across multiple SQL servers on Azure in the United States, it depended on jobs scheduled to extract data every 24 hours. The system's data transformation logic, determined years ago by business analysts, resulted in the neglect of potentially valuable raw data.
The need was clear: Sony Pictures required a more dynamic solution that could swiftly capture raw data, allowing for versatile transformation and reporting. This new system needed the flexibility to adapt to future changes, accommodating both the persistence of current data sources and the integration of new ones, without prior knowledge of which would be relevant moving forward.
I developed a solution utilizing Azure's modern data tools and a lakehouse architecture, designed to stream data from diverse sources. This system was engineered using Azure Data Factory for connecting and managing data sources, combined with Azure Data Lake to establish the lakehouse framework. Tools like PowerBI, Excel, and Tableau were integrated for use by various downstream organizations.
To ensure data relevance and timeliness, we implemented threshold-based extraction routines tailored to the specific needs of each data source. For instance, we developed a heartbeat microservice that utilized Azure Queues to store and process events from video players. Using Azure Data Factory, we subscribed to this queue, extracting the raw JSON data from each message and subsequently storing it in the data lake.
Additionally, we collaborated with Tealium, a client-side data aggregator, to facilitate integration with Sony's front-end applications. Tealium served as a unified platform for client-side developers to capture event data from client applications, which was then relayed to downstream partners like advertising engines and analytics services, including Google Analytics. With Tealium maintaining a central repository of all usage data collected through these application events, we set up regular extracts to transfer this data into our lake, ensuring a comprehensive and up-to-date data repository.
We also developed a suite of reporting tools tailored for various stakeholders in the backend. This system offered preconfigured report views, allowing users to pivot and analyze data based on specific attributes, such as time or geography. For more comprehensive and detailed reporting, we utilized Tableau and PowerBI. Additionally, we seamlessly integrated select PowerBI report views directly into the reporting user interface, enhancing accessibility and user experience.
A significant challenge we faced was the concurrent development of certain microservices while we were attempting to integrate them for data extraction. This situation led to a 'chicken-and-egg' dilemma, where our integration efforts were often outpacing the infrastructure development. The architecture of our solution did provide the flexibility to proactively build adapters, but the inability to connect and test these adapters until the corresponding source services were fully operational posed a unique challenge. We navigated this complexity with patience and adaptability, understanding that such hurdles are part of working in a dynamic, evolving technological landscape.
We create amazing Webflow templates for creative people all around the world and help brands stand out.