As part of the Corporate Decision Sciences organization at NBCUniversal, the Decision Sciences Engineering team helps drive advanced analytics by building and integrating pragmatic data science principles and products throughout the Company. The Team focuses primarily on decision sciences analyses to support and inform strategy and key business decisions and building advanced data products to help enable more data-enhanced decision support. The Lead Data Engineer will be directly responsible for data/ML pipelines, data management, code development, operationalizing of machine learning models, and scaling out platforms and products.

Job Duties

-Partner with various NBCU Technology teams in the design and execution of an overall Corporate Data Syndication Strategy for Nielsen and Alternative Measurement Data
-Process structured and unstructured data into a form suitable for analysis and reporting, empowering state-of-the-art analytics and machine learning environments for business analysts, data scientists and engineers
-Apply expert software development skills to a wide range of ML-related coding projects
-Operationalize data science models and products in a cluster-computing environment
-Evangelize a very high standard of quality, reliability and performance for data models and algorithms that can be streamlined into the engineering and machine learning workflows
-Adapt standard machine learning methods to best exploit modern parallel environments (e.g. distributed clusters)
-Build data pipelines to automate high-volume and real-time data delivery
-Work directly with Product Owners and clients to deliver data products in a collaborative and agile environment
-Working with data scientists to understand their processes and supporting them with developing and building features (feature engineering) for their models
-Grasp new technologies rapidly as needed to progress varied initiatives
-Capable of consulting and collaborating on projects involving interdisciplinary team members

Experience / Skills


-Minimum 5 years of experience with a programming language such as R, -Python or Java, and the experience writing reusable and efficient code to automate analyses and data processes
-Minimum 5 years of experience processing large amounts of structured and unstructured data in a cluster-computing environment or similar experience in academia
-Experience formulating opinions on constructing data processing systems and good knowledge of the principles of distributed systems at scale using tools such as Apache Spark (Databricks), Airflow, Docker, Redis
-Experience with AWS, Azure and other cloud technologies including AWS services, such as Athena, Glue, S3, Lambda, Elastic Beanstalk, API Gateway, ECR, ECS and SageMaker
-Experience building and maintaining production data pipelines

Desired Characteristics

-Experience with open source and Enterprise software
-Familiarity with relational databases, SQL, NoSQL and Graph databases
-Team-oriented and collaborative approach with a demonstrated aptitude and willingness to learn new methods and tools
-Ability to communicate insights and findings through data visualization tools such as Tableau, DOMO, Shiny
-Experience with machine learning software packages (e.g., scikit-learn, TensorFlow, Caffe, Theano, Torch)
-Experience in media and entertainment industry a plus
-Experience with television ratings and digital measurement tools (Nielsen, Rentrak, comScore, Omniture, etc.)
-Experience with large-scale video assets
-Experience with computer vision and metadata generation from video
-Master’s Degree with a specialization in Computer Science, Engineering, -Physics or other quantitative field or equivalent