12 months
100% Remote
Rates are DOE
——————————
—
Responsibilities:
• Own the core company data pipeline, responsible for scaling up data processing flow to meet the rapid data growth
• Evolve data model and data schema based on business and engineering needs
• Implement systems tracking data quality and consistency
• Develop tools supporting self-service data pipeline management (ETL)
• SQL and spark job tuning to improve data processing performance
Experience:
• Strong experience with Python and Spark
• Experience with workflow management tools (Airflow is preferred, Oozie, Azkaban, UC4)
• Experience with Hadoop (or similar) Ecosystem (Spark, Presto are preferred, Yarn, HDFS, Hive, Pig, HBase, Parquet)
• Solid understanding of SQL Engine and able to conduct advanced performance tuning
• Proficient in at least one of the SQL languages (Redshift and Oracle Preferred)
• Comfortable working directly with data analytics to bridge business goals with data engineering