We are looking for a Python & PySpark Data Developer to join our partner team in a hybrid setup (2-3 days per week on-site) , bringing strong data engineering expertise and hands-on experience in building and maintaining ETL pipelines.
What You Will Do :
- Identify and select relevant data sources to be ingested into the data lake based on business and analytical requirements.
- Design and manage the structure and organization of data within the data lake to ensure accessibility, scalability, and optimal performance.
- Develop and maintain ETL pipelines using PySpark for data cleaning, transformation, and integration from multiple sources.
- Configure ETL components such as data formatting, deduplication, volumetric analysis, and enrichment, ensuring all processes are thoroughly documented.
- Contribute to defining and designing new use cases by identifying relevant data, transforming it, and preparing it for analysis.
- Develop and maintain interactive dashboards that visualize key metrics and insights derived from processed data.
- Monitor and troubleshoot data workflows to maintain reliability, scalability, and accuracy in production environments.
- Document pipeline logic, data sources, transformation rules, and operational flows for transparency and maintainability.
Technical Skills :
Proficiency in Python and PySpark for data processing and pipeline development.Strong SQL skills and experience working with relational databases.
Familiarity with data warehousing platforms (e.g., Cloudera).
Understanding of data modeling, data lakes, and data governance principles.
Experience with dashboarding tools (Power BI, Tableau, etc.) is a plus.
Preferred Qualifications :
Bachelor's or Master's degree in Computer Science, Data Science or
a related field.
Fluent English is required; French is a plus but not mandatory.
Knowledge of business intelligence and data storytelling principles.