You will be responsible for:
- Collaborating with cross-functional teams to understand data requirements, and design efficient, scalable, and reliable ETL processes using Python and DataBricks
- Developing and deploying ETL jobs that extract data from various sources, transforming it to meet business needs.
- Taking ownership of the end-to-end engineering lifecycle, including data extraction, cleansing, transformation, and loading, ensuring accuracy and consistency.
- Creating and manage data pipelines, ensuring proper error handling, monitoring and performance optimizations
- Working in an agile environment, participating in sprint planning, daily stand-ups, and retrospectives.
- Conducting code reviews, provide constructive feedback, and enforce coding standards to maintain a high quality.
- Developing and maintain tooling and automation scripts to streamline repetitive tasks.
- Implementing unit, integration, and other testing methodologies to ensure the reliability of the ETL processes
- Utilizing REST APls and other integration techniques to connect various data sources
- Maintaining documentation, including data flow diagrams, technical specifications, and processes.
You Have
- Proficiency in Python programming, including experience in writing efficient and maintainable code.
- Hands-on experience with cloud services, especially DataBricks, for building and managing scalable data pipelines
- Proficiency in working with Snowflake or similar cloud-based data warehousing solutions
- Solid understanding of ETL principles, data modelling, data warehousing concepts, and data integration best practices
- Familiarity with agile methodologies and the ability to work collaboratively in a fast-paced, dynamic environment.
- Experience with code versioning tools (e.g., Git)
- Meticulous attention to detail and a passion for problem solving
- Knowledge of Linux operating systems
- Familiarity with REST APIs and integration techniques
You Might Also Have
- Familiarity with data visualization tools and libraries (e.g., Power BI)
- Background in database administration or performance tuning
- Familiarity with data orchestration tools, such as Apache Airflow
- Previous exposure to big data technologies (e.g., Hadoop, Spark) for large data processing
- Experience with ServiceNow integration