03 Apr
|
Zorba Consulting India
|
Pune
03 Apr
Zorba Consulting India
Pune
Apply on Kit Job: kitjob.in/job/45maga
Job Summary
We are looking for experienced Data Engineers with strong expertise in PySpark and Cloudera Data Platform to design, develop, and optimize scalable data pipelines. The ideal candidate should have hands-on experience with distributed data systems, cloud platforms (AWS), and modern data architecture, along with a strong understanding of data governance and cataloging tools.
Key Responsibilities
- Design, build, and maintain scalable batch and real-time data pipelines using PySpark
- Work with Cloudera Data Platform (CDP) components such as CDE, CDW, Ozone, and Airflow
- Manage and optimize data workflows, ensuring high performance and reliability
- Implement data governance, security, and access control using Apache Ranger
- Develop and maintain data models, Hive Metastore, and large-scale distributed datasets
- Collaborate with cross-functional teams to deliver data solutions for analytics and reporting
- Work with AWS services like EMR, S3, MWAA, Glue Catalog,
and Lake Formation
- Ensure proper data partitioning, bucketing, and optimization using formats like Iceberg and Parquet
- Integrate data cataloging and lineage using Atlan
Required Skills & Qualifications
- 6+ years of experience in Data Engineering
- Solid hands-on experience with PySpark
- Deep understanding of modern data platforms and distributed data systems
- Experience with Cloudera Data Platform (CDP) ecosystem
- Proficiency in SQL and data modeling concepts
- Experience with AWS data services (EMR, S3, MWAA, Glue, Lake Formation)
- Strong knowledge of Hive Metastore and big data architectures
- Experience with file formats (Iceberg, Parquet) and optimization techniques
- Familiarity with data governance, cataloging, and lineage tools (Atlan)
Apply on Kit Job: kitjob.in/job/45maga
📌 Data Engineer (PySpark Cloudera) (Pune)
🏢 Zorba Consulting India
📍 Pune