29 Mar
|
Idexcel
|
Nellore
Apply on Kit Job: kitjob.in/job/44k7i4
Job Description for Senior Data Engineer
Experience : 4years to 8years
Required Skills : Aws,Python,Pyspark,Databricks
Notice Period : Immediate to 15days
Databricks (Spark)
· Develop scalable ETL/ELT pipelines using PySpark (RDD/DataFrame APIs), Delta Lake , Auto Loader (cloudFiles), and Structured Streaming .
· Optimize jobs: partitioning , bucketing , Z-Ordering , OPTIMIZE + VACUUM , broadcast joins, AQE, checkpointing.
· Manage Unity Catalog : catalogs/schemas/tables, data lineage , permissions , secrets , tokens , and cluster policies .
· CI/CD for Databricks assets: notebooks , Jobs , Repos , MLflow artifacts.
· Build Medallion Architecture (Bronze/Silver/Gold) with Delta Live Tables (DLT) and expectations for data quality.
· Event-driven ingestion: Kafka/Kinesis → Databricks Streaming
Snowflake (DW & ELT)
· Model and implement star/snowflake schemas , data marts , and secure views .
· Performance tuning: clustering keys , micro-partitions , result caching , warehouse sizing , query profile analysis.
· Implement Task/Stream patterns for CDC ; external tables for data lakes (S3); Snowpipe for near-real-time ingestion.
· Python/Snowpark for transformations and UDFs; SQL best practices (CTEs, window functions).
· Security: Row Level Security (RLS) , Column Masking , OAuth/SCIM , network policies , data sharing (reader accounts).
AWS Data Engineering
· Storage & compute: S3 (lifecycle, encryption, partitioning), EMR (if needed), Lambda , Glue (ETL/Schema registry), Athena , Kinesis (Data Streams/Firehose), RDS/Aurora , Step Functions .
· Orchestration: MWAA/Airflow or Step Functions (error handling, retries, backfills, SLA alerts).
· Infra-as-code: Terraform/CloudFormation for reproducible environments (Databricks workspace, IAM, S3,
networking).
· Security/compliance: IAM least privilege , KMS , VPC endpoints/private links , Secrets Manager , CloudTrail/CloudWatch , GuardDuty .
· Observability: CloudWatch metrics/logs , structured logging, datadog/Prometheus (optional), cost monitoring (tags/budgets).
Data Quality, Governance & Security
· Implement unit/integration tests for pipelines (e.g., pytest + Outstanding Expectations + DLT expectations ).
· Data contracts and schema evolution; monitor SLA/SLO ; DQ dashboards (missingness, drift, freshness, completeness).
· PII handling: tokenization/pseudonymization, field-level encryption , KYB/KYC data flows adherence; audit trails.
· Cataloging & lineage through Unity Catalog and/or OpenLineage /Purview (if applicable).
DevOps & CI/CD
· Git workflows (branching, PR reviews), Databricks CLI /Terraform modules for jobs/clusters/UC, Snowflake DevOps (object versioning via schemachange or SQL-based migration).
· Automated testing in pipelines; feature flags , canary releases for data jobs; rollback strategies.
Client-Facing PoCs & Delivery
· Rapid PoC build: clearly defined success metrics , benchmark cost/performance , produce a transition plan to production.
· Present architectural decisions, trade-offs (Spark vs Snowflake ELT), and cost projections (Databricks DBU, Snowflake credits, storage egress).
· Produce runbooks , operational playbooks ,
and knowledge transfer documents for client teams.
Required Technical Skillset
· Databricks : PySpark, Delta Lake, Auto Loader, DLT, Jobs, Unity Catalog, MLflow basics.
· Snowflake : SQL, Snowpipe, Tasks/Streams, Snowpark (Python), warehouse sizing, performance tuning, security policies.
· Python : strong in packages for DE (pandas , pyarrow , pytest ), robust error handling, typing, and packaging.
· Orchestration : Airflow DAGs (Sensors, Operators, XCom), Step Functions state machines.
· Streaming & CDC : Kafka/Kinesis, Debezium (nice-to-have), CDC patterns to Delta/Snowflake.
· AWS : S3, Glue, Lambda, Kinesis, IAM/KMS, VPC, CloudWatch; Terraform/CloudFormation.
· Data Modeling : 3NF/Dimensional, slowly changing dimensions (SCD Type 2), surrogate keys, surrogate vs natural debates.
· Security & Compliance : encryption at rest/in transit, tokenization, key rotation, audit logging, governance controls.
· Performance & Cost : Spark job tuning, Snowflake warehouse right-sizing, partitioning/clustering, object storage best practices.
Nice-to-Have:
· dbt (Snowflake) with tests & exposures; Great Expectations .
· Databricks SQL Warehouses and BI connectivity; Photon engine awareness.
· Lakehouse Federation (UC external locations); Delta Sharing ; Iceberg experience.
· Kafka Connect/Debezium , NiFi or MuleSoft (for data integrations).
· Experience in financial services
· Exposure to ISO/IEC 27001 controls in data platforms.
Education & Certifications
· Bachelor’s/Master’s in CS/IT/EE or related.
· Certifications (plus): Databricks Data Engineer Associate/Professional , Snowflake SnowPro Core/Advanced , AWS Solutions Architect/Big Data/DP .
Apply on Kit Job: kitjob.in/job/44k7i4
📌 Senior Data Engineer (Nellore)
🏢 Idexcel
📍 Nellore