24 Mar
|
PropStream
|
Nellore
24 Mar
PropStream
Nellore
Apply on Kit Job: kitjob.in/job/42t6xj
Role Overview
n
We are looking for a hands-on, senior Databricks Architect to design, build, and govern our Lakehouse data platform from the ground up. You will own the end-to-end architecture of our data infrastructure — from raw ingestion through the Medallion layers to serving — and establish the engineering standards that will guide the entire data organization.
n
This is a highly strategic and technical role focused on driving adoption of Databricks, Unity Catalog, and up-to-date Lakehouse patterns across all data products and pipelines.
n
Key Responsibilities
n
Lakehouse Architecture & Design
n
- Design and implement a production-grade Medallion Architecture (Bronze / Silver / Gold) across all data pipelines.
- Establish best practices for Delta Lake table design, partitioning strategies, Z-ordering, and optimization across large-scale datasets.
- Define data modeling standards and schema evolution policies across the Lakehouse.
- Architect end-to-end data flows from ingestion (streaming and batch) through transformation and serving layers.
n
Unity Catalog & Data Governance
n
- Lead the setup, configuration, and rollout of Unity Catalog as the centralized governance layer for all data assets.
- Design metastore hierarchy, catalog/schema/table organization, and tagging standards.
- Implement fine-grained access control (row-level, column-level), data masking policies, and audit logging.
- Establish data lineage tracking and ensure end-to-end visibility across all pipelines.
- Define and enforce data classification and sensitivity frameworks for PII and regulated data assets.
n
Pipeline Development & Orchestration
n
- Build and maintain production-grade data pipelines using PySpark, Delta Live Tables (DLT), and Databricks Workflows / Jobs.
- Design modular, reusable pipeline patterns including incremental ingestion, CDC (Change Data Capture), and full-refresh strategies.
- Implement robust pipeline observability: logging, alerting, lineage tracking, and SLA monitoring.
- Leverage Databricks Repos for CI/CD integration, managing code promotion across dev / staging / production environments.
n
Performance & Compute Optimization
n
- Optimize Spark execution plans, identify and resolve performance bottlenecks across large-scale distributed workloads.
- Right-size cluster configurations: Serverless warehouses, auto-scaling job clusters, and photon-enabled SQL warehouses.
- Leverage Serverless Warehouses and SQL Warehouses for BI and ad hoc analytics workloads, minimizing cost and cold-start latency.
- Manage cost governance for compute, storage, and DBU consumption across workspaces.
n
Developer Experience & Standards
n
- Set up and maintain Databricks Repos with standardized project structures and Git integration.
- Define Python coding standards, notebook best practices, and modular library patterns for the data engineering team.
- Build reusable Python utility libraries for common patterns: schema validation,
data quality checks, Delta operations, and logging.
- Establish unit testing and integration testing frameworks for Spark pipelines.
n
Security, Compliance & Networking
n
- Configure workspace-level and account-level security: Private Link, IP access lists, secrets management via Databricks Secrets or AWS Secrets Manager.
- Design and enforce network isolation for sensitive data workloads.
- Ensure compliance with data residency and access control requirements for customer data.
n
Collaboration & Enablement
n
- Partner with data engineers, data scientists, and analytics engineers to ensure the platform meets diverse workload needs.
- Mentor the engineering team on Databricks, Spark optimization, and Lakehouse best practices.
- Produce architectural documentation, runbooks, and internal knowledge bases.
- Evaluate and recommend new Databricks features and third-party integrations relevant to the organization's data roadmap.
n
Required Qualifications
n
Core Databricks & Lakehouse
n
- 5+ years of hands-on experience with Databricks, with at least 2 years in an architect or senior lead role.
- Deep expertise in Unity Catalog: metastore setup, three-level namespace, ACL design, and data governance workflows.
- Strong mastery of the Medallion Architecture and Delta Lake: ACID transactions, time travel, compaction, and OPTIMIZE/VACUUM strategies.
- Proven experience designing and deploying production pipelines with Databricks Jobs and Workflows, including multi-task job DAGs, retry logic, and notifications.
- Hands-on experience with Da
Apply on Kit Job: kitjob.in/job/42t6xj
📌 Data Engineer (Nellore)
🏢 PropStream
📍 Nellore