Technical Engineer

Collabera
Hartford, Connecticut
Full Time

Email Address

Apply Now

Position Summary

The Databricks Architect/ADMIN is a senior individual contributor responsible for the design, implementation, and continuous optimization of the enterprise Databricks platform.
This role serves as the technical authority for all aspects of the Databricks environment including workspace governance, Unity Catalog, cluster and compute strategy, data pipeline architecture, and cost management.
The Architect works in close partnership with data engineering, analytics, and infrastructure teams, and operates within a broader multi-platform data ecosystem that includes Ab Initio and Fivetran.
A strong background in Unix/Linux systems administration and scripting is essential, as the role requires deep engagement with the underlying compute infrastructure supporting the platform.

Key Responsibilities
Platform Architecture & Desig n
• Architect and govern the enterprise Databricks environment, including workspace topology, Unity Catalog structure, and access control frameworks.
• Define and enforce standards for cluster configuration, runtime versions, instance pool utilization, and auto-scaling policies.
• Design scalable, performant data pipeline patterns using Delta Live Tables, Databricks Workflows, and structured streaming.
• Establish architectural standards for Delta Lake including table formats, partitioning strategies, Z-ordering, and OPTIMIZE/VACUUM scheduling.
• Lead platform integration design with upstream ingestion tools including Fivetran and Ab Initio, ensuring reliable, governed data delivery.

Unix/Linux Infrastructure & Operations
• Administer and troubleshoot Unix/Linux environments underpinning Databricks compute nodes, init scripts, and cluster lifecycle management.
• Develop and maintain shell scripts (Bash) and Python automation for platform operations, monitoring, log aggregation, and maintenance tasks.
• Manage file system operations, permission structures, and data movement tasks in Linux-based storage and compute environments.
• Support EC2/VM-level diagnostics and tuning in coordination with infrastructure and cloud engineering teams.

Cost Management & Optimization
• Own DBU consumption tracking and reporting; proactively identify optimization opportunities across jobs, interactive clusters, and SQL warehouses.
• Implement and maintain cost attribution models to support chargeback or showback reporting by team, product, or LOB.
• Partner with the Senior Director on capacity planning, contract utilization forecasting, and multi-year commitment management.

Governance, Security & Compliance
• Design and implement data governance frameworks within Unity Catalog, including lineage, tagging, and access auditing.
• Collaborate with Cybersecurity to ensure platform configurations satisfy enterprise security controls, including secrets management, network isolation, and encryption.
• Support audit and compliance activities by maintaining documentation of platform configurations, access policies, and data classification standards.

Automation & Artificial Intelligence
• Design and implement end-to-end automation frameworks for platform operations, including cluster lifecycle management, job scheduling, alerting, and self-healing workflows.
• Leverage Databricks AutoML, MLflow, and Model Serving capabilities to support the operationalization of machine learning models within the enterprise data platform.
• Integrate AI-assisted development tooling (e.g., Databricks Assistant, GitHub Copilot) into engineering workflows to accelerate pipeline development and reduce manual effort.
• Identify and drive automation opportunities across ingestion, transformation, data quality, and governance processes reducing toil and improving platform reliability.
• Collaborate with data science and advanced analytics teams to architect scalable feature engineering pipelines and model deployment patterns on Databricks.
• Evaluate and recommend emerging AI/ML platform capabilities, including generative AI integrations and LLM-backed data workflows, in alignment with enterprise strategy.
• Serve as the primary technical escalation point for Databricks platform issues across data engineering and analytics teams.
• Contribute to sprint planning and project tracking within Jira; manage platform change requests and incidents through ServiceNow.
• Produce and maintain architectural documentation, runbooks, and onboarding materials for platform consumers.
• Evaluate and recommend new Databricks features, partner integrations, and tooling investments in support of the platform roadmap.

Required Qualification
• 7+ years of experience in data engineering or data platform roles, with a minimum of 4 years hands-on Databricks implementation experience.
• Demonstrated expertise with Databricks platform capabilities: Unity Catalog, Delta Lake, Databricks Workflows, Delta Live Tables, and SQL Warehouses.
• Strong Unix/Linux proficiency shell scripting, process management, file system operations, cron scheduling, and environment configuration.
• Proficiency in Python and PySpark for distributed data processing, pipeline development, and platform automation.
• Experience with cloud infrastructure (AWS, Azure, or GCP), including compute, storage, networking, and IAM/security constructs.
• Demonstrated ability to design for scale, cost efficiency, and operational reliability in an enterprise data environment.
• Demonstrated experience designing automation frameworks for data platform operations including job orchestration, monitoring, alerting, and pipeline self-healing.
• Familiarity with AI/ML concepts and tooling within the Databricks ecosystem, including MLflow, AutoML, and Model Serving; exposure to generative AI or LLM-integrated workflows is a plus.
• Experience with Oracle database environments, including SQL development, schema design, and integration patterns for data extraction and pipeline sourcing.
• Proficiency in Git-based version control branching strategies, pull request workflows, repository management, and CI/CD pipeline integration for data platform code.
• Experience working within ITSM and project delivery frameworks such as ServiceNow and Jira.
• Strong written and verbal communication skills, with the ability to convey complex architectural concepts to both technical and non-technical audiences.

Preferred Qualifications
• Hands-on experience with MLflow experiment tracking, model registry, and deployment patterns within Databricks.
• Exposure to generative AI frameworks (LangChain, LlamaIndex) or experience building LLM-integrated data pipelines and retrieval-augmented generation (RAG) workflows.
• Experience with workflow automation tools such as Apache Airflow, Databricks Workflows, or comparable orchestration platforms at enterprise scale.
• Experience integrating Databricks with ETL/ELT platforms including Fivetran, or Ab Initio; hands-on Ab Initio development or administration experience is a strong plus.
• Familiarity with enterprise data governance frameworks and catalog tools (e.g., Collibra, Alation, or Unity Catalog advanced features).
• Experience supporting Databricks in regulated industries (financial services, insurance) with associated audit and compliance requirements.
• Working knowledge of Infrastructure-as-Code tooling (Terraform, Ansible) for platform provisioning and configuration management.
• Background in disaster recovery design and resiliency planning for cloud-hosted data platforms.