Data Engineer P-135
SMASH, Who we are?
We believe in long-lasting relationships with our talent. We invest time getting to know them and understanding what they seek as their professional next step.
We aim to find the perfect match. As agents, we pair our talent with our US clients, not only by their technical skills but as a cultural fit. Our core competency is to find the right talent fast.
This position is remote within the United States. You must have U.S. citizenship or a valid U.S. work permit to apply for this role.
Role summary
You will design and operate scalable ETL and streaming pipelines that process contracts and invoices at high volume with strong data quality guarantees. This role focuses on building reliable data platforms that power analytics, ROI reporting, and compliance through robust governance, validation, and observability.
Responsibilities
Design and maintain ETL pipelines to ingest contracts and invoices from PDF, DOCX, CSV, Excel, and webhook sources.
Build scalable workflows for historical data migrations (10K+ invoices per customer).
Implement real-time streaming pipelines for event-driven integrations.
Develop and manage an analytics data warehouse to support reporting, metrics, and trend analysis.
Model customer-specific datasets for ROI, savings, and exception reporting.
Implement data validation checks for completeness, accuracy, and consistency.
Build data quality monitoring, alerting, and dead-letter queue handling.
Implement PII/PHI detection, masking, and data retention policies (5-year audit trail).
Track data lineage from source through transformation to consumption.
Optimize SQL queries and data models for performance and scalability.
Collaborate with product, engineering, and analytics teams to evolve data requirements.
Requirements – Must-haves
Strong experience building and operating ETL pipelines in production environments.
Proficiency in Python for data processing (pandas, numpy, pyspark).
Advanced SQL skills with PostgreSQL, including data modeling and query optimization.
Hands-on experience with workflow orchestration tools (Airflow, Prefect, or similar).
Experience designing and operating data warehouses (Redshift, BigQuery, or Snowflake).
Familiarity with streaming platforms such as Kafka or Kinesis.
Experience implementing data quality frameworks (Great Expectations or similar).
Strong understanding of data validation, error handling, and monitoring best practices.
Ability to design scalable systems handling large datasets and schema complexity.
Nice-to-haves (optional)
Experience processing semi-structured or unstructured data (PDF parsing, OCR).
Exposure to healthcare, financial, or compliance-driven data environments.
Experience with data governance, lineage tools, or cataloging platforms.
Familiarity with cloud-native architectures and infrastructure-as-code.
Experience building metrics pipelines for operational or customer-facing analytics.