Bain & Company Inc

  • Data Engineer

    Job Location US-CA-San Francisco | US-CA-Palo Alto | US-CA-Los Angeles | US-MA-Boston | US-NY-New York | US-TX-Dallas
    Job ID
    Posted Date
    Advanced Analytics
    Regular Full-Time
    Location : Location
    US-CA-San Francisco
  • Company Overview

    Bain & Company is the management consulting firm that the world’s business leaders come to when they want results. Bain advises clients on strategy, operations, information technology, organization, private equity, digital transformation and strategy, and mergers and acquisition, developing practical insights that clients act on and transferring skills that make change stick. The firm aligns its incentives with clients by linking its fees to their results. Bain clients have outperformed the stock market 4 to 1. Founded in 1973, Bain has 57 offices in 36 countries, and its deep expertise and client roster cross every industry and economic sector.


    Department Overview 

    Bain’s Advanced Analytics Group is a team of high-impact quantitative technology specialists who solve statistical, machine learning, and data engineering challenges that we encounter in client engagements. AAG team members hold advanced degrees in subjects ranging across statistics, mathematics, computer sciences, and other quantitative disciplines, and have backgrounds in a variety of fields including data science, marketing analytics, and academia.


    Position Summary

    You will solve cutting-edge problems for a variety of industries as a software engineer specializing in Data Engineering. As a member of a diverse engineering team, you will participate in the full engineering life cycle which includes designing, developing, optimizing, and deploying new machine learning solutions and infrastructure at the production scale of the world’s largest companies.

    Core Responsibilities and Requirements

    • Partner with Data Science, Machine Learning, and Platform Engineering teams to develop and deploy production quality code
    • Develop and champion modern Data Engineering concepts to technical audience and business stakeholders
    • Implement new and innovative deployment techniques, tooling, and infrastructure automation within Bain and our clients.
    • This position will be located in Palo Alto, Los Angeles, Boston, Dallas, Austin, Seattle, or remotely
    • Travel is required (~20%)

    Scope, architect, design, develop, build, and release robust and scalable Data Engineering solutions for structured and unstructured data


    • Build large-scale batch and real-time cloud based distributed data systems to provide low latency delivery of high-quality data.
      • Enable real time and batch processed machine learning solutions
      • Enable users to access and interact with their data by providing APIs, micro-services, and applications.
    • Translate business requirements into technical requirements and implementation details
      • Data lake, data warehouse, or data mart solutions.
      • Data models that are flexible, easy to understand, and enable data insights.
    • Champion next generation data architecture strategies in data pipeline, analysis, and storage solutions.

    Develop infrastructure and deployment platform to enable production data science and machine learning engineering development

    • Participate in the full software development life cycle including designing distributed systems, writing documentation and unit/integration tests, and conducting code reviews.
    • Develop and improve infrastructure including CI/CD, microservice frameworks, distributed computing, and cloud infrastructure needed to support this platform.
    • Design and develop frameworks to automate data ingestion, analysis, visualization, and integration of structured and unstructured data from a variety of data sources.

    Provide technical guidance to external clients and internal stakeholders in Bain


    • Explore new technical innovations in the machine learning and data engineering to improve customer results.
    • Advise and coach engineering teams on technology stack best practices and operational models to raise their data engineering capabilities.




    • Bachelor’s in Computer Science or a related technical field.
    • 4+ years of experience programming with Python, Scala, C/C++, Java, C#, Go, or similar programming language.
    • 4+ years of experience with SQL or NoSQL databases: PostgreSQL, SQL Server, Oracle, MySQL, Redis, MongoDB, Elasticsearch, Hive, HBase, Teradata, Cassandra, Amazon Redshift, Snowflake.
    • 2+ years of experience working on public cloud environments (AWS, GCP, or Azure), and associated deep understanding of failover, high-availability, and high scalability.
    • 2+ years of experience working with Docker containers.
    • Scaling and optimizing schema and performance tuning SQL and ETL pipelines in data lake and data warehouse environments.
    • Expert in SQL and one of the programming languages from previous experience
    • Strong computer science fundaments in data structures, algorithms, automated testing, object-oriented programming, performance complexity, and implications of computer architecture on software performance.
    • Data ingestion using one or more modern ETL compute and orchestration frameworks (e.g. Apache Airflow, Luigi, Spark, Apache Nifi, and Apache Beam).
    • Version control and git workflows


    • Masters in Computer Science or a related technical field.
    • Open source distributed computing and database frameworks such as Apache Flink, Ignite, Presto, Apex, Cassandra and HBase.
    • Real-time steaming distributed data processing using Apache Flink, Storm, Amazon Kinesis, Kafka, Spark Streaming, or Apache Beam.
    • Deployment best practices using CI/CD tools and infrastructure as code (Jenkins, Docker, Kubernetes, and Terraform).
    • Experience with administering and managing Kubernetes clusters (EKS, GCP, or AKS) and Helm.
    • Strong interpersonal and communication skills, including the ability to explain and discuss technical concepts and methodologies with colleagues and clients from other disciplines.
    • Agile development methodology


    • Engineering distributed systems and database internals (including handling consensus, availability, distributed query processing etc.).
    • Deploying end-to-end logging solutions such as the EFK stack.
    • Grafana dashboards.
    • Elements of the PyData ecosystem including Cython, Numpy, Numba, Pandas, and Dask



    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed