Job description
Equivalent ExperienceDescription:
The Big Data Lead software engineer is responsible for owning and driving the technical innovation along with big data technologies. The individual is a subject matter expert technologist with strong Python experience and deep hands-on experience building data pipelines for the Hadoop platform as well as Google cloud. This person will be part of successful Big Data implementations for large data integration initiatives. The candidates for this role must be willing to push the limits of traditional development paradigms typically found in a data-centric organization while embracing the opportunity to gain subject matter expertise in the cyber security domain.
In this role you will
Lead the design and development of sophisticated, resilient, and secure engineering solutions for modernizing our data ecosystem that typically involve multiple disciplines, including big data architecture, data pipelines, data management, and data modeling specific to consumer use cases.
Provide technical expertise for the design, implementation, maintenance, and control of data management services – especially end-to-end, scale-out data pipelines.
Develop self-service, multitenant capabilities on the cyber security data lake including custom/of the shelf services integrated with the Hadoop platform and Google cloud, use API and messaging to communicate across services, integrate with distributed data processing frameworks and data access engines built on the cluster, integrate with enterprise services for security, data governance and automated data controls, and implement policies to enforce fine-grained data access
Build, certify and deploy highly automated services and features for data management (registering, classifying, collecting, loading, formatting, cleansing, structuring, transforming, reformatting, distributing, and archiving/purging) through Data Ingestion, Processing, and Consumption stages of the analytical data lifecycle.
Provide the highest technical leadership in terms of design, engineering, deployment and maintenance of solutions through collaborative efforts with the team and third-party vendors.
Design, code, test, debug, and document programs using Agile development practices.
Review and analyze complex data management technologies that require in depth evaluation of multiple factors including intangibles or unprecedented factors.
Assist in production deployments, including troubleshooting and problem resolution.
Collaborate with enterprise, data platform, data delivery, and other product teams to provide strategic solutions, influencing long range internal and enterprise level data architecture and change management strategies.
Provide technical leadership and recommendation into the future direction of data management technology and custom engineering designs.
Collaborate and consult with peers, colleagues, and managers to resolve issues and achieve goals.
- 10+ years of Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code BigData pipeline frameworks (as a software product - Cloudera ideally) – not just a “data engineer” implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption – advanced analytics, data science and ML.
- 3+ years of hands-on experience designing and building modern, resilient, and secure data pipelines, including movement, collection, integration, transformation of structured/unstructured data with built-in automated data controls, and built-in logging/monitoring/alerting, and pipeline orchestration managed to operational SLAs. Preferably using Airflow Custom Operator (at least 1 year of experience customizing within it), DAGS, connector plugins. - Python, spark, PySpark - working with APIs to integrate different services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer – Google data services. On prem – Apache Airflow – streaming tool core orchestrator. Kafka for streaming services – getting data sourced from and then spark streaming.
Python, spark, APIs to integrate different services, GCP services
Building self-service data pipelines – supports GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption – advanced analytics, data science and ML.
Skill sets: Python, Spark, (PYSPARK) used APIs to integrate with various services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer – Google data services. On prem – Apache Airflow – streaming tool core orchestrator. Kafka for streaming services – getting data sourced from and then spark streaming.
Top Skills Details:
Data, Spark, Data warehouse, Hadoop, Kafka, PySpark, Airflow, Cloudera
Additional Skills & Qualifications:
Additional skills to look for in any/all the above candidates as a plus: GCP, Kafka/Kafka Connect, Hive DB development
Experience with Google cloud data services such as cloud storage, cloud proc, cloud flow, and Big Query. Google Cloud Big Data Specialty – hands on experience ideally not just a certification
Hands-on experience developing and managing technical and business metadata
Experience creating/managing Time-Series data from full data snapshots or incremental data changes
Hands-on experience with implementing fine-grained access controls such as Attribute Based Access Controls using Apache Ranger
Experience automating DQ validation in the data pipelines
Experience implementing automated data change management including code and schema, versioning, QA, CI/CD, rollback processing
Experience with automating end to end data lifecycle on the big data ecosystem
Experience with managing automated schema evolution within data pipelines
Experience implementing masking and/or other forms of obfuscating data
Experience designing and building microservices, APIs and, MySQL
Advanced understanding of SQL and NoSQL DB schemas
Advanced understanding of Partitioned Parquet, ORC, Avro, various compression formats
Developing containerized Microservices and APIs
Familiarity with key concepts implemented by Apache Hudi or Iceberg, or Databricks Delta Lake (bonus)
Job expectations:
Ability to occasionally work nights and/or weekends as needed for on-call/production issue resolution
Ability to occasionally work nights and/or weekends for off-hours system maintenance
About TEKsystems:
We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.
The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
johnandkristie.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, johnandkristie.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, johnandkristie.com is the ideal place to find your next job.