SRE for Data

Full Time
California
Posted
Job description

This is a remote position.

HGS Digital has an opportunity for a Site Reliability Engineer (SRE) for Ads Data ecosystem.


Qualified Candidates must be self-motivated and must have learning attitude. Recent and extensive knowledge of Unix/Linux operating systems internals and networking along with Linux troubleshooting experience. Must have experience with Java, Go, Python or similar language. Must have expertise in designing, analyzing, and troubleshooting large-scale distributed systems (Redis, Elasticsearch, Kafka, Druid, Hadoop, Flink or other comparable solutions), relational databases, caching solutions and web service frameworks. Experience with algorithms, data structures, complexity analysis and software design required. Experience developing tools and APIs to reduce manual interaction with systems and applications using a variety of coding and scripting standards. Candidates should be curious, motivated learners with excellent communication skills and without requiring constant supervision.

Location: Temporarily Remote; Preferred San Francisco /LA / Seattle, WA others outside the area must be willing to relocate


Responsibilities:

  • Daily responsibility to design, write and deliver software to support and improve the availability, scalability, reliability, resiliency, monitoring, alerting, latency, and efficiency of Ads Data ecosystem
  • Manage day to day operations of data services, near real time and batch data pipelines
  • Be the POC for data integrity within datastores and perform root cause analysis on issues
  • Work towards elimination of toil
  • Influence and create new designs, architectures, standards, and methods for large-scale distributed systems
  • Engage in service capacity planning, demand forecasting, software performance analysis and system tuning
  • Work as part of a team serving multiple stakeholders, balancing priorities while communicating status with internal customers
  • Work as part of an on-call rotation to ensure the production systems are operating smoothly
  • Meet service-level-agreements (SLAs) or service-level-objective (SLOs) by measuring and monitoring service availability, performance, and overall system health.

johnandkristie.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, johnandkristie.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, johnandkristie.com is the ideal place to find your next job.

Intrested in this job?

Related Jobs

All Related Listed jobs