Site Reliability Engineer Specialist (J00125641)
Site Reliability Engineering (SRE) is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles.
SRE is also an engineering approach to building and running production systems – we engineer solutions to operational problems. As SREs are responsible for overall system operation, we use a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.
Our SRE culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Equifax brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn, grow and take pride in our work.
Who is Equifax?
At Equifax, we believe knowledge drives progress. As a global data, analytics and technology company, we play an essential role in the global economy by helping employers, employees, financial institutions and government agencies make critical decisions with greater confidence.
We work to help create seamless and positive experiences during life’s pivotal moments: applying for jobs or a mortgage, financing an education or buying a car. Our impact is real and to accomplish our goals we focus on nurturing our people for career advancement and their learning and development, supporting our next generation of leaders, maintaining an inclusive and diverse work environment, and regularly engaging and recognizing our employees. Regardless of location or role, the individual and collective work of our employees makes a difference and we are looking for talented team players to join us as we help people live their financial best.
The Perks of being an Equifax Employee?
- We offer excellent compensation packages with market competitive pay, comprehensive healthcare packages, 401k matching, schedule flexibility, work from home opportunities, paid time off, and organizational growth potential.
- Grow at your own pace through online courses at Learning @ Equifax
What You’ll Do
- Participate in release cycles of our offerings, deploying code to integration, staging and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tools, monitoring, and change management
- Build Automation Work with Agile development teams to ensure smooth promotion of code, configuration and Docker images to production
- As we transition to the Public cloud (AWS or Google), build new build and deployment patterns.
- Oversee and adapt monitoring and alerting systems. Interact with automated monitoring and healing infrastructure to ensure healthy environments
- Develop automation to auto-correct or completely prevent issues in our solutions
- Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats
- Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
- Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
- Identify potential process improvements across the entire engineering organization
- Define and drive architectural enhancements into system to mitigate potential failure points
- Provide impact assessment and mitigation plan for changes going into the production environment
- Investigate root cause of severe and systemic outages, identify corrective actions
- 6+ years’ experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus)
- 2+ Years in (Ansible, Puppet, Chef, Salt)/ 2+ years deploying and managing infrastructure on public clouds (AWS, GCP, or Azure or Pivotal)/Kubernetes
Extra Points for any of the Following
- Bachelor’s Degree in Computer Science, Information Management or in “STEM” Majors
- Knowledge of TCP/IP networking, load balancers, high availability architecture, zero downtime production deployments. Comfortable with network troubleshooting (tcpdump, routing, proxies, firewalls, load balancers, etc.)
- Demonstrated ability to script around repeatable tasks (Go, Ruby, Python, Bash)
- Experience with large scale cluster management systems (Mesos, Kubernetes)
- Experience with Docker-based containers is a plus
- Experience in Linux environments (CentOS).
- Experience working with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper.
Success Attributes of the Technology organization. Does this describe you?
- Think and act differently
- Ownership (build it, own it, run it)
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.