MinneapolisRecruiter Since 2001
the smart solution for Minneapolis jobs

Site Reliability Engineer, Sr

Company: Merrill Corporation
Location: Minneapolis
Posted on: June 13, 2021

Job Description:

Datasite is the industry leader in technology solutions that enable mergers, acquisitions, initial public offerings, restructuring and other critical capital transactions in more than 170 countries. We provide the world's leading investment banks, private equity firms, law firms and corporations with tools to simplify, streamline and accelerate the due diligence process, helping them close more deals, faster. We are a global team of high-energy, passionate people. We have strong individual voices but we work as a team, bringing out the best in each other. We thrive under pressure and always keep the customer at the heart of everything we do.

Job Description:

Sr. Site Reliability Engineer

The Sr. SRE works with other team members to implement our organization's operational observability practices, solving problems before they happen. We are searching for an engineer who is skilled at finding root-cause for enterprise incidents in our customer-facing enterprise platform.

The SRE is responsible for learning the various areas of our product and tying them back to back-end components or services and working with teams to prioritize, diagnose and fix issues. Work as a part of a talented team of engineers focusing on delivering mission critical expertise and support to ensure the highest levels of availability and performance.

Essential Duties and Responsibilities:

  • Partners with manager and other more senior team members to identify and resolve a variety of system failures; maintain and continue training in order to be prepared for all possible issues
  • Leverages knowledge of Datasite's systems and services to troubleshoot Implements instrumentation for our platform to deepen Datasite's insights and analysis into the quality of experience for our customers, and quality of service of our platform
  • Maintain the production monitoring systems that cover our platform from end to end.
  • Automate identified workflows and processes internal to Datasite's SRE team, and the systems our organization uses to manage our platform
  • Assist the organization with applying monitoring, alerting, and configuration management strategies
  • Participate in the Platform Services on-call rotation to assist with production issues in Datasite's platform
  • Prepare after-incident reviews and help the organization implement the identified outcomes into our applications and systems to improve our client experience

Education / Experience:

  • Bachelor's Degree in Computer Science, Math, Engineering, or related field required; 2 to 5 years of relevant experience
  • Demonstrated proficiency in one of the following: Java, JavaScript, Python, Kotlin (or other mainstream programming language)
  • hands-on/administrative experience with the following (e.g. Kubernetes, Prometheus/Grafana, Splunk, Instana/AppDynamics/Dynatrace

Who You Are

You love using your knowledge of operational excellence and service design gained by working with complex systems to dig into problems or processes, define the moving pieces, and automate a solution (particularly using Python!). You provide detailed analysis and problem-solving, coupled with strong communication skills and a sense of ownership and drive. Your insatiable need to understand a system's inner workings motivates you to find flaws in implementation or design and collaborate with others to improve understanding, performance, and reliability.

  • 5-7 years Technical experience
  • Demonstrated proficiency in one of the following: Java, Javascript, Python, Kotlin
  • Experienced in a subset of these tools or their equivalent: Python, Kubernetes, Docker, Java, Springboot, Azure, Instana, AppDynamics, Splunk, Catchpoint, Grafana, Prometheus, Pivotal Cloud Foundry, Node.js, Angular, Jenkins

Who We Are

Come join Datasite's growing Site Reliability Engineering team, where we make our SaaS offering reliable, resilient, and performant. We are dedicated to analyzing and understanding Datasite's entire production stack, from clients' perspectives all the way through backend infrastructure. We provide systems and workflows that uniformly measure the stability, reliability and use of our platform. We emphasize building tools over manual processes. We create, not operate. Things go from repeatable to automated quickly. And most importantly, we are the people who take charge when no one else knows what to do.

What We'll Do Together

  • Continually train to be prepared to identify and resolve a variety of system failures
  • Conduct blameless after-incident retrospectives and drive the identified outcomes into our applications and systems to improve our client experience
  • Develop deep insights and analysis into the quality of experience for our customers, and quality of service of our platform
  • Implement improvements to service resiliency
  • Gracefully scale systems ahead of need using fact based metrics combined with deep knowledge of our platform, and mature systems by pushing for changes that improve reliability and velocity.
  • Design, create, and maintain production monitoring systems that cover our platform from end to end.
  • Automate everything.

Keywords: Merrill Corporation, Minneapolis , Site Reliability Engineer, Sr, Other , Minneapolis, Minnesota

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Minnesota jobs by following @recnetMN on Twitter!

Minneapolis RSS job feeds