Growing a Data Science team (Part II)

The previous post in this series explored how to hire excellent data scientists that fit within the framework of what you are trying to build at your organization.  You now have the perfect team in place, and as leaders you know this is half the battle.  The second half is far more exciting and even more rewarding – setting the strategy for what the team will accomplish.  Done intentionally, this can be a smooth process that generates a cycle of high value projects, team visibility and growth, and the ability to work on projects of higher value still with new modeling techniques and approaches along the way.   We will focus on how our team operates to identify areas of opportunity, and how we work with our business partners to achieve success.


Ensuring you are focusing your efforts in the areas of opportunity adding the most value is a daunting task, especially because there is often no one right answer.  Our team is concentrated on practical applications as opposed to a pure RND team, and as such we rely heavily on our business partners to help us understand their biggest needs.  We start by asking department heads to spend time with their leaders and look for opportunities within their functional areas.  We go through this process with them so we can help identify where data science can add the most value and guide their top three to four priorities. 

We try to keep the business insight type questions within the functional specific data analytics teams and focus on projects that will ultimately make automated decisions.  While meeting with the business, we look for projects and ideas that have clearly defined objectives and outcomes, have a specific and measureable KPI of significant value, and have an integration strategy.  Here an integration strategy does not mean we will ever completely wash our hands of a project – there will always be pitfalls to watch out for and models to maintain.  An exit strategy refers to an end product being integrated into an existing service or software.  We also want to ensure the data for the project is available and there will be IT and QA resources for any future integration needs.


After each department identifies their top three to four projects they feel data science can add value in, we gather the department heads together (our Senior Leadership Team, or SLT) to evaluate their individual priorities in the context of the organization’s priorities.  We devote significant time to debate and discussion – if you cannot devote the time needed to these conversations then you should question the need to take on the proposed projects.  A data science team with a strong business perspective is imperative in these conversations to help connect the available data to the business problem.  The leader of the data science team should spend a significant portion of their time learning all aspects of the enterprise.  Given the scope of data science projects – often touching the work of multiple teams – understanding the ramifications of downstream effects from any model implementation is crucial. By learning how the pieces fit together, the leader will be able to guide the conversation in these meetings – integrating ideas between functional areas and connecting the dots.  This in turn enables her to influence business decisions, and take the team to the next level.  It is through these crucial conversations the SLT comes to an overall consensus on where the Data Science team will focus our efforts. 

With this said, there are of course opportunities for quick wins we must identify and execute.  This is key to growth and showing value as the nature of much data science work lends itself to a long arc: from understanding the business problem, to gathering and cleaning data, modeling and validation, deployment, and future model monitoring and maintenance.  And while we might live by the mantra of, “just because we have the skill set and know-how to answer a business problem, doesn’t mean we should be working on it”, there are always exceptions. 

There exist many grey areas within the business where we can achieve these quick wins and add value.  The key to success is understanding which of these to take on.  Your team’s time is valuable and it is easy to get pulled down the rabbit hole of day-to-day business questions, rarely providing long term value and instead scratching the itch of a curious stakeholder.  These types of projects may involve a predictive model not integrated into an existing system, validating the work of a vendor, or where another team owns the majority of the process.  To this extent we try to reserve a small, variable portion of the team’s time to accomplish quick wins.  Oftentimes quick wins do not involve our model deployment tool, and instead leverage vendor partnerships used by our Data Analytics teams such as Alteryx to make an API call to a DataRobot model we provide. 

Which quick ideas ultimately get worked on are currently at the discretion of the data science manager.  Charged with speed to market, quick win opportunities do not last long enough to go through the formal intake process.  Furthermore, what is considered a quick win has the funny ability to change depending on the availability of the team and competing priorities at the time.  The data science leader, having already invested significant time to understanding the business and their team’s strengths and weaknesses, should position themselves effectively to be in the best position to make this call. 


Once our priorities are set, it is time to officially kick-off the project.  We roughly follow the CRISP-DM structure, as data science does not follow the classic linear structure and is in fact cyclical.    CEO of Decision Management Solutions James Taylor has discussed at length the problems with traditional CRISP-DM, including the “lack of clarity, mindless rework, blind hand-offs to IT, and a failure to iterate”.

We admittedly have opportunities here to improve our process but the general structure adds significant value.  We initiate all projects by meeting with the relevant business owners and subject matter experts – we firmly believe you cannot start investigating a data set until you have a full understanding of the business problem you are trying to solve.  We treat this meeting from a consultant perspective, investigating and uncovering, extracting as much business knowledge we can from these domain experts.  If you truly understand the challenge your internal customers are facing, and are in tune to the business as a whole, you are often able to propose a more efficient solution to their problem.  We also use this initial meeting as an opportunity to discover what types of data may be available, and what the business owners think may be related to or predictive of the metric they are targeting.

Our documentation begins here.  Any team that has ever lost a team member knows significant information can be lost without proper documentation.  We use a project template in Confluence to document every step of our process beginning with Business Understanding through Deployment.  This documentation is integrated with JIRA and Bitbucket to ensure any relevant code is being committed: be it a Python script pulling together data and building a model; or engineers working tickets to track proof of concepts and services they are building.  Because the same minimum questions are answered for each project, any new team member joining is able to read this documentation and know where all relevant code and data are, understand the data and modeling approach, and of course have a thorough understanding of the business problem we are trying to solve. 


You are no doubt familiar with the rest of the steps and we will not bore you here.  Do not be afraid to view data science in its true cyclical form, which can often frustrate management since there is minimal linear progress which can make meaningful checkpoints few and far between.  We have learned explaining the unique complexities of deploying production models is our biggest opportunity.  Everyone can learn to “manage up” to some extent: individual contributors and managers alike should keep their leaders and business owners updated with progress and setbacks, level-setting timelines and expectations when appropriate; and the “manager of managers” can shield the team from the many ad hoc project requests coming across her desk, only accepting the quick wins adding the most value. 

With data science projects touching so many aspects of the business, clear communication is essential to operate effectively.  We have bi-weekly check-ins with our friends in DevOps, IT Security, and with the other “Data” teams to stay in touch about upcoming levels of effort and touch base on long term projects we are partnering on.  In these and other project check-in meetings, opportunities will present themselves to educate your business partners on the complexity of this work – take full advantage always.  In their book Power Your Career: The Art of Tactful Self-promotion at Work Nancy Burke and Richard Dodson advise to “find out what others think is important [and] think about how you might be of use to them”.  And they are certainly not alone in highlighting the benefits of this approach: in his classic book How to Win Friends and Influence People Dale Carnegie also stresses the importance of listening to others.  Next time you meet with your business owners, come to the table with an open mind, ask good questions, and listen – you will be amazed at how far this goes building relationships.

Whatever your specific project process may be, embracing a test and learn philosophy is critical to success – early and always.  The Billy Beane mentality of “getting on first” is vital to the growth of a data science team.  We cannot be afraid to make mistakes provided we learn from them.  We need to fail quickly and fall forward, not seeing failures as setbacks but learning opportunities and challenges to be overcome. 


Bringing our engineers into the process early on – as the scientists explore business understanding and data understanding – has paid dividends ten-fold.  The ultimate solution to any business problem will differ based on specific needs: batch or real-time, data sources being used, latency requirements.  Each of these considerations has the ability to completely change the approach.  The engineer will have a perspective the scientists do not and will add immense value.  Ensure a close, constant collaboration by having engineers and scientists meet regularly (whatever “meeting” looks like in your organization) not only for ongoing project efforts but for long term strategizing as well.  Team leads (be it technical leads, managers, or directors depending on the size of your organization) of the scientists and engineers should spend time thinking and planning what standardized tools the team will use to make integration smoother.  A collaborative system architecture design ensures the tooling engineers are building are easy to use by the scientists. 


To flourish in production data science requires cloud expertise, and ensuring a secure environment is of utmost importance.  We cannot do this alone and our success hinges on our partnership with The General’s DevOps team.  Building trust in this relationship is required to win – and it does not happen overnight; but rest assured the investment in this joint venture will yield gold.  More than any other this rapport will build with constant communication and openness. 

As with many organizations, our beginnings were prided with adding value quickly and being scrappy.  Team members were tinkering and seeking out cloud resources from the only others in the organization working in this environment: members of the DevOps team.  While being proactive is a cardinal quality, we missed an opportunity to involve team leaders in the beginning.  Unfortunately, we did not have the knowledge at the time to be able to have the foresight of what this relationship could and would become.  We were certainly called out as “Shadow IT” and “rogue developers” as you may have been; do not throw in the towel!  If you have not done so already, bring leadership in at the highest levels and educate them on what this relationship can become and what it should look like in your organization.

We see ourselves as ground breakers, agents of change, and pioneers.  And arguments for having DevOps capabilities within the data science team are warranted: data science is charged with speed to market, and production models involved in business decisions demand dedicated DevOps resources.  Moreover, DevOps for data science necessitates a mindset and tooling different from traditional DevOps.  But in many organizations, The General included, this is not feasible: data science and DevOps already exist(ed) as separate entities rolling-up through different organization structures and having competing priorities.  Ensuring proper prioritization and resource allocation requires alignment on short and long-term strategy from top-down, with team leaders checking-in often to discuss upcoming priorities and needs.

At The General, among their many duties DevOps is responsible for building continuous deployment pipelines for all of our services and underlying infrastructure provisioning.  This includes containerization and service orchestration, centralized logging and log shipping pipelines for our services, load balancing, and horizontal & vertical scaling.  Our DevOps resources are also accountable for uptime, security, and ensuring disaster recovery through a variety of methods including IaC (Infrastructure as Code), and Multi-AZ and region failover.

While we reserve the right to change course, viewing DevOps not as a competitor but as a business partner has allowed us to thrive.  It is as true today as it was always – anything can be achieved if no one cares who gets the credit.  With the understanding we succeed and fail together, DevOps getting stronger makes data science stronger; and vice versa.  By having success in smaller scale projects, we continued to get buy-in and influence resource allocation.  This allowed our DevOps team to grow and be able to support our needs.  Instead of having just one DevOps dedicated resource within the Data Science team – who no doubt would be out of the office when a production system errored – we have forty dedicated resource hours from a team of highly skilled engineers.  Opportunities to cross-train exist everywhere and should be fully taken advantage of.  This teaching and learning can only have positive benefits for the long-term prospects of your organization.

Partnering with DevOps has allowed our engineers to stay focused on setting up web services for our models, as well as writing various tooling to create a consistent pattern for our services to follow.  This eliminates re-writing common functionality, and assures consistency across all services, speed to market, and ease of knowledge transfer for new developers. 


We are continually looking to improve all our processes from intake to production.  Learning from experience allows progress, and we are currently investing resources in shoring up technical debt and putting standards around how scientists document their business and data understanding along with other project documentation.  We are thrilled about our current trajectory and the projects we are focused on. 

The final installment of this series will look to the future and our vision for the team; and will reflect on our early mistakes and what we would have done differently knowing what we know now.

I will be presenting on this topic at the Nashville Analytics Summit on Monday, September 9th. Hope to see you there!

Chris Morgan