Ready to start writing your Data Science statement of purpose? Well, it’s your lucky day. This article isn’t just a “how to” guide — it’s an object lesson in so many of the common anxieties grad applicants face every year:
- How do I write about overcoming obstacles (health, low grades, family death) as an undergraduate?
- How do I describe my career path (I’ve been away from school for awhile)?
- Should I mention MOOCs in my SOP?
- Can I get admitted with no research experience?
- Can I get admitted if I wasn’t a Data Science or CS major in undergrad?
The sample essay you’re about to read and model is a perfect answer to these questions. Why? Because the applicant, despite his circuitous background and previous academic struggles, earned admission to 6 of the best MSDS programs in the US.
To protect the author’s privacy, we won’t name the schools. But rest assured, it’s a “Who’s Who” list of sterling, fancy-pants universities that you’re definitely also considering. Thus, this isn’t just a brilliant data science statement of purpose — it’s a brilliant SOP in general. The author employed this framework successfully for both DS and CS programs, and honestly, ANY applicant in ANY field can use this essay as inspiration…
…and hopefully achieve the same wild success as my fascinating friend, Bennett.
As an applicant, Bennett ticked a lot of boxes:
- First-gen college student and child of immigrants;
- Undergrad Cognitive Science major at elite state university;
- Modest, less-than-perfect GPA;
- Multiple DS certifications with supplemental CS coursework (essentially self-taught);
- Online Executive MBA graduate;
- 4 years of post-undergrad work experience;
- Extensive work experience during undergrad;
- ZERO research experience
Some aspects of Bennett’s profile were fascinating. (He was an NSA analyst in undergrad!) Other parts were fairly normal. (No research, average GPA.)
What then made Bennett and his SOP so special? What made top MSDS programs excited to admit him?
The Structure of a Successful Data Science SOP
It goes without saying that Bennett used the SOP Starter Kit to outline his essay. That means he structured the paragraphs as follows:
- Introduction Frame Narrative – 1 paragraph (12% of word count)
- Why This Program – 2 paragraphs (23% of word count)
- Why I’m Qualified – 4 paragraphs (58% of word count…extremely long, but more on this later)
- Concluding Frame Narrative – 1 paragraph (7% of word count)
Before we read the actual essay, let’s examine these sections and see how you can mirror Bennett’s example in your own SOP.
- Introduction Frame Narrative
In the intro, Bennett describes his work as a software engineer. He gets specific. He tells us exactly what he does, and the company he does it for. Most importantly, he describes a moment when he discovered a new intellectual purpose at work:
“Thus, for the first time, I was able to personalize parameters in the pipeline for unaccounted customers. Learning the importance of context for efficient yet equitable automation, I found myself incredibly curious about data-modeling methodologies that can truly represent real-world situations.”
You should do the same as Bennett. Your intro should have some color, some life. It should allow us to see a real human being in there. But it MUST also introduce the sub-niche intellectual problems you hope to tackle in grad school. Chances are, these problems and this sub-niche will define your professional career afterward. They’re the hinge of your whole candidacy.
Common Question #1: “What if I don’t know which sub-niche I want to specialize in?”
Find one. (The SOP Starter Kit has an exercise that will help you figure this out.) Otherwise, you won’t be as competitive as you could be.
Common Question #2: “What if I don’t have an interesting moment (or moments) to write about?”
Stop lying to yourself. No matter where you are, no matter what you’ve done, there was a moment when you decided you needed a graduate degree. There is absolutely a subfield of data science that’s most interesting to you. There are undoubtedly specific applications, in specific industries, you want to work on in the future. How did you discover them?
Bennett wants to study representation of data minorities in ML models for the healthcare industry. That’s the work he wants to do in the future. What kind of work do you want to do in the future? When did you realize this?
That’s the story you tell in your Introduction.
- Why This Program
Either at the end of your introduction, or in the beginning of this new section, you’re going to include a Sentence of Purpose. It’s a thesis statement for your essay. Bennett’s looks like this:
“Through Gotham University’s Master’s program in Data Science, I hope to further explore how to enhance representation of data minorities in ML models, and thus ensure inclusive healthcare access for the customers I serve.”
The “Why This Program” section of your SOP provides all the evidence for how you’ll pursue this goal in grad school. It should take about 2 paragraphs. Which classes will you take? Which professors do you hope to work with? What will you study in your capstone project?
Let’s make this easy. Just complete the exercise in this article: How to Dominate Your SOP’s Why This Program Section. Trust me, it’s that easy! Then, you’re halfway done with your essay.
- Why I’m Qualified
This section of your SOP is the easiest to write. It’s your “greatest hits” list – all the proof that you’re a smart student. Everything you write here should support the argument that you’re going to succeed in grad school: your GPA, advanced classes you’ve taken, research experience, etc. It doesn’t have to be long, and shouldn’t include every menial detail of every project you’ve ever done. (That’s what the CV is for.)
Yet, if it shouldn’t be long…why did Bennett write 4 paragraphs?!
Typically, I’d yell and scream at an applicant who spends half the SOP talking about his past credentials. That’s what most applicants do, and why most get rejected.
But Bennett had a unique situation.
His career was wild and fascinating. He’d never formally studied Data Science. He’d even done an MBA. But he had taken lots of MOOCS and online certification courses (seriously, like 10+), he did have amazing experience as a software engineer, and he also had one bad undergrad semester he felt he needed to explain. Thus, he’s a very atypical applicant, and his background required a lot of explaining.
Unless you too have an MBA, 10+ MOOCs, and a completely unrelated major, then I suggest you keep your “Why I’m Qualified” section much shorter – 2 paragraphs is enough.
- Concluding Frame Narrative
If you have the previous sections in order, this final paragraph should write itself. Make sure to reemphasize the topical problem (your hopeful subfield) from the Intro Frame Narrative. Consider including a career goals statement. But in the end, this section should be easy to write.
That’s it. Four sections, tightly interwoven, all supporting the argument that you are going to be an A+ data science grad student. Now, let’s see how Bennett brought it altogether, so you can attempt to do the same.
A Brilliant Data Science Statement of Purpose
As a software engineer with WayneHealth Group, I maintain data pipelines and batch processing in the modernization team. In 2020, following a health check on existing infrastructure, I discovered that pipelines were delivering data too slowly to clients. After comparing our runtimes to industry standards, I pitched a project using open-source Apache Airflow to help automate pipelines and centralize patient data into a single workflow. However, when considering how to automate 35% of the data, I learned from the billing team how frequently bills are refinanced in our long-term elderly care programs. Thus, for the first time, I was able to personalize parameters in the pipeline for unaccounted customers. Learning the importance of context for efficient yet equitable automation, I found myself incredibly curious about data-modeling methodologies that can truly represent real-world situations.
Through Gotham University’s Master’s program in Data Science, I hope to further explore how to enhance representation of data minorities in ML models, and thus ensure inclusive healthcare access for the customers I serve. Earning my MBA at Metropolis University taught me how to coordinate the need for quantitative reasoning and human intuition through A/B testing, and I believe the MSDS program will build on that foundation. Mathematical Foundations in Computer Science, for example, will help me build real-time analytics dashboards that account for insurance claim data-entry errors through discrete probabilistic models. In the same vein, elective offerings such as Big Data Analytics and Artificial Intelligence will enable me to choose predictive models and evaluate their accuracy when applied to large data sets — particularly useful when predicting whether an insurance claim will necessitate revisions.
Resources like the IGNITE competition will also offer opportunities to collaborate on flexible models that solve real-world situations. Having worked on Apache Airflow implementation in WayneHealth, I understand how collaboration can play a key role in implementing a new idea. Having my IGNITE team’s project evaluated by MSDS professors, with their expertise in modular design and user experience, will only help me evaluate my own performance as I translate my education into functional healthcare applications. Thus, I am certain that Gotham’s MSDS program will prepare me to succeed in a team setting that balances many developer roles, while equipping me to better deliver sales pitches to investors.
Upon graduating, I endeavor to apply my education toward applied healthcare projects that focus on providing easy access to preventative care. Transparency is an integral part of healthcare access because it reduces the expenses and time necessary to find patient care. To help facilitate this transparency, I plan to transition into a Senior Data Scientist role in the Emerging Technologies Collaborative (ETC) at WayneHealth, hopefully working on projects that implement data-driven recommendations for our automated batch processes and servers. When storing vast amounts of patient data across different platforms, vulnerability patches and triage alerts often lead to reactive outcomes that can create downtime for end users. As a result, I seek to implement agentless server monitoring to, first, predict unscheduled outages for our billing and medical coverage systems, and second, recognize patterns in server behavior. Helping recognize outage patterns will not only help me identify problems beforehand, but also decipher the causes of live servers crashing. However, projects outside of WayneHealth excite me as well, including Amgen’s Crystal Bone algorithm which uses AI and machine-learning models to detect bones at risk for osteoporotic fractures. This project was the first tool I have ever seen that uses diagnostic codes sourced from WayneHealth electronic health records, and it inspired me to create my own model using EHR data. In the future, I hope to use EHR diagnostic codes to predict the cost of treatment for those prone to risk, as indicated by the algorithm.
Not only has my position as a software engineer equipped me with strong technical skills, but it has also given me the discipline to continuously learn what I do not yet know. On a project named Karra, an optical-character recognition engine which scans personal information from faxed hospital claim forms, I learned how to develop my own algorithms to calculate the coordinates of form fields to parse data. The technical skills I have gained, in tandem with the unwavering tenacity I developed in this position, will allow me to face any challenge that arises during the MSDS program.
To further prepare for the rigors of the MSDS program, I completed University of Pennsylvania Engineering MOOCs on Coursera, including the Introduction to Python and Java specialization taught by Brandon Krakowsky and the course Computational Thinking for Problem Solving by Susan Davidson. These MOOCs helped me comprehend important programming paradigms such as unit testing and debugging, which will help me test edge cases in MSDS course projects. Also, MOOCs from UC San Diego, such as Python for Data Science and Probability and Statistics in Data Science Using Python, enabled me to optimize data-cleansing techniques for better runtimes. The MCDS program’s Big Data Analytics course will culminate this self-learning effort, providing a solid theoretical understanding of the tools and techniques used to extract insights from large datasets.
While I have taken on a breadth of challenging problems in computer science and implemented solutions at WayneHealth, my prior undergraduate performance did not always reflect my best ability. Between Spring 2016 and Spring 2017, I experienced a personal health challenge that required substantial time away from the UC Coast City campus. I was further distracted by the realities of personally financing my education – working full-time for the National Security Agency (NSA) – while also suffering the loss of a close family member. Even as I struggled I knew the importance of higher education, and, advocating for my own success, I persisted. To strengthen my educational background, I enrolled in online courses and built coping mechanisms, such as managing my time between online courses and on-campus courses efficiently. In the end, these efforts helped me graduate early in the fall of 2018, and I plan to apply the same level of resilience throughout the rest of my academic and professional career.
As I grow increasingly aware of the intersection between ML and social computing, I am determined to study learning techniques such as principal component analysis, and to perform research in data organization/completeness. With my strong self-guided background in applied computer science, and my professional experience with ML and software development in the healthcare insurance industry, the practical knowledge I build at Gotham will help me make voices heard in the data we interact with in our daily lives.
What Makes This SOP Truly Special?
Some might argue that Bennett’s essay doesn’t fit the template described in the SOP Starter Kit. I disagree. The virtuosity of Bennett’s writing shows that the model is adaptable to all kinds of intellectual demands.
(In fact, he’s pointed out himself that the framework helped beautifully with his Computer Science SOPs, which should give confidence to anyone who may be still deciding between DS and CS.)
Personally, I love how Bennett began his Why I’m Qualified section with an expanded Career Goals Statement. It shows us, in painstaking detail, exactly what he’s going to achieve if the school admits him:
“Upon graduating, I endeavor to apply my education toward applied healthcare projects that focus on providing easy access to preventative care.”
There are real data problems in the healthcare insurance industry. Bennett is all-too-familiar with them. Few if any other applicants will ever be able to solve these problems the way he will. We know this because he tells us exactly what he’s going to do in his career afterward:
- Pursue a Senior Data Scientist role in his company;
- Automate batch processes and servers to predict unscheduled outages medical coverage systems;
- And use EHR diagnostic codes to predict treatment costs for high-risk patients.
In this way, Bennett’s expansive, thoughtful SOP makes certain that he isn’t just a boring applicant looking to acquire base knowledge in data science. He already has it! He got it for free from Coursera!
Instead, it shows that he’s deadly focused on his unique sub-niche — solving real data problems in the healthcare insurance industry — and will do everything it takes to succeed. Thus, when Bennett discusses the many obstacles he overcame in the past, we don’t worry about them. We have tremendous confidence in Bennett because he’s already succeeded. He’s already acquired great expertise. And he knows exactly what he needs to do to make an impact in the future.
Though the middle paragraphs are somewhat long, they never feel boring or clunky. They feel intelligent and interesting. Finally, when we get to the last paragraph, we can’t help feeling certain of one thing: “Wow, this guy is unstoppable.”
As you start planning your own data science statement of purpose, there’s one aspect of Bennett’s essay you should mimic. It’s not the MOOCs, his atypical background, or the obstacles he overcame. It’s this:
Bennett mapped out the intellectual problems he wants to study in grad school, and how he will address them pragmatically in his career afterward. It’s not a complex argument:
- In the last few years, I’ve grown fascinated with Problem X in Industry Z;
- At Gotham University, I plan to study Problem X in these specific ways;
- After graduating, I will be able to solve Problem X for companies in Industry Z;
- I know I’m capable of this because of my skills and record of success;
- Admission to Gotham is my immediate and necessary next step, so I hope we can begin solving these problems together.
I offer endless gratitude to Bennett for allowing me to share his story, his brilliant essay, and his resounding success. Data Science, Analytics, and Applied Statistics have become insanely competitive. But if you take the time to follow his example, you too can become a champion in the field, and start your journey toward solving unique problems that the world desperately needs you to solve.
Which data science problems do you plan to solve in grad school and beyond?