How to turn a mass of text and emails into Data?
A few months ago I had a rock-solid idea that data is a nicely organized set of numbers and categorical values ordered and set in a table format.
Yeah, I have to confess: I was all wrong and I changed my mind…
Before my Data science training program, I heard about organizing text and even pictures and using them as a source of data. But all that sounded super complicated rocket science. Quantum chemistry looked easier to understand than how to extract data from thousands of random emails.
The truth is that it is not more complicated than writing my first VBA code 15 years ago. All you need is a mythical zoo and courage to start. From idea to chart it took 30 minutes to write a code (under 250 words) and see some funny statistics about my spam folder’s content (yeah, I used the ads and spam emails). And the highlight: the top performing is “Important” but “Please” got the fourth place behind “Shop”. Not too bad.
Step 1: Check your text, emails, notes and reports
Have you got a range of notes, reports and email communication?
More than enough! It’s all about being creative at the beginning and experiment with your available information. You ask the IT team to download the emails form the engineering team and start the fun part.
Wish to elevate to the next level?
Create an email template (standardize your communication).
Data Privacy and Legal Disclosure
Always check with the HR department what is in your team’s job contract. There should be some info about how their data (including their emails) can be used by the company. Your project must comply this or you need to have an open, honest and positive discussion about the goals and the scope of the data project.
Find the balance
Don’t forget to discuss with your team that you will use the emails for what purpose – if there is a main concern, brainstorm together how the mails can be used, and set a cut off date from when they are happy to contribute. Also, you can put in the subject line an ID word for “private” or “off-business” messages that can be used as a negative filter. Encourage open and transparent communication.
Step 2: Get your Code
I have Anaconda, where I used Jupyter to write my Python code, using Pandas and BeutifulSoup – felt like a day-out in a zoo LOL.
Far not as complicated than it sounds indeed… just import, clean and parse the data, then display the results.
The most important steps to consider:
- Find a platform and programming language in which your team has knowledge or is willing to learn the basics.
- Dedicate folder and download your raw data: reports, emails, etc.
- Write the code according to your needs: you can create a web-based dashboard to see the visualization or an simply an on-demand code to run. But most importantly, define what is the output you wish to see
Step 3: Understand your data
Here we are. Drawing conclusions, understanding the results, and accepting if it is not what you have expected.
In this case, I visualized the results in a word cloud. Great tool to have a weekly report of the keywords: the hot topics. You can also retrieve a list with the top 50 words and their occurrence.
Already standardized your reports? Great, create data tables, and analyze the data without the time-consuming copy-paste. Dedicating well defined “fields” in your email to specific information can be a game changer:
- Faster to report an event
- More clarity in the communication
- Super easy data analysis
Do you have a question or need help?
Here to Back You Up – Have a question or facing a challenge? Send us your question, and we will do our best to answer promptly. Need more than a quick chat for your complex issue? Reach out for a complimentary discovery meeting, where we can delve into how Clover Consulting is poised to assist you.