Machine learning vs data mining and harvesting – know what’s right for your web app!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Welcome to today’s episode of Buzzword Bingo! – Or, put another way, let’s demystify some confusing terminology that some developers and certainly the big consultancies will use to convince you that they are using secret and impenetrable dark-arts that justify a huge development team and an inflated bill.

Alongside IoT, Machine Learning (Artificial Intelligence, Deep Learning), Data Analytics, Data Science, Data Mining, Big Data, Data Pipelining and Data Harvesting are terms thrown around with wild abandon, often interchangeably, often by salesmen and often by salesmen who are parroting other salesmen. They typically have no real idea what they are talking about or that all these terms, although in a related field, mean something tangibly different.

If you have read other articles in this series, you may remember my BIG SECRET about IoT (TLDR – Most IoT devices are not directly connected to the Internet); well here is another BIG SECRET, not as short, but equally pithy.

A lot of people in technical sales are B.S. artists.

Their primary talents are relationship management and negotiation – Let’s not disparage these; they are proper and highly valuable. However, the capability intersection between groups with both strong interpersonal skills and strong commercial negotiation skills who also have strong technical skills is tiny (That’s a little bit of data science right there)

Nothing drives terror into the heart of technical salesman than a customer who knows more about the fundamentals than them (Trust me, I have sat on both sides of the table) – and, the amount of background knowledge required to move them from a position of comfortable BS to abject, gibbering, defensive truth is typically no more than a few hours of study (The secret of my career – always know a little bit more ). If you are kind, you move them early, rather than letting them hang themselves first.

Try this if you want to see them twitch: “You keep referring to this as a classification study, but we are talking about regression, aren't we ?? And by the way, that curve looks badly overfitted; how did you select your validation set ??”

It may well be that the meeting ends early and the very next meeting will include a proper specialist who actually knows enough to give an honest view of the subject, warts and all.

So let’s draw back the curtain on a few of the terms which you may hear flying across the table/zoom session. We are going to look briefly at Data Harvesting, Data Mining, and Machine Learning. To put all of these in context we should also look at Data Pipelines, but that is a subject for another day. Let’s build some foundations.


Machine Learning

This is a subset of the greater Artificial Intelligence domain and refers to using an algorithm that improves their performance automatically through exposure to (typically large) sets of labeled data – or, put another way, computer programs that learn by experience. As an example – you may want a video camera that can tell the difference between Cats and Dogs (This is a classical problem in AI courses) – For a human to write a program that accurately discriminates between cats and dogs would be HARD. However, if we have a big library of pictures of cats and dogs (which are labeled accurately as “Cat” or “Dog,” we can show a Neural Network each of these pictures in turn and end up with a Trained Classifier that can reliably perform this task. Embed this classifier into your camera and off you go.

Fun fact: Training a classifier is often VERY computing-intensive - running a trained classifier can often be done by low-performance devices on the edge - welcome to the Intelligent Internet of Things.


Data Harvesting

This is the process of acquiring data, often for training a machine learning model but occasionally for other processes. Typically data harvesting activities relate to extracting data from one or more third-party websites using web scraping techniques.

Data Mining

This is another loosely defined turn but is typically the activity of analyzing large data sets (for instance, those that may have been developed using a Data Harvesting process) to extract useful information. The term may also be conflated with “Analytics.” For example, This customer data set contains lots of zip codes. I wonder if we have any geographic clusters of customers whom we could target with an advertising campaign.

The field of data, its acquisition, and analysis can appear confusing, and again there are many charlatans out there. A trustworthy and experienced guide can help navigate the minefield and provide a set of good value services to extract the maximum value from your data assets.



About Umbric Data Services

Forget knowledge; data is power – especially when hooked up to custom web applications leveraging the latest in big data, machine learning, and AI to deliver profitable results for business.

Umbric Data Services combines the latest in tech with good old-fashioned customer service to deliver innovative, efficient software that drives productive business insight and revenues in the click of a button. Isn’t it time you worked smart, not hard? Find out more about how we help businesses to grow – visit today.

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore