Why Programmatic Labelling?
Programmatic labelling is a fast way to dramatically increase the size and quality of your datasets, thus improving your models.

Programmatic supervision overcomes cold-starts

The first labels on a new project are often the hardest to get. It tends to be when there is the most uncertainty and experimentation. Programmatic supervision lets you quickly get a decent first dataset that you can then use with tools like active learning to rapidly train and improve models.

A big weakly-supervised dataset is just as good as a small labelled dataset

One of the cool proofs in the original weak labelling paper shows that the performance of weak labelling gets better the more unlabelled data you have at the same rate that supervised learning gets better the more data you have!
Once you have a good set of labelling functions, the more unlabelled data you add the better the performance will be. Compare that to supervised learning where to get better performance you have to add more labelled data which might require expert time.
Normally when you train a machine learning model like logistic regression you can predict that your expected test-set error will shrink the more labelled data you have. You can say even more than this though. The size of your test set error gets smaller at rate that is
O(1/n)O(1/n)
, for
nn
labelled data points. So if you want to reduce your test error you have to label a lot of data and each extra data-point (selected at random) improves the model less and less.
When you train a machine learning model with weak supervision your expected test-set error still goes as
O(1/n)O(1/n)
but now
nn
is the number of unlabelled data points! This is fantastic because you can use a small amount of expert annotator time to come up with good rules and then apply it to a huge dataset.
Intuitively a small but very clean dataset is equivalent to a large noisy data set. Weak supervision gives you a very large but slightly noisy dataset.
A large noisy dataset can match the performance of a small clean dataset.

Programmatic supervision makes much better use of expert time

In NLP you often need domain expertise to provide the annotations and its also often the case that you just can't get a lot of domain experts' time.
For example if you want to extract information from legal documents, you somehow need to persuade lawyers to annotate data. Often the domain gets so specific that you don't just need any lawyer but a lawyer from a specific practice.
You need to be able to get more labels from a fixed amount of expert time. Weak supervision provides an answer for this. In a short session with a domain expert, you can brainstorm a lot of heuristic rules and then rely on the availability of unlabelled data to solve your problem.
Similar issues with expert availability come up in the medical domain, where again the only people who can do the annotation may be working full-time jobs. If you can only get a few hours a week of someone's time, weak labelling is a better use of that time than pure manual annotation.
This study showed that with weak supervision a single expert given 1 day was able to match the performance of a team of labellers working across several weeks.