Good evening everyone, and thank you for joining us for tonight's BCVA webinar. My name is Sarah Peterson from BC AVA board, and I'll be chairing the webinar tonight. Our speaker, Chris is happy to remain online for questions, so please type any that you may have in the Q&A box during the webinar, and I'll save your questions for the end of the presentation.
If you have any technical difficulties, then we've got Peter from the webinar vet on hand, so please let us know by using the Q&A box and we'll do our best to assist you. If you can't see the Q&A box, then if you move your mouse, the taskbar should become visible at the bottom of the screen. So it's my pleasure to introduce Chris Hudson as tonight's speaker.
Chris qualified from Bristol Vet School in 2002 and has worked in vet practise and academia ever since. He holds the RCVS certificate and Diploma in Cattle Health and production, and is also an RCVS recognised specialist in this area, as well as holding a PhD for work on factors associated with fertility in the UK dairy herds. He's got many years of experience of delivering consultancy to dairy herds across the UK around all aspects of herd health and efficiency.
Particular interests include measuring and monitoring dairy herd performance. His research and tests focus on delivering, deriving maximum value from routinely recorded farm data, and also using simulation to aid and decision making. So that makes Chris perfectly placed to talk about, our topic tonight, which is the big data revolution and the cattle vet.
Over to you, Chris. Thanks, Sara. Thanks very much for the, the kind intro.
You did a, a very good job of cut and pasting the cut and pasting my biography into your intro. I like it very much. So, good evening, everyone.
Thanks very much for, thanks very much for joining us. I seem to spend an awful lot of my time on, on video conference now, but at least you're spared having to look at my face and you can just focus on the, the slides instead. So this evening's talk, is around big data, and how that affects us as cattle practitioners.
So this is really just some thoughts about where we are at the moment, how we've got here, and where I think some of this stuff might be leading us in the future. I think in many ways, a lot of the, a lot of the things that people think of as being big, big data or And in, in other parts of society are very tied up with big data are actually things that we're quite used to doing, in many aspects of life as a, as a cattle vet. So you're probably already using far more big data skills than you, than you realise.
So we'll talk about some of those things. I'll hopefully give you a little bit of insight into some of the stuff that happens. Behind the scenes in big data.
So how, how, how sometimes the data itself gets turned into information, how sometimes that, that works well and sometimes it doesn't. So how do we get value from data? And at the end, we'll think about, where the future value is to us as cattle vets, in big data.
So what, what's in big data for us and for our farming clients? So just a, a few slides of introduction to start off with, . So what, what is big data is kind of a, a key initial question to address.
It's a perpetual disagreement between me and Martin Green, in our group, whether this should be what is big data or what are big data. Martin's a grammatical, grammatical purist and thinks that data should always be plural. What are big data just just sounds bizarre to me.
So people have, when, when people first came up with the concept of big data, a very early kind of landmark paper in this field defined what they called the four Vs that they considered to define something as being big data. So they, they, they kind of pair up in pairs for me. So, The first TVs are volume and velocity.
So these are very related to each other. Volume is just the absolute size of the data set, and velocity is how quickly it accumulates. So obviously, they're, they're, they're quite strongly related to each other.
So I think in the cattle world, we would definitely have to acknowledge that by most modern definitions, our data isn't that big in terms of volume. So even like the most heavily recorded herds in terms of raw data are very, very small compared to things that come under the heading of big data in, in other aspects of life. So it's rare for farms to have kind of more than Gigabytes worth of data, even going back to the whole history of the business and even including lots of sensor data and other things that take up quite a lot of space.
And yet in, in the bigger picture, big data terms, that's really very small. So compared to data generated by social media traffic and interactions and big image analysis data sets and big bioinformatic genetic type data sets, the data that we work with is actually not, not that big in absolute terms. But people realised fairly early on in, in the days of big data that trying to define it.
By the absolute number of gigabytes or terabytes or petabytes that you're looking at is actually not always that helpful because inherently, there are some industries that just do not generate that much data. There aren't that many data points. There aren't that many things you can collect about them.
I guess beef and sheep compared to dairy would be an example there in beef, there just are less, there are less occasions on which you can usefully collect information about an animal's performance compared to Compared to dairy, and also people quickly came to realise that volume is kind of mutable. So what was big data. 20 years ago is absolutely not big data now, and a lot of the early definitions of volume were really around the computing capacity to deal with it.
So initially, people thought of this as being kind of it's big data if it's too big for your computer to deal with, again, that changes massively over time. So using volume and velocity as, as really defining features of big data has become really problematic. And now, But there is, there isn't really a volume barrier to calling something big data.
The velosty thing, that's just a, an example up there just to give you an idea. So even a, a very heavily monitored cow, so a cow with, with sensors recording several different aspects of what she's doing every few seconds through the day, she's slowly going to produce a few megabytes worth of raw sensor data per week. So it's not, you know, these are not huge volumes.
And for, for us as clinicians, that's actually really helpful. Is not having to use some of the methods for dealing with and, and analysing big data, genuinely big data. Not having to do that stuff makes it much more accessible and much more easy for us to do.
The other two of the, the original 4 Vs also kind of pair up quite nicely. So these, I think really in many ways are classic challenges of doing herd health work as a, As a cattle vet, so veracity refers to data quality. Data's got to be, accurate and reliable if it's going to be useful and if it's going to unlock value.
We know that we've been dealing with this for a very, very long time in, in the cattle industry. So, you know, classic data quality examples would be things like missing insemination records, in some herds milk recording data, for example, missing clinical event records on the recording of clinical events. These are things that I guess all of us probably struggle with to some extent from, from day to day.
And the final fear is, is variety. And again, this is, I, I imagine something many of us will definitely recognise. Data for one herd is commonly stored in several locations.
So for a typical dairy herd, 5 locations would be very common. So there, there may be stuff on dedicated herd management software that's probably the herd's kind of go to place for making lists and working out which cows are eligible to serve and that sort of stuff. There will often be.
Software associated with the milking plant as well, that's maybe recording slightly separate things. So it's probably recording yield and some other characteristics of each milking. The milk recording company will probably hold some data on that farm, and again, that will overlap but not be the same as what's held by the, the other places.
BCMS will obviously hold data about every farm, but fairly basic data. And in many cases, there can be stuff that's recorded on paper somewhere in the diary on the calendar that never makes it into any of those data sets. So dealing with very varied sources of data, is again, a challenge we've been wrestling with for a long time.
Anyone who's ever struggled to match up, ID numbers between different sources of data will probably recognise that one. So on the principle that 4Vs are a good way to define, to define big data, people rapidly set out to try and discover as many more fees that they could add into the picture as they possibly could. So, there's some other things that people have suggested are useful for, for definition, the need to use a kind of different or novel visualisations for bigger data.
So the concept that you can't just sit and look at a table of, you can't kind of eyeball in inverted commas this data very easily. You can't look at the raw data and draw conclusions in any meaningful way at all. You're going to need some way of turning that into a visualisation or some other way of analysing it to make it useful.
Value is a fairly obvious one. It's got to generate value both the person that generated the data and the person using the data if they're not the same. Volatility is normally taken to refer to kind of the time period for which the data is useful.
It's maybe not, not so relevant to lots of things in, in our world. Variability refers to the, the kind of potential for data to change in nature over time. And vulnerability actually probably is an important one for all of us.
So this normally means organisational vulnerability. So the, the risks to an organisation of holding and storing, particularly personal data relating to to other businesses or other individuals. So there's been a kind of ongoing evolution of people's perception of what counts as big data, and it's almost kind of got to the point where there's quite a big overlap between the use of the phrase big data and any sort of situation in which decision making is based on data.
So you hear this phrase, data-driven decisions quite a lot. Again, this is something we've been doing for a long time in the, the dairy industry. And actually, in many ways, the dairy industry was quite a long way ahead of the curve here.
We'll talk about precision agriculture and smart farming in a minute, but, when people talk about precision, they, they very commonly think of arable related examples, whereas in fact, milking, milking robots were an example of precision, absolutely an example of precision farming that was around a long time before anything very meaningful happened in, in terms of precision ag in the arable sector. So this is something, you know, there, there is a lot of overlap here with. With things that we're very used to working with.
So a key kind of concept in the world of big data is the idea of a data pipeline. So this is trying to set up a flow of data where data moves from one place to another and enables something useful to happen. And this is a relatively simple example of a data pipeline.
So this is the idea that you can take a copy of a herds herd management software data. Maybe alongside that add some information that comes from their milk recording company. And alongside that perhaps add some extra information that only exists in written or other other formats and merge these together to produce some analysis that's useful to you.
So again, this is something we've probably all been doing for a long time. There are some big challenges with this, but just the basic idea if we take the backup of the herd data. We import it into a system like Total that, which is designed to do this.
We add in milk recording data if that doesn't happen to be stored in the herds management software, and we may be adding some other stuff that we've manually entered off paper records on top of that. And between those three sources of data, those come up with something that's useful and analyzable and generate some value for the farmer. So So this is, is absolutely something we, we're all kind of very used to.
I guess it's kind of interesting to think about where this goes next. This is not a very automated data pipeline, so typically pipelines are generated to, kind of carry on working themselves once you've set the pipeline up. Stuff continues being collected automatically and new analysis gets generated automatically, whereas this, this system would be an example of a very kind of manual hands-on data pipeline.
But the principles the same. Just before we finish the, just before we finish the introduction section, I thought it might be good just to unpack a little bit of the jargon that people use in this field, because big data is a, a notoriously jargon-filled field, and you'll hear most of these things, quite a lot outside of the, the livestock sector. So all of these really are, maybe with the exception of Precision Ag, they may be all things that you will come across in wider life, talked about in other parts of the media as well.
So artificial intelligence is really, I mean, I've tried, I've tried to just think about what I think is a, a simple and relevant definition here. These are not the definitions that you'll find if you go and look in computer science books. They're not even the definitions you'll find if you go and look for, review articles on big data in farming.
I've put a reference into one that I think is really nice at the end. If you're interested in more technical ideas around these, then, then definitely chase those things up. So for me, artificial intelligence is really just a branch of computer science that looks at trying to apply mechanisms from the way humans think and humans process thoughts and make decisions to solve problems using computers.
So it's just a, a branch of computing science that try and kind of tries to take aspects of human thinking and use them to address computer science problems. Within that, it's probably fair to say machine learning sits, sits somewhere within artificial intelligence. So it's kind of a, a, a tool within the artificial intelligence toolbox, if you like, that's mostly used for trying to tackle classification or pattern recognition or prediction problems.
So very classical examples of machine learning would be things like image recognition. So Google's algorithms that can look at a, a photograph and come up with a list of the things that are in the photograph. Those are, those are kind of classic machine learning algorithms.
If anyone's used the, the one of those that automatically captions your photos in PowerPoint, it comes up with some very bad guesses and some very amusing misinterpretations of, of kind of cow and dairy farm-related photos. So algorithms are something you'll hear talked about a lot, and an algorithm is often the output of a machine learning process. So when people refer to machine learning algorithms, they normally mean an algorithm that's been developed by some sort of machine learning process.
So an algorithm is really just A set of kind of sequential steps that convert one or more inputs to an output. So it might be, I don't know if we're thinking about a cow example, very obvious example of this would be, algorithms that convert raw activity monitor, sensor data into a, a nice neat list of cows that might be in heat today. So those will be taking things like this cow's, I don't know, average amount of activity over the last 24 hours, might feed in.
Some background activity from other members of the herd. So what's the average, what's her average herd mate done in the last 24 hours. And then it will just come to a, it might come to a decision point where it says, OK, well, this value is more than 25, we'll say she's on the alert list and if it's less than 25, she's not on the list.
So that would be a very simple algorithm. You start with your activity number, you may be divided by the average change in the herd over the last 24 hours and set a threshold to say, OK, above this, she's bullying and below this, she's not bullying. Obviously, algorithms, like the algorithms that actually drive activity monitor heat detection systems are wildly more complex than, than that suggests, but the basic principles are exactly the same.
The internet of things is, is something quite different. So actually really hard to define what the Internet of Things means, although it's, it's very widely talked about. I think probably the best way to think of it is just that it means the connection of devices to the internet that aren't primarily designed for connecting to the internet.
So devices like your PC, laptop, you might be watching on now, your iPad, your phone, connecting to the internet is kind of one of their main tasks. That's, that's mostly what they're designed to do. The Internet of things is really the concept of connecting up lots of stuff that isn't designed for that.
So it would encompass things like connected cars. Connected tractors, connected toasters, connected fridges, you know, anything that, anything that is primarily designed for a different task but which you can connect to the internet, can be thought of as part of this, this internet of things concept. Precision agriculture is another, another kind of related topic.
Again, a, a, a really nice simple decision of that is just trying to improve agri agricultural outcomes by making data-driven decisions that are much more kind of fine detailed, granular level. So the, the kind of canonical example of precision agriculture, as I mentioned a second ago, probably comes from arable. And this is the idea that instead of saying, OK, well, well, that 15 hectare field didn't yield particularly well this year, we'll apply more chemical to it.
This year, in an attempt to, to kind of address that and improve yield, instead of doing that, you use data that comes from a much more granular level. So that could be, data that's being collected by the combine as last year's crops harvested. Measuring density of density of production as it moves across the field.
So you can use that to produce a kind of productivity map. There might be some drone-based image analysis in there as well. And you're using that to, to, to be able to apply chemical in a much more precise way.
So it might not, it might not be that the whole field needs more chemical, it might be specific areas do, and, and you can use less in other places. Disruptive technology, is the last thing I thought would be interesting and relevant to mention here. So again, a nice simple definition of this is any innovation that any technological innovation that comes along in a field that forces the other players in that sector to change their offer.
So whether that means they have to add new features or do something different or do something better to catch up, or whether it means that something comes along and offers similar performance for a much lower price, so the other players, the other competitors have to reduce their price. It's anything that causes kind of a, a step change big enough to disrupt an existing market. And you can definitely imagine where, where some of these could, could come from.
So examples of disruptive tech from other, other bits of the world. Would, would be things like, Airbnb. So something that's come along and completely radically altered the way people, the way people think about, The way people think about the accommodation market, ride-hailing gaps, Uber and so on would be another really good example.
OK, so that's it for the, the jargon. Martin's preference or plurality is one out on this, in this section clearly. So I thought before we kind of move on to dig into some more big data-related stuff, it might just be interesting to think about where data typically lives for a typical cattle farm.
So I guess this is where are the data in the sense that where are the multiple sources of data. OK, so the left hand of this picture, I've tried to collect together stuff that applies to both dairy and beef. The right hand there's a couple of extra things that that are fairly specific to, to dairy farms.
So there's obviously a couple of things in here that are mandatory legal requirements. So they will be, they will be They will exist for any dairy or beef herds. So those would be BCMS movement data, that deaths and movements and medicine records.
Those, those will exist for every, for every farm. BCMS data is generally relatively accessible. But obviously has a relatively limited scope.
So there, there is some very crude analysis you can do using BCMS data, but by itself, that doesn't give you an awful lot of insight. Actually, if you have both of those mandatory things, if you've also got medicine records for a herd, It starts to be some quite nice stuff you, you then can do. So I guess particularly for beef herds who are particularly unlikely to have much else.
There are, there is definitely scope for setting up medicine's records, say, using a, a recording template, standard kind of. Standardised format for recording medicines used in Excel, you can then merge that with BCMS data, and look at things like treatment rates, mortality rates, and so on. If you're lucky, your herd might have some herd management software.
So a kind of piece of software that they've purchased specifically to do the job of helping them keep track of and manage their herd. So in dairy, this is becoming really increasingly common. There's, there's fewer and fewer kind of bigger dairy herds that don't use any herd management software.
In the beef suckler sector, it's still, it's still very common for people not to, not to use any. The processor will also have some information. So in dairying, that would be, that would be whoever buys the farmers' milk.
The information they have is often not enormously useful to you, . In that sense, so they will have kind of collection volumes and particularly milk quality related things. So may, that may be your only source of information for back to scan monitoring, but generally speaking, the farmer will see and tell you if they have a problem with, with back to scan.
But beyond that, I don't routinely use kind of processor data much for dairy. Whereas in contrast, beef, beef actually, the processors often will have some quite useful, quite useful information because it is by the, by the nature of the business that's collected at individual animal level, and you will generally at least have kind of dead carcass weight and maybe hopefully some, some confirmation things, and condemnations. I guess also in the, the process of it, we could lump in, other dedicated supply chains, so supermarkets will also have useful data about the, particularly the, the dairy farms and sometimes beef farms as well in their, in their dedicated supply pools if they, if they have those.
Other ad hoc data, so medicines records may well fall under this. They could be in the herd management software, or they could be some, they could be somewhere else. And waste data, is something else that's often kind of maybe collected, particularly in beef businesses, but doesn't end up going into her management software.
So it's often something else that might exist and, and you might not know about. And we've then got sensor data, and broadly, you could split that up into on animal sensor data. So activity monitors, rumination sensors, maybe room temperature bolus, probably the most, the most common 3.
And environmental sensors as well. So a surprising number of farms, if you ask, will have temperature and humidity monitoring in buildings, largely because that's now very cheap. So, USB data logger for temperature and humidity, really, really cheap on Amazon.
It's surprising how many farmers off their own bat have those, And again, again, you can, you can often get hold of that data fairly easily. And then in dairy herds, you're likely to have some, some data held with their milk recording organisation, and usually some data coming from the milking plant or the robots if it's a robot herd. So a slightly more complicated example of a data pipeline, this is just an example of, of the way that data might flow around a relatively typical dairy farm.
So this wouldn't be a particularly large number of different systems to exist on one farm. This diagram is possibly a bit more joined up than these examples often are. But for example, you might have a, a herd with management software where that's kind of their, their standard go to place.
That's their, their main central master database. Often you can enter data into those via a smartphone app. And actually that data goes both ways, so you can usually also look at data for individual animals on the app as you wander around them.
So that will feed into the herd management software. That might also receive some processed data from a sensor system. So if it's dedicated herd management software, it's probably not directly receiving kind of raw data from sensors and interpreting it.
That sensor data is probably collected from the sensors, processed and interpreted by another piece of soft, another piece of software, sometimes on a different computer, and then some outcomes of that, so things like heat alerts or rumination alerts might get passed back to the master herd management software. There would often be milking data imported from the milking plant. So again, that's commonly another piece of software running on the same or on a different computer, communicates with the managing the management software, and also kind of runs and manages the, the milking plant.
So typically the herd management software would import some milking data from that system and some event data could move in either direction. If it's a herd that milk records, they will probably be exporting their data from the management software to the milk recording company once a month to give the milk recording company, kind of scaffolding to hang their milk recording results on. And hopefully, the 4th, like the resulting milk recording results, so each cow's cell count, butter fat and protein and yield will come back and go back into the herd management software.
Most of these systems will also have some interaction with BCMS and most of them will be able to, to kind of automatically upload identity and movement information, and fulfil that kind of statutory requirement. And then we can obviously come along and do some analysis based on that data. So the, the two kind of routes to that are either that you use some, you use your own analysis software on your own system.
So something along the lines of total that where you can take data from a variety of different sources, and Do analytics on it that looks kind of consistent. So you get the same graphs to look at each time. It's calculating things in the same way, it's using its own definitions, those sorts of things.
More often you can use the herd management software to, to look at some, analytics as well. So any herd management software system will have some capability to, to analyse the data that it stores. They are kind of designed to do that to varying degrees, I would say.
So for some home management softwares, analytics is really not primarily what they were built for and they're fairly basic. For others, they, they do a, they do a pretty good job. I think for us as cattle vets, the thing that, the thing that starts to get difficult quite quickly here is keeping abreast of, of all the different herd management softwares.
So that this is, as a clinician, this is probably the biggest part of my job. I don't, I don't do any proper hands-on cow vetting these days. I only do herd health type stuff and I, I do quite a lot of data-related herd health.
And I would say there are only like 3 or 4 of the herd management software systems that I'm kind of Pretty familiar with and fluent with, there are still plenty that there are at least, at least as many again that I come across on farm where I wouldn't be particularly confident to sit down and do some analysis and know exactly what it was that I was, I was looking at. And there are definitely some, even outside of that familiarity thing, there are definitely some advantages with exporting data and looking at it in your own software, not least that you have consistent definitions of things. So things are calculated in the same way.
21 day pre rate, fantastically useful fertility metric would be a very good example of something that can be quite different if you calculate it using a different, different software system. And that's because it's quite sensitive to how you handle things like Cows that are eligible for a small part of the 21 day period. OK.
So just while we're talking about where data lives in the UK, it would be fairly silly not to talk about the livestock information programmes. So this is kind of the replacement for BCMS that's being designed and, and built at the moment with input from the traceability design user group. There is lots of info on the internet to find out about the livestock information programme.
The idea is that the, the resulting product, the livestock information service or list. Is going to be, it will do everything that BCMS does, but we'll leave the door open to do a lot more as well. So probably out of the box, it will have some additional functionality that's not already there in BCMS and it will join up some more sources of data that aren't currently joined.
The thing that they're very keen to highlight is that their vision is that it stays extensible and they can, they absolutely leave the door open for more information to, to kind of come into the livestock information service. So for me, I don't know. I don't, I'd be surprised if this is a kind of immediate game changer for us as cattle vets, but there are definitely some interesting possibilities with, with where this could go in the, in the future.
As with many, many aspects of data. Permissions and consent and data protection are an absolute minefield in this area. So I think if I had to guess one thing that would hold this project back, it could well be that.
OK, so a few thoughts around kind of how big data can help us. And I guess first and most obviously, with this section, I'm just going to highlight a few odds and ends of, of things that are interesting to think about and know about here. Fairly obviously, data, particularly big data is no use without techniques for analysing it.
So as we said earlier, you know, this is not a world in which you can just sit down and look at a big Excel spreadsheet by eye and work out what's, what's happening, You can derive a lot of value from data with good visualisations, so Being able to come up with, with good ways to represent data graphically can be really useful, and often that is largely down to software companies to, to develop. It's a nice article from Neil Hosins and Co and in practise, a year or two ago now that, that looks at some, some kind of issues around graphical representation of, of dairy herd data. But very often the use, the more kind of classically big data use cases often relate to detection or prediction of an outcome.
So in, in this sort of area, the sort of statistical models that we've been used to using to analyse dairy herd data, so even the sorts of things that I was using lots back when I first started in Nottingham 10 years ago, lots of the statistical models that I used in my PhD focus on kind of interpretability and understanding what's happening. So they're relatively straightforward to understand. They're fairly transparent in terms of how they, how they generate predictions and how they generate results.
And you can kind of say, you can, you can relatively straightforwardly say what they mean. So they would be things like, you know, for every, for every additional 1000 litres of yield the cow produces in the lactation, a chance of getting pregnant per 21 days goes down by 10%. So that they're designed to be interpretable, but actually, it turns out those don't tend to perform particularly well when it comes to predicting outcomes, particularly based on data that, that, that they've not seen before.
So often, if you do those well, you can often predict back your own data pretty well. But commonly they, they don't perform particularly brilliantly when you try and use them to predict what would happen if, if you have some new data. So if we're trying to use, Sensor technology for, for example, as a, as a way of predicting or detecting something.
We're going to need something that is predictive outside of the data that it, the data that it's seen before. So there's this idea of a trade-off between interpretability, so linear and logistic regression of the kind of standard statistical ways to, to take multiple different inputs and predict an output. Those have the advantages that I've just mentioned, but often don't perform particularly real well in real world, in real world prediction situations.
On the flip side of that are a variety of different machine learning techniques, Which tend to perform much better at predictions, but are actually much more difficult to, much more difficult to interpret. So certainly by the time you get down to things like neural networks and deep learning, those are, Essentially black boxes. There is no way of knowing.
Well, there's very, there are ways of trying to understand, but there's not really any way of knowing exactly how your algorithm is, is coming to its predictions. You just get the predictions. So it's really important to try and test those work.
So the, the kind of classic examples here are your, your machine learning algorithm is trying to detect features within your data that That are likely to predict the outcome. So if you're trying to design a machine learning algorithm that determines whether a photograph is a tiger or not, your algorithm might include features like, does the thing in the photograph have stripes? Does it, is it, is it in an aggressive posture?
And obviously, that, that doesn't get you the right answer, absolutely 100% of the time. So it's really important to try and test how well your, your machine learning algorithm works. The, the basic way that you make these, is that you start with a set of what, what people call training data.
People like to use very, Kind of human thought related terms for this, not, not always for good reason. So you use your training data to, to train your models. So that really just means analyse the data and allow the machine learning process to try and pick the features of your data, that best predict an outcome.
That produces an algorithm for you. Really importantly, you then give your, you then give your algorithm some new data to look at. So this is data that's separate, it's not part of the training data set.
Your algorithm hasn't kind of seen it in heavy inverted commas before, and you assess how well your algorithm predicts, outcomes based on your, based on your new test data. And there's lots of different ways to, to do this, which probably aren't very interesting or exciting for you to know about. Once you've, you've got to that stage, you can then generally tweak a large number of different things about your machine learning algorithm that, that help you, make it perform and predict better.
And you can then deploy your algorithm and, and roll that out however you want to. So normally, that means incorporating it in some software, or maybe using a web type interface to, to get it out there. So it's really important to understand how well an algorithm works.
And again, there are quite a big number of metrics that can be used to measure how well a model performs. So some of these are going to be very familiar, I imagine, to people. So things like sensitivity, and negative predictive value NPV, kind of about how good, how good a, a model is at detecting positives, specificity and PPV or positive predictive value.
Are largely about how many of those positives are false positives. And I, I guess it's probably worth saying that their relative importance may be different for different situations. So if you're thinking about A standard kind of detection type algorithm, so like heat detection activity monitors yet again would be a good example.
The characteristics that are probably most important to you there are sensitivity and PPV, so. The, the, the positive focused characteristics, if you like, sensitivity would be what proportion of, what proportion of heats are being detected and PPV is what proportion of heat alerts are actually correct. And those are, those are fairly intuitively, instinctively the things you want to know.
The difficulty is that those, those four metrics or, or those pairs of metrics, really sensitivity and MPV and specificity and PPV kind of measured different aspects and how well your, how well your algorithm or model performs is kind of a combination of both of those. So the area under the rock curve, top right one there, and Capa. Are both just two different ways of trying to combine those things together and come up with a, a value that kind of encompasses both of them.
But very commonly, the metric that's reported as a headline is accuracy. So accuracy is Just worth unpacking a little bit more because you will, you will often come across, particularly kind of claims about how accurate something is. So accuracy as a metric here means something very simple.
It's just the proportion of, of data points which are correctly predicted by the model. So it's pretty simple to understand. You can absolutely see why this is a very widely reported metric, and often it's, it's fine.
But I would just be very aware that accuracy can be quite misleading where the outcome that you're predicting is quite unbalanced, very common or very rare. So if most of your kind of data points are ones or most of them are zeros, accuracy can actually be quite a bad measure of how well your algorithms working. So again, if we yet again go back to the example of an automated heat detection system with an accuracy of 90%, on the face of it, that sounds pretty good.
But if we think about what proportion of time carriers are in heat for, that could be for the sake of argument, 20, you know, 2 days out of the 22 day cycle. Actually, if we just had a model that said, no, this cow's not in heat, and it just said that all the time, so it's algorithm was, it's algorithm was just to completely ignore all of the data and all of the information that it has access to and just say, no, she's not in heat. Actually, that algorithm will be correct 90% of the time because most cows are not in heat, at least 90% of the time.
So accuracy by itself isn't always a particularly good measure. So the other things that I, I mentioned on the last slide, so things like CAA, balanced accuracy is, is a relatively simple kind of correction of accuracy to account for this problem. Those are very important to, to look for, but very commonly, particularly in marketing material, the accuracy is the only thing reported.
So just to give you an example of, of how that stuff works in, in practise, this is just an example of, of, A more kind of typical, a data pipeline that a data scientist would probably recognise more readily. So we might start with some data that comes from a sensor. So this could be raw activity data, which is kind of activity on 3 axes.
That will generally then get pre-processed. So in those raw activity units will get conversed in some sort of, Some sort of activity unit to measure. So you've got a, a simple kind of numeric measure of relative relatively how active is this cow hour by hour.
You might then do something like smoothing using a rolling means, so how's activity commonly goes up and down. Quite, quite a lot inevitably. So you might smooth that out over the last kind of 10 hours to give you a figure of more, more reflective of, of what the cow's been up to over some sort of hopefully biologically meaningful time scale.
You might then get some kind of ability to visualise that in a simplified format. So you might just get a, a bar chart with, Activity units for a particular code day by day. And that will then also get fed into some sort of machine learning-based algorithm.
So it's a machine learning algorithm in the sense that some sort of machine learning process was used to create the algorithm. That will generally also have some additional data going into it. So the activity data for the rest of the herd might influence the prediction for this cow.
So to account for things like moving cows around paddocks, if they all walk further between more milking and the next milking, then most systems will realise the baseline activity of the whole herd's gone up and they won't give you an alert for every herd. Some, sometimes that happens, but it's generally not supposed to. And there might well be other sources of data that feed into this as well.
So by merging, aggregating different sources of data, you can often make your predictions more accurate. So for, east risk detection, you might also be feeding in things like milk yield or milk temperature or rumination, or potentially the cow's fertility event history and that lactation. So is it, you know, is she 21 days since her last Eastres?
And that will then generate you an alert list of cows that, that might be in heat today. So the ones that kind of meet the, the threshold for that. So this is very existing mature tech.
These systems have been around for a long time, although I would say this is still quite rapidly evolving. So these change quite quickly, and we'll, we'll maybe just come back to that idea in a minute. At the moment, I would say it's fair to say we still don't get the, we're, we're quite a long way off hitting, getting near the flatter part of the top of the curve, unfortunate terminology at the moment, but we're probably Still a fairly long way off hitting the, the point where the incremental improvements in our algorithms for detecting this, start to get smaller.
So we've probably still got a fair bit of progress to make here in terms of making the best use of that sensor data. So where might this fit together into adding value for us and our clients in the future? Well, Sensotech and automation is a, a very obvious one to some extent that's already out there.
I would just reiterate what I just said, this is going to continue to get better and better as time goes by, and I suspect most of us have seen that. I think, results from heat detection systems tend to just get better and better with newer and newer systems over time. So I've worked with plenty of herds that we've just put in.
A newer version of the same thing, and it, and it's providing much better results for them. Data hubs, I think, is something that's going to become maybe more of a feature. So the idea that there is some sort of automated data pipeline that aggregates data for, for a farm from different sources, so the livestock Information Service could end up being one of these, so it certainly will be a data hub, whether it's a data hub that, allows us to do the sort of analysis that we're typically interested in, probably won't be to start off with, but the door is open to that happening in the long term potentially.
I think syndromic surveillance is a, a really interesting idea here and it's certainly some, an idea that the APHA are very interested in. So syndromic surveillance is using routinely collected data to try and, try and effectively do disease surveillance. So again, the, the classic non-veterinary example of this is, a paper from a, an American group, quite a few years ago now that used, Google search data to predict influenza.
Outbreaks in particular areas of the US. So they found that by, by monitoring people's search, search activity, so people searching for things like What's a normal temperature or how do I visit my GP or what does flu look like or whatever, those things are able to, to really accurately predict influenza outbreaks, substantially ahead of kind of primary, primary healthcare related things. So there is obviously quite, quite big potential for this here if we're able to join up enough, enough sources of data.
And there's some nice examples from elsewhere in the continent where actually people have been pretty successful at doing that, where they have a national database that, that, that incorporates enough elements. So there's some really nice stuff on, Detecting the emergence of Bluetongue, from the, the group at I in Nantes in France. So I think some more of that will come and to some extent, we can use that even at, at practise level.
So some of that is just about kind of detecting anomalies. You could definitely imagine getting to a point where your herds are kind of monitored and you get an alert that says, OK, this, this feature is becoming more common, you need to investigate it. And finally, big data obviously also has potential to, to add value for research.
So this is, you'll be relieved to hear your final example of a data pipeline. This is our research dairy at Nottingham. So we have, I think, 10 or 11 different sources of data that are all automatically, imported into a central single database.
Amalgamated and merged together and get kind of a, a certain amount of pre-processing that happens so that anyone in our team can come along and, and query that database and get out the information that's relevant to their, to their projects. So some of that is kind of on animal, real-time location type data. Some of that is, completely off animal internal environmental sensors, and some of that comes from other, other things completely.
So, so kind of cloud-based weather station's data. And then that, that allows it to be relatively easy for members of the team to come along and, and try and come up with innovative ideas to visualise stuff. So these are just some visualisations of, of the real-time location data.
OK, so just to finish off, I thought it'd be interesting just to consider kind of what the role of the vet might look like here. So, what can we help with here? What can we do in terms of making the most of big data, and where is the value to us and our business.
So I've kind of come up with 4 ideas here. Obviously, there are lots more that you can think of. So one of one of these is around helping clients get the best from From the systems that they've invested in.
So I think as vets, we have a really unique kind of combined understanding of Biology, epidemiology, and to the extent we need it, probably the technology. So that puts us really well placed to, to try and help farmers make sure that they get the best from these systems. Again, to go back to the, the same example, like a broken record, but I'm sure you will all have had the experience of, of, you know, two herds that buy the same heat detection activity monitor system at roughly at the same time and have wildly different outcomes.
We all know what the reasons for that are, very commonly things around the environment or for health, had to get that in for Sara somewhere and take that box off now. That, they're commonly things that we're, we're kind of used to dealing with, but we're very well placed to help farmers understand and deal with those. So things like the, the concept of lowering the threshold for, for a heat alert in parallel with using some milk progesterone testing to screen out false positives.
Those are ideas that we're really well placed to understand and come up with and communicate to farmers, and the things that can really make a big difference to how much farmers, how much value farmers derive from some of this stuff. And on a similar theme, we're also a really good source of impartial advice for clients that are considering investing in this sort of thing. So again, yet again, to stick with the Activity monitor example, there are other examples of sensor tech, I promise.
I continuously come back to this one. But there are lots and lots of different systems out there. It's actually very difficult to design a good study that gives you a useful amount of insight into how they perform relative to each other, and there's lots of reasons for that.
So there, it's actually, You know, the, these things are going to vary quite a lot from environment to environment. So the, the JDS paper that I've put up there is actually a really nice study that only really shows you how a set of systems at a point in time compared to each other on a, on a specific research dairy on a not particularly massive number of cows. It's very likely that different systems do better in different situations, for example, and for most of these systems, their algorithms will, will get updated as things go along.
The software will get updated. Some of them, some of them are apparently update based on the farm's own data, kind of, kind of as they go. So it's actually really difficult to come up with, with a good kind of comparison between systems and often they just get better as, as you go by.
And for, for me, one of the really big things we can help with here is that we understand the wider economic context of the farm. So we know that if they spend, you know, 30,000 pounds on this particular flashy sensor technology system, that's 30,000 pounds, they're not going to spend on building a new calf shed or putting up a new transition time management facility or any of those other things. And often, because Sensor tech looks cool and sexy, and generally has really good marketing teams behind it.
You know, people are often, I, I'm. You know, commonly been surprised by clients kind of coming to me and saying, oh, yes, I, I, I'm really interested in buying this, or also quite commonly because I have lots of clients that they see very often, quite commonly clients that I go to like 12 months down the line and suddenly find they've spent a lot of money on something. And actually you often know that that money might have been better spent, better spent somewhere else on the farm just by, you know, essentially your potential role here is to act as the shiny marketing team for a new calf shed or just some other kind of simple things.
But equally, these, these systems can unlock massive value and can have massive cost benefits. So again, we're, we're, we should be really well placed to help clients out with that. Where else can it, where else can it help us?
I mean, effectively, conventional herd health skills are, are really a form of data-driven decision making. So the more automated data hubs happen, the easier it gets to access data, the more detailed data we can get at and the better ways we've got of analysing it. The more value we can add for our, our clients there and the more, the more we can charge them for doing it.
So the better and better we get at doing this, the better we can monetize the, the value of, of that activity for us as vets and the bigger the economic benefits we can unlock for our, for our, our clients. And over time you could definitely imagine some more . Automation of that coming along.
So for anyone that's, anyone that's aware of the Quarter Pro project, very enthusiastic looking James in the corner there promoting it on YouTube if you want to find out about it. Part, part of that, which is kind of the AHDB's mastites control plan light, part of that is an, an automatic pattern analysis tool that will take data from a, a herd and give you a suggestion of which is the most likely, the most likely diagnosis for where mastitis is mostly coming from in that herd. So that, that's actually a hand-built algorithm, and we've done some work in our group looking at how accurately machine learning algorithms can predict kind of compared to that.
And finally, there's often big value to be unlocked in in practise data, so things like highlighting farms that are useful to benefit from consultancy advice by, for example, using antimicrobial sales data benchmarking, or actually benchmarking client data used in herd health plan reviews can often be quite useful, . As a way of identifying clients that, that you could target for trying to sell some advice time to, or working out, you know, where should you as a practise be focusing, what should your next kind of preventive advice type campaign be? And this can definitely include beef, beef farms as well.
And there are obviously some, some mature products already out there in the market that, that are specifically designed to help you do this. And again, as, as time goes by, those, those will get more and more useful. OK, that's all I've got time for.
I think I've not left many, not left an enormous amount of time for questions, but I don't think there are too many. So a, a few thanks to the other people in, in my group that you can see up there. The imprac, my own in practise article is the basis of this talk, really.
So, they cover a lot of the same ground. And I mentioned back at the start this Wolfo review of, precision agriculture and big data and farming. It's a really nice.
Slightly more technical review. And just while I'm here, I'll also mention, her health toolkit, because we're on a, a kind of data theme. This is just a website where you can go, which we, we're using as a kind of front end for some of our research findings.
So this allows you to look at things like, Calf milk replacer, calculations, antimicrobial benchmarking, cow living space calculations, and it's essentially just somewhere where, where you can do some kind of simple analysis and get some, some visualisations and some benchmarking back. Worth keeping an eye on that because we will continue to add more stuff to it as we, as we generate more findings. OK, that's all I've got time for for the time being.
So back to you, Sara, I guess. Thanks very much, Chris. We're, we're not gonna let you off, just yet, because we have got questions, questions coming in.
But I will just say for now, thanks very much. That was a really, really interesting talk. We're having some great comments coming in already.
I think it's given everybody listening a, a great handle on, on the jargon. And also what opportunities exist when it comes to data analysis. Please keep your questions coming in.
But just before we go to those, can I just ask everybody watching, just to spare us 30 seconds to complete the feedback survey that should have propped up, in a new tab on your browser. Just depends on which, which type of, device you're using to watch the webinar. The survey may, not present itself.
So if you can't find it and want to give us some feedback, please email it into office at the webinar vet.com. Similarly, if you're listening to the recording of this webinar, you can add comments underneath or also email the webinar vet office as well.
So let's go over to our first question now. There's one that came in fairly early on in the presentation, so it may relate to the livestock Information Service or system that you were discussing later on. But a question from Anne regarding whether it would be possible to have a system based on the cloud storage, whereby, every farm has an account, all the companies, upload everything onto that, onto that, cloud, and then as vets or advisors, we can then access the information to our clients' accounts.
That would obviously make everything much more useful. And I think you did touch on Some of the difficulties with that in terms of data permissions and things. But do you think that's where we should be moving to in the future?
Do you think there's greater possibilities with that now? I think that's quite a utopian, a utopian idea, and, and it's definitely one that I would share. So I guess there's kind of two potential routes we could get there.
One is, one is Lip and List, the livestock Information Service. I think that aspiration. Their aspiration is definitely to leave the door open for it to be more extensible, but I don't think their focus is going to be on that.
Initially, but ultimately, it could be a platform where, where some of that could happen. And the other route is commercial, commercial systems, some of which are out there already. I think that definitely has big potential, but it is also very complicated and, and difficult to do.
I think some of the other countries that actually, Ironically, some of the other daring nations that were slightly slower to adopt data recording technology are actually doing better than us now because they kind of jumped the step of every farmer doing their own thing and having a label of different software systems and had a much more kind of, ambitious national database from much earlier on. So it is definitely doable how it ends up happening. I'm not I'm not sure.
I'm not sure if I had to put a bet on like whether you would see a sys, you know, whether in 10 years' time you would be able to just go to a place and immediately access all the analytics you want to. I don't know, I would probably have to guess no in the next 10 years, but it's definitely a, it's a space that is definitely there to be exploited. Maybe I'll try and exploit it.
No. Definitely something to aim towards, though, I think, because, I think at the moment, with everybody hugging the data and, and not sharing, it is, it is making some things, more difficult, difficult to analyse. Moving on then to the next question, from Giovanni.
Are there any algorithm or softwares that consider at the same time genetic, genomic para parameters and also phenotypic data, E.g., for example, milk production, mastitis, etc.
In terms of, I mean, . This is a concept is quite widely used for analysing genetics data. So lots of the things that you'll see, lots of the, the kind of changes and predictions that you'll see in the way we manage genetic data is based on these types of techniques, but, At the moment I don't think there are kind of farm-based.
Any farm-based algorithms that are also using genetic data to try and predict kind of whether K150 is in heat today or has mastitis today or, or whatever. So Yeah, that answers your question. I think there are some farms that are now capturing individual animal ID data, aren't there, with a view to then it becoming useful for genetics, but at the moment, yeah, yeah, and it definitely works that way and has done has done for a long time.
It's. It's the, the information getting back to the geneticists is getting more and more common now. It's, you know, there's been some of that happening for a long time, but it's getting more and more.
It's the other, I, I think I kind of heard Giovanni's question the other way around there. Are people also using genetics as a, a feed into algorithms for, for predicting other outcomes, and I'm, I'm not aware of that. OK, if anybody is aware of any, please, please let us know.
Another, another question here from, from Beth, you said, thanks very much for a lovely talk. On the what is big data slide, you had some figures for missed recording changes. Did you have a reference for that?
The reference was just my PhD, so I don't think I ever, I don't think I published that particular bit because it kind of covered, covered lots of different, covered lots of different aspects, . So, my, my PhD is available to find should you, should you want to peruse it, it's openly available to download from, from the Nottingham library, . And yeah, that, so that, that was essentially, I identified a set of, cattle vets with an interest in this sort of, this sort of data and herd health type work, and asked them to send in, anonymized data sets that I could use for various bits of research and as a kind of starting point this and they developed some methods to screen data quality in them, and I was quite surprised by how many failed.
Brilliant. 11 final question, then we'll actually we've got just 22 questions and then we'll let you off the hook, Chris. I think I know what the answer to this question will be.
Do you think this sort of approach to data should be taught to students, even if it's from a very basic perspective? Hi, Gemma. It's from Gemma.
It's from Gemma. I was, I was doing some, video conferencing teaching to Gemma earlier on. Yeah, absolutely.
This is super, super important for the future, for the future cattle vets for exactly the reasons that we were discussing in the session earlier, Gemma. So, yeah, yeah, absolutely. And in many ways, this overlaps strongly with, with things that are kind of core, core techniques to us as cattle vets in, in doing herd health with our clients.
They're often things that That we've been doing for a long time, and hopefully the, the further we get down the big data roll and the more we can, we can develop better analytics and, and more automated streamlined ways of aggregating stuff, the easier it gets and the more value we can unlock for our, for our clients. Great, and that leads me on to the final, you will definitely get taught that. So that leads me on to the, the final question, which sort of follows on from that.
So, what advice would you have for, for those listening who would like to start doing more with the data you mentioned about exporting data and analysing it, yourself? So what advice would you have, for those listening that want to start doing that? I mean, where, where do they start?
Great, great question. So we did a, we did a, we did a workshop at BCVA last time led by, Bobby, Martin, Pete, and me, where we got people using, some slightly more sophisticated analytical software called R, which is freely available and is kind of the basis of, of most of our research analytics in the groups, fantastically powerful piece of software. But with a, a slightly steep learning curve.
So I, I guess it depends how much you, how much you want to invest in it, . And you, you can kind of You, you can kind of take either, either approach really. You can either rely on commercial software products that are out there already for, for analysing it and hope that they continue to improve over, over time.
That definitely has happened so far. You know, there are lots of mature products out there that allow you to do this pretty well. I've mentioned some of them, mentioned some of them already.
Those will continue to get better with time. And for, for kind of day to day analytics and dairy herd health, those are generally the sorts of things that I would use, . It's then how much further you go developing.
Skills beyond that, it gets more difficult, . So there is a huge amount of kind of teach yourself information about, about our out there. One of the things that can make this quite difficult is that it's quite difficult to get information out of lots of the data sources.
So, But as Sara kind of alluded to, the software companies can be quite protective of, of the data that lives in their software and generally they're not designed to kind of dump out all the information that you might potentially need. But there are definitely ways to do that. I guess as a career development thing, this is something that we as a group, Kind of think, think we have some good ideas about.
So this is definitely something that, that our residents pick up as they go through their residencies. So if you're at that sort of stage of career and this is something you're really interested in, You know, this is, that, that's something to consider doing a residency somewhere that you think have expertise in this sort of field. So I was just saying earlier on, she spent today labouring through some self-learning stuff, so she's probably the last person you should ask about whether that's a good idea or not.
Only under duress. Ps right there. Yeah, if I, if I can get through that and understand that as the person that hates, that's probably the most, then, I think anybody, anybody could get a grip on that.
And, one thing I would say is that the workshops that we did on our BCVA congress last year were really well attended, and the feedback was excellent. And if this is something that you feel as members you'd like BCVA to do a little bit more of, then please let us know. Email us at the office.
And give us some ideas, because we are currently revamping our CPD offering and planning, planning it for next year. And, if you'd like some, some workshops, done on, on analysis of data and, and what the potentials are, then, you know, please, please let us know. But I think that's I, I guess, similarly, Sara, if, if you have, particular things that you want to ask about or find out about.
Feel free to drop me an email, and I can, I can talk about kind of providing some pointers, or doing some work to try and set something up for you. It allows, that allows you to, to kind of do whatever it is you're looking to accomplish. So absolutely feel free to, to drop me a line about any of those things.
You might regret that. You might be inundated with data, although, knowing you do it for free. OK.
Well, I think that's that brings us to a close of all of our, all of our questions and and thanks very much because we have run over time. So thanks Chris for for hanging on and answering those so well. .
I'll just put this last slide up, here, which is just our future webinars. We are continuing with our fortnightly, webinars throughout, until the end of July. So our next one will be on Tuesday, the 9th of June, when we've got Owen Atkinson and Nick Bell, discussing the Healthy Feet Night programme.
Which is a simpler entry level version of the Healthy Feet programme. And then we've got further webinars, on transition health and also infectious diseases, and we've just organised one on liver fluke as well. So I really hope that you can join us for those.
I want to say thanks again, Chris, for that great presentation and discussion tonight. We're having some brilliant comments coming in, you know, thank you, thanking you for such a great, engaging talk. And thank you to everybody who's, Been listening as well.
And so I'll just, just sign off by saying goodnight and stay safe, everybody, and hope to see you in two weeks.