OK. Hello, everybody. My name is Guillerme Rosa.
I'm a faculty member at the Department of Animal and Day Science at the University of Wisconsin Madison. So today I'm gonna talk about some. Some projects, related to genetic and computer vision tools for controlling gastrointestinal nematoids in sheep.
So this was basically the PD research of . A student Luara Freitas, who I, I'm gonna mention more times here and And currently, she's a postdoc in my lab, and she's still working on this kind of research as well. OK, so just to give you some.
Information about the problem that we are trying to. Address here. So gastrointestinal neheumatoid infection, GIN, it's a major issue in sheep production.
One of the Most important parasites is the Amoncus contortus, highly pathogenic parasite. And what sheep producers do, they control infections with antimitics, mostly chemical products. So the problem with this is, of course, The cost of treating animals, environmental residues.
And another big problem that happens is that we are getting parasites that are resistant to this drugs as you treat, you are kind of selecting for resistance. So that's another major problem one with the GIN in in. So a monos contortos, I mentioned to you very problematic blood feeding parasites, so they cause anaemia.
And then because when the, when the animal gets anaemic, the colour of the, the conjunctiva. Gets pay. And that's what quite often what people do to assess health of this the, the, the flocks is to look at the, the, on the eye of this, the animals, more specifically the conjunctiva.
And they use a scoring method, quite often the, the ama score. So it's a score, visual score so subjective assessment of the, the colour of the conjunct type. And it's a scale from 1 to 5, so 1 would be the normal animal, very bright red conjunctive, and the anaemic.
If you get a score 5, it's very pale, the animal is very anaemic. And then 2 would be an animal that you start paying attention, and 345 are those that in general they treat. So, the good thing about this method is that you don't treat the whole flock at once.
You do what they call targeted treatment. So you look at the conjunctiva, you assess anaemia and treat only those animals that are in need of treatment. So the work that I, I'm gonna show you, we started working with this breed of sheep called Santina.
This is a breed that we, very common in Brazil for meat production. So, This was a work in collaboration with the Animal Science Institute in Sao Paulo, Brazil. So it is a substantin is a woolless hair she, as you see the pictures here.
And again, mostly for meat production. Very rustic animal used especially in the north of Brazil, like very hot areas. So I'm gonna show you basically 3 subobjectives here goals of this larger project.
And the first was. Can we use Machine learning methods to identify resistance, resilience, and susceptibility to amongst contortos and in sheep. So this paper was published a couple of years ago, if you're interested in in more details.
So, just to give you a little bit of background, I mentioned the, the, the, this nematoids, significant health concern. And, and, and you can think that animals can be classified in these three groups resistant, resilient, or susceptible. So the, the resistant are basically animals that don't get infection, OK?
In any kind of disease that's how we, we classify so an animal that is, that doesn't get or is more resistant gets less than others, the infection. They're resilient, they get infected, but they don't suffer too much the consequences of the infection. For example, growth or performance, milk yield reproduction, so they keep going despite of the infection.
And then the susceptible are those that get infected and suffer, the disease, OK? So we are looking here infection level at two levels. One is to, counting, egg, faecal egg count.
So that's a direct assessment of the, the, the infection. Of the, each animal. And then we look at the, the red cells in the blood, so the packet cell volume as an indication of how anaemic the animal is, so how much the animal is suffering from this infection.
So, of course, the challenge is that for you to monitor FIO count and and text cell volume, you need to Get blood and faecal samples which are not simple to collect. It's time consuming. You need to bring animals to the.
Barn and then where you can work with them and then ship the samples to labs and, and so it's, it's costly and time consuming. So we are looking for cheaper, non-invasive and less labour intensive methods to reduce costs and, and have a, a more timely assessment of infection. So these are the objectives, investigate the feasibility of using is to measure phenotypic traits to predict whether the animal is resistant, resilient, and susceptible.
And this could be used, for example, in breeding programmes, like if you want to look for select animals that are genetically more resistant or resilient. And then we're gonna compare a few different methods, . Statistical machine learning methods for classification.
So we have the logistical regression, linear discriminant analysis, and random forest and neural nets. And then we'll evaluate this methods across farms as well. I'm gonna show you why.
So we have basically we are using literature with the thresholds that people suggest in the literature. So an animal, when you have low packed cell volume, so the animal is susceptible, it's infected and, and, and suffering from with the disease. So we had 110 records.
And then if you have individuals with PCV above 22%, that's OK. And then an animal that is high level of infection with more than 1000 eggs per gramme, so this would be the resilient animals, OK? So they are, they have a big load of infection, but the PCV is still OK.
And then the resistant animals are those that have lower level of infection. So, about 300, 3600 phenotypic samples, and, and from 12. 50 animals.
OK. So which variables we used? So we had the body weight of the animal, body condition score, and then we have also the farm, which animal in in the where the animal is, is, is located, sex, age, and the month that we record.
So then we compare these 5, these 4 models using a cross-validation approach, and we are assessing the predictive ability by precision and sensitivity. So these are the, the, the summary of the results in terms of precision, that is the 1st 3, the the the green plots there and the sensitivity, and then we are looking at resistance in the first column, resilience, the second, and susceptibility and then we have the four models. OK, so you look, for example, let me see if I can get my Laser pointer here.
So, for example, you have here the, the, the. Multiple linear regression and linear discriminant analysis, giving a better sensitivity to detect susceptible animals. So that's how you read here, OK.
So we had more or in general higher sensitivity for susceptible animals, higher precision for resistance and resilient animals. And then of course you need to decide how much of like a False positives, false negatives. So of course, you can make the decisions based on what is more important for each specific context or each specific farm.
And then we looked also, this is very important. So when you do cross validation here we have multiple farms with multiple animals per farm and repeated measures. So when we do cross validation, we need to be careful with how you break down your data set in in training test and and and and validation and test.
Because if you look at, if you have the same animal represented on the training and, and, and, and the, the test, you are gonna get a You're gonna overestimate the performance of your model, right? Because in the future, what we're gonna use the model to predict a new animal in a new farm and so forth. So that's what we did here.
So we are looking at how the model performs in a new farm. So we train the model in all farms but one, and then we predict the animals on that specific farm. So and then you have, so you see here.
Sensitivity, for susceptible classes. So if we have sensitivity here, susceptible classes and for example 3 farms that got that they were, we got high sensitivity to detect susceptible animals, but then some that are low. So all this depends on how different is a farm compared to the others in terms of the kind of animal age of the animals or, or, or.
Environmental conditions and so forth. So, precision for resistant class, so you see these 4 classes were reasonable. And, but overall precision for resilient class was relatively low across all farms, OK?
But again, I want just to stress this need when we are doing cross validation, this is quite often what happened with this precision, tools like accelerometer and RFID and, and computer vision, you have quite often. Multiple farms and multiple animals and repeated measurements, and you need to take into account this data structure when you are fitting the modules and also evaluating performance of the model. So this was specific for the linear regression, but of course, you have this for all models as well.
So in conclusion, we thought that the linear regression and the linear discriminant analysis achieved the best performance overall, especially in classifying susceptible and resistant animals. You can apply this on readily available records, easy to measure variables such as body weight, body condition score. There is also a visual inspection, sex, age, and the month where when you record.
And you can use also the Pharmacha to, . As an import variable, OK. So these two can support farm farm management decisions.
Potentially reducing economic losses caused by this, parasites. And then, of course, if you are able to detect animals that are resistant, you can be, you can consider selecting those animals for breeding programmes so that you get, you improve the, the flocks, in terms of resistance to the, the disease. OK, so then talking about Genetic selection.
We also look at the genetic markers on this sheet. To see if we can predict resistance to this GIN. Using some different models.
So again, this is a paper already published last year. So if you wanna get more results on this. So, That's the goal here is to explore the possibility of developing breeding programmes to select and improve blocks for resistance to the disease.
We have traits that are correlated with the resistance like the faecal count, the packed cell volume, and the pharma score. And the selection of animals, . For economically important traits is traditionally based on geno phenotypic records and pedigree information.
So if you don't have genomic information, that's what basically what you do. You measure the traits and you select the best animals, not only on their own phenotype, but using the combined information of animals related to each other with the pedigree information. But nowadays we have genomic information as well.
So you can have full DNA sequences, but quite often what we have in livestock is we use molecular, but like just some. Polymorphism, some sites on the genome, mostly SNP markers, and then we build models where you predict the, the genetic merit of animals based on those molecular markers, Building models that associated to the molecular marker genotypes with the phenotypic traits, OK? So that's the what we had in in terms of observations, the FEC, FEC PCV and FEM 54 and 5000 more or less observations on 1500 animals, more or less.
So what we did was, when we measure the phenotypic trait in one animal, of course, there is a genetic component, but also the environment component, right? It could be, that you measure it was a dry and hot season, or, or cold season, wet season, and what, it depends on the diet that the animal is eating. So what we did was first to analyse the data using the traditional.
Mixed model for genetic evaluation of animals and also for genetic barometer estimation, we call this the animal model where you have some fixed effects to correct for the this differences between farms and ear and and season and so forth. And then we had here a random effect related to the genetic merit of the animal that is the breeding value. So what we are getting here is the estimated breeding value, so an estimate estimation of the.
Genetic merit of the animal. And then because we have repeated measurements, we have another component here to model the, the correlation between measurements on the same menu. But what we are interested is, is on the prediction of A, the breeding values, OK?
And the breeding values, we assume normally distribute centred on zero, and here you have the genetic relationship matrix built using pedigree information. So once you have this this EBVs, now we try to find markers associated with these breeding failures. OK.
So we have the SNPs and we did the Traditional kind of quality control looking for excluding markers with very low minority frequent or call rate in our case lower than 0.9 and then things that are completely away from hardly invaried proportions and also exclude animals with call rate lower than 0.9.
The modules that we used were first the G blob. Gblop is a model very similar to the one that I just showed you with the animal model, but instead of this matrix A that uses pedigree information, we use a matrix G that is built based on the genomic information. So you estimate genetic.
Association between animals using the genomic information. So this is a very common model using genomic prediction. It's called Glob.
Base A and base B, these are regression methods based on Bayesian models based on regression, and they were actually proposed on the first paper on genomic prediction in 2001. And then later. The models like Lasso and Bay and lasts were proposed, so we are using Bayes and Lasso as well and comparing these methods with neuroet.
In this case, Bayesian regularised neuroet. Again, we use cross validation. Fivefold cross validation and we are looking at accuracy and mean square error to compare the math the models.
So these are the, the results. So we have the 3. Phenotypic traits here, PCV FEC and the Pharmaia, all the models, accuracy, mean square error, and then basically what we see here is that not big differences between the models, but like for PCV the this regression methods were better than the neural net.
Same thing here for the FICOA account. And same thing here for the famacha. And then we see also that the estimation of, or the prediction of FIO ag count was better than the other two traits.
OK. So Of course, we can improve these things, trying different methods, different models, and increasing sample size, OK, but so far that's what we got here. But it's, it's pretty good accuracy anyways, OK?
So these are. Reasonable, reasonable values for to be used, for example, in selection programmes. So in conclusion, the results indicate that the parametric methods Gblo and the regression were better and they are so suitable for genomic prediction of this resistance indicator traits for resistance in, in something in this ship.
The Glob is actually very simple to implement. There is the possibility of combining also genomic with pedigree information, very simple to extend this model for multiple traits for the, the joint analysis of multiple traits. So at this stage, we would be recommending the Glo as the, the.
Best model here, good results and easy to implement. And then like I mentioned, further studs can be, should be performed using maybe larger sample sizes, more blocks included, and then maybe even try other methods, OK. Very good.
And then to finalise, I wanna mention a third work that we develop again, Luara Freitas is the first author here. Yeah So now we are using computer vision. To try to help, .
Chiap producer, so instead of having each producer trained to score a match. We can develop a computer vision system that is more of it's objective and doesn't need the, the, the training of the technicians and It's easy to to apply and so forth. So we use the pharma we used computer vision.
To predict not only the Famache score but also the Packed cell volume and the fecalag count. To see how the, the results. So these are again papers published recently if you're interested.
So we have a paper where we estimate if a match score. It's pretty good estimation because it's basically the colour of the eyes, but now instead of having a an individual evaluating, we have the computer looking at the intense of red, green and blue in the pictures, right? And getting an objective assessment of the call.
But what we, I'm gonna show you here is how the results for something that is a little bit more complex that is using the colour of the eye, but to predict the, the FIO ag count and the the the the the the packet cell volume. But the idea is that, again, formacha methods is an interesting tool that we can use for targeted treatment. But it's prone to error because we, it relies on subjective evaluation by, by individuals performing the analysis, so.
We are trying to develop this image analysis or computer vision approach, OK? So we are using. Simpler methods like, logistical regression in this case we are basically, well, what I'm gonna show you is mostly, well, both we are looking at the continuous variable in terms of the, the, the, the, the packed cell volume and also the FIA count, but we are also trying to classify the individuals as anaemic or normal, OK?
So we use both regression and random forest approach. So we had data on almost 400 animals with over 3000 images. And records on packet cell volume, that's what I'm gonna show you here.
So this is basically the image animals that were classified as anaemic or non-ananemic, so 3400 animals. And 3400 pictures on 392 animals. So the first thing that we need on the, when you, you, you acquire a picture, the first thing that we need in terms of the image processing is to detect or to extract.
The pixels that belong to the, the conju type. So we need to train a model to do this. So we had the first step of annotation.
So we use bounding boxes, so we Basically, manually find where the conjuncta is in each picture. And then we develop compare different masks, models that are gonna to look pixel by pixel. You classify pixel by pixel as belonging to the, the, the conjunctiva or belonging to the background.
So and then we select the best one. And then with this, we use our Oops. We use a unit model to do this that we call segmentation, right?
So you have a picture here, and then you get the mask. So this is the region of interest. So you look at you.
Super superpose this mask here, and now you get the pixels that you are interested and now you can look at the colour, distribution and so forth of the Of the The colour of the pixels in this region, OK? So to assess how good is the segmentation, we look at the intersection of union. So how close is the region that we that the computer select and the, the region that we.
Manually gave to the computer and the intersection of union was 0.63, so not too good and then we were able to improve on almost 2.7.
Basically what we did was. We realised that sometimes. The computer got confused.
For example, here you have this area, the reflection on the eye capture as a conjunctive component. So we, when we get like two or more areas like this, we looked for the larger area and ignore the rest. So this improved the the the pro the the.
The model and now we get 0.68 intersection over you. OK.
So now we have this preprocessing of images that extract the pixels that are of interest, that are on the conjunctive. And now we are using two approaches to regression and classification models. One, to look at the regression is to predict the percentage of PCV and the other classification is based on this 27% that is there.
Threshold suggested in the literature. We classify animals as as normal or anaemic and now we are using a classification model to To classify, OK? So we had 3 models here, deception, Inception V3 and VGG 19.
So we use cross validation again. So here we, we need to split in three components, so you have the training and validation. To fine tune the model and then you test on this third subset of the data.
So we, we put basically 70% for training and then 150% for validation and 15% for for test. And then we know the, the PCV of each animal, so you can classify them on. Anaemic or normal, and now you have the prediction from the model and you can view the confusion matrix to assess false positive, false negative, precision, accuracy, recall, and so forth.
So this is the exception model, which provides the best performance for, for the regression. I square of 0.24, OK, not.
Too big, but it's still a starting point. And Here we have the scatter plot plot for the with the predicted and observed. So not an easy task to look at the colour of the eye to predict the, the, the packed cell phone.
But with the classification was a little bit better. It's still look at the best mode of deception, we still have a large rate of false negatives, 43%, and false positive 27%. OK.
But It's just, it's, it's still a a reasonable result and of course just a starting point, we have one. Data set with less than 400 animals and now we are. Expanding actually submit a grant proposal to collect data on multiple breeds because so far we had one single breed here and then we need pictures across.
The seasons and then we were in Brazil, but now in the US you have like in winter in Wisconsin you know and then you have all these different environments, desert and then savanna and everything. So we need to collect data on different circumstances with breeds and environmental conditions as well. So the VDG 19 model that actually was the best for classification.
And the accuracy. Precision recall. And F1 score for the exception and the BGG.
Very similar, but recall was much superior for the VDG. And this is the results for the exception. OK.
So, overall, still we would wish to decrease this number of false positives, but it, but nonetheless, the model success successfully identified most anaemic animals with 17, only 17% of the anaemic animals going undetected. OK. So and now, what we developed.
It's a prototype with a A is still a web-based, so you download on your cell phone, you take a picture. So of course, you need to give access. To the camera, you take a picture and you get a result immediately in terms of anaemic or non-ananemic, the classification.
This is a. Like I said, a prototype. It is a web-based, so you need to be connected to the internet or the cell phone.
To be able to, so the image goes to the cloud we do the analysis and send back the results. But again, I mentioned that we submit a grant proposal and in addition to the collection of additional data across multiple breeds and different environmental conditions, we have also. Proposed to, to develop now, to improve this app and make it freely available to producers and they.
With no need to be connected on the internet or cell phone or cell line. Or cell signal. So you, you can take a picture and the, the image is processed on your cell phone and you get a prediction right away.
And then, of course, when you get signal, cell signal or internet, the image can be sent. To the To the cloud as well, but the results you get right away. So in conclusion, The ocular conjunctiva images and deep neural network algorithms, the one that I just showed you, can help support efficient fire management decisions, potentially reducing economic loss, and also because it helps you with the, with the targeted treatment, you contribute to the environment, so it has less residual, chemical residue.
And and and you and you will save money as well. Save money not only because you improve performance of animals, you don't lose animals, some sometimes they die because of the the dynamic anaemia, but then you also save money in terms of the blood test. Or FIO I count.
OK. So what the advantage of this approach with computer vision over the visual forma is that no need to train an individual to do this. You can perform as frequently as you want.
And algorithms can be improved with increased sample size. So as we develop, we collect more data and we put this app available, and then we get more images, we keep improving the bundle and of course it's gonna be much better than anyone, any individual can do, visually. OK.
So to finish up, I want to thank your collaborators. This is a picture of my lab, but this was last summer. Some of them have left already and have new members, but most importantly, Doctor Luara Freitas was this .
She's from Brazil, was a PhD student in Brazil who came to my lab as a visiting scholar to develop this computer vision component, and then she did so well that I invited her to stay here for as a postdoc, and now we are continuing with this research. Thank you very much. I hope you enjoy the, the, the presentation and I'll be happy to entertain questions that you can send to me directly or to the, the.
The platform here. Thank you very much.