All right, so you might recognize this guy from earlier in the show.
He's a speaker as well as an organizer, and today he's going to be talking to you about a Tensorflow Hub in addition to auto ML.
So, his name's, Vikram, if you didn't pick that up earlier and give him our attention.
Awesome. Thank you, Chris. Let's get started. OK, so hi everybody. My name is Vikram. I'm Cofounder, and tech leads at Omnilabs inc. I'm also a Google developer expert for cloud and machine learning. And today we'll be talking a bit about how we can help in your machine learning journey. So when we talk about machine learning journey, we generally expect -- the expectation is that we will have the clean data, the perfectly clean data available for us to do our machine learning models.
And then we will have a great compute system where we will just upload all the data, write all the code and train our model. And once we have all of that done, we will get an amazing robot for us, which will be our AI system that we can use everywhere. But the reality is, sadly, something else. In reality, we end up with really bad data, missing values, unknown values, no values, all sorts of things. We also have -- when we spend so much time in cleaning up that data and actually using it -- we realize that most of the tools don't work together. The whole infrastructure around machine learning and machine learning engineering, the whole ecosystem is evolving slowly, but there are still tools which don't work well together with each other. So we have to perform scripting magic and join all these different things together to build something out of it. And in the end, what we get is sometimes useful. But most of the time it just ends up as a zombie code that it just lives somewhere doesn't get deployed. Most of the models never see the light of the day. And that's that's the sad reality of it. But what if what if we had a genie that we could use to build all these models and just take away all of this complexity that we have to face? And we are building these models and deploying these models and doing all this machine learning operations? Look, today we heard from Hannes Hancall call beforehand. How about how to do these things on when you are using this Tensorflow as a TFX or you are doing this on a fully open source manner, on using Kubeflow, TFX is also fully open source, but different ecosystems.
And you can use these with different open source tools as well. But what if there was an easier way to do this? Cloud AutoML. Cloud AutoML is a tool that is supported by Google on using to use and build your custom models from your own dataset. So how does it work? You have your own data set that you will use on Cloud Auto ML, which will store that data. It will train your data for you if we deploy that model and you can also use it to serve and in the end you can use it as a rest API API or you can just use it as a best solution on Google cloud. You can also export that model outside. So let's see how it actually works. So we have we know about our VisionAI. So we know that Google Cloud provides APIs around vision and other tools similar to that, but if I upload an Xray image over there, most of the information that it can give me, things like radiography and there's a shoreline download, which is fairly good. Actually, it's not expected to give so many details around this. Like there's a joint modernises, medical imaging, it's a chest X-ray and it's like that. But if I wanted to actually use this image to figure out whether there is pneumonia in this image or not, you can't really do that with the Vision API that is available as a as a service from Google Cloud. But what you can do is you can use cloud Automl to train and build your own custom machine learning model. And then once you have the customising then model, you can deploy it and just use that model for your purposes. So, for example, what I have actually done is I have built this model which classifies images between whether the whether the X-ray has pneumonia in it or not. So what we will try to do is we will try to run this model onto a local machine. We built this model on Auto ML and exported the model using Tensorflow JS. And now we'll try to use it.
Now I try to use it, so with this running the model, luckily everything is on local of are being made and I just uploaded an image and we will see the result here. This loading time is there because this is the first time that I'm loading that model into the page. You can also reduce the time, I think, JSON, the talk that is up to me are going to talk more about that. So let's see. Oh, OK. Here we see the probability of normal that this is a normal image is 0.61.
That's a 61% probability that it's not a pneumonia case. Let's try a pneumonia case over here. And now that we see how quickly the computation happened, because this time that one was already loaded and the probability of winning is ninety four percent on this image. This was the data that was not available on the solution that is available to you using an API. But what we did was able to own it using our own dataset. So how do you actually go about doing this? I used for this example, I used Google at Google, had a challenge that was going on and we used that dataset. But for another example that I'm doing today, we will try to do something from the beginning. What we're trying to do is we have seen a lot of examples about images. So I just want to do something that is non image based. And for that, we have this dataset which talks about how do you figure out fraud in greater transactions. So we have some great card design information in here. It's an interesting one. And B of data information. We have transactions in here, the amount in here. You can see how other people are trying to solve this problem. But what we can do is we can take a look at how we would do this. So far, the first thing that we will do is we will go to our cloud dashboard. So we go to our Google Cloud Platform project.
This is here we are now. We will go to our.
I s. so getting to know all of the different tools are here, so we will go into details, but you can also build custom models for natural language, for vision, for translation. I'm not sure if we do intelligence available, but you can do all of these things for your audience. So let's go to table stuff that is actually the one of the most available data sets that people don't really use a lot. So over here, we have our data sets that we can use and build models upon to take a look at the 220 fraud detection.
I have also done this talk beforehand and every time I end up using a new database system.
Ok, so let's try with the one that's going there. I've already uploaded the data set, but just show you how exactly you can do that. So that's go to import section. Generally, I tend to tell people that always upload your data already onto the cloud storage and then just point the correct location. So we are going into a bucket and choose the property for protection and I can select the file from here. This would be much faster than uploading the files from your local, even better if you are already using the query just pointed to the great table and it will just import the data for you automatically. Then once you have imported your data, you can go take a look at some of the information about your models and your data data that be imported. We saw this and the effects as well as Duflo that these tools are available for everyone to use on. When you are doing an end to end pipeline here, you can also take a look at that information. So we have information like all these different columns, some numeric type, some of categorical types. How many values are missing from how many values are embedded? How many distinct values are there? And once you assign a target column. So for this one, we are trying to demonstrate that this transaction is fraud or not. You can see the correlation with the target as well. So sometimes you will see that some data has very high correlation.
You might actually want to take a look at the data and ensure that that data is actually not null and doesn't have the same values or something like that. This is also a really huge table. We have three hundred and ninety four columns in here and this is six hundred fifty seven bits of data for tabulator. That's actually huge. OK, so we can take a look at this data. We can start doing some basic statistical analysis on this, on Kagle. If you take a look at how generally people approach these problems, you will see that the first thing to do is understand your data. And this is a really great place to upload your data. And just very quickly, you can see basic details of your data and you can start getting a sense of whether this is actually usable or not. Then we go into creating the model. So to train the model, it's as simple as giving the model. And then so that's what I end up doing, is generally I give it a name about four, three. I'm going to train this model. Three of the best part about this, as you can specify a number here on how much time do you want to bring this model for? So for this model, we are going to train it for three hours. But you can also do is you can go up to 72 hours and.
If you more or less strained already and it can't find out better accuracy, whatever optimization objective you had, you can it can automatically stop itself early or you can disable it and it will take the whole time. But I generally end up enabling it because I don't want to spend 72 hours of budget time in there. If it's if my one of the approaches best accuracy or my best object to us, I don't want to waste like 70 hours just just cause, OK, at this point you just start the model. That's as simple as that. You have your model getting started. What I did was I also started training for actually 72, two hours, and I did a training for two hours. This was yesterday. So while these monitors are getting trained, let's take a look what actually ends up happening when you create a model. We see some basic details right here. That area is one five, six accuracy's. Ninety seven percent. That's really good accuracy. Let's see more details on this. So full evaluation. All right, so now we can evaluate our model, so not only we can, we got the data between the model and the evaluation phase of our model. So we see that actually it's really good. We have 99 percent accuracy versus low. You can also set a good score threshold over here and see how it's performing at different thresholds so we can see like eight point five zero point four nine.
You can also see the confusion matrix over here. You can see how your model performed or what time during training process that's about. You can also see the importance this you can use and use for later to say, next time that I'm collecting data, I will try to ensure that these values are properly filled in and they're not missing any data sets, data values. Once you have all of this, then you can actually start using it right there to test and use your model. You can use that prediction. So the best way to do that would be to upload the data into the query. You can also use some Google cloud storage and the results will also is also going to be query is the thing that I like about it is you can do actually online protection. So we just built a model which can do which can figure out whether a transaction is for or not. I wouldn't want to run this at like midnight every day out of the phone every time when the transaction comes in. To do that, you can do an online petition. Now, this here won't look great because we have three hundred and ninety three rows in here. So what you can go to is adjacent one of you and you can see the payload of the request that we are making in here, these values in the same manner that we defined our rules of data.
You can pass in multiple values as well, not as much known feature like there's only one value that we are passing in here, one whole input. This is the complete input that we got from the user for that transaction. And we can predict with this. You see the whether it's for or not and we see the confidence this point. My line of this is not for. So at this point, we have the complete model and it's available for us to actually we see that we just made it all the way here. How would actually make this call on silverside to actually make it like from my API? So what you can do is you can go to. This is what. All right, so right here we have a not just example, you can do Java, Python, whichever one you are most familiar with. And it's fairly simple. You set up your project and your model part, your project idea and region, things like that, and just make the call. It's a simple protocol which takes up a lot Ballymun and we don't see the results and you can just take all the results and done the response back. This whole code, we give you the complete picture that you need to for your uses. So at this point, like, this is a really good model and we can start using it. But why would I actually want to use.
So let's take a look at that.
So why would you actually want to use this as we talked about some problem that we have as users where the data is messy, are we see that there's a lot of problems around computation model jobs and things like that?
Why would you want to use the. We as humans, like there's there's not a lot of talent that is available and on machine learning, we see a lot of people, a lot of hype going on around it. But when you actually look for talent and on machine learning, it's really hard to find those people and actually to hire them. Let's say you actually hired a person and you are now using their skills to actually build a machine, that it still takes a lot of time, not only in the prepossessing cleaning, it's also setting up those pipelines, running these experiments, coming up with the best modern architecture, trying to find out various Hypponen. I'm just trying to tune them and funding all these experiments. Keeping track of all those things still takes time. And moreover, as soon as you add the human cost as well as the time cost, the cost is actually high for something that you might not even end up using. So with what ends up happening is you can use all these things. You can start using your data right away. The model for your data, one of the good things that I might have skipped over here is let's go back to this on training.
Part of that that was right here. So let's go back to the Model S. So the training will take several hours. It also includes the time to set up the infrastructure and theorem. So even this time, you are not getting charged for that. And when we consider that time into how much effort that goes into setting up the infrastructure, tearing down the structure and all these extra costs that go around building your product, it's a lot of cost and you actually want to reduce as much as possible. So I suggest people to use as much as possible. OK, but what if I actually build the model? I like the model. I actually using it. But what if I actually need even more flexibility? And I've tried multiple data mediation, but I'm not getting that under the cloud of highly unlikely. But if you are not. You can go to the Fallujah offensive, Fallujah is a repository where you can browse different machine learning models.
For your own use cases, and these models are trained models, which you can also retrain for your usage and you can fight it on different formats as well. So let's take a look at some types of models here. So if I go into image as a glass, I can get image classification models in here. I can get landmark models. These are like between models. This is a collection, but they're also featuring models that are available in here. So let's go take a look at this began here from Deep Mind. You can also filter for models that you're only working patterns of legs. And I don't see any in here.
Ok.
Let's go back to her and then we will filter 40 years or so, you will see these ones that you can use that using technology in your browser. And I think Jason will talk more about these things, these models in here. You can also get models which look what the F like, which will be your Android and iOS devices. And these, I think also work on that's files. You can also get current devices and run your models on that. You can get these different models from here. The best product I actually like about this is let's go to one of the examples, but I really like.
But in here.
These people have to Giorgio's not only you can use an existing model, you can also learn about how do you how can you use that model? How can that model build and how to fine-tune that model? So here this is Golab. It's totally free to use you. Can we have all the code in here on importing the model? So here we import the whole model, the pretend button, and then we get our data set. We are using tons of those details. But you can build your own dataset in here. We tokenized the data. We do all the processing that we need to do and then the whole thing together to build the model and the model you can write and to run this whole thing, you can just like it's a simple digital notebook and you can run this whole thing and learn more about it as well and truly for free. And of course, that's all you in here. Nice. OK, so that's the setup for you. But what if you are actually working inside the team and things that you can do on call and we can actually share those with different people in your team and everybody can collaborate on those repositories. But as you grow and become bigger and larger themes, you end up having trouble with not only the model file that you are using to write coding, but also the assets and on models. So for that, Google cloud provides about a hub, which is a central place where you can come in and you can have different types of assets in here. So we talked about Cuchillo pipeline to this. Let's go to the demo of this.
Awesome.
So we have a hub which has different types of assets that you can share between your team. Now, this is my personal one. I don't have any assets shared here. But you can share those assets. You can start those assets to see have a quick way to go to those assets. Nothing in here, the page, that's a full report, that's not great. OK, I just want to show more about this, so you have different categories of answers that you can share. There's, of course, the data that you can share between your team. There's, of course, some data that's available from the cloud. I think most of this data is available also on data sets such. But you can also take a look at here like one of the recent cases happening to it, and that is available over here as well. You can search for those datasets and look at look at different types of data. That's, of course, the key for pipelines that God mentioned today. You can have one of the existing Kufor pipelines, like if you wanted to run, let's say, right here at times to this model, which works really great on if you're doing any kind of advertisement or advertisement type of work and you can do a time of analysis using an estimate and supply and engine is a complete and do installation over here.
You can do OCR echolocation over here. Similarly, there are more available looking for pipelines. These are entrance systems that are already built that you can directly use into your projects, of course, amid a continuous notebook's services, like all of these things are available here, green models as well. I don't see any reason go here because I think all of those are available on Tufnel. But if my team built a green model over here, you can put that here as well. So data and your model assets will come in here from your team as well. All right. The best part about this that I really love is technical guides. Again, same as DFB. It's great to use something that is available, but it's even better if you can learn from it. There are like I really love this article where you can just use Subutex machine learning how to build community model on Google Cloud and they talk about using Vision API as an intelligence API and it's a complete Blackpool's here. And with all the details around, how do you want to use it so you can filter by different types. And this is a really great hub as a central place that you can come in for all of your machine learning needs and when you're working with a team.
So how do I actually go about.
What what would you do if you were doing a new project on machine learning, the best use? Because that I tell people the best way to get started is start using start looking at the vision of the already available APIs from available from the cloud. You might find your answer right there like I was not expecting here, for which maybe I'd actually written that this is an x ray image or that it contains just in here and there's a neck in here and you'd be surprised at what this can do. Like sometimes people tell me they want to get better from the image. Like right here we see this and there aren't a lot of characters in here, but these are in here and you can get that from here as well. Different properties, different safe search, whether this is a medical imaging, not whether this is a violence in this image. And you can do most of this and this is from here on the API. If those don't work, try going to a cloud of smoke. As we talked about that, I actually wanted to figure out whether there's money in this image or not. How good is this? Nonfunctioning. And you can go to cloud optimal and build your own model over there. And even then, at this point, if you're not satisfied, then you still need more more hands on work like both of these systems are totally out of your control. You can't really do a lot of those things around hypovolemic attuning and anything that other than your data, you don't really have a lot of control over anything on Unobvious.
I don't really have any control on. And you have most of the control over data and how much time you want to spend, but you don't really have any other controls. If you really want a lot of control around how you build your models, how you deploy them, they're trying to cut down your costs. And because like that, you can always have and I have and build from there the way I generally approach. Most of my problems are generally not solvable by Vision API. So I actually start with a it's a really great place to get a quick feel about not only your data, but also what kind of models you can build and what kind of models architectures would look. And if nothing else, it would actually give you at least a baseline baseline about what kind of models and what kind of predictions you can expect from the data. It's a very easy way to filter out projects that you don't want to work upon. Let's say a client gives you a autonomy of file and tries to ask you to like build a machine one on it. And the data is not really useful to do anything on that. So with our family, you can very easily run a quick search and then a quick project on that and get a piece of mind that you didn't want to spend my days trying to build a pipeline and things like that when there was no actual value in doing any of that.
So once things pass through my design criteria that we can actually this data is actually useful and we can build something useful out of it. Then I actually started looking into Google. I have well, anything that is available, any literature that is available around building these different models and the already available asset that we already have, one of the fantastic but is not available. So either we end up requiring the state of the art models that are available on the hub, or you end up finding a really innovative way on Kaggle by some of the winners from competitions. Or you end up going to a hub and realize that some of your teammates might actually have built something like this already. And you can just focus on that and start looking on top of that. And finally, once the model is ready, I'd really like to use my models available as quickly as possible. So they generally end up taking the point of becoming a staple. But I really love it when I can export them out as often as models and use them that things in my browser, that's my ideal. But when I really enjoy those kind of things. So that's actually it. If you have any questions or compliments, I'm here.
Thank you, everybody.
Also, the speaker knows a lot of information in a short period of time, I think he did a great job and now is really, really awesome. And, you know, Kagle recently used auto e-mail to go the spam filter instead of there's an article about it that came out recently. I guess you mentioned cost as one factor in why you might go with auto now and begin using that. And it kind of went over a lot of different points. Are there some main reasons besides cost why you kind of go for auto metal versus kind of building a model from the ground up?
Not really. As I talked about, the talented on machine learning is fairly McFeeley skill. And like even though you might find a great machine learning and they might be domain specific down engineers who only work on image data, engineers who only work on tax data. And today you as a company might be working on heavy computation. But if you are sort of like us, like that might change in six months from now and you might need to, again, find and hire a new person, bring them on board to all these offices and on a different type of problem. And as maybe this is something more from my background, experiences and things change rapidly in startup enterprise. And what do you want to do is you want to have a quick way to try these things out. So not only time and cost, it's both of those are very valid concerns. But hiring itself, no matter how good of a person you have on your one team, they can't be exporting everything and you get the flexibility of using tools available out there.
Definitely, I think that transfer learning a framework is something that's saving a lot of time, and sometimes I find that perhaps ego becomes something that plays a part in that being that people want to build it themselves. They don't want to do something else. But when you talk but when you're talking about like an enterprise situation and you need to get something done for a client, you know, I'm in the same boat as you were. I worked at a startup, and we just need to get things done, know. And so I think it's really awesome that this is available and appreciate you sharing your insights about super cool. Awesome. Yeah, awesome. And I think we're ready for our next talk. Thanks.
Awesome. Thank you.