Custom Vision 2 – Object Detection Made Easy – The Xamarin Show


>>On this week’s Xamarin Show, I have a good friend,
Jim, back on, talking about object detection
with Custom Vision AI, Xamarin and
awesome toys, tune in. Welcome back everyone
to The Xamarin Show, I’m your host James Montemagno, and today I have a good friend, Jim, back. How is it going, Jim?>>Good. Good. How are you?>>I’m doing absolutely lovely, it’s a beautiful
sunshiny day in Seattle, like every day, so you
know, I’m in a great mood.>>Excellent. Excellent.>>You’re all the way in from?>>I’m all the way from London.>>That’s right. I never know where you’re at
because you’re everywhere.>>I am everywhere.>>Yeah. Now, all
the way from the UK.>>Still a little bit jet-lagged, but, yeah, really
excited to be here.>>Yeah. So, you’re->>Doing great stuff.>>-Cloud developer
advocate? Correct?>>Yeah. Cloud
developer advocate, my specialism is Xamarin.>>Nice.>>I’m all about the Xamarin. So, my job is to build
cool Xamarin stuff, help developers get on
board with Xamarin, help developers
build cool things, create some samples,
create content for people, show them how easy it is
to build amazing apps, and take advantage of some of the services we have
with things like Azure.>>Yeah. You were on
a little bit ago, and you were here with toys, and you’re back with more toys.>>Back with more toys.>>Got that, toys?>>I have the toys. Yeah.>>Last time you
were showing us how you built away taking photos of your daughter’s toys, correct? Identifying them. So, this is BBA, or this is a monkey, right? Like the hot dog or not, but even more advanced
because you’re identifying a bunch
of different things.>>Yes.>>But, you’re back
to tell us about an evolution of that service
we talked about today?>>Yes, today I want to
talk about the new hotness.>>New hotness.>>New hotness. We did
have a new hotness. So, the thing I showed
off about six months ago, I assume we’ll have a link in
the notes underneath this.>>Down there.>>Down there, down there. Yeah. So, thing I showed off was image classification where you have this idea of
transfer learning. We have these newer networks
built by very, very smart people,
not me, smart people.>>Not me.>>You give them
five, 10 pictures, and you say this is a hot dog, this is a not hot dog, this is banana, this is an apple, and it trains a model, and then you can
upload a picture, and it will give
you the percentage confidence that that picture is a picture of the hot dog,
or the not hot dog.>>Got it.>>Now, that’s really cool, but the next evolution in this, which we now have as part of
our custom vision service as a preview at the moment
is object detection.>>Okay, so object detection.>>Yes.>>What’s the difference?
It sounds similar.>>It is. The difference is when you have
image classification, is you’ve taken a picture of a thing and it says
it is a thing.>>Yes, thing A is thing A.>>Thing A is thing A,
with object detection, you train it up on lots of
pictures of things and then you take a picture that
contain multiple things, and it says this is thing A here, and thing B here.>>Got it.>>So, not only does it
say it is the thing, it tells you where the thing
is on the picture.>>That is very
nifty. You can use in all sorts of different scenarios, I’m imagining, especially,
if you’re taking in images, doing content filtering, or searching for
just certain things such as, I remember the story
of someone using these things to do cucumbers, that were coming down a line
and they were detecting good cucumbers versus bad
cucumbers in different ratios, but now they can see where
it’s at inside that photo.>>Yes, see which
you could take for about 10 cucumbers
and see where it is.>>Or cucumber and a carrot.>>Cucumber and carrot.>>Then, it would tell
you both of them?>>Yes. But, actually a really
good real world use case for this is we have app Seeing AI
app that we built out vision impaired people
for normal piece of work, and not does currency detection. So, if you imagine you
could train this up on five pound notes,
10 pound notes, one dollar bill, $10 bills, you lay out some money
on the table, and you take a photograph with that, and it can say right. I think there’s
a one dollar bill, or one dollar bill or $10 bill, you have got $12.>>Got it.>>So, yeah, she can add up
how much currency you’ve got. Seeing AI does that for about five or six different
currencies at the moment, but you could build
a similar thing with this.>>Cool.>>Obviously, I’m dealing with cuddly toys because I think
they’re a bit more fun.>>I do love
cuddly toys. Where do we get started? Where do we do?>>So, the place to get
started is customvision.ai, and we have customvision.ai. It’s our portal for
training these models, it’s pretty much
free to get started, you can go to
Holum models for free, and we love freedom,
one last free.>>Free is good.>>So, you come in here,
you create new projects, and what we saw last time was you could create
classification with a domain. This time, there’s now this object detection
preview option.>>Yes, this is new.>>You’ve got model.>>This is the new hotness. Now, I’m going to user
a project I’ve already created because these models
need a bit more data. With image classification, you need five or six pictures
as a minimum, with objects detection, you
need about 15 pictures.>>Total or per?>>Per object.>>Got it.>>You can’t just
give it the pictures, you have to tell it
what’s in the picture. So, for example, I’ve
got one here of Portie, my daughter’s most favorite toy, she calls it baby mamaw because naming she’s a bit of a program
that comes to naming, she’s also great at it. In this picture, I uploaded
it, and notice you can see, I drew this bounding
box to say that this object is Baby Meow Meow, this part of the picture here.>>Got it.>>So, I focused in
deep on the object. that means if I had
a picture with say two, I could then tag both
in the same image.>>So, you drew those boxes or?>>Yes, I drew those boxes.
I’ll show you now. I can click “Add Image”, and then I can choose an image here.>>Cute.>>This will then add this image. I upload the image, I
don’t tag it in line. I upload images first, and then I can find
the untagged images, and I can go in here, is going to try and
detect something so, I’ll let it choose
that, and then I adjust the box myself.>>Wow, it’s cool.>>I’ll bring in nice and tight for the object that I want, and then I say this
is baby polar bear.>>How important is
that accuracy right there. You said you’re going
to get nice and tight, is that really important
in this kind of detection?>>Yes, because otherwise
you could pick up noise from around the image.>>Got you.>>So, if I were to leave this is quite a big board around here, it may not be good at
detecting polar bear, it might be very
good to detecting things sitting on top
of an orange chair.>>I see.>>So, I want to bring
it in quite tight. I also have should really
consider rotation. This trains the polar bear
being the right way up, but of course if I take
a photograph of the polar bear->>Upside down.>>-upside down, it
won’t detect it. So, ideally, I
should try and take lots of pictures and
different rotations, but for now, I’ll just
do it right way up.>>Sure.>>That’s it. I’ll
just do that and then because of my zoom, it’s form off the bottom
of the screen. Let me just zoom this. There’s not going to do,
okay, it’s done it. Cool.>>Autosave.>>Autosave also save on
that one is for zero. Cool. They want to do that, I can just click
the “Train” button, and that will then retrain
my model using that data. I won’t click “Train”
now because this one’s a little bit slower with
the image classification. It trains like 20, 30 seconds. This sometimes takes
a couple of minutes, and you don’t want to see us sit here waiting
for the train.>>Got it. Cool.>>So, that’s how I create it, and I’ve got a whole lot of
detected images in here. I’ve got 30 of baby duck, which is the duck here, 23 of the cat, 29 of monkey, and then
43 of the polar bear. So, again, it’s not that much
data that I have to put in. It’s fairly dull drawing
all these boxes, but once it’s done
once, it’s done.>>Yeah, we’re good.>>Yeah. I’m a good, all done. Now, if I want to test this, I can then the same as we saw
with the image classifier, I can either give an image URL,
or I can upload an image.>>So, you take
any image that you have anywhere and then
boom, you get there.>>So, this is why I took on the channel line desk just now.>>Right here.>>There we go. So, we’ve got baby monkey
with 84.8 percent, and baby polar bear,
93.5 percent.>>You can see the probabilities
down here of its guests.>>Yeah. we have
a probability threshold. So, you can adjust
the threshold on this quick test here
because, obviously, you may get some of
the it thinks is X, thinks it’s Y. I created the model for currency detection, and I wasn’t very good
with my bounding boxes. So, it actually detected with low probability
10 pound note, and then 10 pound note
slightly offset, and 10 pound note slightly offset because it picked up
some background data. So, you need to adjust these thresholds to see.
But, that’s the quick test. So, this has got it there, and it’s detect the objects, and you can see by
these red boxes, we have the actual location.>>Yeah. That’s really
cool and you can even see there’s stuff in the background,
but it’s ignoring it. It’s really focused on.
That’s really cool.>>Yeah.>>So, I get it into
the app, I guess.>>So, this is what’s really
cool is you get this into the app exactly the same way you got the image classification
into the app. Literally, it is
the same API code.>>Got it.>>I mean literally. So,
there’s the NuGet Package. The marks of Azure
Cognitive Services vision, custom vision prediction
to get package.>>Short and sweet.>>Short and sweet,
straight to the point.>>It’s we’re really
about declarative names, so you know exactly what
you’re getting and this could not be more verbose.
It’s very nice.>>But, it’s nice because this is the Azure Cognitive Services, it’s the vision part of it,
obviously, we have speech. On division, we’ve got face,
you’ve got custom visions.>>It does make sense.>>I got prediction training. So, if the training
you do in the portal, you can also train
from a NuGet Package, well, if you wanted to, so you could take that training
on the road as it were.>>Got it.>>So, I could actually train for my phone if I wanted to. Now, I will just highlight
this is currently preview. This package is version
0.10.0 preview. So, if you use this, don’t forget to take
the use preview package option.>>Got it.>>So, you bring
this package in. I creates a new thing called
a prediction endpoint, and I passed this
a prediction key. I’m not going to show my key, this is a key that
is owned by you, this is the key you
get from the portal.>>Okay. So, that came from the portal in here.
I just grabbed it.>>Yes. I can click on
“Settings”. I’ll see my key. There’s two keys you get. There’s a prediction
key, the training key. So, of course, I can put the prediction came
out to predict, and the training key
in a training app if I was building two
different apps there. There’s also project ID because
I have multiple projects, each have the same
prediction key, but different project IDs, which allows you to made that up multiple different models that you can access in the same app.>>Got it.>>It’s pretty good. So, I create my prediction endpoint
with my prediction key. I take my photo. I’m using
the Xam Media Plugin, which is really simple to use. It’s in one line of
codes to take a photo, and then I call Predict Image, I pass it in my projectID, I pass it in my stream, and that spits back predictions.>>Okay. So, what does it mean? What is a prediction?>>So, a prediction is
a combination of the tag. So, the name, I have got baby
polar bear, baby monkey.>>Okay.>>It is the probability, the amount of which it was seen. So, if I just flip back
here- let me try it, let’s bring that
quick test back up. Again, it’s the probability, which is the figure that pops up. In just one second. There you go. So, I get back this tag, I
get back the probability.>>Is it going to be returning, all of your tags or are just?>>Yes.>>Okay.>>So, I’m sorry, for
the custom visual monotones, or your tags for
the objects detection, it has all the tags for
the ones that it finds.>>Oh, okay.>>So, if you think, we’re
seeing two tags here, whereas I’ve trained
this for four tags.>>Got it.>>The duck, the cat,
the monkey, and polar bear. I’m just getting the two,
because it’s all it’s detected, and then there’s probabilities.>>Okay.>>As well, I’ll also
get the bounding box, which comes as a percentage
distance across the image. So, this is about half way. So, I’ll get bounding boxes that’s like naught point
five, naught point two, and it’s got a width
of naught point four and a height of naught point eight or something like that.>>So, if you wanted to end
up, having some cropping, image thing, you could then, use that information to
crop out the image?.>>Yes.>>Cool.>>So, that’s pretty much
all I do is call it up. I then filter out my predictions based of a probability of, in my case, I’m using 75 percent because, I don’t
want to pick up some.>>Do you care about, whatever’s
the highest predictions, and if there is none, that
are above 75 percent, they should probably take
another photo I guess.>>Yeah. I’m assuming
those are probably errors. I could obviously adjust
that to suit based on my criteria but I’m
assuming those are errors. That’s literally all I do. Then I’ve put this into an app, I’ve put in some SketchUp Code to actually draw the
bounded boxes on screen. So, should we try it out?>>Yes. Sure. Let’s
do it. Let’s do it. So, you’ve added
those NuGets into both, the .NET standard and your iOS
and Android app, right?>>Yes.>>You install it everywhere.>>Yep. Install
all the thing everywhere.>>Install all the things everywhere. That’s great. Got it.>>Yep, and there’s
no initializations to all my codes running stuff so they don’t list
on their project. I don’t have to do
a blah-blah-blah dot in it inside my iOS or Android,
or happens in .NET.>>This is literally
a prediction endpoint, predict image data. That’s literally all of the code.>>Literally it. That’s it. Yes.>>Very impressive.>>So, I look at my code,
look at my endpoint. Then, I would just do a standard Android launchy thing and in a minute, will see my app.>>Oh cool. Threshold.>>This is my App. Yeah.>>So, let’s take a photo
and I’ll do my best. Let’s line up the monkey without a monkey in the background. There we go. Let’s
take this photo. Then it’s going to
send this photo. It’s going to send
up to the Cloud. Then it’s going to
make this prediction. This is what makes a prediction,
that’s my project ID, my photo as a raw stream,
does its thing. The very first time
you make the call. It’s a bit slow and then repeated calls get quicker
because it kind of loads that model up into whatever infrastructure
behind the scenes. Then you have to restart it. Okay. So it made the code, it’s taken photo, send to
the custom vision service. It’s run it through
its neural networks and it’s brought back
all the predictions.>>Cool.>>So, we can see
these results here. Let me just zoom in a bit. So, it’s picked up. There’s a one percent chance that Baby Duck is on
some of the image.>>I don’t see a duck here
that also makes sense.>>Yep. There’s no duck.
What else do we got. We’ve got a four percent chances that there’s the duck
somewhere else. So, you notice the duck
was there twice.>>Because there’s two of
them that its detected.>>Yeah. So, in the bounding box, there’s a bounding box here, point four zero, point three, point four that’s where
it thinks baby duck was.>>Cool.>>If I then have a look
at the other one, the bounding box of
this one point four, point four, point
three, point five. So it maybe it thinks that the polar bear
possibly could be the duck and maybe
the monkey could be the duck at
a very low percentage.>>Got it.>>So, we’ve got
the different baby mammal the cat, two percent. Five percent for the cats. Probably not the cat, let’s see we got it
further down here. Here we go, 87 percent
for the polar bear. Then, let’s see if you’ve
got the monkey anywhere, 71 percent for the monkey.>>Pretty good.>>Yeah. Pretty good. So I got
a threshold of 75 percent, so it’s probably not going to pick the monkey at
this time around. But I run this through and
then as you can see here.>>Ah, very cool.>>What it’s done is to take the polar bear and it’s giving me back the coordinates
where the polar bears is. Now, the monkeys said 71 percent. I set my threshold to 75. I could then obviously
adjust my threshold down, let’s put it down to 58 percent for example, I take a photo.>>The studio live in action.>>Yeah. See it happening
and then I run that through. It will make the call. It should hopefully be
slightly quicker the second time. Maybe a little bit quicker.>>There you go.>>But, again let’s just run this and see what it picks on screen. This time nope.>>Still didn’t get it.>>Still didn’t get the monkey. Slightly from bounding
box thing on the polar bear.>>Yeah, it’s pretty cool.
Yeah, the different detection, probably based on the angle
that you took it, it was there and that seems like kind
of makes some sense. There’s also some
background noise that probably wasn’t
introduced previously.>>Yeah, it could be if I kind
of swapped the two over I might get a different result because of the background noise. But these things, just like
with a custom vision service, these images will
then fade back up. So, here’s all the ones
that I used before.>>Got it.>>So, here we have
the studio shot. There we go.
Baby monkey 54 percent.>>So close if you
put it that way.>>Fifty-eight percent. You can make 54 percent.>>Very cool.>>But, of course, let me
just tweak the bounding box. The bounding box is
not quite right. That’s just the polar bear. Then I can do the
same on this one. I can then tweak the boundary box is actually pretty
good on this one.>>Yeah, it’s actually
really close.>>Pretty good.>>You can kind of
constantly improve the detection algorithm here based on the actual photos
that you’re taking.>>Yes. So I can keep
retraining my model. I can click “Train” again. Wait a couple minutes
for it finished training and then take the photograph and
next time it will take into consideration
the color of the desk, the studio lighting, and probably give
me a better result. I can do that for
all the shots that I’ve taken.>>Now, one question for
you really quick I know in the other image classification. There was like a core ML model
and TensorFlow model. Is that something in this or
is it completely different?>>In theory, you can
create those models. You can create TensorFlow and CoreML object detection models. We don’t have that available from Custom Vision Service
at the moment.>>Got it.>>I have no idea where
the plains do that or not. I can’t comment on that.>>The question is does CoreML
even support that I guess.>>Potentially, yes. Yes
you can build those models.>>Got it.>>But we obviously don’t have that capability at the moment. I’m hoping one day we do.>>Got it.>>Maybe come back
and show us how to export CoreML and TensorFlow.>>It would leave to be the
same as image classifier, we’ve click the
export button and get the model in one CoreML.>>Very cool. You
can just do this all today at Custom Vision AI.>>Custom Vision AI, you
can do it all today, you can do as part
of the free trial.>>Cool.>>So this will not
cost you anything, you get so many thousand calls. I don’t know the exact numbers. But, it’s so many thousand calls, so many hundred of images you
can use to train it with. Yet you can get going
head to customvision.ai. You can just get going, now.>>Cool. Can people
get the source? Could we put that up on the GitHub or what’s
the plan there?>>This source code
is up on the GitHub, we’ll stick a link in the show notes of how to get
the source code for this. Yeah, you can build app, you can just plug
in your own IDs, put in your own prediction keys,
your own project ID, fewer models, take a photograph, and it will show you where on the image your photo actually is. All that will be
in the source code.>>Awesome Jim, I love
it. I kind of like the next evolution of taking it a bit further and I really like the drawing of the boxes
on it. Very cool.>>It’s very cool.>>Awesome.>>I can’t wait to see
what I’ll be doing here six months time with
the next evolution after this.>>Exactly. Well, thanks
for coming back on and showing us all this new hotness.>>Thank you for having me.>>Absolutely. You can find all the links in
the show notes below. Don’t forget to rate, subscribe, ding that bell to become a member of the
notification squad. So you get the Xamarin show
right in your inbox each and every week until next time
I’m James Montemagno. This is the Xamarin Show.
Thanks for watching.

9 Comments

Add a Comment

Your email address will not be published. Required fields are marked *