An Open Source Computer vision model to identify the Australian Aboriginal Flag
This project was based on a Hacker News discussion, thank you for all your input!
I've been recently paying attention to the #freetheflag debate, in short;
The Aboriginal flag of Australia is widely used by indigenous Australians as a symbol of their heritage. Though, the flag is actually copyrighted by an indigenous individual who has exclusive control of the licensing rightfully. This has become a debate because a lot of Aboriginals believe they should have a right to print or copy the Aboriginal flag as they would like.
(Just a quick shout out to my mob, the Kuku Yalanji people of Far North Queensland)
Over the years I've been trying to learn machine learning but never got anywhere because I couldn't think of a use case. I recently read a cool post from Clothing The Gap which gives an overview of the current copyright debate. They had an image that contains the "Aboriginal flag" done by a European artist several years earlier and how this could maybe be used to invalidate copyright as the design was perhaps already in existence. This gave me the idea to think about if there were perhaps other artworks throughout history that may have contained the flag design.
My main idea was that if I could use machine learning to train a computer vision model to find Aboriginal flags. I could then run it over historical archives of images/paintings to see if I can find any other places the Aboriginal flag seemingly appeared throughout history. Such that in a court case one might overturn the copyright by simply saying they were printing an "Indonesian symbol from the 14th century" (just a potential example)
If you look at the top left of the image, you will see an Aboriginal flag in this painting. I considered my model training a success once it could find the flag in this sample
It does actually work and as you can see in the above image, the model is able to draw a bounding box around the "flag".
I've only scanned 100,000 historical images so far and yet to find any pre-existing artworks that contain the flag. I still have a couple of million images to get through and hope to add a couple million more.
But here is a gallery of false positives, images that the model thought were aboriginal flags but not quite... (if you look at the images for long enough you can see why maybe the model thought it was an aboriginal flag)
I've also saved some of the resulting data in a table for anyone who wants to take a closer look.
I will keep working on the project to improve the results, and all of the code is open-source and free to use.
The rest of this post is for people who would like to run the code themselves and learn how to train an object recognition model from scratch. It is less than 20 lines of code in total and I've made everything as simple as possible with all resources available in the repo.
If anyone would like to help me train a better model then please reach out!
I had no prior experience in computer vision, so I had no idea how I might train a model to do this. I managed to do it in a week, it is super easy for anyone with a bit of programming knowledge. The CV community is big and beautiful and I managed to get my idea working with PyTorch in a night. (I spent a few nights on Tensorflow and didn't get very far)
This tutorial is self-contained and can be found in the repo.
Again, the tutorial contains very little code thanks to a few open-source projects it depends on.
I also had a problem with the complexity of the language in the CV community so I'm going to purposely oversimplify things here.
This is super easy and you could likely have it working in under an hour. (Then add ML to your resume)
We are going to split the tutorial into three steps;
- Classification - We need to manually draw boxes around the objects we are looking for in some sample images. The machine learning will use this human-curated data to train itself.
- Training - Once we have a classified data-set of images, we can use PyTorch to train a reusable model.
- Identification - Now that we have a model, we want to see if it can correctly find the desired object in a given sample image
Let's do it!
# You will need python3 and pip3 installed git clone https://github.com/australia/aboriginal-flag-cv-model cd aboriginal-flag-cv-model pip3 install -r requirements.txt
For the purposes of this tutorial, we are just going to train a model to find Aboriginal flags. But after you've finished this, you should be able to train a model to detect any object you would like. (Simple things, not hard things like if a person is sad).
So the initial classification is a human step, but it's kinda fun to do and will help you understand what the model can detect.
We start with an
images folder which is in the repo.
/images 1.jpg 2.jpg
Essentially we have to use our monkey minds to draw bounding boxes around images that contain the desired object we are looking for.
And generate an associated XML file for each file that describes those bounding boxes.
After we are finished our directory should look like
/images 1.jpg 1.xml 2.jpg 2.xml
The easiest program to do this in (and a kind of nostalgic UI) is called
You will have to figure out how to install and run it yourself.
Once open, point it at the
images folder from the repo, once you figure out how to use the program, you will start drawing boxes and saving the XML to the
images directory. And by the end of it, it should look like the directory structure above.
The XML contains a label that you will be able to define when drawing bounding boxes. The model will require you later to use the same label in the training, for this example you should just use the label
The way you draw your boxes does change the outcome of the model, for the Aboriginal flag I tended to;
- Leave a bit of outer space around the shape of the flag
- Choose images at all angles and depths
- Didn't worry if a limb or object was in front of the flag
- Chose real flags, paintings of flags, full-scale images of the flag
- A mixture of single or multiple instances of the object
Once you have your images and associated XML files generated, you are ready to start training.
If you get too lazy to classify the 40 images in the repo, just copy the files in
images. I do recommend classifying them manually your self to see how small nuances might influence the learning model. Choosing images of different shapes, colors, angles, sizes, depth and so on will make your model more robust.
So next we want to generate a model, and PyTorch/Detecto makes this easy by letting us generate one file to store all of our learned data in e.g.
We point PyTorch/Detecto at our classified data set and it should spit out a
model.pth which we will use later to find our object (flag) in samples.
The entire code to go from the
dataset(folder of images and XML) to a
reusable object recognition model is below.
WARNING: You might need a decent computer to even run 3 epochs let alone 6 so if you just want to get a copy of a decently trained model, just download my pre-trained model.pth
# train.py # Import detecto libs, the lib is great and does all the work # https://github.com/alankbi/detecto from detecto import core from detecto.core import Model # Load all images and XML files from the Classification section # or point it at "images" if you classified your own dataset = core.Dataset('images_classified/') # We initalize the Model and map it to the label we used in labelImg classification model = Model(['aboriginal_flag']) # The model.fit() method is the bulk of this program # It starts training your model synchronously (the lib doesn't expose many logs) # It will take up quite a lot of resources, and if it crashes on your computer # you will probably have to rent a bigger box for a few hours to get this to run on. # Epochs essentially means iterations, the more the merrier (accuracy) (up to a limit) # It will take quite a while (15 mins?) for this process to end, grab a wine. # If you get process killed, this process is memory bound. Try increase your disk swap ram. model.fit(dataset, epochs=3, verbose=True) # TIP: The more images you classify and the more epochs you run, the better your results will be. # Once the model training has finished, we can save to a single file. # Pass this file around to anywhere you want to now use your newly trained model. model.save('model.pth') # If you have got this far, you've already trained your very own unique machine learning model # What are you going to do with this new found power?
To run it from within the repo;
python3 train.py // Should output a file called model.pth
The PTH file type is primarily associated with PyTorch. PTH is a data file for Machine Learning with PyTorch. PyTorch is an open source machine learning library based on the Torch library. It is primarily developed by Facebooks artificial intelligence research group.
(If the above code didn't run for you, please make an issue.
Now onto the fun part, let's see if our generated model can find what we are looking for!
So now we should have a
model.pth and a
samples/sample.jpg in the repo, let's run it to see if our model is smart enough to find the object.
Finding the objects coordinates in the picture is easy, but we also want to draw a box around the coordinates which requires just a bit more code.
To run it from the repo
The code for that file is below, I've commented in how it works.
# findFlag.py from detecto.core import Model import cv2 #Used for loading the image into memory # First, let's load our trained model from the Training section # We need to specify the label which we want to find (the same one from Classification and Training) model = Model.load('model.pth', ['aboriginal_flag']) # Now, let's load a sample image into memory # Change the file name below if you want to test other potential samples image = cv2.imread("samples/sample.jpg") # model.predict() is the method we call with our image as an argument # to try find our desired object in the sample image using our pre-trained model. # It will do a bit of processing and then spit back some numbers. # The numbers define what it thinks the bounding boxes are of potential matches. # And the probability that the bounding box is recognizing the object (flag). labels, boxes, scores = model.predict(image) # Below we are just printing the results, predict() will # give back a couple of arrays that represent the bounding box coordinates and # probability that the model believes that the box is a match # The coordinates are (xMin, yMin, xMax, yMax) # Using this data, you could just open the original image in an image editor # and draw a box around the printed coordinates print(labels, boxes, scores) # WARNING: You don't have to understand this part, I barely do. # All this code does is draw rectangles around the model predictions above # and outputs to the display for your viewing pleasure. for idx, s in enumerate(scores): if s > 0.3: # This line decides what probabilities we should outline rect = boxes[idx] start_point = (rect.int(), rect.int()) end_point = (rect.int(), rect.int()) cv2.rectangle(image, start_point, end_point, (0, 0, 255), 2) cv2.imshow("Image" + str(idx), image) # Press a key to close the output image cv2.waitKey(0)
If you are having a good day, an image should have appeared on your screen. And if you are having a lucky day, then the Python script should have also drawn a rectangle over the image.
That is all there is really, you obviously can just take the outputted prediction data (boxes and scores) and save it to where ever you would like e.g. a database.
If something didn't work feel free to complain in the tutorial repo issues.
I do hope it worked, those steps above worked for me. I drew an Aboriginal flag on paper and took selfies at many angles and the model picked it up. (I manually classified 150 images instead of 40 though) (and if I call recall correctly, around 10 epochs)
This tutorial is meant to be a complete noob guide (written by a noob), how I've described things, and the way they are in computer vision - are two different things.
This task has allowed me to introduce myself to the computer vision sector and I'm sure I will learn more over time.
The difficulty of trying to identify objects differs by magnitudes depending on what you are trying to achieve.
p.s. do not invent Skynet
Thanks to Andred Burchill for testing.