top of page

Semi-Conductor  Irrigation  Informed  by

Self-Supervised

Object  Detection  &  Segmentation

This project is divided into 2 main sections, Software and Harware. Software is detections on Plant and Terrain, utilizing Object Detection, Segmentation, PCA, SVM, and etc. While hardware is the machine that is built. In this case, the implementation of the machine is not only a follow up of the Software section, but it reveals results collected by the software.

image.png

Strategies  for  Desert  Plant  Conservation

Plant

Object Detection & Segmentation

In order  to tackle the problem of unsupervised object detection and segmentation with a simple cut-and-learn pipeline. This method builds upon insights from recent work. First, we propose MaskCut that generates multiple binary masks per image using self-supervised features from DINO. Second, we show a dynamic loss dropping strategy, called DropLoss, that can learn a detector from MaskCut’s initial masks while encouraging the model to explore objects missed by MaskCut; Third, we further improve the performance of our method through multiple rounds of self-training with our own dataset.

Screen Shot 2023-10-23 at 01.09.21.png

In this project 25 species of desert plants images that was collected and created to a dataset is tested in this experiment. Before training the model using this dataset, we process the data first. The initial step involves resizing all images to a standardized resolution of 480*480 pixels. Subsequently, noise reduction techniques are employed. Both steps enhance data quality, refining the images and mitigating unwanted visual artifacts. Then, we use seven distinct enhancement techniques, each tailored to unveil specific aspects of the photos. These techniques include histogram equalization, Laplacian-based enhancement, logarithmic transformation, gamma transformation, Contrast Limited daptive Histogram Equalization (CLAHE), Single Scale Retinex (retinex-SSR), and Multi-Scale Retinex (retinex-MSR). Through their collective application, these methods enrich the dataset's visual content and offer a more nuanced perspective on the intricate details of desert plant features. 

ViT

Screen Shot 2023-10-22 at 00.34.19.png

ViT is short for Vision Transformer. First, the image is divided into small pieces, and then each small piece is going enter the transformer as a token. However there is no sequence problem with the input. For a picture, however, each patch is in order. Thus, by analogy with Bert, a positional embed (sum) is added to each patch embed. At the same time, the final output also borrows from Bert, replacing the whole with 0 and classes; The corresponding embed is the final output.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Maskcut

Screen Shot 2023-11-06 at 00.48.34.png

Wang, X., Girdhar, R., Yu, S. X., & Misra, I. (2023). Cut and learn for unsupervised object detection and instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

MaskCut can discover multiple object masks in an image without supervision. We build upon and create a patch-wise similarity matrix for the image using a self-supervised DINO model’s features. 

 

This project applied Normalized Cuts to this matrix and obtain a single foreground object mask of the image. We then mask out the affinity matrix values using the foreground mask and repeat the process, which allows MaskCut to discover multiple object masks in a single image. 

Self Training

Labeled Data

Train a supervised model using labeled data

Unlabeled Data

Make predictions on unlabeled data using the model from the previous step

Pseudo-Labeled Data

Take predictions satisfying probability threshold or k_best criteria and add them to the pseudo-labeled set

COMBINE and train the next version of the model

Make predictions on the remaining unlabeled data using this model

Take predictions satisfying probability threshold or k_best criteria and add them to the pseudo-labeled set

Unlabeled Data

Pseudo-Labeled Data

 Evaluate the final model on test data

Repeat the process until all data has been labeled

Terrain Detection

For Terrain Detection, the two main methods that we utilize to modify the data is Principle Component Analysis and Support Vector Machine. 

Principle Component Analysis

Principal component analysis(PCA)is a highly utilized and effective method for reducing dimensions, primarily employed to analyze and visualize intricate data sets with high dimensions,  as well as for compressing and preprocessing data. Principal component analysis can synthesize related high-dimensional variables into linearly independent low-dimensional variables, called main components. PCA aims to retain the maximum amount of information from the original data. When it comes to collecting terrain data, terrain data can be high-dimensional. PCA can be utilized for various purposes, including reducing dimensionality and eliminating noise. 

Support Vector Machine

Screen Shot 2023-10-23 at 11.38.13.png

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

After functioning the data with PCA, we will then function it with SVM. What we basically do is that we first map the eigenvector of the instance to some points in space, for example these solid and hollow points in the figure on the right, which belong to two different classes. The purpose of SVM is to draw a line that "best" separates these two types of points, so that if there are new points in the future, this line can also make a good classification. And we would want to find the best fit line that creates the largest margin. With a greater margin it leads to a greater robustness. The reason for choosing SVM is because we want the process 3 dimensional data points.

Take a Closer Look at the Hardware...

WechatIMG674.jpg
image.png
image.png
image.png
image.png

Testing Results

Screen Shot 2023-10-26 at 19.12.17.png

For model parameter tuning, the learning rate Ir 1e-2 was used at the beginning, and then the learning rate was modified for model tuning.

image.png

Learning rate Ir 1e-2 (total_loss:1.12 accuracy:0.872)

When the trend of total_loss reaches the lowest point, the whole trend does not stop, and the overall trend declines first, then rises and then falls to a relatively low point. The reason is that the lr number is too large, and the gradient of the model fluctuates up and down. According to the significant downward trend of the total_loss graph, modified Ir and step. We altered the original 4k step into 50k.

Screen Shot 2023-10-21 at 17.20.49.png

Learning rate Ir 1e-3(total_loss:0.4173 accuracy:0.9289)

Screen Shot 2023-10-21 at 17.20.59.png

Learning rate Ir 1e-4(total_loss:0.5567 accuracy:0.872)

By analyzing the graph above. The mean value of total_loss is 0.487, and the mean value of accuracy is 0.90045. This indicates that our model converges around 0.487, and we can reach the optimal solution, and 0.90047 is the best accuracy.

image.png

Top left corner: Sand data distribution after dimensionality reduction by PCA
Top right corner: Soil data distribution after dimensionality reduction by PCA
Bottom left corner: Rock data distribution after dimensionality reduction by PCA
Bottom right corner: The distribution of all data after PCA reduction (Yellow: Soil, Purple: Stone, Blue: Sand)
(Graph is created with matplotlib)

​

The graph after PCA is the data distribution after dimensionality reduction on different surfaces. The original real-time data is not the matrix data in a row (n,3), and then every 100 data pieces are truncated to get a matrix of (n/100,100,3), which is equivalent to n/100 data pieces,100 refers to the number of channels. And three refers to the number of features per channel. Here, the shape of PCA action is (n,f), so (n/100,100,3) into (n/100,100*3) matrix, and then PCA dimensionality reduction, retain two positions, as for the reason for maintaining two dimensions: can do visualization.

​

After dimensionality reduction using pca, most of the data in the whole dataset is between +-500 and +-2000.

This project introduces a novel strategy for irrigation in arid regions via unsupervised plant detection with the utilization of Mask R-CNN and real-time terrain detection with the assistance of PCA and SVM. Despite the model's limitations in specific situations, light could still be shed on the impact of this paper. The formation of deserts in arid and semi-arid areas of China and modern desertification have gone through many stages, which can not only be related to human influence. Action in this area is therefore necessary. We hope this strategy can be widely used to solve this phenomenon or assist as a starter of this change. While many enterprises today are already focusing on plantations in desert areas, more technology should be utilized to solve this global crisis.

bottom of page