'Text2LIVE' that automatically processes images and videos just by ordering with text

When processing an object in an image or video, such as making it a different color or adding a special effect, it is common to use image and video editing software to specify the object and adjust the effect to be processed according to the object. and troublesome operations are required. `` Text2LIVE '' that can process such images and videos by `` just specifying with text '' using machine learning has been released on GitHub.

GitHub - omerbt/Text2LIVE: Official Pytorch Implementation for 'Text2LIVE: Text-Driven Layered Image and Video Editing' (ECCV 2022 Oral)


Text2LIVE: Text-Driven Layered Image and Video Editing

Text2LIVE uses `` zero shot learning '' which is machine learning to predict what the machine has never seen, identifies the specified object from the read image, and adds the specified effect. The key idea is to add an editing layer (color and opacity) that is composited with the original input, rather than directly generating the edited output result.

The author gives several examples of what kind of edits can actually be made. In the image below, after loading the image of the cake on the far left, it is called ' Oreo ', which is sandwiched between chocolate cookies with white cream, French sweet bread ' Brioche ', 'Ice', and 'Spanish moss'. Plants, which change the appearance of the cake. Just by inserting text instructions, the cake is processed as if the material specified in the text was used. On the other hand, cake identification is also performed automatically, so you can see that the eggs and bags in the background are also colored with 'Oreo' and 'ice'.

Below are two small birds edited into `` crochet '', `` wooden '', `` gold '', `` stained glass '', etc.

Also, below, you can see that only 'dolphins' among 'dolphins jumping from water surrounded by rocks' can be processed accurately. By specifying 'wooden dolphin', 'golden dolphin', etc., not only dolphins are identified from the original image and their appearances are replaced, but also by specifying 'killer whale', automatically 'dolphins like killer whales' It is processed by judging that 'replace with'.

In the image below, by entering 'ball' in the command, only the ball held by the child is changed in appearance. In this way, it seems that not only the main object is distinguished from the background, but also the type of object is identified.

Similar text editing can also be done on videos. Click the image below to see a video of a jewelry brand Swarovski -style swan or a hand-knotted swan swimming smoothly.

In addition to processing the whole body of the giraffe, there are techniques such as processing only the neck by specifying 'neck warmer' and 'colorful mane' ... ...

In the video where the car is running, various command techniques are shown, such as processing the car and the background separately by specifying 'cyberpunk neon car' and 'countryside at night'.

In addition, Text2LIVE also has a function called 'Semi-Transparent Effects' that extends the appearance with a transparent background layer instead of adding the appearance of objects and backgrounds. You can add fire in the bear's mouth without changing the appearance of the bear itself by entering 'flame coming out of the bear's mouth' in the image of the bear, or add 'heart latte art' in the image of coffee to create latte art. By adding 'cigar smoke' to the smoke edit layer, you can make it look like smoke is coming out of a cigar. I can.

According to the creator, the amount of VRAM required for using Text2LIVE depends on the size of the images and videos to be input, but 32GB is recommended. Also, although the accuracy of image processing includes some unstable parts, it seems that it will be further improved in the future.

in Software,   Design,   Art, Posted by log1e_dh