FoodSAM

FoodSAM: Any Food Segmentation

University of Chinese Academy of Sciences

Abstract

In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingredients in food can be supposed as independent individuals, which motivated us to perform instance segmentation on food images. Furthermore, FoodSAM extends its zero-shot capability to encompass panoptic segmentation by incorporating an object detector, which renders FoodSAM to effectively capture non-food object information. Drawing inspiration from the recent success of promptable segmentation, we also extend FoodSAM to promptable segmentation, supporting various prompt variants. Consequently, FoodSAM emerges as an all-encompassing solution capable of segmenting food items at multiple levels of granularity. Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images. Extensive experiments demonstrate the feasibility and impressing performance of FoodSAM, validating SAM's potential as a prominent and influential tool within the domain of food image segmentation.

Visualization Comparison

Visualization on cross domain senarios. It is performed panoptic segmentation by FoodSAM.

Visualization comparison with baseline and ground-truth on semantic segmentation. The difference is calculated between the enhanced and coarse.

Visualization comparison with SSA and RAM on semantic segmentation. SSA and RAM may output the instance information and their results are obtained by the public repository, here we only discuss semantic results on food.

Visualization comparison with RAM on instance segmentation. RAM may output the non-food instance information, here we only discuss the semantic results on food.

Visualization comparison with RAM, SEEM and ours on panoptic segmentation. The visualization results are obtained from their public code repository.

Visualization results on promptable segmentation. From left to right: input, double point prompts, double box prompts

@misc{lan2023foodsam, title={FoodSAM: Any Food Segmentation}, author={Xing Lan and Jiayi Lyu and Hanyu Jiang and Kun Dong and Zehai Niu and Yi Zhang and Jian Xue}, year={2023}, eprint={2308.05938}, archivePrefix={arXiv}, primaryClass={cs.CV} }