More gear tends to result in poorer photography.
It is far better to get one really solid stand on wheels so it is easy to move and one C-stand. Get a c-stand with the extra grip knuckle and 40" bar. You can never have too many, but you can do a lot with two light stands.
A simple background is good. Very easy to select and swap for something else in post processing. I like a simple gray, but white can work well too and might be the better choice. Light it evenly and it is white. Grade a light across it with less exposure and you have a gray gradient. You can put a gel on it for color. Or let no light fall on it and it is black.
There are many background support options. Two c-stands with cross bars work well as a background stand and are versatile for other purposes, too.
If it were me, I would start with one good modifier. For me, that would be the very versatile Elinchrom Deep Octa, but you should get the one which most inspires you. By using only one, you will learn its subtle capabilities. Once you have done this, you will know which to get next and you will know how to make each piece dance with your guidance. A rectangular box may be more useful for products.
If you have the space, there is much to be said for a camera stand over a tripod--very stable, almost impossible to knock down, goes all the way down and pretty high up.
A four to six step ladder you can safely stand on to shoot, and for other purposes, is helpful.
Clamps are always useful. You don't need fancy, just functional. And gaffer tape.
Card stock in black and white is most useful. You can cut it to sizes you need.
Assembling a pair of v-flats, black on one side, white on the other, doesn't cost a lot but is an incredibly valuable tool.
Godox gear is fine. It is inexpensive and capable. If it fails, you can replace it. And you can do that a few times at the lower prices. I prefer Elinchrom. I like the modifiers. I like having a 1000+WS battery powered unit I can use in studio or in the field which is solidly built and can deal with the elements.
I do use more at times, but one light with one modifier, a v-flat, perhaps some card stock and a background is all I use for most things. I believe that is true for virtually all the images in my IG:
There may be a fill light from camera position in some, but the images where that is true could have been done without.