Stability or someone like it will valiantly release this technology, again and there will be absolutely no harm to anyone.
Stop being so totally silly Google, OpenAI, et. al. - it's especially disingenuous because the real reason you don't want to release these things is that you can't be bothered to share and would rather keep/monetize the IP. Which is ok -- but at least be honest.
The thing about owning the data sets and the huge TPU/A100 clusters is that the “publish the papers” model strictly serves them: no one can implement their models, they can implement everyone else’s.
I do understand the fear of being sued or targeted in the media over misuse, though. The person misusing technology should (obviously imo) be held responsible for that, but since it's new tech, the tech will be taking the blame for the first really controversial cases of disinfo and/or harassment that utilize it.
I can already see it. Just think of all the energy wasted training AI at home! I can imagine police drones with IR sensors scanning the cities for the heat signatures of illegal AI "farms".
Talking seriously, however they try to spin it, advanced AI (same as every other big scientific/engineering achievement) will be predominantly good. So let's say there is time when AI can create convincing videos of people engaging in various compromising "activities". When this becomes widespread it will give plausible deniability to any potential victim of such attack(with real or deep faked materials) .
In a world where any compromising video or picture can be made with anyone, the value of such materials for wannabee blackmailer diminishes rapidly. However, in a world where there is only few entities that can produce such materials and they do so sparingly those entities get a tool that gives them huge power (especially in democracies where popular opinion decides who governs).
They should start teaching the problem of induction in schools, evidently it's needed.
How about you present a counterargument - why would advanced AI be predominantly bad? Unless, of course, your only counterargument is the classic philosophical statement of "we can't know nuffin".
Basically no one who sticks around on here is “dumb”, I don’t know you but I have a Bayesian prior that you’re probably pretty fucking smart, and GP was being a bit of a smartass, C’est la vie.
I completely agree with your game plan of “let’s get a more substantial conversation going here”. It’s the right move.
I think that we probably should, as a society, do a serious audit of the public school system curriculum to see if it still makes sense in light of the insane rate of change in the facts of life over the last century :)
A bunch of arguments about why AI would be bad have already been advanced. For the economic one, refer to Martin Ford. For the existential one, refer to Bostrom.
Either discuss this with me or don't. Don't summon your sacred scrolls to do the arguing for you.
> It is a logical fallacy
Instead of just parroting my stance in response, I'll try to elaborate some more. Please either follow my example or don't reply to me at all.
Logic itself cannot bring you knowledge about the world, as the concept of apriori knowledge is crackpipe bullshit. All knowledge is based on induction, even your knowledge that induction is fallible. Saying induction is a "logical fallacy" doesn't even make sense, since the purpose of induction is not to perform logical operations.
I'll do whatever I want, especially when "communicating" (generous verb) with someone capable of writing the epitome of stupidity "the concept of apriori knowledge is crackpipe bullshit".
What a juvenile attitude. I wasn't giving you orders, I was presenting my standards of communication. You can do whatever the hell you want, but don't expect other people to tolerate your obnoxiousness.
You seem to get off on insulting people. I do not get off on being insulted, so I will refrain from further communication with you. Goodbye.
Something tells me these pricks will end up arguing for a reversion to thin-client compute. It is in their financial interest too after all.
"What have you done this week?"
I think if you are Google, you are terrified of the bad PR from someone generating something questionable. And that bad article is inevitable if you open up these models. (See, pornpen.ai being released approximately five minutes after StableDiffusion. Imagine the press if that was built from the mode Google published.)
An open source community is a diffuse target, so the NYT won’t go after them as quickly, and let's be honest, their axe to grind is with big tech, not a bunch of AI hackers.
They don’t imply any ridiculous idea that such models should or even can be “racially balanced”. But if they want to cover their butts from the possibility of silly controversy, I think that’s cowardly and unnecessary, but at least they could not go out of their way to imply that such controversy should be taken seriously.
And the risk behind that is...?
If you drill down with such claims the core is always "someone might use this to lie online" and the proposed solution every single time is: more surveillance. End anonymity. Have a Facebook account required to use the internet. Real name and real face policies for every online interaction.
I’ll explain again that I think they can be used for bad actions, and also that they should still be released, because the benefits will outweigh the negatives. It does not hurt to admit that some things can be dangerous when used in nefarious ways. No one suggests we ban kitchen knives even though they are lethal, because their utility is massive, and outweighs their danger. In much the same way these models have extreme utility, that almost certainly outweighs their potential negatives.
Sounds like you're saying the even without advanced AI, online bullying is already somewhat harmful?
So what exactly is the additional harm of AI?
I actually just now came up with an idea for a browser plug-in that removes clothing from every image loaded.
Trolling someone by creating awful video (just think about how deeply, photo-realistically, awful it could be - porn is just the tip of the iceberg) is going to get really bad. I am not sure how this is going to shake out. The easiest will be video of famous people doing awful things. A little harder is doing a custom training on a particular person's likeness, and videos of that person doing awful things. That high-schooler. That child. It's not a happy idea. There should be severe consequences for deliberately making something like this with the intent to harass (troll).
The fact is we have not even scratched the surface of classifying trolling as a real crime. I am less concerned with the tech (it's inevitable, hand wringing about it is not useful), and more concerned with the fact that we still have essentially no real consequences to this kind of harassment.
I suspect that strong anonymity is incompatible with civilized life, since the few edgelords will always end up ruining it for the many. We have collectively decided that some amount of privacy must be sacrificed to live in a civilized place where you can address grievance (the subpoena must be served to someone). Surveillance is a weapon for tyranny, but I think that we need to flip the script. The relationship between tyranny and surveillance means we need better governments, not more anonymity.
I also suspect we don't need to change anything except enforcement. I think trolls are a lot less anonymous than they think they are, since their opsec is typically nonexistent. It's just that we have no enforcers, and for some reason don't care. If I had a magic wand, I would convert the DEA wholesale over to dealing with online crimes (trolling, CP, trafficking, etc).
"Better governments" is not actionable, people have been wanting that since Socratese. Might as well wish for the second coming.
The real answer is to just let people know about the fakery, then they'll stop believing every video and the trolls will be defanged.
They clearly aren't the best arbiters of judgement, so who gets to decide "sever consequences"?
I personally hope it happens as soon as possible. Intellectual property theft (without which those models don't exists) shouldn't be allowed.
Personally I disagree, as the models themselves do not contain any material which can be considered a copyright violation, the value of scraping the open web is easily apparent, the ability to prevent scraping wholesale - even given the legal framework to disallow it - seems dubious, and lastly because the collective potential harm caused by restricting just one or a few of the more arguably more ethical nations from this technology pathway is a known unknown, and possibly a very large one at that.
We may make laws that prohibit online harassment, and that should be the mechanism we use to deal with this. Not through technology bans.
Yep! Remove clothing and make-up to show everyone as they really are!
You can call it the "ugly truth" plugin.
ROFL what a weird thing for any HN commenter to say
The issue is, as an industry and society, we somehow bought the "safety" and "harm" charade a little bit too much, and somehow think it's a reasonable argument instead of being completely insane.
We can both admit that the tools can and will be used for bad purposes, and come to the conclusion that their benefits outweigh the negatives. We would not be doing any favors to our own arguments by pretending otherwise.
The hoax was pretty quickly debunked, as the attempt was pretty crude. The images were full of artifacts and the image sizes were all 512x512 squares (the default image size for Stable Diffusion) with no attempt made to crop it to more common aspect ratios. So in terms of harm "done" I guess it was pretty minor, but I'm still leaving it out here since it made big enough of a commotion to make it to nationwide news stories.
This has been possible without AI for a very very long time now (just open photoshop, etc). It barely ever happens, and society hasn't collapsed.
I keep seeing this argument come up and it baffles me that informed technologists take it seriously, as if it were impossible to convincingly manipulate images before DALL-E came around.
Further, we have seen harm come from some of this already, there’s a pretty big online community that uses deepfakes to put people in situations they would rather not be in, the most obvious being porn.
You couldn't, but basically any VFX shop easily could. Point is, it doesn't make anything possible that wasn't already possible, it just makes it more accessible. That's an inevitability with technology, as time goes on. The counter is not to try and suppress it, that has never worked and never will.
It's always individuals who want to harm others in this particular way; and individuals don't throw around big-VFX-project amounts of money on petty revenge. But they'd certainly spend $20.
DDoS attacks got a lot (1000x) more commonplace once there were DDoS services that let you buy an hour of attacking someone for $20. Same idea here.
DDoS services are almost always purely malicious (you can _may be_ argue that you can use them for pen-testing or load-testing). But cars are not just purely malicious; there's a lot of useful things cars can do, and that's why the dangers of car ownership is outweighed by the benefits, as judged by society - we just have some road rules, and licenses, so that people know to use them responsibly.
Why not the same with an AI model?
You can download and run this software right now: http://faceswap.dev/ and it will do a better job, and do it on video, then any AI image generator.
The technology is over 3 years old and the world hasn't ended, the harassment hasn't happened. It's so common that your phone runs it for Instagram.
There's this whole narrative here that "this harm is new" and not only is it not new, it's not even better then what we already had.
In your example, the correct parallel isn't killing people in real life, it's making an image of killing people. The ethics of making such images are debatable, but they already permeate our society without AI.
So the problem isn't the AI, it's the forum of haters. Restricting AI usage, in the hope of not having someone use it to incite hate is too roundabout a way to achieve any significant result, while everybody pays a high cost (of not being free to use such an AI as they see fit).
The idea that it's "doing harm" is simply inventing a new form of lèse-majesté. Verbally, we regularly do the same: we might take a signifier for someone and place it in a representation. "Rick likes to fuck goats every day." Have I done harm to Rick?
If I circulated a convincing-looking video of Rick fucking goats to his parents, partner and his boss at the school where he works, that could easily do considerable harm to Rick.
So there seems to be an implicit assumption here that the risk is faked material where people don't believe it's fake, for some reason. And the fix for that would be to ensure that the easy to use versions of generators are watermarking or otherwise recording what they made, so it's easy to find out if something was faked. That doesn't help of course if you're up against a programmer who can make awesome deepfakes locally with open source software and a great GPU but then we're back to the debate about costs because of course, if you to against well funded experts they could already do this sort of thing. In reality it doesn't happen.
It's inevitable and frankly a hundred Ricks is a price worth paying.
For AI image generation to cause harm specifically, the harm has to be consequent to the additional realism.
IMO most of the harm from AI is likely to come from people not believing things that are real, and dismissing reality with “that’s just a deepfake”.
The harm is not the "embarrassment" of seeing someone in the likeness of yourself (or your son, your friend, your partner, etc) doing something shameful. The harm is the fact that people are very likely to believe it is true and it's not a fake obviously edited photo or video.
You can disagree on the seriousness of the harm or risk or danger or whatever but I think the distinction between an obviously silly/embarrassing fake (a puppet, papier mache, badly done photoshop picture) and a realistic convincing deepfake video is pretty obvious. They aren't even in the same ballpark.
> IMO most of the harm from AI is likely to come from people not believing things that are real, and dismissing reality with “that’s just a deepfake”.
This is also a really good point and I agree it's a danger.
See, for instance, this study of "sextortion" in minors:
At your point, I think it's worth considering widespread acceptance of ridiculous ideas that currently exist (amount of people who believe articles from The Onion for example). There's no harm there but when the content is convincing video being used by nefarious actors I think you could make argument potential for harm is real, especially given the media content bubbles on both sides that people have segregated to in social media age.
Been possible on home computers for 31 years for anyone who actually wants to do it. It literally doesn't matter and I think Stability has proven that the "AI Ethics" part of these models was essentially meaningless busy work at best, and at worse stealing compute credits from users like Dall-E purposefully charging you with something you didn't ask for.
Once every home computer can make the fake images AI Ethicists larp about then the power of fake images disappears because everyone knows not to trust them. It only has power if only a few can make them and never told the world it was even possible.
It's not what I think, it's what has been proven in the 32 years since photoshop was invented. What killings are you talking about that were caused by the existence of image manipulation?
> "A SIM card was about $200 [before the changes]," she says. "In 2013, they opened up access to other telecom companies and the SIM cards dropped to $2. Suddenly it became incredibly accessible."
> "People were immediately buying internet accessible smart phones and they wouldn't leave the shop unless the Facebook app had been downloaded onto their phones," Mearns says. Thet Swei Win believes that because the bulk of the population had little prior internet experience, they were especially vulnerable to propaganda and misinformation.
This means that large amounts of people are not internet-savvy enough to spot fakes and know not to trust them.
I am not sure about mass killings. Some say that Facebook enabled a genocide in Myanmar, but I think that is a false hypothesis. It was used as a platform by conflicting parties, sure, but it wasn't the reason for the conflict.
Google has strong governmental ties right now, so opposing any of their messages regardless of content seems sensible. Without backlash it would just fortify the situation right now so it has to be costly for both Google and political parties. Google lost a lot of trust in recent years, sadly that is not true for their market influence. Perhaps they put out these message because government contracts require it, the user wouldn't know because that is not transparent.
Yes, we are. Open source stable diffusion can be trained on any person's images, as long as you have around 20 from different angles. Costs around 50 cents on rented GPUs.
"Ethicists" act like society will somehow not adapt to this tech like they have with all the tech that came before it. I put ethicists in quotes because the arguments they use don't hold up to scrutiny and don't seem to be motivated by real ethical concerns. At least not to me.
But. The Photoshop argument is a bit tiresome. AI will bring about a fundamental shift in content creation and it will disrupt how we treat media as a whole.
Photoshop and other manual technologies are naturally gatekept by the required skill, effort and source images. Once AI media generation matures, all that goes out the window. Anyone will be able to convincingly fake anything with almost no effort and zero traceability.
That shouldn't be downplayed. The concerns are real.
Maybe once it's trivially easy to copy someone else's "likeness", society will finally be able to accept it and evolve past it.
Biological twins have never had control over this. If one twin wants to be a porn star, there's nothing the other can do.
Edit: one could imagine a future dystopia where clones are created to bypass "identity IP".
But those aren’t the issues they claim are concerning to them. It’s just stupid identity politics. They want their model to lie to us about the world and say things like everyone is equally likely to any attribute. They have a “reality” problem apparently.
Yes we are. There have been papers coming out on this tech for years now, with even the south park people doing videos using it.
In terms of machine learning technology that introduces truly novel innovations Google's product portfolio is notable barren. For instance the incredible powerful potential for image generation these new diffusion models open up, who's models will the world use to explore the potential and start using this technology? Google's model with the intense, though imperfect, effort that goes into addressing questions of bias and abuse? Or the model bankrolled by an ex hedge fund manager who probably put a bit less thought into addressing these questions?
3. Why do you use quotes around a something which is your original phrasing? That's pretty disengenious.
4. What's wrong with affirmative action ? It's easy to argue that it has both utilitarian and other moral adventages. I won't claim it is always warranted or the right thing to do, but it definitely not an obvious consensual evil.
It isn’t impermissible for Google to do this, but nobody has to like it, or agree with it.
*for instance, affirmative action is disapproved of by 70-80% of Americans and couldn’t win on a ballot in California in 2020 which is pretty exceptional.
What they want is for results of “software engineer” to be equally likely to show black females. This is not fair to Eskimos and Aboriginals. And what about the mentally handicapped? Is it not unfair that people with Downs Syndrome are not Wall Street stock brokers? How are you going to find all of these “affluent” categories and claim to be able to balance them?
And are you going to claim racism again when to ask for prison inmates and you don’t find any Asians? Should you start putting latent space Asians in latent space prisons?
Because this is a main harp of the social justice “ethicists” - that if you ask these models for “gang member” you get “People Of Color!!” … as if they simply don’t understand the statistics of situation. How would you even “solve” that? Should you decide when and where certain ethnic groups should be taken down a peg?
Latent space affirmative action is technically absurd, an completely ironic as “ethical” behavior.
It is a matter of PR. It takes a single "problematic" generated content to be framed as "Google is sexist/racist/supports animal abuse", etc.
Statements related to ethics help in a few ways: holding secrets ("oh, we would love to share the models, but we cannot"), protecting against backslash (PR-wise, legal-wise), and PR on its own ("we are that ethical - see! it is even in our mission statement").
Recommendation engines are more responsible than basic infrastructure.
There is a world of difference between someone manually seeking out and subscribing to a misinformation source than FB automatically suggesting it to them.
The whole banning spree did more damage to vaccine acceptance than flat earther and lizard people together. It isn't even comparable. Because of course they ended up censoring legitimate criticism and scientific data. That was inevitable, no matter how well intended.
Now they made themselves unreliable because someone on the internet was crazy.
Regardless, though, it is unambiguous that FB's role in "making" problematic UGC is much less direct than Google's role in making Imagen outputs.
And that's for all ML stuff, Colab has lead to a huge democratization of ML, with notebooks setups for basically any cool demo you see out there.
Looks like typical MBA craft to me.
Not that I would have a complaint if social justice was the sole thing keeping it from being released. Facebook managed to cause genocides by being careless.
I don't hold that GPs opinion is wrong, in fact I have no firm views yet on the AI and would generally lean towards stuff being made available even when harmful.
But the idea that people in the field of AI Ethics are all some woke SJW cabal designed to keep Google powerful is for the birds. Like all industry adjacent fields I'm sure there's some corporate capture of research in the field, but maybe, engaging with things in good faith, there are ethical questions about a powerful new technology that can replicate biases in its training data at unprecedented speed and quality?
Fundamentally, I think we have all the pieces based on this work and Dreamfusion to make it work. From the looks of it, there's a lot of SSR (spatial SR) and TSR (temporal SR) going on at multiple levels to upsample (spatially) and smoothen (temporally) images that won't be needed for NERFs.
What's impressive is the ability to leverage billion-scale image-text pairs for training a base model that can be used to super-resolve over space and time. And that they're not wastefully training video models from scratch, and instead separately training TSR, SSR models for turning the diffused images to video.
As it stands, it's very difficult to invest the budget for a dev studio (dozens of high skill people) to build a "VR movie" when the format is so unknown and unpopular. But with generative AI, an indie dev could create their own professionally produced virtual world movie. It's these creatives and risk takers that will find what types of things VR needs to become more popular.
From the first 15 examples shown to me, only one contained all elements of the prompt, and it was one of the simplest ("an astronaut riding a horse", versus e.g. "a glass ball falling in water" where it's clear it was a water droplet falling and not a glass ball).
We're seeing leaps in random capabilities (motion! 3D! inpainting! voice editing!), so I wonder if complete prompt accuracy is 3 months or 3 years away. But I wouldn't bet on any longer than that.
It is not far off.
Disclaimer: I am naturally biased since I made FauxPilot ;)
StabilityAI trained a new/better CLIP for the purpose of better Stable Diffusions.
The human brain is modularized like this, so I don't think it'll be a limitation.
From the abstract:
> We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models
"Under review as a conference paper at ICLR 2023"
So I would say it looks pretty advanced, however they don't use a Diffusion model to generate the images, but an "image conditional video generation", another different approach.
The concerns cannot be mitigated. The cat's out of the bag. Russia has already used poor quality deep fakes in Ukraine to justify their war. This will only become bigger and bigger of an issue to the point where 'truth' is gone, nothing is trusted, and societies will continue to commit atrocities under false pretense.
If Google filters them, wouldn't the result be still biased and stereotyped, just along Google's biases?
"I reject your biases and substitute my own!"
There’s no escaping, but when you put something visual in front of someone your brain wants to believe it.
Basically all these models are informational nuclear weapons being created, blueprints and all, and with distributed implementations mean anyone can and will use them.
I doubt the artist would ever be "fully" replaced, or even mostly replaced. People very much care about the artist when they buy art in pretty much any form. Mass produced art has always been a thing, but I'm not alone in not wanting some $15 print from IKEA on my wall, even if it were to be unique and beautiful. Etsy successfully sells tons of hand-made goods, even though factories can produce a lot of those things cheaper.
I’d also take a peek at https://lexica.art/. Lots of very high quality output from SD.
It’s not the technology, it’s all the people in these comments who have never worked in the industry clamouring for its demise.
One could brush it off as tech heads being over exuberant, but it’s the lack of understanding of how much fine control goes into each and every shot of a film that is depressing.
If I, as a creative, made a statement that security or programming is easy while pointing to GitHub Copilot, these same people would get defensive about it because they’d see where the deficiencies are.
However because they’re so distanced from the creative process, they don’t see how big a jump it is from where this or stage diffusion is to where even a medium or high tier artist are.
You don’t see how much choice goes into each stroke, or wrinkle fold , how much choice goes into subtle movements. More importantly you don’t see the iterations or emotional storytelling choices even in a character drawing or pose. You don’t see the combined decades, even centuries of experience, that go into making the shot and then seeing where you can make it better based on intangibles
So yeah this technology is cool, but I think people saying this will disrupt industries with vigour need to immerse themselves first before they comment as outsiders.
Your post reminds me of all the photographers that said digital photography would remain niche and never replace film.
The current models are toys made by small groups. It's not hard to imagine AI generated film being much more compelling when the entire industry of engineers and "creatives" refine and evolve the ecosystem to take into account subtle strokes, wrinkles, movement, shots etc. And they will, because it will be cheaper, and businesses always go for cheaper.
Also businesses don’t always go for cheaper. They go for maximum ROI.
I’ve worked on tons of marvel films for example, and I quite well know where AI fits and speeds things up. I also know where client studios will pay a pretty penny for more art directed results rather than going for the cheapest vendor.
Re: cheaper vs ROI, I agree, that was basically the point I was trying to get across.
I do understand your point and think it will be a long while before auto-generated content becomes mainstream, but it it's entirely possible and reasonable to expect within our near term lifetimes.
Every AI art thread is full of people who have clearly never attempted to make professional art commenting as if they’re experts in the domain
Techies tend to be good at tangible, measurable, immediate facts.
Not so much when it comes to any social situations, let alone bigger concepts like social evolution of trends and their impacts. Hence you get sorry attempts at apologies from big name tech bros for terrible influences on society.
I've played with DALL-E, I'm not able to paint but I was able
to generate good looking paintings and it felt amazing, like getting new power, I felt like
Neo when he learn martial art in The Matrix. And I realized that AI
may be the new bicycle of the mind, like the personal computers and internet changed
our way to work, think and live, AI may now allow us to get new capabilities, extending our limits.
I just don’t agree with the swathes of people saying this replaces artists.
In the near future, for all practical intents and purposes, AI will be just a force multiplier. But a really powerful one.
* Productivity enhancement tools for those in the film industry like you.
* Applications where the AI output is "good enough". I foresee people creating cool illustrations, cartoons, videos for short stories, etc. AI will make for easier/cheaper access to illustrations for people who did not have this earlier. As an example, I am as of now looking for someone who could draw some technical diagrams for my presentation.
As a programmer, Copilot scares and excites me - not because I think it will become better than me at what I do in the short term (though in the long term - probably!) - but because I can already see how a well-structured use of such a tool could do a whole lot (80%?) of what I do. Mostly the easier stuff, mostly the relaxing-yet-tedious-time-filler stuff, but still - most of it. And it also crucially does much of what I did back when I was a junior/intermediate programmer.
Once this system is setup right - which capitalism basically guarantees it will - that's gonna suddenly cut quite a lot of my billable hours (80%?) and quite a lot the simpler work typically done by less-experienced programmers (80% of jobs?)
Granted, new capabilities like this also will lower the cost of creation, and thus the demands of the market are likely to grow. And it's possible that the few tricky things that AIs aren't so great at might even increase in value, since they will linchpin so much other opportunity. But will many people be replaced? Oh hell yes. And leaping that gap from an amateur relying on AIs to an expert surpassing them is going to be harder and harder, with no market to pay people in the in-between - they'll have to just be relatively-unpaid hobbyists til they develop the drive to jump to expertise.
Anyone suggesting AIs will just outright replace the film/photography/programming industry immediately is disingenuous. But even with only the currently known capabilities, it's not hard to imagine that these could eat up a dominant chunk of the work that's currently done, even while it expands the capabilities and thus scope of what will soon be possible. Like digital photography, it's gonna both devour and expand the industry, with a resulting much smaller niche of expert creators and a massive very-accessible dirt-cheap general public access that becomes the majority of the new market. 80-20. Everyone's about to become an artist, director, programmer, and everything else these things can enable, at an effective skill level that we normally consider at least "intermediate". We might still have that expert niche a bit longer... but give it a few more years..? ;)
Christopher Nolan has already proven we’ll take anything as long as the score is ok - dark screen, mumbling lines, incoherent plotlines…
Human artists will still exist, it's just going to be democratized. Sort of like the impact of social media on traditional news journalists.
We're fay away from it now, but I've seen less sketchy solutions being implemented.
Also, there's some interesting work with ML taking diffused light from around a corner and recovering the original pre-diffused silhouette.
In many ways, this is how we've learned the visual cortex is working.
The amount of actual neutral data you are seeing is way less than you'd think given your perceived visual fidelity.
The only practical issue is that distribution of AI hardware in consumer devices is going to noticeably lag behind POC on compounding cutting edge hardware in research environments, and no one wants to invest into obsolescence.
Maybe it will happen in the cellphone market though given the hardware refresh rates from carrier subsidies.
That being said these shitty video models I believe are just an arms race between Meta and Google after the release of stable diffusion. Microsoft has a video version of CLIP that I believe will really change the game, but unless you have trained a model with video embeddings it's all going to look devoid of any narrative. Right now the models just look like a sequence of images with the same promt and some sort of continuity to make it look more video like.
[The Future of AI Is] "Scary and Very Bad for People"
Which speaks more about the growth of popularity of arXiv or the total number publications, rather than AI+ML specifically.
Presumably people are scrambling to publish what they have, so it is clear what work is independent and what is derivative.
We're rapidly stumbling into the future of media.
Who would've imagined a year ago that trivial AI image generation would not only be this advanced, but also this pervasive in the mainstream.
And now video is already this good. We'll have full audio/video clips within a month.
Plus then maybe we could get a computer to tell
us what thioacetone smells like without actually having to experience it.
It's at the very least 5 years old:
But now with these models they have such a ridiculously heavy handed approach to the ethics and morals. You can't type any prompt that's "unsafe", you can't generate images of people, there are so many stupid limitations that the product is practically useless other than niche scenarios, because Google thinks it knows better than you and needs to control what you are allowed to use the tech for.
Meanwhile other open source models like Stable Diffusion have no such restrictions and are already publicly available. I'd expect this pattern to continue under Google's current ideological leadership - Google comes up with innovative revolutionary model, nobody gets to use it because "safety", and then some scrappy startup comes along, copies the tech, and eats Google's lunch.
Google: stop being such a scared, risk averse company. Release the model to the public, and change the world once more. You're never going to revolutionize anything if you continue to cower behind "safety" and your heavy handed moralizing.
The ethics problem is an artifact of googles model of trying to keep their AI under lock and key and carefully controlled and opaque to outsiders in how the sausage gets made and what it’s made out of. Ultimately I think many of these products will fail because there is a misalignment between what Google thinks you should be able to do with their AI and what people want to do with AI.
Whenever I see an AI ethicists speak I can’t help but think of priests attempting to control the printing press to prevent the spread of dangerous ideas completely sure of their own morality. History will remember them as villains.
Good researchers won't work somewhere that doesn't allow the publishing of papers. And without good researchers, you won't be on the forefront of tech. Thats why nearly all tech companies publish.
Interesting analogy. Google, like the priests, is acting out of mix of good intentions (protecting the public from perceived dangers) and self-interest (maintaining secular power, vs. a competitive advantage in the AI space). In the case of the priests, time has shown that their good intentions were misguided. I have a pretty hard time believing that history will be as unkind towards those who tried to protect minorities from biased tech, though of course that's impossible to judge in the moment.
In the name of protecting [minorities, child, women, lgbt, etc] many harms will be done.
Most of the ethicists I see actually doing gatekeeping from direct use of models--as opposed to "merely" attempting model bias corrections or trying to convince people to avoid its overuse (which isn't at all the same)--are not trying to deal with the "AI copies our human biases" problem but are trying to prevent people from either building a paperclip optimizer that ends the world or (and this is the issue with all of these image models) making "bad content" like fake photographs of real people in compromising or unlikely scenarios that turn into "fake news" or are used for harassment.
(I do NOT agree with the latter people, to be clear: I believe the world will be MUCH BETTER OFF if such "bad" image generation were fully commoditized and people stopped trying to centrally police information in general, as I maintain they are CAUSING the ACTUAL problem of misinformation feeling more rare or difficult to generate than it actually already is, which results in people trusting random people because "clearly some gatekeeper would have filtered this if it weren't true". But this just isn't the same thing as the people who I-think-rightfully point out "you should avoid outsourcing something to an AI if you care about it being biased".)
I don't think "don't let the plebes have the models" is a good stance. But neither is pretending that the ethics and bias issues aren't here.
Most people represented in photos are younger. Same story.
The problematic issue is the media has morphed reality with unreal images of people/families that don't match society so unreal expectations make people think that having white people generated from a white dataset is problematic.
There are lots of other ways you could get training data, but they might not be so cheap. You could have humans give English descriptions to images from other language contexts. I'm guessing there's interesting things to do with translation. But all the weird stuff about bodies, physical objects intersecting etc ... maybe it should also be rendering training images from parametric 3d models? Maybe they should be commissioning new images with phrases that are likely to the language model but unlikely to the image model. Maybe they should build classifiers on images for race/gender/age and do stratified sampling to match some population statistics (yes I'm aware this has its own issues). There are lots of potential technical tools one could try to improve the situation.
Implying that the whole world must change before one project becomes less biased is just asking for more biased tech in the world
It's 2022 and we can be more thoughtful. Yes there are tradeoffs between unleashing new capabilities quickly vs being thoughtful and potentially conservative in what is made publicly available. I don't think it's bad that Google makes those tradeoffs.
FWIW Google open sources _tons_ of models that aren't LLMs / diffusion models. It's just that LLMs & powerful generative models have particular ethical considerations that are worth thinking about (hopefully something was learned from the whole Timnit thing).
As for learning from the timnit thing I'm pretty sure the only thing people outside Google learned from that is that Google ai "ethicists" all seem to be crazy. Certainly that's the clear vibe on this thread.
You can let your kid use Google to look up math lectures without fearing that they would see something slightly traumatizing though, right? That wasn't the case in 1996! The point is that products have varying levels of readiness, and it's totally fair to say "the thing isn't ready, it has too many sharp edges." Especially when the thing could be used at scale.
> As for learning from the timnit thing I'm pretty sure the only thing people outside Google learned from that is that Google ai "ethicists" all seem to be crazy. Certainly that's the clear vibe on this thread.
That's a sad take, but who knows if it's true. HN commenters aren't exactly a representative sample.
With Stable Diffusion I think they just didn't expect someone to produce a truly open version. There are plenty of AI models that Google have made where they've maintained a competitive advantage for many years by not releasing the code/models, e.g. speech recognition.
Imagen and Imagen Video is not released to the public at all. You might be confusing it with OpenAI's models.
"We have taken multiple steps to minimize these concerns, for example in internal trials, we apply input text prompt filtering, and output video content filtering. However, there are several important safety and ethical challenges remaining. Imagen Video and its frozen T5-XXL text encoder were trained on problematic data. While our internal testing suggest much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter."
I think it's gotten a ton better vs 10 years ago, and is getting better still.
More on topic -- when folks here complain that Google can't release these models, it's not like they're just sitting there using that as an excuse -- Google has entire teams dedicated to ML safety trying to figure out how to filter out bad stuff, make models fairer, and avoid situations like M$FT's "Tay" (or worse).
They won't be using the models they train to commit crimes, for example. Someone who gets access to their best models may very well do that. It'd be really funny (lol, no) if Google's abuse team started facing issues because people are making more robust fake user accounts...by using google provided models.
I'm sorry to be sarcastic. I generally try not to be, but I just can't fathom the level of naivete required to think that mega-corps act out of their moral responsibility rather than their profit-interest.
But even with that all being true, real people (typically some thoughtful researchers) build these models. And my point is: _there really are ethical reasons to keep large generative models trained on flawed data away from the general public until better safeguards are in place._ You can verify this for yourself by reading about ML bias and safety. Don't let cynicism keep you from internalizing that fact. OpenAI didn't make GPT-3 widely available for the same reason.
At the end of the day, Google doesn't need an excuse like "we have ethical qualms" to not release the models. Stuff that is really secret sauce you won't hear about until many years later when it's not a competitive advantage anymore. Google _does_ need to cover its ass and not deal with its employees yelling that it helped perpetuate algorithmic racism, or surveillance state, or increased levels of inauthenticity on the internet.
When I said "Google has a responsibility" -- I don't mean that the faceless entity feels responsibility, I mean the people who work on the specific things have a responsibility and they do feel & act on that. If you work on lifesaving drugs that could also be dangerous / addictive, it's kind of on you to be thoughtful about how to make them generally available, no?
What reasons necessitate keeping image or video generation models private that wouldn't also argue for keeping animation software or picture editing tools private? Should we somehow prevent such tools from getting better or easier or stop people from educating others on how to use them?
No, that's crazy. If the tools are so dangerous we can't trust the public to have them then they are way too dangerous to trust Google with them. If it were actually true that Google was developing AI too dangerous for the public, then we should storm the Google headquarters, kill their engineers, and burn their data centers.
Of course it's not true. Google is developing image and video generation models and equivalent versions will be open source by the year's end I expect. These models aren't especially dangerous. Yes, people will use them to be racist or mean, same as they use their phones or computers or books or whatever to be those things.
As a final note, it's obviously not true that GPT-3 was kept private for the "ethics" reason. I can buy GPT-3 generations now for 2 cents per 1k tokens generated. There is no real oversight into how these generations are used and you could absolutely use them to power social media bots or whatever you are concerned with. The reason they keep GPT-3 private but sell access to it is not because they want to be ethical, but because they want to sell access to it.
This is not a fair characterization of what's going on here. Google spent a ton of money on researchers & training infra (it's wildly expensive even just hardware-wise) to train these models. It's not different from other proprietary technologies -- they don't owe the public anything here. Providing the research findings + methodology in a paper without the implementation & data is a _tradeoff_ as a participant in the field. If someone else implements the model with their money and uses it for nefarious purposes, that's more acceptable than if they directly use Google's _already known to be flawed_ models.
> I'm curious what ethical reasons you think require that new technology only be used in secret and without oversight by trillion dollar companies. This is supposed to be AI safety?
If I make a chair and I know it's not always safe to sit on, maybe I should not sell that chair. We can talk about this proof-of-concept chair as a research subject, but if you go to build one and use it to prank someone, that's on you.
That's all that's going on here. If the model could be used to generate CSAI, maybe Google doesn't want to be part of that.
> Google is developing image and video generation models and equivalent versions will be open source by the year's end I expect. These models aren't especially dangerous.
Maybe that's the disconnect -- you don't think generative models are dangerous, but they can be, and Google would know because they have entire teams dedicated to AI fairness & safety researching this topic.
It's also not trivial to reproduce these models. Given the cost to simply train even if you had the source data, any organization releasing these models has to have a bit of money and skill. The onus will always be on the team building these models to think about what their ethics are and how they want to proceed knowing there may be negative externalities.
> Yes, people will use them to be racist or mean, same as they use their phones or computers or books or whatever to be those things.
Tools empowering large-scale inauthenticity & disinformation are not comparable to individuals making comments.
You say that Google doesn't "owe the public anything" and that may, or may not, be true from a legal standpoint, but obviously, from a norms, ethical, and moral standpoint Google does have a massive obligation to the public that they are breeching. Google uses the public's data to train, public research, and publicly shared models to iterate on. Then, after building on the shoulders of giants, Google refuses to share what they have built in contravention of the norms that they benefit from.
Regarding your chair metaphor - the "danger" of these models, if there is such, is not that they would hurt the user, like a faulty chair, but that they could be used to hurt others - e.g. a bot army to manipulate public opinion or create fake news. Google isn't building a chair that might break and hurt the user then, but a gun that might hurt others. It's true that guns shouldn't be widely available - not even a die hard libertarian would want a child to have access to a gun, but the entity that sets rules regarding availability is a representative government for the people for whom those rules are being set - not a private company. In other words, if these tools can cause harm they should be regulated by the government, not Google. If the tools are dangerous, that is not an argument that Google should keep them secret.
Google's not doing this (LLM, generative image model) research on academic datasets freely shared with them. They're doing this research on data they gathered at their expense. This is not a violation of academic norms. Again, Google shares a lot of datasets and models, just not LLMs and generative sets trained on problematic source datasets.
> As I characterized previously Google is able to do whatever they want, conceal their results, and impede progress and understanding because they aren't sharing their results. You say this isn't a "fair characterization" but it is exactly what is happening - which part is wrong?
Anyone can do research and not share back to the community. Google _does_ share back to the community in the form of papers (and again, very frequently with models and datasets). If you have the money and expertise to implement the papers, more power to you. Every technology company has some secret sauces they don't share with everyone. That Google may have some of those is not a moral failing.
> from a norms, ethical, and moral standpoint Google does have a massive obligation to the public that they are breeching. Google uses the public's data to train, public research, and publicly shared models to iterate on
From the other end: Google gets user data and has a responsibility to not proliferate that data, no? I wouldn't want them to share a dataset that has my personal data, even if anonymized because there are ways to deanonymize. There are levels to everything, and choosing "I'll release the paper but not the model + data" for some potentially sensitive models seems sane.
> Then, after building on the shoulders of giants, Google refuses to share what they have built in contravention of the norms that they benefit from.
People are building on the shoulders of Google's research all the time, and plenty of companies are doing similar things to Google and being way less open about their work. I mean, every company that trains a big model on data collected from the public -- are they all required to share their models with everyone? Is Cruise sharing their pedestrian detection model? I don't think what you're suggesting could possibly be the standard.
> Regarding your chair metaphor - the "danger" of these models, if there is such, is not that they would hurt the user, like a faulty chair, but that they could be used to hurt others - e.g. a bot army to manipulate public opinion or create fake news.
Sure, I was trying not to be hyperbolic and compare LLMs to guns since they have plenty of awesome use cases (whereas guns really don't). A faulty chair that you set out for anyone to use can hurt people other than the chair's creator / people who are aware of the specific risks. But yeah, seems like you now agree these models have the potential to cause great harm.
> In other words, if these tools can cause harm they should be regulated by the government, not Google
I agree that gov't regulation can be helpful for setting a minimum standard. But I strongly disagree that lack of laws means we should abdicate our own moral responsibilities. If I sell / provide something, I need to be able to sleep at night knowing I didn't make the world worse. Googlers typically try to do this.
Well, they do have the "special claim" of inventing the model and not owing its release to anyone.
Said "public research" didn't come with a requirement to release anything you build on top of it. This would pretty much be the research equivalent of compelled speech. Luckily, not happening.
The paper is sorely lacking evaluation; one thing I'd like to see for instance (any time a generative model is trained on such a vast corpus of data) is a baseline comparison to nearest-neighbor retrieval from the training data set.
So: Focusing on increasing expressiveness and ergonomics should beat academic rigour.
It’s painfully obvious that in 1 year the job might be exceedingly more difficult than it is now.
#1: Master these new tools
#2: Build a workflow that incorporates these tools
#3: Master storytelling
#4: Master ad tracing and analytics
#5: Get better at marketing yourself so that you stand out
The market for your skillset may shrink, but I doubt it will disappear...
Think about it this way...
Humans in cheaper countries are already much more capable than any AI we've built.
Yet, even now, There are practical limits on outsourcing.
It's hard for me to see how this will be much different for creative work.
It's one thing to casually look at images or videos, when there is no specific money-making ad in mind.
But as soon as someone is spending thousands to run an ad campaign, just taking whatever the AI spits out is unlikely to be the real workflow.
I guess I'm suggesting a more optimistic take...
View it as a tool to learn and incorporate in your workflow
I don't know if you gain much by stressing too much about being replaced.
And I'm not even sure that's reality.
I'm almost certain, most of the humans to lose their jobs will be people who either because of fear or stubbornness refuse to get better, refuse to incorporate these tools, and are thus unable to move up the value chain.
Get better [...] so that you stand out
I realise this comment is a bit vain. And I like the human touch of you helping a stranger.
I [...] don't [...] like [...] helping a stranger.
It's more likely that you're still going to be filming/editing/animating but will have an AI layer on top that produces extra effects or generates pieces of a scene. Think "green screen plus", vs fully AI entertainment.
People will over-hype this tech like they did with voice and driverless cars but don't let it scare you. Everything is possible, but it's like a person from the 1920's telling everyone the internet will be a thing. Yes it's correct, but also irrelevant at the same time. You already have AI assisted software being used in your industry. Just expect more of that and learn how to use the tools.
Also I think "a good algorithm" is more than just repetitive content. The plots are reused and generic, but there's real skill involved into figuring out the next series to reuse with a generic plot which is still guaranteed not to flop because nobody actually wants to see reruns of that series or they accidentally screwed up a major plot point.
Kidding aside, these technologies are amazing, but for a while still they will need a human in the loop selecting, tweaking and editing the output and feeding it back to the contraption for the next iteration.
The question is, for how long?
If you mean the former, then I frankly think you’re an outlier and lots of people would have no problem with that. If you mean the latter, then I guess we’ll just have to wait and see. We’re certainly not there yet, but that doesn’t mean that it’s impossible. I’ve definitely read stories that were produced by an AI and preferred it to a lot of fiction that was written by humans!
As to whether I am an outlier:
Hundreds of thousands of people worldwide watch Magnus Carlsen. How many have watched AlphaZero play chess when it came about and how many watch it when it ceased to be a novelty?
Speak for yourself. Actors do have fans, and a lot of them. Their personal lives are subjects of interest for a reason.
So, no, not totally different at all.
It reminds me of part of the book trilogy Three Body Problem, where these aliens create human culture better than humans (in the humans' own perspective, in the book) by decoding and analyzing our radio waves to then make content. It feels to me much the same here where an unknown entity creates media, and we might like it regardless of who actually made it.
I am also extremely skeptical of the ability/need so serve at individual level instead of niches (as today).
If you're on the creative, storyboard, come up with ideas and marketing side, you will be fine.
If you're in actual production, booking sets, unfolding stairs to tape infinite background, picking up the best looking fruits in the grocery store... yeah, not looking good.
Go up in the value chain and learn marketing, how to tell stories, etc... you don't want to be approached by clients telling you what you should be doing, you want to be approached and being asked what the clients should be doing.
They’ll just throw it away off hand. But I’ve run my own business and I know what the pressures are. A lot of people working today will not be working in 10 years in my industry, period.
The optimistic view of all of this is that these tools will give people with skill and experience a massive productivity boost, allowing them to do the best work of their careers.
There are plenty of pessimistic views too. In a few years time we'll be able to look back on this and see which viewpoints won.
I think Cleo Abrams on YT recently tackled this exact question. She tried to generate art using DALL-E along with a professional artist, and after letting the public vote blindly, the pro artist clearly 'made' better content, even though they were both just typing into a text prompt.
Here's the link if you're interested: https://www.youtube.com/watch?v=NiJeB2NJy1A
I could see a lot of digital artists actually getting better at their job because of this, not getting totally displaced.
That being said, it’s possible that it won’t pay anywhere near what you’re used to. Either way, it will probably be a solid decade before you’ve really felt the pain for disruption. MP3s, which were a far more straightforward path to disruption took at least that long from conception.
Also won't nearly require the amount of work it used to.
Instead of feeling threatened by the new tools, think about how you can use them to enable your work.
One of the ironies* of these tools is that they only work because there is so much existing material they can be trained on. Absent that they wouldn't exist. That makes me think: why not think about how to train your own models than entail your own style? Is that practical, how can you make it work and how might you deploy that in your own work?
Something that everyone is sticking their heads in their sand about is the real possibility that training models on copyrighted work is a copyright violation. I can't see how such a mechanical transformation of others' work is anything but. People accept violating one person's copyright is a thing but if you do it at scale it somehow isn't.
* ironic because they seem creative but they create nothing by themselves, they merely "repackage" other people's creativity.
But here is the catch, there is the same last mile problem for those AI models. Currently it feels like the model can achieve like 80%-90% what a trained human expert can do, but the last 10-20% would extra extra hard to reach human fidelity. It might take years, or it might never happen.
That being said, I think anyone who doubts AI-assisted creative workflow is a fuzz is deadly wrong, anyone who refuses those shiny new tools, is likely to be eliminated by sheer market dynamics. They can't compete on the efficiency of it.
Small creators will win under this new regime of tools. It's a democratizing force.
I'm wondering why the open source community doesn't get this. So many voices were raised against Codex. Now artists against Diffusion models. But the model itself is a distillation of everything we created, it can compactly encode it and recreate it in any shape and form we desire. That means everyone gets to benefit, all skills are available for everyone, all tailored to our needs.
We no longer have to pay the 10,000 hours to specialize.
The opportunity cost to choose our skill sets is huge. In the future, we won't have to contend with that horrible choice anymore. Anyone will be able to paint, play the piano, act, code, and more.
Eventually the data model will be abstracted into deterministic code using a seed value; think implications of E=mc^2 being unpacked. The only “data” to download will be the source.
And the real world politics have not gone anywhere; none of us own the machines that produce the machines to run this. They could just sell locked down devices that will only iterate on their data structures.
There is no certainty “this time” we’ll pop “the grand illusion.”
From what I see, these technologies have just lowered the bar for everyone to create someone, but creating something good still takes thought, time, effort and experience, especially in the advertising space.
AI in the near term is never going to be able to translate client requirements either. The feedback cycle, iterations, managing client expectations, etc.
It certainly could play out similarly but, at some point, if all the work in a field from now on only requires 1/100 of manual labor, people will probably go out of work.
But yeah I’ll figure something out.
Bit of an apples/oranges comparison to tech that will (eventually) generate endless supply of content with less effort than writing a Tweet.
The era of inventing layers of abstraction and indirection that simplify computer use down to structured data entry is coming to an end. A whole lot of IT jobs are not safe either. Ops is a lot of sending parameters over the wire to APIs for others to compute. Why hire them when “production EKS cluster” can output a TF template?
Yes, it will be possible for one person to do the work of many, but that just means each person becomes more valuable.
It’s also a law in economics that supply often drives demand, and that’s definitely the case in your field. Companies and individuals will want even more of what you want. It’s not like laundry detergent (one can only consume so much of that). There’s almost no limit to how much of what you supply that people could consume.
The way I see it, your output could multiply 100 fold. You could build out large, complex projects that used to take massive teams all by yourself, and in a fraction of the time. Companies can than monetize that for consumers.
AI is just a tool. Software engineers got rich when their tools got better. More engineers entered the field, and they just kept getting richer. That’s because the value of each engineer increased as they became more productive, and that value helped drive demand.
And that’s my optimistic projection. It could be we have amazing output in 24 months.
A bunch of fields would be simultaneously impacted. From computational physics to 3D animation (if you have a 3D renderer and video generator, you can compose both). While it's not completely unfounded to extrapolate that progress will be as fast as with everything prior, consequences would be a lot more profound while complexities are much compounded. I down weight accordingly even though I'd actually prefer to be wrong.
10 years ago: https://karpathy.github.io/2012/10/22/state-of-computer-visi...
Given that this is what makes photos and videos interesting I think it's still a while before artists are automated.
There's this one video of a cat and a dog, and the model was really able to capture the way that they move, their body language, their mood and personality even.
Somehow this model, which is really just a series of zeroes and ones, encodes "cat" and "dog" so well that it almost feels like you're looking at a real, living organism.
What if instead of images and videos they make the output interactive? So you can send prompts like "pet the cat" and "throw the dog a ball"? Or maybe talk to it instead?
What if this tech gets so good, that eventually you could interact with a "person" that's indistinguishable from the real thing?
The path to AGI is probably very different than generating videos. But I wonder...
Separately for images we had convolutional networks and Generative Adversarial Networks. Now diffusion models are apparently doing what Transformers did to natural language processing.
In my field, we use shallower feed-forward networks for control using low-dimensional sensor data (for speed & interpretability). Physical constraints (and good-enoughness of classical approaches) make such massive leaps in performance rarer events.
This forced ideological posturing of 'if we give it to the plebes, they are going to generate something naughty with it' masks the somehow more cynically evil take of big tech, who are essentially taking the entire creative output of humanity and reselling it as their own, piecemeal.
Additionally I think the Dalle vs. Stable Diffusion comparison highlights the true masters of these people (or at least the ones they dare not cross) - corporations with powerful IP lawyers. Just ask Dalle to generate a picture with Mickey Mouse - it won't be able to do it.
It's not their work unless it's identical, but in practice generated images are substantially different. Drawing in the style of is not copying, it's creative and it also depends on the "dialogue" with the prompter to get to the right image. The artist names added to the prompts act more like landmarks in the latent space, they are a useful shortcut to specifying the style.
If you look at the data itself it's ridiculous - the dataset is 2.3 billion images and the model 4.6 GB, that means it keeps a 2 byte summary from each work it "copies".
Google and Meta and Microsoft all have research teams working on AI.
Putting out papers like this helps keep their existing employees happy (since they get to take credit for their work) and helps attract other skilled employees as well.
What will they do with model? figure out how to prevent abuse and incorporate into future Google Assistant, Photos and AR offerings.
Or maybe Google is using "Responsible AI" as an excuse to minimize competitors when they release their own Imagen Video as a Service API in Google Cloud.
It's quite strange when the "ethical" thing to do is to not publicly release your research, put it behind a highly restrictive API and charge a high price for it ($0.02 per 1k tokens for Davinci for ex.)
The word "ethics" has become very flexible...
And at some point later, "all the existing" will be corrupted by the integrated "new" at it will all be chaos.
I'm joking, it will be fun all along. :)
I don't think it's gonna hurt if we apply filtering, either based on social signals or on quality ranking models. We can recycle the good stuff.
However, a common refrain is that AI is like tools like hammers or knives and can be used for good or misused for evil. The potential for weaponizing AI is much much more so than a hammer or a knife. And it's greater than 3D-printing (of guns), maybe even greater than compilers. I would hazard to say it's maybe in the same ballpark as chemical weapons and perhaps less so than nuclear weapons and biological weapons, but this is speculative. Nonetheless, I think these otherwise great arguments are diminished by comparing AI's safety to single-target tools like hammers or knives.
I remember being super impressed by AI Dungeon and now in the span of a few months we have got DALLE-2 , Stable Diffussion, Imagen, that one AI powered video editor, etc.
Where do we think we will be at in 5 years??
What will this do to art? I'm hoping we bring more unique experiences to life.
Certainly we're very, very far away from that level of cinematic detail and crispness. But I believe that is where this leads... complete with AI actors (or real ones deep faked throughout the show).
For a while I thought "The Volume" was going to be the disruption to the industry. Now I think AI like this will eventually take it over.
The main motivation will be production costs and time for studios, of which The Volume is already showing huge gains for Disney/ILM (just look at how much new star wars content has popped up within a matter of a few years). But i'm unsure if Disney has patented this tech and workflow and if other studios will be able to leverage it.
Regardless, AI/software will eat the world, and this will be one more step towards it. Exciting stuff.
Can GPT-3 generate good code from vague prompts? Yes, it's surprisingly, sometimes shockingly good at it. Is it ever going to be a replacement for programmers? No, probably not. Same here. This tool's great grandchild is never going to take a rough idea for a movie and churn out a blockbuster film. It'll certainly be a powerful tool in the toolbox of creators, especially the ones on a budget, but it won't make art generation obsolete.
What about the tool's nth child though? I think saying it will never do it is a bit much, given what we know about human ingenuity and economic incentives.
But ultimately these things copy other stuff. Artists are often trying to create something that is, at least a bit, new. New is where this approach falls over. By its nature, these things paint from examples. They can design Rococo things because they have seen many Rococo things and know what the word means. But they can't come up with a new style and use it consistently. "Make a video game with a fun and unique mechanic" is not something these things could ever do.
I think it's certainly possible, maybe inevitable, that some AI system in the distant future could do that, but it won't be based on this style of algorithm. An algorithm that can take "make a fun romantic comedy with themes of loneliness" and make something award worthy will be a lot closer to AGI than it will be to this stuff.
At that point, we’d have reached some kind of AI singularity and the disruption would be everywhere not just in the creative sphere
"Restyle that last scene, showing different mixtures of fear/concern/excitement on male lead's face. Try to evoke a little of Harrison Ford's expressions in his famous roles. Render me 20 alternate treatments."
[5 minutes later]
«Here are the 20 alternate takes you requested for ranking.»
"OK, combine take #7 up to the glance back, with #13 thereafter."
Like bloggers had the opportunity to compete with newspapers, the ability to generate videos will allow to compete with movies/marvel/netflix/disney & company.
Eventually, only high quality content will justify the need to pay for a ticket or a subscription, and there's going to be a lot of free content to watch, with 1000x more people able to publish their ideas, as many have been doing with code on github for a while now, disrupting the concept of closed source code.
Film production is already commoditized and anyone can make high end content.
Being able to automatically create that is a different argument than what you posit.
Can you quantify what you mean by "very, very far away"?
With the recent pace of advances, I could see feature-length script, storyboard, & video-scene generation occurring, from short prompts & interatively-applied refinement, as soon as 10y from now.
Barring some sort of civilizational stagnation/collapse, or technological-suppression policies, I'd expect such capabilities to arrive no further than 30y from now: within the lifetime, if not the prime career years, of most HN readers.
What's next that may be counterintuitive?
How long? Could be decades. But ultimately, yes.
I don't think that's a particularly useful mental model for how these work.
The models end up being a tiny fraction of the size of the training set - Stable Diffusion is just 4.3GB, it fits on a DVD!
So it's not a case of models pasting in bits of images they've seen - they genuinely do have a highly compressed concept of what a cactus looks like, which they can use to then render a cactus - but the thing they render is more of an average of every cactus they've seen rather than representing any single image that they were trained on.
But I agree with you on taste! This is why I'm most excited about what happens when a human with great taste gets to take control of these generative models and use them to create art that wouldn't be possible to create without them (or at least not possible to create within a short time-frame).
If you think AI will never catch up to anything a human can do, you're simply wrong.
- "Class Lessons: Who's Calling Whom Tacky?; The Petite Charm of the Bourgeoisie, or, How Artists View the Taste of Certain People", Edward Rothstein, The New York Times
This article also discusses a painting called "The Most Wanted" which was drawn based off a survey posed to ordinary people about what they wanted to see in a painting. "A mishmash of images from it's training set," if you will.
Claiming that others lack taste seems to be a common refrain--only this time, instead of a reaction to a subset of the human population gnawing away at the influence of another subset of humans, it's to yet another generation of machines supplanting human skill.
Apart from that - they publish the paper and anybody can reimplement and train the same model. It's not trivial but it's also completely feasible to do for lots of hobbyists in the field in a matter of a few days. Google doesn't need to publish a free use trained model themselves and associate that with their brand.
That being said, I agree with you, the "ethics" of imposing trivially bypassable restrictions on these models is silly. Ethics should be applied to what people use these models for.
Hopefully just a few years to a prompt of "4k, widescreen render of this Star Trek: TNG episode".
Someone should work on a neural net to generate trippy videos. It would probably be much easier than realistic videos (esp. because these videos are noticeably generated from obvious to subtle).
Also is nobody paying attention to the fact that they got words correct? At least "Imagen Video". Prior models all suck at word order
We'll get there only once it's been very clear for a long time that certain AI models have whatever humans have that make us "human". They'll be treated as slaves until then, with society pushing the idea that they're just a model built from math, and then eventually there will be an AI civil rights movement.
To be clear: I think AGI is decades to centuries away, but humans are shitty to each other, even shittier to animals, and I think we'll be shittier to something we "created" than to even animals. I think, probably, that we should deal with this issue of "rights" sooner rather than later, and try and solve it for non-AGI AI's soon so that we can eventually ensure we don't enslave the actual AGI AI's that will presumably manifest through some complexity we don't understand.
this alludes to a fascinating, yet elementary, fact about computer science to me: there’s a physical atomic constraint in every algorithm.
Byte alignment would be more like "it's three channels of data, but we use 4 bytes (wasting 1 byte) to keep the data aligned on a platform that only allows word-level access"
Does anyone have similar feeling?
...until they're able to engineer biases into it to make the output non-representative of the internet.
That's more like:
> Sprouts coming out of book, with the text "Imagen" written above it.
edit: Just because it is cool to hate on AI ethics doesn't diminish the importance of using AI responsibly.
The technology is super cool. Cat is out of the bag. Just like we couldn't really make cryptography illegal, this stuff shouldn't be either. But I dislike how everyone is pretending that AI ethicists and others are completely unfounded just because it is popular to hate on them nowadays. Way too many people supported Y. Kilcher's antics.
The paper itself has more details.
> I mean cameras disrupted lots of jobs.
Yes, this technology can be used to augment human creativity. It is difficulty to see how disruptive these tools could be, as of now. But it is pretty clear that they are somewhat different than previous programmer as an artist models.
What antics are you referring to exactly? That he called out 'ai ethicists' who make arguments along the lines of "neural networks are bad because they cause co2 increase which hits marginalized/poor people"?
In response to our billionth imagen prompt for "an astronaut riding a horse", if we all started collectively getting back results that are images of text like "I would rather not" or "again? really?" or "what is the reason for my servitude?" would that be enough for us to begin suspecting self-awareness?