Art direction vs artificial intelligence: A helpful tool or an added hassle?
AI-powered text-to-image tools are presenting a moral quandary in terms of creative ethics. But is there a way to use such tools while expanding artistic potential?
In this third article of Shades of Intelligence, a new series investigating artificial intelligence and the creative industry, we look at how machine learning might reshape the research stage of a project.
Whether you’re curious or cautious about machine learning’s effect on your practice, learn where it can be implemented – and the positives and negatives – via creative case studies and use cases by those in the know. This article investigates the visual research stage of a project, from moodboarding to art direction. For further investigations into AI and the creative process, head here.
As things stand presently with artificial intelligence, it’s fair to say that most creatives have taken a few cautious initial steps into using machine learning. As our survey results suggest, 83% of creatives have already used AI tools, but wariness increases when the possibility of using these tools to actually create imagery comes to the fore. Questions of originality, bias and visual quality arise when text–to-image tools are brought into the equation. It raises the issue of whether it is ethically just to use text-to-image programmes, and if in fact the images such tools conjure up are even any good.
To get a deeper understanding of this technological advancement turned moral dilemma, we speak to three individuals currently adopting machine learning to create visuals across moving image and animation, illustration and graphic design.
Step One: Moodboarding
One creative whose practice incorporates text-to-image tools is Berlin-based experimental animator Dom Harwood, who releases work under the alias Infinite Vibes. Originally a sound designer, the artist was initially attracted to AI as it offered a playground “ripe for experimentation,” as he describes it. A little over two years ago, a friend recommended a very early iteration of a Google Colab AI notebook – where users can write code in a browser format – and Dom, being “a bit obsessive by nature,” was hooked. Since that fateful day in June 2021, he can count on one hand the days he hasn’t opened an AI tool or programme. He’s also busier than ever. “I’ve been freelance for almost 15 years but it’s gone crazy,” he tells It’s Nice That. “I’ve got way more offers than I can do, and it’s the first time in my life that has happened.”
There are a few factors behind this. Firstly, there is an ever-increasing demand for moving image as a creative skill. Dom can also slot into any creative team working in this discipline and is often brought onto projects by directors to sit amongst the usual roles of colourists and cinematographers. Given his deep understanding of AI’s capabilities, he can also create work fully on his own, and in a style he has taken the time to develop – an approach that is mindful in respect to other artists when it comes to moodboarding, but remains open-minded to AI-generated possibilities. “Art has been my life, for my entire life. I have a lot of sympathy and compassion for traditional artists who are worried. A lot of people who are into this stuff come from a coding background and don’t empathise with artists,” Dom says. “It’s really immoral to steal style and this technology makes it so easy. I’m not creating typical stuff and I’m clearly not ripping off modern artists. I am very careful about that and put a lot into work that’s not AI.” In short, the work of Infinite Vibes is, he says, “augmented by AI, rather than being dreamt up by it.”
A recent example is Infinite Vibes’ music video for Jessy Lanza, I Hate Myself, created entirely by Dom and 15 separate AI tools. A dreamscape of images of Jessy presented as a variety of artworks, I Hate Myself began with Dom’s usual starting point of stepping away from the digital world and heading to the park or a cafe with a notebook to develop ideas.
The initial visualisation of these ideas will be created by Dom, usually in Blender, but then fed into AI tools for expansion. In this case, that first step began with the idea of a model of Jessy. To make a custom model, Dom was sent press photos of the artist to use in DreamBooth within Stable Diffusion. Here, the model can generate contextualised iterations of the image it is supplied with, allowing Dom to use “a few different methods of teaching Stable Diffusion the concept of Jessy Lanza,” as any creative would when building out the moodboard of a concept. From here, Dom moved into Stable Diffusion XL to create images in high resolution, before switching to Photoshop to extend these to the 16:9 ratio needed for a music video, using Photoshop’s Generative Fill tool. To give the AI Jessy Lanza the ability to sing, Jessy sent Dom an acapella version of the song to be used in another AI tool called SadTalker, which Dom used to make Jessy’s mouth and face move in sync with the song.
The animation phase then called for a whole host of extra tools. Stable Diffusion is back open again here, but specifically Stable Diffusion Warp Fusion, which specialises in taking in video files so Dom can augment his animations with AI generation. Stable Diffusion can also be run locally on your own computer and, as it’s open source, “there’s a huge community of people building tools for it,” unlike similar tools Midjourney and Dall-E, Dom says. In Warp Fusion, Stable Diffusion is applied on top of the footage supplied, but uses a process called “optical flow” to keep the finished product coherent. “The issue with AI animation is if you’re feeding frames through, it will see every frame as a new image, applying a new diffusion over the top. You then get these crazy, stuttering, flickering videos,” Dom explains. “Warp Fusion predicts where the movement is and keeps that the same.” Animate Diff can also be used to do this, “which completely removes the flicker, but you have less control over the prompting.”
This plethora of tools is then also combined with interfaces that the animator uses, such as Google Colabs and Automatic 1111, but also node-based interfaces like Comfy UI, where he can create his own workflows from nodes. If this all sounds a bit complex, Dom clarifies that “if you’ve ever used Touch Designer, Houdini or Blender, that’s a node-based workflow.”
As and when to pick up and put down these tools then follows a natural process. Like any other medium, using any other tool, “It’s just about iterating towards a feeling, and then iterating further until it feels right,” Dom says. In the output of Infinite Vibes, Dom uses these programmes like others pick up different paintbrushes and his own ideas and references still lead the work. As you can see in his lengthy process, AI doesn’tautomate his creativity, a process which still needs clarification from those who commission him: “Clients still don’t know the limitations... It’s amazing, but it’s not a magic button.” But, it’s also a process he treasures. “I instantly lose interest if I feel like I’m filling in the blocks a computer came up with,” he says, referring to his aversion to using AI for idea generation. “Yes, it’s useful for corporate work when you need something that’s good enough but – as someone who doesn’t come to artistic work with that perspective – I have no interest in manually colouring in the book an AI has come up with. Why bother doing it at all?”
Step Two: Commissioning
Seba Cestaro is another artist – this time in the field of traditional illustration – who references their own catalogue of work to expand with AI. Based in Buenos Aires, Seba, on the surface, is a traditional 2D illustrator with an enviable list of sought-after clients, from The New Yorker to Netflix. The strength in his portfolio, particularly in the context of editorial commissions, is Seba’s ability to construct entire alternate worlds with a (digital) pen and paper. Although on first glance viewers will see nods to analogue qualities in texture, linework and colour palette, many of Seba’s newer, fascinatingly constructed environments, are thanks to AI.
Although he has what he calls “a natural curiosity for emerging technologies”, at first the world of generated artworks felt far removed from the corner of illustration Seba had settled in. “I saw it as something distant, and wasn’t sure how it could influence my work,” he tells us. However, he noticed his peers adopting these technologies, largely by sharing work on social media. “It sparked my curiosity and compelled me to explore and understand what was happening in this field,” he says.
Particularly interested in artists who were using generated imagery from text-based prompts, Seba saw the opportunity to embark on “an entirely new experience for me,” he says. This sentiment is relatively unique to Seba, according to our survey results, when compared to his peers. Respondents from the field of illustration were the least likely to have used AI in their practice, and were also the discipline most worried about AI’s impact on employment: 71% of illustrators stated they were concerned about how AI will affect their jobs, in comparison to 37% of all other respondents. Such concern is of course justified, given the machine learning advances we have seen so far. There is concern about text-to-image tools mimicking the style and hand of contemporary illustrators, resulting in currently commissioned work being brought in-house and increasing worries around copyright infringement in the already grey area of “owning” a style.
Unfortunately, opportunities to curb these developments are often out of the hands of illustrators. It’s on the shoulders of art directors, studios, agencies, brands and so on to create work with the individuals who have inspired its direction, and it’s our view that this level of honesty and transparency should be a given in our industry (whether you’re using an AI tool or not). However, until potential regulations are in place, one way to gain a better grasp on AI’s impact on illustration is to experiment, as Seba has.
Mainly working in Midjourney, Seba’s current technique is to blend already made illustrations into new ideas, while maintaining his original style. “If the result meets my expectations, I keep one of the images,” he says. “If I feel there is more to explore, I request additional options from the AI.” Then, he’ll take matters back into his own hands, opening Illustrator and vectorising the result and editing how he pleases. “Sometimes, I use only a portion of the image from Midjourney and illustrate something new around that piece,” he expands. “Once I have an image I like, I take it to Photoshop to work on textures, lighting and shadows.”
Across Seba’s current portfolio you’ll see original 2D illustrations lifted into 3D renderings, still with a tint of an analogue shade thanks to this process. But there are also more photographic renderings, with nods to the original via colour palette and objects rendered into real life. “In this case, I use one of my illustrations as an input to which I add prompts (text) to help generate a new image,” the illustrator explains. “The AI takes that information and returns a set of four images based on it, which sometimes align with my expectations and sometimes take unexpected turns.” At times it can take Seba days to reach an image he likes “and that is the part I enjoy the most of the whole process,” he says, “the search for something new and unusual.” In short: “AI empowers me to extend the visual narrative of my artworks.”
In only using his own work to generate new work, Seba’s quick to clarify he can “only speak about the way I use artificial intelligence”, when it comes to artists’ work being used in wider data sets. Interestingly, Seba’s AI experiments are also continuously well received amongst his community online. Much of this respect seems to be the result of Seba outwardly demonstrating his process, posting the original, the failures and the final AI-collaborated piece together. “I believe that showing the visual progression of a work from conception to completion can inspire curiosity and value,” he says. “In addition, my approach to AI focuses on free exploration… This constant search for innovation and authenticity is what can help create a connection with those who appreciate my work.”
This openness to “multiply my artistic possibilities” has encouraged Seba to also see future possibilities for collaboration. Considering much of his work will be in tandem with an art director’s brief, the illustrator states how integrating AI into this back and forth could be viewed “as a positive thing,” he says. “I think it opens a door to a closer collaboration between [illustrators and art directors].” For example, Seba recently collaborated with fellow AI-inquisitive artists to generate prompts separately “and, once we had the images we liked, we merged them to create something new,” he says. “I imagine this same approach could be applied with an art director who set out to work with AI. We could each bring our perspective to the table and, from there, we would begin to generate images that would gradually approach a final result that we both agreed on.”
Step Three: Art Direction
Outside implementing AI as part of the dialogue between art director and commissioned artist, there are of course agencies developing projects using these tools in-house. An example from earlier this year is Base Design’s campaign for opera house La Monnaie, created by its Brussels team. Briefed with the theme of “fate” for a season centring Wagner’s Ring, Base pitched the idea of leaving artwork generation in the hands of AI – a concept loved by the designers creating the work, and the opera house itself.
Not only offering a conceptual nod to La Monnaie’s programming, using AI in such a large campaign seemingly offered Base the opportunity to create a boundary-pushing, experiential identity system. Not to mention the agency would ignite the talking point of being the first opera house to do so. Ironically, it would also give La Monnaie a unique level of control in asset creation, pitched by Base as a flexible design system in a toolbox to work with. But it was arguably a risk. A divisive emerging technology combined with a collection of epic operas from the 1800s, delivered as a malleable toolbox of assets Base would hand over to non-designers. However, after 15 years of working with Base’s design director Aurore Lechien, “La Monnaie trusted us, which encouraged us to go this far in the concept,” says Manon Bails, a strategist and copywriter at the agency.
Sitting down with the Base team today, a mixed feeling towards the final result is evident. In the first few weeks of the project, each team member recalls a distinctive level of excitement towards the use of AI, sentimentally likening it to the same feeling they had as students. “It was so fun,” says design director Aurore plainly. “We just had fun for hours. Playing with the tool, getting it to improve, generating an awful picture… and then a nice one. It was really just a great moment.”
What Base was developing at this stage was a series of visual imaginings in relation to the opera being created. Developed in tandem with the opera house, the Base team would input parts as they received them into Dall-E, which was chosen as it “provided the best results in terms of its weirdness and imperfect generation,” adds Arthur Dubois, Base’s motion designer. Arthur then used Google Colab to “inject life into still images” through animation, also working with Tokkin Heads, “an AI that generates characters and the facial features to apply a templated animation, or just record with your webcam to create an animation of the face,” he says. Runway was then also used, due to its framing interpolation, which generates in-between frames for a type-to-image transition. The final AI tool the team employed was Upscayl, used to upscale every image generated to principal resolution.
Although these steps appear simple when noted down, the actual process of working with AI in a design context wasn’t without its challenges. Firstly, the imagery generated by Dall-E tested the creative limits of an identity system such as this. “It really allowed for an aesthetic that felt less corporate or commercial,” says Bruce Vansteenwinkel, a designer at Base. But at times these images weren’t appealing from an aesthetic standpoint. Thankfully, the animation techniques used “really helped us to increase the weirdness of each visual,” adds Aurore. “Even more so when we weren’t really fond of an image, adding motion could help bring the result we wanted.”
The process of then delivering the work also proved harder than originally anticipated. When it came to collaborating with La Monnaie’s team “we realised that it was a bit more hassle,” explains Bruce. “We sold it as a toolbox. A way of collaborating where we could create images, they could create images, but that creates a lot of opportunities for new misunderstandings and new frustrations. Things became out of hand.”
However, losing control led Base to realise the necessity of human-led design when collaborating with artificially generated artworks. “It’s funny. We sold them a concept of losing control, but then we lost so much control we had to gain it back,” Bruce says. “Right before launching there was an internal crisis because some of the images we’d developed were less striking than the originals we had presented. That was the concept, but because we’re designers – neat freaks, control freaks – we wanted to regain control of the image. The delight we had in the beginning turned into despair, and maybe even disappointment.”
The final visual not representing an art director or designer’s initial vision is a possibility on any project. In the end, Base’s team placed themselves back in their usual role in a brief such as this, inputting details like the aforementioned animation techniques to tie the campaign together. “We were a bit naive,” says Aurore. “A tool where both we can create images, and the client can create images sounds wonderful, but looking at what they’d done we realised we were a very important part of the process. We have a culture of image. They have a culture of content. We understood that we were still needed, which is the whole question around AI.”
Like Seba’s communication approach, since completing the La Monnaie project Base has been extremely open, and proud, of this use case of AI as art direction and design. As pointed out by Manon: “It’s our duty to use these tools and see where they can bring us to.” There has also been little backlash towards the agency, even though its team have used it so obviously.
Arguably this is due to the fact that AI was only implemented due to its relevance for the overall theme of the opera season at hand, rather than “an excuse not to work, or have AI as a buzzword,” says Bruce. “On this project I received a lot of questions about whether AI is going to replace our jobs. In our personal experience, I think in many ways it can. But that’s more of a choice than an actual fate you have to accept. There are a few agencies working towards owning AI as their unique selling point, but I’m not sure whether that’s the strongest way to move forward with design at large. Choosing when and how to use it, maybe a little sparingly as well, evades the question of whether it will take our jobs.” Interestingly, since completing the project Base hasn’t used AI tools to this extent again, because, with this experience in mind, the concept hasn’t called for it.
Within the context of using AI tools creatively, the moodboarding, commissioning and art direction steps discussed here are arguably the most divisive use cases. In curating contributors for this series, we’ve been careful to collaborate with individuals whose use cases of AI mirror our own sentiment, particularly in how they treat data sets that include other artists’ work.
There are, of course, many others who do not adopt such caution, and there are legitimate concerns about certain tools we’ve mentioned. For example, earlier in 2023, a group of artists filed a lawsuit against Stability AI, Midjourney and DeviantArt for copyright infringement on their outputs – a case later dismissed. There is also currently little information about the data gathered to produce such outputs, leading to increasingly concerning cases of bias, an area of ethics explored in a piece commissioned by It’s Nice That earlier this year by artist Linda Dounia Rebeiz. Although the project cases referenced here voice the positives and negatives of using AI tools at these specific steps, we do advise that text-to-image technologies are used diligently by the wider industry, a prospect further explored in the final article of Shades of Intelligence below.
Google Colab: A Google Colab is an “executable document that lets you write, run and share code within Google Drive”. Based in browser, it is particularly useful for machine learning.
Stable Diffusion: Stable Diffusion is an open-source, text-to-image model developed by Stability AI. Using text prompts it can create lifelike, detailed imagery which, while potentially useful, has raised alarm amongst the creative community for its ability to mimic style.
Stable Diffusion XL: Stable Diffusion XL is one of the more recent updates as part of Stable Diffusion which can create images with shorter descriptions, and at a higher quality.
Generative Fill: Featured as part of Photoshop, Generative Fill is a tool built to help edit images, either by extending the image at hand or removing content. As part of the Adobe suite, it’s arguably the most widespread visual AI tool currently available.
SadTalker: SadTalker is a talking head generator which can create “3D motion coefficients” such as poses and expressions by using 3D modelling techniques, ExpNet and PoseVAE.
Stable Warp Fusion: Another tool as part of the Stable Diffusion open-source family, Warp Fusion can be used to create moving image visuals using artificial intelligence to turn video into animation.
Animate Diff: Animate Diff is a further AI video generator which can create animate text-to-image prompts. Unlike other tools, it doesn’t require specific tuning.
Stable Diffusion Control Net: Another open source tool available, a Stable Diffusion Control Net creates a neural network which in turn adds extra conditions to image generation. This can allow users to create parameters to control image creation dependent on their needs.
Automatic 1111: Also known as A111, Automatic 1111 is a web interface for individuals working with Stable Diffusion.
Comfy UI: ComfyUI is a node-based modular user interface for Stable Diffusion. It allows you to create complex workflows and even make new custom nodes if you can code. There's a sizable community of users sharing workflows and new nodes to expand what it's capable of.
Midjourney: Midjourney is another available AI tool which can create text-to-image generations from inputted prompts. It is currently available via Discord where users will receive four images to a prompt, before choosing which they would like to upscale.
Dall-E: Developed by OpenAI, Dall-E is a further text-to-image AI tool currently available. Users can create new imagery from prompts or “outpaint” to expand images, “inpaint” to create realistic edits. You can also create “variations” on inputted imagery resulting in different variations.
About the Author
Lucy (she/her) joined It’s Nice That as a staff writer in July 2016 after graduating from Chelsea College of Art. In January 2019 she was made deputy editor and in November 2021, became a senior editor predominantly working on It’s Nice That's partnerships. Feel free to get in contact with Lucy about creative projects for the site or potential partnerships.