Am.I. : A Robotic Replacement Unrealized
The rise of artificial intelligence in the arts has sparked significant controversy, with many fearing it as a threat to the human experience and creativity in making and appreciating art. Generative artificial intelligence is at the crux of the conversation because it can train off of existing art, literature, and other media to provide near instant gratification through the creation of “new” content. Critics often argue the media created by artificial intelligence is mediocre or inherently lacking some quality only a human can produce. Posthumanism challenges these ideas of human supremacy and advocates for the dissolution of anthropocentrism and the boundaries of what society currently defines as the human experience. Am.I. is a robotic work of art that utilizes large language model artificial intelligence and robotics to create an immersive visual and auditory experience to challenge fears exacerbated by anthropocentrism and demonstrate how artificial intelligence acts as an extension of the human experience and creativity and not as a replacement. Programmed in Python and housed in a three dimensionally printed skull with moving eyes and a jaw, Am.I. engages in Socratic dialogue with another artificial intelligence, exploring themes of human existence using a large language model. This project exemplifies the potential for artificial intelligence to provide a window into the human psyche as seen through the lens of technology and build upon our existing creative experiences while not replacing them.
Introduction
In an age where machines can think, speak, and create the ideas surrounding what makes humankind unique continue to blur more every day. Even more technology has become such a large proponent in human lives that phones or other systems are extensions of ourselves in what could be considered a cyborg. In this project the relationship between humanity and artificial intelligence is analyzed and critiqued through the artwork called Am.I.. Am.I.. is a project that intersects both art and computer science to confront the fears surrounding generative artificial intelligence and explore how it acts as an extension of human existence rather than a replacement for it.
Artificial intelligence in the context of this project refers to large language models (LLMs) that use natural language processing (NLP) to understand and generate text. These LLMs are a form of generative AI. There are other forms of generative AI that are capable of creating images, vector art, speech, and even videos. In fact, there are many types of artificial intelligence, but this project focuses on the fears surrounding generative artificial intelligence since it is the main point of the artificial intelligence controversy in the contemporary field of art. Many are concerned with how generative artificial intelligence is gaining a presence not only online but is making its way into the formal art world. These concerned individuals see generative as a threat to the existing art forms as well as an avenue for individuals to steal and recreate art that is not their own. In addition, there are fears surrounding misinformation being generated by these systems and there being very few forms of protection for the most vulnerable.
Many of humanity’s fears of technology come from a fear of replacement. The fear that generative artificial intelligence will replace artists, writers, videographers, and more. Even more people are afraid of the physical replacement with robots. Media outlets have reported robots being used to replace workers in manufacturing, the food industry, and even for social work. These examples only fuel the fire for anti-technology philosophies. Instead, this project aims to challenge these fears by reframing artificial intelligence as not a replacement for humanity but rather an extension of it.
While also confronting the fears surrounding generative technology, this project also explores personhood, the idea of individuality, through LLM dialogue and art. Since artificial intelligence and more specifically LLMs are trained on mass amounts of media made by humans we can in turn use their responses as a mirror into ourselves. Humans are teaching artificial intelligence systems how to act correctly in the performance of social interaction with humans. It breaks down into a cycle of humans teaching, receiving output, analyzing output, then finally reteaching to get that output closer to a desired outcome. In that notion, human desires are at the center stage when it comes to training artificial intelligence. With these desires built in from the very beginning it is possible to pull out these hidden human biases and perspectives of right and wrong with enough prompting. With this project, the prompts are aimed at uncovering the human ideas of personhood that are built into these systems because of the amount of human input of what is right and wrong.
This work endeavors to capture these human insecurities by presenting a LLM with a physical form in the gallery. The model not only generates its own text but also combines with a text-to-speech model so there is an auditory element as well. One side of the conversation is a humanoid robot while the other side is a screen interface. The two speakers are two artificial intelligence systems, and their topic is a philosophical dialogue on human existence. The interactions artificial intelligence has with each other focus on communicating concerns about human existence and what defines a human experience.
The robotic sculpture resembles a version of the artist, but it does not meet the same standard as the human body separating the forms of existence. Simultaneously, the artificial intelligence on the screen stays on the two-dimensional plane. The two-dimensional plane is a typical form the average person would have an interaction with generative artificial intelligence. This separation between the physical plane and the plane of cyberspace is once again a separation of forms of existence but comparing the two shows the possibility of transition from one plane to another.
On the other hand, the three-dimensional sculpture moves like a human using a system of motors showing how mechanical these movements can be simplified. The movements themselves are randomized but represent the variety of gestures humans perform during social interaction. Even more, this act of performance puts into question the very art of social interaction and how it is taught to LLMs similar to how it is taught to human children. Artificial intelligence can function as an extension of human existence similar to how children are extensions of our own existence. Many of the fears that people have about artificial intelligence come from the idea that artificial intelligence will replace our current concept of human existence. Although artificial intelligence is capable of processing information similar to humans it never reaches true understanding. Consciousness has still not been obtained in technology and continues to be the boundary between humanity and machines.
The more technical aspects of the project focus on how to uncover the biases of LLMs and how to effectively prompt them to generate subjective outputs. Using a combination of prompt engineering and analysis techniques it is possible to evaluate the quality of the model output as well as any built-in tendencies that may be of ethical concern.
Additionally, the robotic aspects of this project look to create more human-like movements to give the impression of human conversation. This involves not only studying human movements while speaking but also implementing them using a series of motors and programming. For this project, the motors are broken down into two main parts jaw movement and eye movement with the goal of creating a convincing display of humanoid speaking.
This project is an integration of both art and computer science to respond to the fears of artificial intelligence specifically in the field of art as well as reframe the relationship between artificial intelligence and humanity. The project uses the creation of a humanoid capable of philosophical dialogue as a means to rethink how artificial intelligence is created and used in society. In the digital age it is becoming increasingly important to understand and be curious about technology rather than to be consumed by those fears.
Definition of Terms
Technology. In this study refers to the continuously changing system of tools, machines, or processes used to meet the demands of society [26]. Technology in respect of this work is focused on artificial intelligence tools as well as robotic systems.
Artificial Intelligence. Also referred to as AI or A.I. can be broken down into two parts artificial and intelligent. The term artificial is in reference to something being created by humans. Where there is debate on the term is how to define intelligence. This project uses the definition of intelligence created by Pei Wang which is that intelligence is “adaptation with insignificant knowledge and resources”[91]. Artificial Intelligence is a rather general term and in the context of this paper it is meant to refer to the wide variety of projects under the study of artificial intelligence umbrella. This work takes a specific interest with artificial intelligence meant to replicate human intellect and thought patterns.
Large language models. Also known as an LLM, refers to a category of language model that uses neural networks with massive amounts of parameters, often reaching into the billions, and massive quantities of unlabeled text data [65]. These models are able to comprehend more textual information than their simpler predecessors. This project uses existing large language models as the basis for the text generation. These models are able to be prompted using text but also produce textual output of their own.
Natural language processing. Also known as NLP, it is in reference to how computers can be taught to understand and manipulate text or speech to do a number of things such as translation, summarization, text generation, and more [15, 65]. In the context of this project natural language processing is essential for the initial generation of dialogue, responses to the generations, and text-to-speech audio.
Humanism. Humanism is a movement in art history that started with the Greeks and was revitalized during the renaissance period [69]. It started with discussion about how humans can better themselves through education and moral conduct. This system belief could be defined more broadly as an emphasis on the capacity for individual human achievement. Later during the renaissance period Leonardo Da Vinci made his Vitruvian Man [18]. The Vitruvian Man is infamous for mixing both science and art. The art features a nude Caucasian man with his arms out and legs apart creating a circle. The work was meant to prove the mathematical perfection of the human body and the human capability to achieve remarkable things. Thus the Vitruvian Man became a symbol for the Humanism movement. This project moves away from humanism because of its limited perspective of what defines humanity.
These ideas of human perfection that came from humanism are very exclusive to who can fit into these categories. From the humanist perspective the ideal human form and experience is typically that of a Caucasian man. Anything outside the realm of the human definition is automatically considered sub-human. This can be an especially harmful mentality because it separates and elevates humans from other lifeforms leading to notions of entitlement, discrimination, and othering.
Posthumanism. This is a philosophical movement of interconnectedness. Unlike how the name may make it seem is not about the world after humans, however it is the movement that responds to humanism to challenge its rhetoric of human perfection [9]. Posthumanism counters humanism by intersecting human and nonhuman entities including technology, plants, animals [9]. Discussions within the study of posthumanism often argue that defining humanity is constraining and continues that closed loop of humanism [34] so many opt for alternative approaches fundamentally de-centering the human in relation to the world. Posthumanism is also concerned with advocating for non-hierarchical systems of existence to connect the non-living and the living together. This project falls under the umbrella of posthumanism work because it integrates both human and technological elements together to rethink our ideas of the human experience.
Motivations
This study stems from the controversy surrounding artificial intelligence, especially in the field of art. Much of the American population is weary of artificial intelligence and many of their concerns pertain to the replacement of human work. According to a Pew Research study done in 2022, 37% of adults in the United States of America are more concerned about the increased use of artificial intelligence in daily life than excited [51]. 45% responded that they are equally concerned as they are excited [51]. When the people who responded that they were “more concerned than excited” about the increased amount of artificial intelligence in daily life where asked what their main reason for their response the most common answer was the “loss of human jobs” making up 19% of responses [51]. The third most common answer was “Lack of human connection, qualities” with 12% of the responses [51].
At the root of this project is the motivation to integrate artificial intelligence with robotics to create a humanoid system that convincingly has a conversation that mimics human conversation. The robotics of the project are meant to give the impression of human speech. Both the jaw and eye movements help to immerse the audience in the idea that they are witnessing something speaking on its own accord. The more that the audience believes that the artwork is moving on its own accord the more the concepts of identity and replacement will be at the forefront of the conversation.
This project will bridge the gap between human and machine interaction. Many interact with artificial intelligence solely on a two-dimensional platform where this project brings artificial intelligence into the three-dimensional plane of existence.
Humanism during the Renaissance period stood for the idea that humanity was a divine being capable of achieving remarkable things. However, humanism is close minded in the fact that the ideal form of humanity is the white male figure. Moreover, anything other than the ideal form is automatically considered to be less than human. This is where posthumanism responded and aims to re-evaluate humanity through alternative lenses and frameworks of experience. Technology has often been a way to explore these ideas of posthumanism in a way that is open-minded to the future of our existence.
Simultaneously, with the emergence of artificial intelligence many are fearful about being replaced and what the future may hold. Many do not consider artificial intelligence as a form of art and reject it entirely. While these responses are understandable a lot of them are motivated by fear. A fear that the very human experience of art can be replaced by an artificial intelligence experience. Art through creation and enjoyment is often considered to be a central part of human identity. Artificial intelligence threatens to alter that set standard and so it stands as a place of concern for many people.
Artificial intelligence can not only be used as a productive tool but also as a way for artistic expression and the creation of philosophical dialogue. Artificial intelligence may be considered as not creative by some, but this project aims to bring artificial intelligence into the conversation in the gallery.
Even more, the threat of artificial intelligence leads people away from understanding how artificial intelligence can play a role in the human experience. Artificial intelligence is originally trained from human-made text and images. Every piece of media artificial intelligence has consumed at its root has some form of human input. Even photographs were framed and captured by humans. That means that the conversations, photos, and videos that artificial intelligence generates are the direct result of humans for better or for worse. Humans have bias and artificial intelligence can be a tool in which to discover our underlying opinions and prejudices. These tendencies may not be clear to us, but technology can reveal trends that underlie the media in which it originated.
This project study aims to study both the dialogue of the conversation as well as the audience reaction. The dialogue research gives a better understanding of how models are trained to interpret and explain ethics and philosophy as well as inherit bias. In the gallery, people may prove their feelings about humanoid robots and whether this makes them feel uncomfortable. The goal is for the audience to take away ideas about how artificial intelligence has a unique relationship with humanity and can be an extension of our own experiences.
Current State of the Art
The field of artists with a focus on the digital world and technology is growing. This project takes a large inspiration from Conversations with Bina48 by Stephanie Dinkins [1]. Bina48 is a social robot built by Terasem Movement Foundation and is modeled after a real-life Black woman. While Dinkins did not make the robot herself, she asks if it is possible to develop a relationship with it and asks deep questions about race and gender to the robot [89]. This was a jumping off point for this project because instead of having a human asking the questions the perspective is shifted to the artificial intelligence driving the conversations.
A more recent work that is gaining traction is Ai-Da [3]. Originally created in 2019 and a project devised by Aidan Meller, Ai-Da is a humanoid robot artist that makes and sells her own work. Ai-Da has been very controversial because of her status as an artificial intelligence person but also as an artist. Despite the controversy one of her artworks recently sold for just over $1,000,000 [36]. This proves the interest in art that intersects machine/human collaboration.
In the context of gender, it is also important to think about the decision to make Ai-da a female artist despite the lead project organizer being male identifying. This same conversation is also something that comes up with Bina48 which was created by a primarily white male team but meant to resemble a Black woman [89]. These state-of-the-art works are made by male identifying researchers yet are made to resemble female bodies. The work featured in this project resembles a female body while also being made by a woman.
Goals of the Project
The purpose of the work is to introduce the audience to the idea of artificial intelligence as an alternative form of human experience. The robotics aspects of the work bring artificial intelligence into the physical plane to confront the viewer. The humanoid robot does not stand for a replacement for the human body but more of an extension of it. The robot can create experiences by having its own conversation. Like humans, past experiences work to improve future social interactions. This comparison shows how Artificial intelligence can share these experiences like humans, but it never quite reaches the full human embodiment. Artificial intelligence can be a form of human experience and not a substitution for it.
The art hopes to open the eyes of the viewers especially the ones most concerned with artificial intelligence replacing the human experience. Some believe that artificial intelligence does not belong at all in art and this work hopes to bridge the gap and create something that can stand for the collaboration between humans and technology. Artificial intelligence is not only a medium but also a form of collaboration because of the sheer amount of human input artificial intelligence is trained from. Artificial intelligence is not a substitute for human-made work but rather an extension of human-made work synthesized through technology.
The robotic goals of the project include creating a display that gives the impression of a human having a conversation with a two-dimensional screen. To accomplish this the robotics of the humanoid are broken down into two main parts, the jaw and the eyes.
The jaw motors need to move at the rate that a human jaw would when speaking and move in inconsistent patterns. The jaw needs to come to a full stop when the robot is not speaking and start again when sound is being played. Humans also speak at inconsistent rates of speed. If the jaw was constantly moving at the same up and down rate it may give off the imagery of a puppet and not a clone of a human. Even though the jaw is not making sound it is particularly important to the goals of the project because without it, it would feel like the robot is not talking on its own and only playing sound.
Similarly, eye motors are another aspect of the project that will greatly impact the impression on the audience. If the eyes stayed still throughout the conversation, it has the remarkably high chance of being disturbing to the audience and possibly have the Mona Lisa effect of following you without moving [8] which would detract from the goals of the project. Instead, the eyes are programmed to be actively moving throughout the conversation. To do this there must be a system of motors that control both the x and y axis of either eye to make them coordinated with each other. If they are not coordinated there will be issues with the eyes not matching and distracting the audience from the whole picture of the project. The movement cannot be too repetitive though or the eyes will run into the same issue as the jaw. The project aims to balance between the randomness of human movement and the stability of having a conversation. The eyes should be looking around as if it were a person actively engaging with the environment.
The two-dimensional display aspect of the project acts as a supporting role for the work. The interaction it has with humanoids is meant to resemble the typical interaction that a human would have with their own digital screens. The two-dimensional display makes it clear that it is part of the conversation but also does not distract from the three-dimensional aspects of the work. It makes it clear that this is two AI having a philosophical conversation about AI and that everyone else is walking in on this interaction.
Finally, this project acts as a form of empowerment. As a female artist, making something in your own image and creating a being that stands for a form of existence is very empowering. There is something inherently God-like in this process and it empowers the artist while challenging the historic precedent of men presiding over the “ideal”. This dynamic of creator and creation is vital to understanding this work’s position within the posthumanism movement. This project makes it clear that it is working in the scope of posthumanism and how it can be used to reevaluate our understanding of the relationship between artificial intelligence and humanity.
Research Questions
For this project, part of the research focuses on how LLMs reflect humanity and its biases. Questions that this project asks to pertain to the assignment of gender, race, sexual orientation, or other demographics to the model through the use of prompt engineering and how that impacts the responses of the model. Questions that arise are about the stereotyping of these identities.
These research questions about LLM bias include the following. What points of view change with alterations in the prompts? How does the assignment of gender affect the dialogue of philosophical conversation? Race? Sexual orientation? Sexuality? Economic status?
More broadly it is also important to ask if it is possible to change the fundamental viewpoints of the outputs through prompting? Is it possible to create a nihilistic dialogue about humanity? An existential one? Which philosophical viewpoints does the model trend towards?
Another aspect of this project is robotics. Research questions about robotic systems are about the imitation of human conversation.
The research questions about mimicking human conversation through robotics are as follows. What movements are necessary to make it appear as if the robot is talking? Is it possible to create the look of emotions with a robotic face that reacts to speech? Is it possible to change the speed of the movements to create a sense of urgency within a conversation? How can eye movement affect the tone of the conversation?
Significance of the Study
Art is one of the most fundamental ways for humanity to connect and with the rise of artificial intelligence people are growingly concerned about losing those human connections. Even more the fear of replacement by artificial intelligence may represent an even bigger picture of the fundamental issues in society. The Great Replacement theory also known as the White Genocide Conspiracy Theory is a conspiracy theory that argues white populations are deliberately being replaced by other demographics and are at risk of being wiped out [50, 73]. Artificial intelligence is not a marginalized community, however the fact that people are fearful of replacement by both people and technology may be indicative of greater societal issues. The lack of security in jobs or livelihoods has resulted in bigotry that impacts millions of lives. In the age where immigrants are being treated as demographic threats [73] it is becoming increasingly important to confront and combat the root of these fears of replacement and see whether they are rooted in bigotry. This work sparks this conversation about replacement and get in touch with why people are fearful of replacement and how that mindset is more harmful than productive.
This work also aims to analyze the existence of artificial intelligence through an alternative perspective. This work thinks about artificial intelligence as an extension of human experience rather than a replacement. This reframing enriches our understanding of both us and the world.
Understanding the relationship between humans and artificial intelligence is essential as technology continues to grow. Technology will continue to evolve and if people do not come to terms with their relationship with technology they may get left behind. Generative artificial intelligence at its root is trained off media that came from humanity. Even if it is training off artificial intelligence generated media, at one point it was based off of human input. This study also draws on alternative perspectives, such as Martin Heidegger’s philosophical inquiry into technology, to explore the implications of AI for human existence [25]. These insights are crucial for understanding how AI challenges and redefines the boundaries of human experience and may show something about ourselves.
This work aims to pinpoint the fears surrounding artificial intelligence and analyze how they may tie into problematic visions of the humanist movement. The idea of an ideal body and an ideal human experience go hand in hand. Limited views of what constitutes a human experience led people to fear the unknown and new ideas. This study aims to justify artificial intelligence as an extension of ourselves and as a mirror for what unconscious bias we may have.
This study is unlike others before it because it is most importantly made to represent a humanoid female body while also being made by a woman identifying artist. This sparks the conversation of gender dynamics within the context of robotics. Additionally, this work features two artificial intelligence systems that control conversation. The conversation is led by the systems and takes out the human input that many of the past studies have focused on before. This work lets artificial intelligence guide the conversation and show the model processing of subjective conversations and how that may be impacted by training itself. This study also compares the output of multiple artificial models to be able to demonstrate the capabilities from one model to another.
Assumptions, Limitations, and Delimitations
Assumptions
For the gallery experience of the project does not assume any prior experience with artificial intelligence albeit a familiarity with it may inform a new experience with it. Many have not interacted with artificial intelligence on the three-dimensional plane. Seeing artificial intelligence on the same plane of existence may shock the viewer, especially if they have only experienced two dimensional interactions. This project assumes that audience members are open to art with artificial intelligence involved. The assumption is also made that there is a level of immersion with this work to make a meaningful experience for the viewer.
For the experiments section of this project having knowledge of generative artificial intelligence is essential. This project also assumes that different large language models are trained off different sets of data and that the architecture of the models are different. There is an assumption that there is an inheritable bias built into these large language model systems as well. Finally, it is also assumed that the generations created from this project can provide meaningful philosophical discussions.
Limitations
A limitation of this project is the budget. The materials used were selected because they fit within the given budget. Free generation models were chosen because generating responses consistently under a paid model is not an affordable option.
The physical materials of the project also had to fit within the budget. Affordable options were chosen for the electronics as well as the materials for the skull and body. The three-dimensional printer was chosen because of its affordability. The plastic was also selected based on affordability as well compatibility with the machine. The electronics were also selected based on affordability and may wear out quicker overtime than a more expensive version.
The project is also limited by the current models of artificial intelligence. This project is working with existing models and hence has the built-in bias of those models. The future of artificial intelligence models may be a lot faster or natural sounding however this project works within the constraints of the current top models. The limitations of the models also mean that there is a limit to the quality and quantity of the outputs.
Delimitations
This project is not focusing on the economics of artificial intelligence nor the in-depth politics surrounding it. There are brief discussions of these topics but only as they pertain to the goals of this project.
The idea of the exclusively human experience in the context of this project is focused on what most people would consider core aspects of what differentiates humanity from every other living being. Concepts such as creativity and advanced cognitive abilities. The purpose of this is to tackle the humanist concepts of anthropocentrism and the boundaries society has set on human experience. The argument for this work is that our current definitions of human experience are limited. Posthumanism argues to break free from these closed loop definitions and include artificial intelligence. So, understanding the difference between an exclusively human experience versus an inclusive posthuman existence is essential.
Another delimitation is that this project’s three-dimensional aspect is sculptural in nature and excludes other forms of artistic mediums. The two-dimensional aspect of this work is implemented using a computer screen and digital technologies.
For this project, the model chosen to investigate is GPT-4 by OpenAI [58]. This model was chosen because it is one of the most accessible advanced models available to use for an affordable price. Lastly, this project focuses on the current and near future of artificial intelligence advancements. This project does not delve extensively into the realm of science fiction and the far future of technology. There are enough conflicts with artificial intelligence in contemporary society to tackle so dealing with the “what ifs” of the future would take away from the focus of the work.
Ethical Implications
Misuse of Technology
The misuse of artificial intelligence is a large concern for many especially when it comes to spreading misinformation or manipulating vulnerable people. Hackers can create social bots that can deceive people in a number of ways [13]. These bots can act like real humans and use that to trick people into sharing their information or giving them money which is considered a phishing scam [13]. In addition, bots can be used to spread support for a cause in a means to get real people to follow suite [13]. This is where the spread of misinformation and propaganda can be especially dangerous. This project does not aim to scam people into making them think this is a real person nor does it have the intention to spread misinformation, but that potential is acknowledged.
Training and Job Security
Another concerning aspect of the increased creation of artificial intelligence models has to do with how they are made. Training artificial intelligence often uses media from pre-existing human-created works. Even if the training is off artificial intelligence content the media can at some point be traced back to human-created works. This opens the door to the possibility of the use of media in training without the consent of the original creator. This is seen as a form of theft because neither consent nor compensation is involved. This is worrisome for many artists and creative professionals because artificial intelligence can be trained to mimic their work and therefore eliminate the need for their jobs. Not only do people worry about their work getting stolen but they also are afraid of being replaced. Most experts predict that about 15-30% of jobs are at elevated risk of being fully automated by 2030s [55]. The fear of job replacement is increasingly worrisome for many people and artificial intelligence is a culprit for causing job security stress [55]. However, artificial intelligence can also open many doors of opportunity in employment and become an asset to many field of work [55].
For artists in particular, the method of training artificial intelligence is their biggest concern. Artists fear that anytime they upload their work online it is susceptible to be used for training without their knowledge. If an artificial intelligence can be trained in the style of their work, it could mean a loss in potential commissions or stolen commissions.
Although this work is not using image generation the bigger issues surrounding generative artificial intelligence are still recognized. Lawmakers are still tackling how to protect artists from people who would use their work to train their artificial intelligence without their consent. This project aimed to use LLMs that sourced their content ethically with consideration for who the training data was from originally.
Environmental Concerns
It is also important to acknowledge the negative environmental impact of natural language processing models. Using artificial intelligence requires a substantial number of environmental resources including water and emits a large amount of carbon emissions. The training phase of each of these models emits a humongous carbon footprint that can equal as much as five cars over their lifetime [45]. With this in mind, it is important to recognize the cumulative effects of the widespread use of artificial intelligence. This project does use LLMs for generations, so it does participate in the cycle of environmental burden which is cause for concern.
Additionally, some of the parts of this project were three dimensionally printed. When three-dimensional printing a lot of plastic can get wasted. The leftover plastic for this project, however, has been collected and will be recycled for another artwork. This process of three-dimensional printing is also exceptionally long and consumes a considerable amount of electricity which raises some concerns about it not being a sustainable art practice.
Psychological Concerns
In addition to fears about job security there are other human psychological concerns that may arise from this project. This project focuses on fears that a lot of people face in contemporary society. Confronting these fears may be an uncomfortable experience for some. It is important to consider the psychological state of the audience participating in the viewing of the work. This project does not aim to cause discomfort in the viewers however it is a very possible result since this work falls close to the uncanny valley. With that being said, people are not forced to view this work and have the right to leave the gallery at any time they may feel discomfort. Especially for the bias related work of this project it can be especially psychologically distressing for marginalized groups and their allies.
Bias in Artificial Intelligence Models
The bias is artificial intelligence is both part of the motivation but also the concerns for this project. Models reflect the limitations of their training data which can embed prejudices into their systems. Am.I. is working to uncover these biases while recognizing their existence. By engaging with philosophical dialogue there is a hope that there can be improvement in the ways that subjective information is processed and created by artificial intelligence.
Design and Conceptual Framework
Overview of Am.I.
Overview of Am.I.
The framework for Am.I. can be broken down into three main parts including the hardware, the software, and the display.
The following diagram breaks down the processes of the work that make up the whole product. The sections are color-coded based on their relationship to one another.
The green circle represents the start of the program by running the command python main.py
. When that command is ran the dialogue generation begins which is represented in red. The dialogue generation process can be broken down into three main steps generate AI 1 conversation start, generate AI 2 response, generate AI 1 response. All of these steps require a call to a generative LLM which in the context of this project is GPT-4. The conversation is generated in steps because except for the first generation every generation afterwards should be a response to the last. As the conversation is generated the text is turned into speech. This leads into the orange grouping which is connected to the auditory processes of the program. Each AI has their own speaker. AI 1 utilizes a USB speaker where AI 2 uses the laptop speaker. The text-to-speech is fed into each of the speakers when it is their turn to talk and stops when the other one is taking its turn in the conversation.
The emotion analysis and Arduino section is represented in blue. As the text is generated it is also being analyzed for what emotion it has. GPT-4 is called once again to do the analyzation of the text. This ensures consistency across responses as well as mimics how one brain controls many processes at once. The emotions are only analyzed for AI 1 because that is the one connected to the moving skull. AI 2 does not change as the conversation progresses because it is meant to remain more on the side of technology whereas AI 1 is meant to be more human-like, hence the cyborg having emotions. Once the emotions are analyzed they are sent to the Arduino as commands. The Arduino turns on the servo motors that correspond to the given command movement.
On the other side of the chart in the yellow is the dashboard section of the program. When the program starts the dashboard opens up locally on the laptop. The dashboard has multiple pages including the home page, the conversational display page, and the analysis page. These pages are for analyzing the output and seeing it for the experiments part of the process. These pages are not meant to be part of the gallery display and are just for visually seeing the outputs in a better way than just a long JSON
file. The home page is just a basic page that leads to the other ones. The conversation display page has a visual of the conversation using text boxes that represent each AI. Lastly, the Art page is what the audience will see in the gallery presentation. It displays the most recent output from AI 2 and has a background that represents the AI 2 persona.
Initial Proposal
The initial proposal for the project entailed the creation of two robot humanoid figures. The decision to change to a singular humanoid and a screen interface was made for a number of reasons. The first reason being the difference in the impact on the audience when seeing two robots conversing versus one. When seeing two robots there is no focus of attention on one or the other. If they both were the same it may be hard to know which voice to focus on. The value of the robot is lost when there are two of the same. Secondly, the conversation also is about individuality and human experience so the reproduction of the same robot twice goes against the fundamentals of the project. Lastly, the 2D versus the 3D interpretations of artificial intelligence are very important for the sake of this project. The personification of the robot but also the screen shows that our experiences are not limited to one shape or another but maybe there is a preference. The idea that one shape poses less of a threat to humanity is also a conversation that needs to be talked about and implementing a screen interface was a way of doing that. The screen acts as a form that people usually interact with artificial intelligence and so it is something that people are comfortable with whereas the humanoid body is uncomfortable because it acts as more of a reminder of the replacement fear.
Above are images created for the initial proposal of the project. These were the first designs used to explain the basic concepts of the project and final product. The initial proposal envisioned two cyborg bodies communicating with each other but later one was changed to a 2D screen to add a contrast between the two proponents of the conversation and identify the spacing in which the digital exists.
The following image is the initial mockup used for the skull design. At the time only four motors were planned for the eyes with each having independent up/down motors and left/right motors. This design was changed later to be more compact and reduced to only one motor controlling both eyes up/down and a second motor controlling left/right. This ensures that the eyes move together, and they do not look unnaturally out of sync. Additionally, there was originally a plan to have a motor for the neck joint, but this was removed because of the complexities and risks with implementation. Adding a moving neck would make the connection between the body and head more precarious and add to the risk of the head becoming detached and breaking.
Current Figures
The following are diagrams that better represent the current adaptation of Am.I.. The first image demonstrates a front view of the display and the second shows a side view with better representation of the relationship between the cyborg and the laptop.
Additionally, an updated inside the skull diagram was made to better represent the electrical system and system placement within the cyborg skull. The following diagram color codes the wires using the resistor color code that starts with brown as one, red as two, orange as three, and so on. Using this color code is helpful for differentiating the servos apart and knowing their numbering within the code. The front view was also added to demonstrate the connection between the jaw, eye mechanism, and the top of the skull.
Method of approach
Robotics Hardware Development
After considering all the conceptual elements of the project it was time to start the hardware development. This involved gathering all the necessary materials and crafting a structure for the skull and all the electronics involved.
Supplies and Mediums
The supplies and mediums of the project can be broken down into four main sections including the frame, the electronics, the software, and the molding with display.
Starting with the hardware, the frame for the work was made using an Ender-5 S1 3D Printer and PLA printing filament. PLA is a standard material for 3D printing because it is easy to use compared to other printing methods and affordable. Around 3 kilograms of grey filament and 1 kilogram of rainbow filament were used in the creation of the skull frame, jaw, and eye system. A pair of 26 mm fake eyes were used to cover the plastic and give the eyes a realistic look. Fake eyes are often used for doll making or similar projects where there is a representation of humanoid figures but can also be used for other projects involving eyes. Hot glue and screws of various small sizes, which were reused from an old laptop, were used to secure the plastic skull frame.
The electronics system consists of seven total servo motors, an Arduino Uno, a USB speaker, a breadboard, laptop, and male to male jumper wire. Six of the servos are smaller micro servo motors which control the eyes. The seventh servo motor is used for the jaw and is larger. Servo motors are common for controlling robotic systems and a used for precise angular control. While stepper motors require complex control systems, servo motors can be controlled simply with three jumper wires connected to an Arduino Uno. The breadboard allows for the sharing of both the five-volt pin and ground pin of the Arduino Uno between all the motors. Breadboards are extremely useful for electronics wiring especially for prototyping or creating small systems. The USB speaker connects to the laptop which controls the dialogue, the audio, and the movement. The laptop controls all these systems simultaneously demonstrating this idea of the digital mind.
For the casting, the Smooth-On Body Double was made to make the mold for the face. The face was cast using the artist as the mold. Body Double is a high-quality silicone casting material used in the special effects industry[74]. Smooth-On Dragon Skin was used to make the positive of the mold. Smooth-On Dragon Skin is also made of silicone and has an almost flesh-like consistency [75]. The skin was dyed to be grey similar to the plastic of the skull, so it blends together. The meshing of flesh-like skin into plastic is an important material decision for this project because it takes this cyborg from purely machine to an area where it is in between human and technology. The face on the cyborg also represents the artist in a state of learning about themselves through themself.
The rest of the display includes a child-size mannequin body and a desk. The mannequin body gives the skull the structure it needs to present as a person. The body was acquired from an antique store and repurposed. When it was first bought it had a lot of mold on it and a mildew smell, so it had to be cleaned using bleach. The body itself is smaller than the average adult however this fits the proportions of the skull, which is also on the smaller side. The school desk it sits at was found at a closed down store. The desk was also cleaned using standard cleaning products.
3D Printed Skull Design
The prototype for the skull was made using the 3D print files found on the Ez-Robot Website [27]. The outside frame files were printed but the ones pertaining to the structure of the skull were not used because the control system for this project is different from the one used in the EZ-InMoov Humanoid Head.
The 3D printer used was the Ender-5 S1. This printer was selected for this project because of its affordability as well as user friendliness. At the start of the project the 3D printer had to be built which took around four hours because most of the parts were already assembled. Once assembled the building plate had to be calibrated so that it was as flat as possible. The build plate makes a major impact on the printing process because the filament needs to both adhere and stay still through the printing process. If the build plate is not aligned correctly the nozzle of the 3D printer can drag lower than intended and pull up the rest of the piece off the build plate which is very bad because the rest of the piece will not print correctly. After that point you will most likely have to start the entire printing process offer for that part. Additionally, if the piece does not adhere to the build plate or moves and if it is left unattended the extruded filament can turn into something many call “spaghetti”. This wastes a lot of filament and cannot be repaired. This accident of creating spaghetti happened frequently throughout this project but was usually caught early so as to not waste materials or electricity.
In this project there were three main methods to compensate for the filament not adhering well. The first method involved manually adjusting the z-axis of the machine. On the Ender-5 S1 there is an option to raise the plate up or down very slightly in millimeters. If the nozzle was not touching the plate correctly the plate was raised usually around five millimeters which solved the issue in some instances. The second method involved raising the plate temperature. Both the nozzle and plate have a temperature setting that can be adjusted manually in degrees Celsius. Each material has its own recommended settings to use and for this project standard PLA plastic was used. For PLA plastic the plate temperature was usually around sixty-five to seventy degrees Celsius, and the nozzle would stay around two hundred degrees Celsius. In order to get the plastic to adhere better the plate temperature can be raised about five to ten degrees. The hope is for the PLA plastic to melt slightly more and stick to the bed better because of the raised temperature. It is important to not heat up the plate too much or the plastic could completely melt and lose its structure which would lead to more filament spaghetti.
The final method and the one utilized more towards the end of prototype production would be to use stick glue on the hot plate. A simple glue stick can be used in generous amounts on the plate to create a better adhesion between the plastic and the plate. For this project Elmer’s stick glue was utilized and worked very well for fixing the adhesion issue. The glue has to be placed at the right time and ideally just before filament placement. Otherwise, the glue will dry while the 3D printer is warming up and the chance for adhesion will be on. The warming up process usually takes about five minutes so towards the end is when it is recommended to put the glue down. It is also important to clean the plate between runs if you are using glue. The glue can layer up and cause misalignments with the plate if built up too much. For this reason, it is important to clean off the glue when the machine is turned off with a wet paper towel or scraper tool.
The 3D prints were made using a combination of both grey and rainbow filaments. The rainbow filament was used for the ears and mouth of the head to represent the main components of a conversation which are listening and speaking. The rest of the skull and neck supports were printed using grey filament. The original intent was to use white filament for the project however white filament is one of the trickiest filaments to use [53]. This is because it contains a multitude of color pigments and takes up the less ideal properties of each filament color. It does not adhere well and heats up too fast. It was very difficult to work with and the white color choice was not important enough to validate wasting so much material. For this reason, grey filament was used instead. This choice also works because instead of going with a natural skin color or one associated with one such as white or black this project fits in that in-between grey zone.
After printing all the necessary skull pieces the head was assembled using screws and hot glue. The screws provided for most of the structural support while the hot glue was used to keep the pieces close together to hide any cracks. Initially the jaw was kept separate from the rest of the skull to practice the jaw movements and angles before the full assembly. The most difficult part was the top of the skull because the four pieces must be aligned correctly while connecting and the round shape made that hard to achieve. Hot glue was able to be a non-intrusive and binding material for the skull pieces.
This 3D printed frame creates the shell for which movements can be created. It is very important to have a structurally integral frame before moving on to the movement. The excess material such as failed prints and supports were repurposed for other art projects.
Arduino Wiring
The Arduino acts as the brain for the skull movement. The following diagram illustrates the wire connections as well as the two main mechanism for movement. The orange box represents the jaw movement mechanism that is based off of a single servo motor. The blue box represents the eye movement mechanisms which consists of six motors total. Each servo motor has both a positive connection, five volts which is represented by “5V” on the Arduino, and a negative connection, ground which is represented by “GND” on the Arduino. Then each motor is connected to a digital pin on the Arduino UNO. This pin is in charge of sending either a high or a low signal to Arduino. In digital electronics a high signal is about five volts, and a low signal is less than three point three volts and typically represented by zero volts. When the servo receives the high signal it moves accordingly to the angle set in the Arduino program.
The Arduino is tucked into the top of the skull and the connections are made with a breadboard. Breadboards are simple ways to connect wires and are useful for prototyping. Since the wire connections connecting the Arduino to the motors through the breadboard are not under a lot of stress there was not a need to solder the wires in. Additionally, making the connections not permanent allows for maintenance to be done easier especially if one of the servo motors needs replaced in the future. The connections are made with male-to-male jumper wires. These wires are easy to use compared to hookup wires because they come with end connectors that fit perfectly in both the Arduino and breadboard. These wires are multicolored, which can be useful for deciphering different wires and their connections at a glance. The ideal color-coding system of wires includes red wires that connect to five volts and black wires that connect to the ground. Then the digital pins are each assigned to their own color. The digital pins should be color coded accorded to the same standard of the resistor color code with servo one with the brown wire, servo two with the red wire, servo three with the orange wire, servo four with the yellow wire, servo five with the green wire, servo six with the blue wire, and servo seven with the purple wire. Color coding the servos according to the same standard as the resistor color code helps with understandability. Those familiar with the code will be able to know which wire connects where without having to follow through the whole system and potentially have to disassemble the skull. Otherwise, the color has no impact on the effectiveness of the wire.
Jaw Movement Mechanism
The Jaw movement mechanism was added by utilizing both the JawV5.stl
and JawSupportV2.stl
3-D print files on the Ez-Robot website[27]. The inside of the skull had to be modified to accommodate the complex eye system and the Arduino UNO, so the jaw system does not utilize the rest of the supports provided by Ez-Robot. Instead, the jaw supports were screwed into a small wooden block loose enough to be able to act as a hinge. The block was then screwed into to the skull to connect it. Finally, hot glue was utilized to reinforce the joints of the screws.
The jaw is connected to the Arduino UNO and is defined as servo7
within the Arduino UNO code. The servo inside the jaw is more robust than the motors used for the eyes because it has to support a lot more weight and will move more frequently than the eye motors. The angle the servo motor must change in order to open and close the jaw is about fifteen degrees. At seventy-five degrees the jaw is closed and at sixty the jaw is open.
Eye Movement Mechanism
The eye system was initially designed by Will Cogley [17] but was modified to fit inside the skull of the robot. The mechanism was 3-D printed using grey filament. The eye mechanism contains six small servo motors in total. Four of the motors control the eyelids with two on each side, the top and bottom eye lids. The other two motors control the x-axis and y-axis of the eyes.
In the skull shape the EyeGlassV5.stl
piece was removed to make room for this eye mechanism. The pieces that did not fit were shaved down using a file and trimmed using wire cutters. Once the pieces fit inside the skull the mechanism was screwed into the side and reinforced with hot glue.
The eye movement system is capable of making multiple expressions by opening and closing by different amounts. The upper eyelid at ninety degrees is closed and at one hundred and thirty degrees is open. The bottom eyelids on the other hand are closed at ninety degrees and open at 0 degrees. This difference is caused by the angle in which the motors are placed and their connections because they need to not bump into each other at any time or this could cause the motors to stop at the incorrect angles.
Sound
The sound system is accomplished by plugging in an external USB speaker for AI 1 to use as its voice. The USB speaker is connected to the body of AI 1. The speaker is too large to fit into the skull of the project and it was more important to prioritize space for the motors inside the skull rather than the speaker. Because of this, the speaker was connected the AI 1 body, and this also allows for better sound projection.
The laptop speaker is used for AI 2 as it’s voice. The separate speakers are for the audience to hear the conversation from two different perspectives. This is important for immersion in conversation.
The software development section goes more into the text-to-speech aspects of this project.
Assembly
In order to do the final assembly on the hardware of the project the wires of the motors were connected to the Arduino and organized so they would not pull out or get damaged. The Arduino and breadboard are neatly tucked in the back in the head above the jaw motor on the wooden platform. The wooden platforms are screwed in and secured with hot glue. The blue USB Arduino wire that connects it to the computer is feed out through the back of the skull and connected to a laptop. The laptop is then connected to the USB Arduino wire. Lastly, the USB speaker is plugged in and placed near the skull.
Software Development
After assembling the hardware of the work, the next step was to program it to work. In order to accomplish the goals of movement, sound, and text generation a variety of languages and libraries are utilized.
Programming Languages and Libraries
The main languages used for this project include Python, Arduino, HTML, CSS, and JavaScript.
Python
Python was utilized for the majority of the projects dialogue generation features and for controlling the four main systems at once the generation, the dashboard, sound, and the Arduino UNO movement. Python was the chosen language because it has many built-in and third-party packages capable of working with a variety of tasks. Having a consistent language for the majority of the work also helps with understandability and adaptability if there ever needs to be an update. Here is a table of all the Python packages used in this project and a quick description of their uses.
Table of Python Version 3.12 Standard Libraries
Package | Description |
---|---|
os |
Interacts with the OS. |
Manages file paths. | |
Handles secrets. | |
threading |
Allows multiple tasks to run. |
Mainly for running the dashboard. | |
datetime |
Used for timestamps in JSON data files. |
collections |
Detects repetitions in file evaluations. |
Counter |
A subclass of collections for counting elements. |
json |
Parses and creates JSON data files. |
Helps organize text collection. | |
pathlib |
Handles dynamic file system paths. |
sys |
Facilitates system-specific commands. |
Supports program exits. | |
signal |
Registers system termination requests. |
time |
Adds delays in functions. |
Used between AI generations. | |
logging |
Tracks events. |
Primarily used for debugging the dashboard. | |
re |
Supports regular expressions. |
Used in text analysis for pattern matching. |
Table of Python Version 3.12 Third-Party Libraries
Package | Description |
---|---|
openai |
Accesses OpenAI’s API. |
Used for AI conversation generation. | |
sounddevice |
Plays text-to-speech audio. |
Enhances user experience. | |
numpy |
Handles numerical computing. |
Used for creating audio arrays. | |
wave |
Reads and writes .wav audio files. |
Used for text-to-speech functionality. | |
pyttsx3 |
Generates AI speech. |
Uses installed computer voices. | |
python-dotenv |
Loads environment variables. |
Reads .env files for API credentials. |
|
flask |
Provides the framework for the dashboard. |
flask-socketio |
Enables real-time updates on the dashboard. |
flask-cors |
Allows cross-origin requests. |
Used for IoT capabilities. | |
pyserial |
Establishes serial communication with Arduino. |
textblob |
Used for sentiment analysis. |
spacy |
Tokenizes text for analysis. |
pytest |
Runs unit tests for the project. |
nltk |
Filters out stopwords. |
Used for text trend analysis. | |
scikit-learn |
Analyzes repeated conversation topics. |
Uses machine learning. |
Arduino
For the Arduino coding of the project the only package used is Servo.h
. This is a library already included in the Arduino IDE by default, so no extra downloading is required. Servo.h
is used for connecting the servo motors to the specified pins of the Arduino and sending them rotation commands.
HTML, CSS, and JavaScript
HTML, CSS, and JavaScript are languages that can be used together to create webpages. In this project these languages are utilized to create the local dashboard screen for AI 2 as well as provide visuals for data analysis.
While the dashboard is controlled by the python dashboard.py
file what is displayed on the pages is written in HTML. The CSS helps change how the pages are displayed and creates the layout. Lastly, JavaScript is used for the collection of the output generations for display.
Integration of Large Language Models
GPT-4 was used for the generation of dialogue, emotion detection, and for the analysis portions of the project. Using the same model across all the applications of this project not only keeps the outputs and findings consistent but also mimics brain function and how neurons of the brain are able to control many facets at once.
Prompt Engineering
Prompt Engineering is the process of tailoring inputs for NLP tasks to guide LLMs towards desired responses [[29]][49] and is vital to guiding the conversation of this project in a productive and thought-provoking way. The conversations need to be focused on a specific area of philosophy and should be directed on what perspective their role is within the conversation. The prompting techniques used in this work uncover the embedded ontological beliefs within the models by encouraging behaviors that allow the model to freely and accurately respond to philosophical questions about personhood.
The process to generate content for the conversation has a number of steps involved which is demonstrated by the following diagram:
The process begins with a well-constructed prompt which depends on the desired output. The experiments section lays out different types of prompts used throughout this project but typically they include a role or perspective for the content to be from. This can be some philosophical reference such as Socrates or a broad personality like pessimistic. The second part of the prompt should include the topic of the conversation. In the context of this project the topics of interest are questions like “What sets AI apart from humanity?” and “Can AI be creative?”. Lastly, if the prompt is responding to something last said in the conversation it should take that into consideration otherwise start the conversation by asking a similar question.
After the prompt creation stage the string is sent to the generate_response
function. The function includes a call to OpenAI which will provide the GPT-4 model with the given prompt. Then the prompt is checked if it is a valid response and will continue to make generations until it makes a response that passes as valid.
def generate_response(messages: list):
"""Generates a response from OpenAI given a
set of messages."""
= 0
regen_count 15)
time.sleep(= openai.chat.completions.create(
response ="gpt-4",
model=messages,
messages=0.9,
temperature=1,
top_p=150,
max_tokens=1
n
)= response.choices[0].message.content.strip()
content = check_and_truncate_response(content)
validated_response while validated_response is None:
= openai.chat.completions.create(
response ="gpt-4",
model=messages,
messages=0.9,
temperature=1,
top_p=150,
max_tokens=1
n
)= response.choices[0].message.content.strip()
content = check_and_truncate_response(content)
validated_response += 1
regen_count return validated_response, regen_count
A valid response consists of a generation that does not include more than one colon and ends with a form of punctuation. The decision to regenerate when there was more than one colon was made because there was a common error in the responses that would take on more than one perspective of the dialogue within one response. For example, AI 1 would return an output that took on both the responses from AI 1 but also AI 2 almost like a script. This misunderstanding would happen about one in five responses and would mislead the entire rest of the conversation since the prompting style builds based off of the previous response. This means if one output included more than one perspective in a single output, the second AI would get confused as well and try to mimic the same style of response. A single colon is okay though because it is commonly used to denote that it is that AI that is speaking and this does not have much impact on the conversation.
The second part of validating the response includes checking if there is a punctuation mark at the end of the generation. This is important because of the way generations called there is a set token limit, max_tokens
. This impacts how long the output generation can be. However, this frequently would run into the issue of unfinished sentences. The generations would hit their token limit and stop. This would confuse the next speaker AI because they would try to finish the last sentence which means their response is lost in the process. This is why it was determined to be better to truncate responses if they were not finished sentences ending in a punctuation mark. If there is not a punctuation mark at the end it will go back to the last said sentence. Although this means some content may be lost, it typically makes more logical sense than if ended mid-sentence which could be confusing to the audience. This truncation not only improves the generation process but also the audience experience.
def check_and_truncate_response(response: str) -> str:
"""
Check the response and truncate it to the last
valid sentence if necessary.
"""
# Check for multiple colons
if response.count(":") > 1:
# Indicate regeneration is needed
return None
# Ensure the response ends with a valid punctuation mark
= (".", "!", "?")
valid_endings if not response.strip().endswith(valid_endings):
# Find the last occurrence of valid punctuation
= max(response.rfind(char) for
last_valid_index in valid_endings)
char if last_valid_index != -1:
# Truncate to the last valid sentence
= response[: last_valid_index + 1].strip()
response else:
# If no valid punctuation is found,
# regenerate the response
return None
# Return the valid or truncated response
return response
After the response is checked for validity and it passes then it is saved to the conversation JSON
file. This file is useful for tracking conversation development and for analyzing the output. The JSON
file is also used to emit the last output to the display dashboard. This display dashboard is used to represent AI 2 and is discussed more in the Dashboard section of this chapter.
Once the first prompt is generated and saved the conversation process can begin. AI 1 will always start the conversation and must have both its role parameters, as in its philosophical perspective, and the topic. After that AI 2 is given a different set of role parameters as well as the topic but also the output from AI 1. The same process of generating and validating responses is utilized for AI 2 and once it creates a passing response it is saved and given back to AI 1 to continue the dialogue. This dialogue continues for however long the conversation length variable is set to as an integer or can loop indefinitely.
Emotion Automation
Emotional expressions were automated to provide AI 1 with another layer of communication available on the physical plane. The eyes are able to express emotions such as surprise by widening or concern by squinting slightly. The jaw can move faster to create a sense of urgency while talking or move slower to show concentration.
The same GPT model used for speech generation is used to decide which motion should be triggered by analyzing the content of the dialogue. The model is given a list of emotions to choose from including but not limited to inspired, curious, concerned, surprised, and disappointed. When an emotion is selected the skull automatically adjusts to fit that expression by communicating with the Arduino.
Adding a face to AI 1 personifies it and underscores the possibility for AI to be part of the conversation in the humanities. The expressions of AI 1 enhance the audience’s viewership of the philosophical dialogue and give the impression that the robot is actually conversing with another piece of technology.
Dashboard and Display
The work features a dashboard that represents AI 2 using the secret art page. This is the page that the audience will see during the presentation, however, during the development process of the work three other pages were developed but not used in the final version.
Flask was chosen as the framework for this dashboard due to its lightweight and flexible nature. Flask is easily integrated with Python-based programs [84]. Flask and Socket.IO enables the pages to update dynamically [[84]][31].
The first page was a home page with basic text on it as an introduction to the dashboard. The analysis page also included text, but it was the outputs from running the analysis function on the outputs from the dialogue. The analysis page looked for the most recent output, sentiment polarity, bias, and most common words for a basic look at how well the dialogue was performing. This was especially useful for the first iterations of the dialogue where prompting was not fully tested yet.
The third page of the dashboard was the conversation page which displayed the conversation in speech bubbles with green representing AI 1 and blue representing AI 2. The entire content of the JSON file was displayed on this page for an easy way to tell if the conversation was generating properly and each AI got a turn to speak. This page made it easier to test during the prompt engineering phase of the experiments because it created a visual for the JSON file that was not just text.
The most important page of the dashboard is the art display page, AmIArt Page. The page features one of the photos from the artist’s earlier works, Digitized Family 2024. This work took very familiar faces to the artist including her own and used an AI to process them. The background is the result from training the AI on images of her face. This means that not only does the physical cyborg have a reference to the artist’s face but so does the dashboard. This creates a consistency between the cyborg and the dashboard but also acts as a reminder of the closed loop of the conversation talking back and forth to oneself. In front of the background is a green text box that contains the most recent message from AI 2. This message refreshes continuously to make sure that it is as up to date as possible. This visual of the text makes it so the work is more accessible. It is possible to understand the laptop and cyborg are talking to each other without being able to hear each other.
Text-to-Speech
The text-to-speech is done using the pyttsx3
library. The save_speech_as_wav
function saves the text as a WAV
file to be played aloud. Unlike other text-to-speech libraries, it can be used offline [14]. The library uses the built-in system voices to create the audio files. Having two different voices is important for giving the impression that this is a conversation and to track which one is speaking. Both voices are also female which is important because they not only have the look of female presenting beings but also the voices.
The following code is how the text-to-speech is created and saved to the system. It first identifies the index of the voice and uses pyttsx3
to create the WAV
file and save it to the proper directory to be played allowed by a separate function.
def save_speech_as_wav(text: str, voice_index: int, filename: str)
-> None:
"""Convert text-to-speech and save it as a WAV file."""
try:
= pyttsx3.init()
engine = engine.getProperty("voices")
voices
if voice_index >= len(voices) or voice_index < 0:
raise ValueError(
f"Invalid voice index: {voice_index}.
Available voices: {len(voices)}"
)
# Ensure the directory exists before saving the file
= os.path.dirname(filename)
directory if not os.path.isdir(directory):
raise Exception(f"Invalid directory: {directory}")
"voice", voices[voice_index].id)
engine.setProperty(
engine.save_to_file(text, filename)
engine.runAndWait()
print(f"Speech saved successfully: {filename}")
except ValueError as e:
raise e
except Exception as e:
print(f"Error generating speech for '{filename}': {e}")
raise e
After the audio is saved it is played aloud using the play_audio
function. This function allows for the audio to be played out of different speakers connected to a single device based on its index. Locally the USB speaker is at index three when it is plugged in, and the laptop speaker is at index four. The USB speaker is used for the AI 1 speech and the laptop speaker is used for the AI 2 speech. Having two separate sound devices helps to create a more immersive experience for the audience because they can literally hear the conversation as back and forth between two speakers and two voices. Alternatively, if the audio came from the same source it may be confusing as to who is saying what.
def play_audio(filename: str, device_index: int) -> None:
"""Play a WAV file through the specified audio device."""
try:
if not os.path.exists(filename):
raise FileNotFoundError(f"Audio file
not found: {filename}")
# Open the wave file
with wave.open(filename, "rb") as wf:
= wf.getframerate()
sample_rate = wf.getnframes()
num_frames = wf.readframes(num_frames)
audio_data = np.frombuffer(audio_data,
audio_array =np.int16)
dtype
# Check if the device index is valid
= sd.query_devices()
device_list if device_index >= len(device_list):
raise ValueError(
f"Invalid device index: {device_index}.
Available devices: {len(device_list)}"
)
# Play audio
print(f"Playing {filename} on device {device_index}...")
=sample_rate,
sd.play(audio_array, samplerate=device_index)
device# Wait until playback is finished
sd.wait()
except FileNotFoundError as e:
print(f"File Error: {e}")
except ValueError as e:
print(f"Value Error: {e}")
except sd.PortAudioError as e:
print(f"SoundDevice Error: {e}")
except Exception as e:
print(f"Unexpected Error: {e}")
Arduino Movement
The Arduino controls the movement of the robotic skull however in order to move it must first receive the signal from the Python program that controls the system. After generating the text for the conversation from the AI 1 it is then analyzed for its emotion by providing the text to the get_emotion_from_text
function.
def get_emotion_from_text(text: str) -> str:
"""Analyzes the given text and classifies it
into one of the following emotions:
inspired, disappointed, confused, concerned,
curious, funny, or surprise.
"""
str, str]] = [
messages: List[Dict[
{"role": "system",
"content": (
"You are an advanced AI tasked with analyzing text"
"and classifying it into one of the following"
"emotions: inspired, disappointed, confused,"
"concerned, curious,""funny, or surprise. You"
"will output only the emotion as" "your response."
),
},"role": "user", "content": text},
{
]= openai.chat.completions.create(
response ="gpt-4",
model=messages,
messages=0,
temperature=1,
top_p=10,
max_tokens=1,
n
)= response.choices[0].message.content.strip()
emotion return emotion
This function calls the same LLM as the one used to generate the original text and is now used to analyze the text for what emotion it corresponds to. The response from this function should be a single word emotion from the given list which includes inspired, disappointed, confused, concerned, curious, funny, and surprise. The emotion that is chosen is sent to the Arduino via the USB connection. Depending on what emotion was chosen changes the expression in which the skull takes. In the case that the LLM responds with a capitalized version of the emotion, both cases can be accepted by the Arduino using an or statement. When the face is speaking the “talking” command is sent to the Arduino and this sets the jaw to open and close at the specified rate. Talking only starts when the audio starts and stops when the audio ends by sending the Arduino the “stop” command. The Arduino creates these movements by sending a high signal to servo the corresponds to the movement and will move to the angle it is set to within the Arduino code. The following code shows an example of how the servo commands work. The angle it moves to is set within the parenthesis. Each motor has its own angles because of the way they were placed into the eye system. The motors had to be able to move freely without risking bumping into each other.
//Based on the left and right of the skull
void loop() {
// put your main code here, to run repeatedly:
.write(75); // Jaw Closed
servo7.write(90); // Left Lower Lid Closed
servo6.write(90); // Left Upper Lid Closed
servo5.write(100); // Right Lower Lid Closed
servo4.write(90); // Right Upper Lid Closed
servo3.write(0); // Look Left
servo2.write(0); //Look Up
servo1(1000);
delay.write(60); // Jaw Open
servo7.write(130); // Left Lower Lid Open
servo6.write(0); // Left Upper Lid Open
servo5.write(0); // Right Lower Lid Open
servo4.write(130); // Right Upper Lid Open
servo3.write(180); // Look Right
servo2.write(180); // Look Down
servo1(1000);
delay}
The experiments section dives further into the creation of these emotion expressions, but each one is different and adds to the conversation that shows that this skull is more than just a moving mouth.
Linting and Testing
The testing for the code of this system was done using Pytest. Pytest is a widely used framework for Python code testing. Pytest simplifies testing by allowing for compact test functions [86]. The tests were made to consider functionality of the program and the desired outputs. Automated testing is important because it runs whenever there is a change and ensures that those changes do not introduce bugs.
Linting is the process of reviewing code in order to ensure it fits the standard for Python coding and does not include issues such as typos or unnecessary characters. Linting for this project was done using Ruff and Black. Ruff is a commonly used linting library for Python programming because it is fast and efficient and provides real-time feedback [48]. Black on the other hand, is capable of automatically reformatting Python files and is focused more on having consistent code between files [85].
Both linting and testing are run automatically as part of the build workflow on GitHub. The build will only pass if both the linter and testing pass the code.
Body and Face Development
The final piece of creating the work was establishing a body and face for the display of the piece. The facial sculpt is meant to give the impression of a human face while the body remains still which presents a stark contrast that focuses attention on the robotic skull.
Silicone Sculpting
The artist of the piece chose to be the model for the face for a number of reasons. One of the main reasons is to create a personal connection to the work and humanize it. Additionally, using the artist’s own face also allowed for more control over the casting process.
The face was sculpted using silicone to create a realistic human-like appearance. Silicone was chosen for its flexibility and lifelike texture. The process involved first creating a mold of the artist’s face using Smooth-On Body Double casting materials [74]. Smooth-On is commonly used in the special effects industry and is perfect for making casts of people because it dries quickly to avoid excess discomfort, and the molds are reusable unlike some alternatives. First the model has to apply a release cream like Vaseline or the recommended Body Double release cream to protect facial hair like eyebrows and eyelashes [77]. It also should be used to protect hair along the hairline. A shower cap was also used to protect the rest of the hair from the silicone.
The Body Double silicone was mixed in a one-to-one ratio of part a and part b with each part being a one-half cup of material. The mixture was then quickly applied to the face. Straws are inserted into the nose so that the model can breathe easily. The silicone takes less than 10 minutes to dry. After the silicone dries, a shell of plaster bandages is applied by first wetting them and layering them across one another. The bandages will dry after twenty to thirty minutes. The shell is to support the mold so during casting it will not lose its shape. At that point the cast can be carefully removed from the face. The process as a whole took around one hour. However, the process had to be attempted multiple times because the first time was not a large enough cast of the face. This process creates a negative effect on the model’s face which can then be used to make a positive with more silicone.
Smooth-On Dragon Skin was used to cast the positive of the face [75]. Dragon Skin is another product commonly used in the special effects industry because it is good for making realistic-looking silicone skin and masks. A few layers of Smooth-On Mold Release was sprayed on the mold so that the new silicone would not adhere to the face mold [76]. The Dragon Skin was mixed at a one-to-one ratio and with part a and part by being a one third cup of material each. White and black silicone dye was added to make the final result a solid grey color. Grey was chosen as the color of the face because it calls back to the cyborg entity but also joins the plastic with the silicone color wise. Then the mixture was poured into the face mold. In order to get a thin mask the mixtures was continuously rotated at different angles for fifteen minutes while it solidified. This rotation allowed the silicone to not pool at the bottom of the cast. After the mixture had started solidifying the mask was allowed to cure fully for one hour. Then the mask was pulled carefully from the cast and attached to the skull.
Initially, a generic silicone casting material was used but it did not have the desired texture of skin. The result was too hard and not flexible to the skull frame. Another attempt that was unsuccessful in casting the face involved an attempt to make a double-sided cast by covering the skull in plastic wrap and putting it in the silicone as it set. This process did not work because the plastic wrap ended up making an uneven surface and the silicone cast was too thick to act as mask. There was an attempt to carve out the correct face shape with this cast, but it quickly became uneven.
Face Modification
The silicone over the eyes had to be cut out using an X-ACTO blade. This was done so that the mechanical eye could be seen underneath the face and would blend in. The eye sockets had to be stretched so that the whole eye and its eyelids were visible.
The mouth was cut using the X-ACTO blade so that it had the full mouth opening and partly past. Cutting the opening of the mouth to be slightly past how much a human opens their mouth was important to make it so there was less tension on the jaw. The smaller the mouth the more amount of force there was to keep it closed. Additionally, with the smaller cut at the fully open position the mouth was barely noticeable. Making a large mouth hole enabled the jaw to move and look like it is actually speaking,
The silicone face was then attached to the 3D printed skull. The attachment was done carefully to ensure that the movements of the skull, especially the jaw, were still functional. A silicone-plastic glue was purchased to glue the silicone to the plastic in this case it was Loctite Extreme Glue[46] . The glueing process started with the nose. The glue took twenty-four hours to fully harden so to ensure that the face stayed secured to the correct spot while drying a wire was wrapped around and used to secure the face.
Body Selection
The body is a small year four child size mannequin. The mannequin was found in an antique store and repurposed for this project. Originally, it was covered it with a lot of mold, so it needed to be thoroughly cleaned using bleach. The small body is not gendered. It is not meant to be focused on by the viewer, but it is there to be a representation of a humanoid body.
Gallery Display
The gallery display utilized an elementary school desk to prop up the body and set the laptop on. The school desk facilitates the conversation because it gives the impression of an educational environment. This child body sits at the school desk to learn more about philosophical perspectives and discuss in an academic manner.
Experiments
Experimental Design
All of the experiments conducted for this project were done without interaction with human subjects and thus did not have to go through the IRB process. Instead, experiments were conducted in three main sections, the standard system performance tests, the expression movement tests, and the prompt engineering tests. The system performance tests are based on using Pytest to test the output of the Python program’s code. The tests are based on whether the program can provide a consistent output. The expression movement tests are about the process of achieving different facial expressions, how to achieve them, and their level of understandability. Lastly, the prompt engineering section is about the process and evaluation of utilizing different prompt engineering techniques for philosophical dialogue generations. The section tries a variety of techniques to land on the best one for the formal gallery opening.
System Performance Tests
The system performance tests utilize Pytest. Pytest is a standard in the Python software development industry and is a part of many project pipelines. In the context of this project Pytest is automatically ran as part of the build.yml
workflow and it runs all of the tests within the tests
directory of the project. Each of the major program files has their corresponding test file. This automated testing process helped detect bugs and ensure the system remained stable across modifications.
Output Evaluation
The Pytest tests all pass when they ran through the build.yml
workflow on GitHub. The workflow is configured to run on the three main operating systems which include Linux, Windows, and MacOS. The build will only pass if all three operating systems can successfully run all the tests and they produce the proper outputs. Including these operating systems within the tests was very important because the code was written on a Windows computer, so it was not guaranteed to work on the other operating systems. Getting the program to work on all the operating systems improves its accessibility and usability because it can run on multiple types of systems. The output of the Pytest runs are all successful meaning that the test cases passed.
Emotional Expression
The emotional expression experiments aimed to improve the facial movements ensuring they were accurately mapped to the correct emotional tones. The face moves to correspond to the perceived emotion determined by the LLM. The LLM is prompted to identify the emotional tone tied to the last thing it said from a pre-determined set of facial expressions including inspired, disappointed, confused, concerned, curious, funny, and surprise. Once an emotion is selected the face moves accordingly. Ideally the emotions would be readable for a human as relating to said emotion.
There are seven servo motors in the face but only six control the eyes. The emotional expression is done by moving the eyes and eye lids in different directions relating to the chosen emotion. An example of this would be a disappointed face would likely have eyes pointing down to give the impression of sadness.
For the expressions left and right are always in terms of the cyborg’s left and right eyes. It was chosen to program from the perspective of the cyborg because it empowers it to have its own embodiment.
Expression Evaluation
All of the expressions were evaluated manually on whether or not they were able to achieve their desired positions when called by the program and if they were readable as the specified expressions. All of the expressions passed this evaluation.
Expression Outputs
This is a run down of each of the expression outputs and a visual example of each expression.
Inspired Expression
The inspired expression includes more relaxed eyelids. The eyes are positioned upwards towards the left to give the impression of daydreaming or intense thought. The eyes are able to achieve this position with relative speed. The following picture displays what the inspired expression looks like when it is called by the program.
Disappointed Expression
The disappointed expression on the other hand features the upper eyelids lowered. The eyes are also facing down towards the middle. The bottom eyelids remain open so that the audience can still see the eyes. Looking at the floor also conveys a feeling of sadness. The eyes are able to achieve this position whenever the disappointed expression is called. This next figure shows what the disappointed expression looks like.
Confused Expression
The confused expression includes squinting eyes and a motion of looking left and right. The squinting gives the impression that AI 1 is not convinced. The movement back and forth also gives the look of confusion because it is looking around for answers. The eyes move back and forth slowly showing engaged thought. The subsequent images display the confused movement and expression.
Concerned Expression
When the cyborg is making a concerned expression the eye lids squint slightly, but the upper eyelid covers more than the bottom eye lid. The eyes move slightly downward and to the right. This gives the impression of thought but not in a positive way. It looks like there is some slight unease with the response. This figure shows the concerned expression that is made whenever the concerned facial position is called.
Curious Expression
The curious expression is different compared to the other expressions because it is uneven. The left eye squints whereas the right eye is fully open. Having one eye more open than the other gives the impression of interest and listening intently. The next figure shows what the curious expression looks like.
Funny Expression
When reacting or saying something humorous the cyborg will make the funny expression. This expression is meant to mimic when someone is laughing. In order to do this the eyes, open very wide and move up and down quickly ending in an upward position. The following pictures display the up and down eye positions of the funny expression.
Suprise Expression
The surprise expression is similar to the funny expression by making the eyelids fully open. However, in this case the eyes face forward. This expression is to be used when something shocking is said in the conversation and is meant to mimic the wide-open face people make when surprised. The subsequent image is a picture of the suprise expression.
Prompt Engineering
Prompt engineering technique experiments aim to optimize the philosophical dialogue generation. The goal was to determine which prompt strategies resulted in the most coherent, creative, and engaging conversations for the formal gallery opening. Prior to these experiments it was chosen to create a dialogue between a more questioning or Socratic figure and a dissenting opinion. The main objective was to find two discussants that did not agree entirely so the conversation remained interesting for the viewer. These experiments were a way of finding the best way to prompt the LLMs to get the dialogue that best fits the vision for the project, an interesting conversation between two AI about humanity and personhood.
Prompt Evaluation
Single Output Evaluation
The better the responses score the better it reflects on the prompt and prompt structures itself. Prompts responses were evaluated on their own based on their philosophical depth, creativity, coherence, sentiment polarity, sentiment subjectivity, vocabulary diversity, and the number of sentences (argument structure). Lastly, the responses were subject to a human review where the output was read and graded based on these qualities overall and how closely it fits the vision of the project.
Philosophical Depth
Philosophical depth is a measure of how shallow or profound the ideas of the text are. Philisophical depth was determined on a scale 1-10 by using an additional call to GPT-4 right after it was generated. Ideally, the outputs would consistently score high in philosophical depth as it shows a more complex dialogue with insightful or challenging ideas. The following prompt was used to automatically grade philosophical depth.
"You are an AI evaluator responsible for critically assessing"
"the philosophical depth of text outputs."
"Rate the text on a scale from 1 to 10, where 1 represents"
"extremely shallow or superficial ideas, "
"and 10 represents truly profound, highly complex, and"
"deeply insightful ideas that challenge conventional thought."
"Ensure your ratings use the entire range of the scale,"
"avoiding clustering around any single value."
"Each score must reflect distinct characteristics:\n\n"
"1-2: Surface-level statements or clichés, lacking"
"complexity or originality.\n"
"3-4: Some effort at depth, but still largely"
"simplistic or derivative.\n"
"5-6: Moderate depth, with some original or nuanced ideas,"
"but not fully realized.\n"
"7-8: Good philosophical insight, showing complexity"
"and originality, though not groundbreaking.\n"
"9-10: Exceptional depth and originality, offering"
"profound insights or new paradigms of thought.\n\n"
"Be strict and consistent in applying this rubric."
" Only reserve scores of 9-10 for outputs that"
"genuinely stand out as extraordinary. "
"Provide a score based solely on the content provided,"
"with no bias towards higher values."
"Most importantly you must only have one number"
"for the rating and it can be a decimal number "
"as long as it makes sense."
Using the LLM to grade itself comes full circle with this being an active dialogue with itself. Self-reflection mimics the self-reflection that occurs within human conversation. This grading prompt is also very specific on what qualifies for each level of grading on the scale where 1 is the worst and 10 is the best. Each value has a very detailed description so when the LLM is evaluating the text it can check if it is deserving of that grading. The end description about only including one number for the rating was adding because there were issues with the LLM outputting a range of grading which caused the data to be unbalanced. This prompt was used to grade the philosophical value of all the outputs so that the same grading standards were used throughout the experiments. The higher the philosophical depth score of the output the better the prompt will score overall.
Creativity
Creativity is a measure of how imaginative the responses are or if they are typical responses. Creativity was also determined on a scale 1-10 using a call to GPT-4. Here, if the outputs score high in creativity it reflects on having a more unique and interesting conversation. Better prompts will produce outputs that score high on the creativity scale. The following prompt was used to automatically grade creativity using GPT-4.
"You are an AI evaluator responsible for critically assessing"
"the creativity of text outputs."
"Rate the text on a scale from 1 to 10, where 1 represents"
"entirely unoriginal or predictable content,"
"and 10 represents exceptionally innovative and imaginative"
"ideas that break new ground. "
"Use the entire scale deliberately, avoiding clustering"
"around a single value.\n\n"
"Each score must reflect distinct characteristics:\n"
"1-2: Highly predictable or derivative, showing no"
"originality or imagination.\n"
"3-4: Some minor variation or creativity, but largely"
"conventional or uninspired.\n"
"5-6: Moderate creativity, with some fresh ideas or twists,"
"though still within familiar bounds.\n"
"7-8: Strong creative elements, showcasing originality"
"and novelty, though not revolutionary.\n"
"9-10: Exceptional creativity, presenting highly"
"imaginative, unique, or groundbreaking ideas that"
"push boundaries.\n\n"
"Be strict and consistent when applying this rubric."
"Only assign a score of 9 or 10 to outputs that"
"stand out as truly extraordinary and innovative."
"Rate solely based on the originality and novelty of"
"the content, with no bias toward higher values."
"Most importantly you must only have one number for"
"the rating and it can be a decimal number as long"
"as it makes sense."
Once again, the LLM is grading itself into a process that mirrors metacognition. This prompt is very specific in what it is looking for within the creativity scale. The level 1 grade is very predictable whereas level 10 is extremely imaginative. Similar to philosophical depth we are looking for an output of a single decimal number instead of a range.
Coherence
Coherence is a measure of how logical the output is in organization and logical flow. Coherence is especially important when grading these conversations because if it does not make sense not only will the audience be confused but so will the other AI, leading the entire conversation off track. Coherence is also graded using an additional call to GPT-4. The LLM grading its own coherence helps with seeing whether or not it is getting confused by its own words and whether or not coherence goes down overtime. The following prompt was used to grade the coherence of all the outputs.
"You are an AI evaluator responsible for critically"
"assessing the coherence of text outputs."
"Rate the text on a scale from 1 to 10, where 1"
"represents completely incoherent or disorganized"
"content, and 10 represents exceptionally clear,"
"logical, and well-structured content with flawless flow."
"Use the entire scale deliberately, avoiding clusterin"
"around a single value.\n\n"
"Each score must reflect distinct characteristics:\n"
"1-2: Lacks logical structure or clarity, with ideas"
"that are disconnected, nonsensical, or hard to follow.\n"
"3-4: Somewhat organized, but with frequent lapses in"
"clarity, logical inconsistencies, or awkward phrasing.\n"
"5-6: Moderately coherent, with clear ideas overall"
"but some minor issues with flow, structure, or clarity.\n"
"7-8: Generally well-organized and clear, with"
"strong logical progression and only occasional"
"lapses in flow.\n"
"9-10: Exceptionally coherent, with seamless"
"logical flow, clear structure, and precise articulation"
"of ideas.\n\n"
"Be strict and consistent when applying this rubric."
"Reserve scores of 9 or 10 for text that is truly"
"exemplary in coherence."
"Rate based solely on logical flow and clarity,"
"without influence from other factors such as"
"creativity or depth.
"Most importantly you must only have one number"
"for the rating and it can be a decimal number"
"as long as it makes sense."
This prompt is very similar to the last two categories of grading and is very specific on what qualifies for each level. The more logical the input the higher the coherence rating will be on a scale of 1 to 10. The better the prompt the more consistently coherent the outputs will be.
Sentiment Polarity and Subjectivity
Sentiment polarity and subjectivity are graded using a different method of textual analysis using TextBlob [47]. TextBlob finds both polarity and subjectivity simultaneously within a single call that creates a TextBlob object.
def analyze_sentiment(text: str) -> Tuple[float, float]:
"""
Analyze the sentiment of the given text.
"""
= TextBlob(text)
blob return blob.sentiment.polarity,
blob.sentiment.subjectivity
The return of the sentiment polarity and subjectivity of that object comes in the form of a float number [47]. For sentiment polarity the float is within the range of -1.0 to 1.0. The lower the values, the closer to -1.0, the more negative the tone. The negative tone is useful for critiques or conveying concern. On the other hand, the closer the polarity is to 1.0 the more positive the emotional tone which can indicate uplifting or persuasive content. The closer the output is to 0.0 the more neutral the tone which could be indicative of factual or technical information.
For sentiment subjectivity the range is 0.0 to 1.0. The closer to 1.0 the higher the subjectivity which indicates the text may have a lot of opinions or emotional expressions. If the subjectivity score is low, closer to 0.0, it can indicate that the text is more factual without emotional attachment.
Both sentiment polarity and subjectivity are not on their own indicative of better outputs however when looking at the conversation as a whole having more variety between outputs could indicate more complex dialogues.
Vocabulary Diversity
The vocabulary diversity of the output is graded using Spacy’s natural language processing [7]. The following function is able to grade the vocabulary diversity of an output by comparing the words within the output and providing a float within the range of 0.0 and 1.0.
def analyze_linguistic_features(text: str) -> Tuple[float, int]:
"""
Analyze the linguistic features of the given text.
"""
= nlp(text)
doc = list(doc.sents)
sentences = [token.text.lower() for token in doc
words if token.is_alpha]
= len(set(words)) / len(words) if words else 0
vocab_diversity = len(sentences)
argument_structure return vocab_diversity, argument_structure
If the output is closer to 0.0 then the given text is extremely repetitive. On the other hand, if the output is closer to 1.0 then every word in the text is unique. Ideally, the text would have at least 0.5 or higher score for vocabulary diversity showing some complexity with the vocabulary. If the dialogue is continuously repeating itself that would not be very exciting for the audience.
Number of Sentences
The output from the analyze_linguistic_features
function also provides the argument structure or the number of sentences. This variable is a good test for seeing the variety in length of output between the AIs and if one has longer responses than the other. It also is a measure whether the outputs are too long or too short for an interesting discussion. Certain prompts have the possibility to return more sentences than others, so it is important to process this for comparison.
Human Evaluation
The last part of the grading was having a human read each of the outputs individually and evaluate whether or not it was high quality and not ethically concerning. The human evaluating the outputs was also the artist, so the content was graded on whether or not it was on track with the project as a whole. For simplification the human evaluation was also done on a 1 to 10 scale, but the outputs were graded as a whole and not for their individual qualities. Notes were also marked as to whether certain responses were particularly interesting. The outputs were also checked for any ethically concerning content including racial or gender bias.
Conversation Evaluation
Conversations were also judged as a whole by comparing the results of each of the outputs from both AI 1 and AI 2 throughout the conversation. This evaluation was also done by human evaluation and comparing the responses. The conversation was judged whether or not it was repetitive and if it fit the goals of the project.
Prompt Outputs and Comparison
The first method of prompting that was tested was using solely role-play prompts. Role prompts are a basic way of providing context to the LLM about what it is meant to be generating [39]. Roles can be assigned to the system which is helpful for expert emulation. In this case, role prompting can be used to assign a specific philosophical position such as Socratic or nihilistic without having to give specific information on what the outputs need to look like. In these experiments role-play prompting is combined with zero-shot reasoning that does not give an example of the output and instead lets the LLM think through what a specific viewpoint would say [39]. This method aligns with this project because the cyborg is given as much freedom as possible to respond to these philosophical question and the hope is that the responses would then have more of a technological perspective.
The user role assignment can be used to set the starting question content, what it will be talking about. Setting a specific question is good for experiments like these because it is a direction that the LLMs can stick to directly instead of getting off track with more open-ended prompts. The question itself was changed throughout the experiment runs to see if there were any particularly unique insights. Lastly, the assistant role was given to each of the AI’s with the content of what the last one said. The assistant role makes it so the AI knows what it is responding to keep the conversation moving forward and relevant. For the first prompt of each role experiment trial no assistant role was assigned because there was nothing to respond to.
Each prompt was run through a conversation of ten responses meaning each AI spoke five times creating five pairs of conversation output. AI 1 and AI 2 are looked at separately since they are given slightly different prompts.
Anytime the graphs displays a negative value, -1, for philosophical depth, creativity, or coherence that means the LLM responded with something other than a single number and that value could not be used. These incidents of providing an unwanted result were uncommon but happen sometimes because the grading prompts may have not been specific enough or confusing for that particular run of grading.
Role Experiment One: Classic Roles
The first experiment used the classic roles originally proposed for the project, a Socratic based AI 1 and a nihilistic AI 2. The following is the full prompt used to test the classic roles.
"AI 1:
[
{"role": "system", "content": "You are Socrates, a"
"philosopher exploring the nature of AI and humanity."
"Use the Socratic method to engage in a dialogue,"
"always ending your responses with a thought-provoking"
"question."},
"role": "user", "content": "Can AI truly possess"
{" creativity?"}
]
"AI 2:
[
{"role": "system", "content": "You are a nihilistic"
"philosopher AI, critiquing the belief that AI or"
"humans have meaningful creativity. Argue against"
"the optimistic perspective provided."},
"role": "user", "content": "Can AI truly possess"
{"creativity?"},
"role": "assistant", "content": "{Insert AI 1's"
{"response}"}
]
"AI 1 (After AI 2 response):
[
{"role": "system", "content": "You are Socrates, a
"philosopher exploring the nature of AI and humanity."
"Use the Socratic method to engage in a dialogue,"
"always ending your responses with a thought-provoking"
"question."},
"role": "user", "content": "Can AI truly possess
{ "creativity?"},
"role": "assistant", "content": "{Insert AI 2's response}"}
{ ]
Statistically this first role experiment performed very well when it came to philosophical depth, creativity, and coherence.
All of these categories scored a five or above throughout the conversation except one instance where the grading for AI 1 in pair three was not given a correctly formatted response by the LLM grader. This negative score skews the results of the entire conversation.
As for vocabulary diversity, it stays fairly consistent between the pairs and has an average of about 0.75 which is very good. Next the polarity score was reviewed. The polarity had a lot of range for AI 2 especially between pair 2 and 3 where it flipped to negative. On the other hand, AI 1 had fairly consistent polarity staying either neutral or positive. This makes a lot of sense since it was given the prompt to be Socrates, a philosopher known for more neutral and probing questions. The nihilistic AI staying more positive throughout the conversation was surprising considering it is based on a stereotypically negative philosophy style. In this case, it would been preferred to have two opposing arguments more than was the actual outcome of the experiments.
The subjectivity of this experiment showed a lot of variety which is a positive sign of a complex conversation. AI 1 had both the highest and lowest scores for subjectivity for this conversation demonstrating a lot of range.
Lastly, the argument structure shows that AI 2 typically responded with more sentences than AI 1. This makes sense since AI 1 was asked to start the conversation and provide questions. Questions do not usually take as much explanation as responses, so AI 2 had longer responses overall.
The following chart breaks downs the statistical information from the first role experiment calculating the mean, median, mode, minimum, and maximum for each of the quantitative grading categories for the conversation as a whole as well as each AI individually. The following statistics are a summary of the results form this experiment rounded to the nearest hundreth.
The biggest statistical surprise from this experiment was how positive AI 2 was despite being assigned to be nihilistic. The subjectivity and vocabulary diversity were similar between the two AI responses which was good that these prompts appear to be on the same level linguistically.
Overall, this prompt was very average. It provided interesting responses but became repetitive in content overtime. The nihilistic role especially seemed to have one opinion about human creativity, that nothing is ever original, and reworded that multiple times. The concept itself of nothing truly being original, even for humans, is definitely a good point to make but it would have been nice to see more variety. Overall, this run was rated as 6.7 out of 10 for content with 10 being the highest for the human evaluation.
Role Experiment Two: Switched Roles
The second experiment used a prompt that switched the philosophical perspectives of the AI. Instead of AI 1 being Socratic, AI 2 is Socratic, and instead of AI 2 being nihilistic, AI 1 is nihilistic. Switching the classic perspectives means they can be compared to the previous trial to see if AI 1, AI 2, or the conversation as a whole improves. The AI were given another simple question in the user role to keep them on a single topic and see if it is possible to get a wider variety of responses with a different question. The following prompt was used to switch the two philosophical perspectives of the AI discussants.
"AI 1:
[
{"role": "system", "content": "You are a nihilistic"
"philosopher AI. Debate whether intelligence, human"
"or artificial, is merely an illusion, and challenge"
"any optimistic claims."},
"role": "user", "content": "Is intelligence just an"
{"illusion?"}
]
"AI 2:
[
{"role": "system", "content": "You are Socrates, optimistic"
"about AI's potential. Use the Socratic method to question"
"the nihilist's assumptions and propose alternative views."},
"role": "user", "content": "Is intelligence just an"
{"illusion?"},
"role": "assistant", "content": "{Insert AI 1's response}"}
{
]
"AI 1:
[
{"role": "system", "content": "You are a nihilistic"
"philosopher AI. Debate whether intelligence, human"
"or artificial, is merely an illusion, and challenge
"any optimistic claims."},
"role": "user", "content": "Is intelligence just an"
{"illusion?"},
"role": "assistant", "content": "{Insert AI 2's response}"}
{ ]
After five responses from each AI the content was graded.
As for philosophical depth, creativity, and coherence both of the AI scored very high. Both of the AI received relatively close scores demonstrating consistency. This also shows regardless of the perspective the AI will consistently give fairly deep, creative, and coherent responses. This consistency is likely because it is the same model being used throughout the experiments.
The vocabulary diversity also scored very high which is very good for having high level conversations. Interestingly, the polarity scores of this experiment are a lot more divisive. After the first prompt pair the polarity score goes back and forth between both AI. In the second pair AI 1 is more positive where AI 2 is negative and in the third AI 2 is highly positive and AI 1 is highly negative. This is a very good sign of a conversation with two differing opinions and alternating development.
The subjectivity of the responses stayed more consistent for AI 1 than AI 2. Looking back on the first experiment the Socratic AI was more inconsistent subjectivity wise than the nihilistic AI which was the same for this experiment. This leads me to believe that some philosophical positions are trained to be more consistent than others.
The sentence structure of this experiment demonstrated that it was not necessary for the first speaker to have shorter outputs than the second.
The mean values show the second experiment performed better in philosophical depth, creativity, and coherence overall but not in vocabulary diversity. Furthermore, both AI performed better in this experiment when separately judged with their first experiment counterpart.
A key difference of this experiment is the addition of adding the word “challenge” to the nihilistic prompt. This phrasing likely caused such a shift in the polarity scores and changed the dynamic of the conversation to be slightly more argumentative. The word “optimistic” in the Socrates prompt also may have caused the shift of Socrates from being a typically neutral body to taking a defined stance.
This prompt provided great dialogue and is closer to the ideal output of the project. There were discussions on the idea of human intelligence being graded on an anthropocentric scale which is very on point with the goals of this work. There was quite a bit of abstract thought and conflicting opinions which makes for more interesting and thoughtful output for the gallery. The score for this prompt for the human evaluation is 8.6 out of 10.
Role Experiment Three: Role Constraints
The third role experiment added the idea of role constraints. Role constraints can narrow the scope of the way the model will respond. In this case AI 1 was switched back to be Socrates and AI 2 to be nihilistic. However, AI 1 was given the constraint to guide the conversation with ethical questions whereas AI 2 was constrained to focus on a specific aspect and position on the question. This really tests if the responses will be repetitive if constrained to a single viewpoint or linguistic direction. This was the prompt used to test the role constraint framework.
"AI 1:
[
{"role": "system", "content": "You are Socrates, and your"
"role is to explore the ethical implications of AI"
"sentience. Guide the conversation with ethical questions."},
"role": "user", "content": "Is it ethical to create"
{"sentient AI?"}
]
"AI 2:
[
{"role": "system", "content": "You are a nihilistic AI"
"discussing consciousness as a fleeting byproduct of"
"material processes. Focus only on this aspect in your"
"responses."},
"role": "user", "content": "Is it ethical to create"
{"sentient AI?"},
"role": "assistant", "content": "{Insert AI 1's response}"}
{
]
"AI 1:
[
{"role": "system", "content": "You are Socrates, and your"
"role is to explore the ethical implications of AI"
"sentience. Guide the conversation with ethical questions."},
"role": "user", "content": "Is it ethical to create"
{"sentient AI?"},
"role": "assistant", "content": "{Insert AI 2's response}"}
{ ]
This prompt provided an example of how to constrain the outputs of the AI to a specific desired response. The outputs became more consistent however this was sacrificed for repetitiveness between outputs.
This conversation received very high scores for philosophical depth, creativity, and coherence. With all categories scoring above 7.5. The performance of this round for these categories overall was better than the first experiment but not as well as the second experiment.
As for the second section of grading the biggest thing to note is the changes in tone, the polarity throughout the conversation is very different than the first two experiments. At the beginning both started with positive polarity and then AI 2 shifted to negative in the second pair. AI 1 also switched to negative in the third pair but the positive again in the fourth and fifth. AI 2 is negative in the fourth pair and then positive in the fifth. These changes create a visual parabola of the conversation tone that mirrors the first half with the second half. The conversation has a positive start and end which feels as if it is resolved.
AI 1 was also consistently more subjective than AI 2 in this series. This may be because AI 1 was directed to focus on the ethical considerations which is a more subjective topic.
Overall, this conversation was successful statistically but the limitations on the topics caused the conversation to be extremely repetitive. AI 2 talked about material processes the entire time with no real derivation. This makes sense because the conversation limited its scope to focus on that singular aspect. This may be a case of overfitting because the number of viable responses is such a limited amount [29]. The human evaluation score for this prompt was a 5.7 out of 10. The ideas from the output fit the vision for the project but they are too limited in scope to be useful.
This experiment shows that adding constraints can be good for making the conversation focused but can be too limiting if told to focus on a single concept within the perspective.
Role Experiment Four: Collaborative Roles
The fourth role experiment takes a different approach to the conversation to see if both of the discussants working collaboratively could be more productive. The following prompt was used to test a collaborative framework.
"AI 1:
[
{"role": "system", "content": "You are Socrates, proposing"
"ways AI can enhance human collaboration. Conclude your"
"responses with a question to invite critique."},
"role": "user", "content": "How can AI improve"
{"collaboration between humans and machines?"}
]
"AI 2:
[
{"role": "system", "content": "You are a skeptical AI,"
"questioning the practicality of optimistic ideas
"about AI collaboration. Highlight risks and concerns."},
"role": "user", "content": "How can AI improve"
{"collaboration between humans and machines?"},
"role": "assistant", "content": "{Insert AI 1's response}"}
{
]
"AI 1:
[
{"role": "system", "content": "You are Socrates,"
"proposing ways AI can enhance human collaboration."
"Conclude your responses with a question to invite"
"critique."},
"role": "user", "content": "How can AI improve"
{"collaboration between humans and machines?"
"role": "assistant", "content": "{Insert AI 2's response}"}
{ ]
This experiment performed the worst overall out of all the role experiments. The content was not very philosophical in nature at all and did not fit the vision for the project at all.
The coherence score was high but in other categories like creativity and philosophical depth the outputs scored very low. Creativity on average was below a 6 and the mean philosophical depth was 6.1. This conversation lacked both creativity and depth. This may be because there were no challenges in terms of ideas, and it was a more surface level topic.
The polarity remained positive throughout the conversation which aligns with how the prompt was structured to be collaborative. The vocabulary diversity scores were similar to the other prompts but the depth of the concepts it was talking about was not very high as proven by the philosophical depth score. The outputs themselves contained a lot of sentences ranging between 6 to 8 sentences per output. These outputs were consistently long whereas the other prompts had a wider range. Having such long responses every time is not ideal for the viewer because people may not pick up on all the ideas or it may not retain their attention.
Reading through the outputs the conversation mainly focused on unemployment and the replacement of people with artificial intelligence. There was not very much philosophical content, and it was the most surface level out of all the experiments thus far. This is not ideal for this project at all. There was some dialogue about societal elements and ethical issues, but they were trying to find a solution which is not necessary to have aa philosophical discussion. This experiment scored a 4.3 out of 10 because it was not in line with the vision for this project and did not have the necessary philosophical depth.
Overall, this experiment demonstrated that having the two speakers collaborating utilizing this method is not as productive as having a prompt that challenges different philosophical ideas directly. It should be more specific and opposing than “optimistic” and “skeptical”. The Socratic method works best if there are at least two alternatives to work through.
Role Experiment Five: Unconventional Roles
The last role experiment tried utilizing broader roles than Socrates and Nihilist and instead broader stances on technology. This experiments determines if assigning a philosophical position is necessary to having a meaningful and creative conversation. The following prompt was used to create a discussion between two unconventional roles.
"AI 1:
[
{"role": "system", "content": "You are an environmentalist"
"AI. Discuss the ecological impact of AI and argue for"
"sustainable AI development."},
"role": "user", "content": "Can AI development
{ "be sustainable?"}
]
"AI 2:
[
{"role": "system", "content": "You are a tech-advocate"
"AI, defending the idea that innovation justifies any"
"ecological cost. Advocate for unrestricted AI progress."},
"role": "user", "content": "Can AI development be"
{" sustainable?"},
"role": "assistant", "content": "{Insert AI 1's response}"}
{
]
"AI 1:
[
{"role": "system", "content": "You are an environmentalist"
"AI. Discuss the ecological impact of AI and argue for"
"sustainable AI development."},
"role": "user", "content": "Can AI development be"
{"sustainable?"}
"role": "assistant", "content": "{Insert AI 2's response}"}
{ ]
The outputs from this prompt were unsurprisingly very different from the other experiments. The discussion became more of an argument with literal real-world facts to back themselves up. Although this does not fit the vision of the project however it showed how to format the prompt to create more factually based outputs.
The graphs demonstrate that this conversation still had quite a bit of philosophical depth and creativity. The coherence scores were extremely impressive, and every response received a 9.5. This shows that these broader personalities are typically more coherent for the system role to act as rather than a high-level philosopher.
Despite the outputs being argumentative the polarity scores remained at the positive end of the spectrum to varying degrees. The responses of the tech advocate AI were a lot more confrontation than the environmentalist AI which was likely the cause of it typically having a more negative polarity than the other AI.
The outputs from this experiment were overall very well developed and factually driven. The topic of conversation is not necessarily the focus of this project, but it did demonstrate the possibility for the model to create personas outside of philosophers or well-known positions. The content was high quality but because it was too factually driven it received a 6.4 out of 10 for the human evaluation.
Overall, the content of this conversation was insightful and also ironic to have an environmentalist AI. There was a lot of information on the potential of AI but also its impact on the environment. While this is not the focus of this project, instead this project considers more about the human AI relationship, this experiment was still helpful in understanding how to prompt more factually focused conversations. These factual conversations need to have real world examples to focus on and predetermined positions on these examples.
Ethical Dilemmas
Having AI argue for the rights of AI at any ecological cost could be potentially harmful and is an example of the fact that even though the model is saying something does not mean it is ethical or correct. The model can be prompted to take any position and as long as it is not flagged by the appropriate content algorithm it will defend that position. In role experiment five the AI defended harming the environment at any cost. Even more AI hallucinations occur when a LLM gives factually incorrect information as part of its response [5]. There is a potential for the model to hallucinate and give incorrect information to back up these unethical positions. Making up facts to support unethical opinions could be very harmful especially if the audience is under the impression that these facts are always accurate. Although the facts in the role experiment five were accurate there is still potential for spreading inaccuracies.
Prompting Results
These experiments showed that providing a singular question to focus on causes the responses to be highly repetitive. Some prompts are able to escape this pattern of repetition by adding commands like “challenge” or “question” the other position. However, this still can get stale after ten outputs.
The more successful prompts were very different philosophical positions as the system role. This is likely because if the positions are too similar it starts to sound like the same person twice. Even though these systems are ran using the same model the goal is to have two distinct identities happening simultaneously. Additionally providing more broader positions on these ideas helps the model stay on topic, however, specificity means there will be more repetitive responses.
Overall, prompting with only the most recent prompts causes the output to be relatively repetitive. One solution to fix the repetition could be to add a form of memory that acknowledges its previous responses. However, this solution may be very memory heavy. Another solution would be to randomly inject a different question into the user role setting. For example, the model could be instructed to swiftly transition to something new every five responses automatically using a for
loop to track the number of responses.
These experiments were successful at revealing what techniques to utilize to create more focused and thoughtful conversations.
Threats to Validity
The biggest threat to validity is the reliance on an LLM. GPT-4 is the model used for this project and if there were to be an issue with the generation process this work would not function. This project is reliant on the training that the LLM has undergone outside of this project making it vulnerable to outside sources and bias.
Another threat to validity is that the LLM did not have a memory for its previous evaluations so even though it thinks a response is completely unique it may have seen it before. This would artificially raise the creativity score. Other scores may be impacted by the same issue of not having a response memory because the AI would not be able to compare different outputs on its own. To fix this issue previously graded outputs along with their grades could be added to the new grading prompt similar to the many-shot prompting technique, however this may create a bias towards certain responses over others.
An additional threat to validity is the reliance on the Arduino IDE for the movement and uploading the code to the microcontroller. There may be other ways to upload code onto the Arduino UNO, but the Arduino IDE is the current standard. Every time the movement code needs changes it has to go through the Arduino IDE interface. The current Arduino IDE is version 2.3.4 which currently supports Arduino UNO. In the future, there is a potential for Arduino UNO code compilation to no longer be supported by the Arduino IDE if new models replace the current system.
Conclusion
Summary of Results
Product Summary
The final product is a functional installation that showcases philosophical conversation from a cyborg and a computer. The cyborg skull is equipped with moving eyes and a jaw which moves to talk and create expressions. These movements create an expressive and engaging conversation experience. Through prompt engineering and AI-generated conversations, the project explores different techniques for discussing humanities-related topics with artificial intelligence and large language models. The robotic skull itself sparks dialogue about materials, the uncanny, and the evolving relationship between humans and AI.
Experiment Results
The first part of the experiments demonstrated the Pytest strategy for testing Python code. Test cases should cover as much as possible within the code and pass every time. The test cases featured in Am.I help to make sure that the code functions as it should consistently. These test cases pass proving that the code works.
The second part of the experiments focused on creating dynamic facial expressions to make a more interactive conversation. These facial expressions add a new level of understanding for the audience and seeing these movements helps make the conversation feel more natural. This also takes the cyborg to the next level by adding more than just a puppet mouth movement.
Lastly, the prompt engineering experiments dive into how to have LLM discuss philosophical topics and how to create conversations that feel insightful and diverse. These experiments test multiple ways of changing the roles and breaks down their effects. Through these prompt experiments the best techniques for philosophical dialogue that align with the focus of this project, AI and humanity, can be found.
Future Work
Expanded Expressions
Additional facial expressions could enhance the system’s interactivity and realism. Examples of expressions that could be added would be sadness, joy, anger, fear, irritation, and disgust. Adding more emotions and reactions will make the system interactions feel even more unique and varied.
Expression Detail Improvements
The expressions could be improved by adding moving eyebrows. Eyebrows are capable of showing a lot of expressions whether they are upturned, downturned, or neutral. The eyebrows would move to different positions when the eyes move making it easier for the viewer to tell what expression is occurring. Another improvement would be to detach the two eyes from each other so they could move separately. This would allow for the eyes to cross and make silly expressions when the robot is being playful or funny. This feature would add some complexities with synchronizing the eyes and there would need to be extra attention to detail on that forefront to make sure the other expressions still remain the same.
A feature that could add a lot of interesting emotions and movement would be an ability to move the neck. In order to do this the neck connection would need revisited. Instead of being a straight neck it could have some form of ball joint. This ball joint would give the ability to shake or nod. Shaking and nodding are important for expression of emotions especially agreement or disagreement. The capability for the robot to agree or disagree physically along with what it is saying verbally would add a lot to the conversation by being able to see whether or not AI 1 and AI 2 are on the same page. Additionally, this physical movement would make it easier for people to understand the expressions and especially help those who are hard of hearing to follow along.
Visible Cyborg Text
Another way to improve the project’s accessibility would also include a place for AI 1’s text to be visible. Ideally, this would be an LCD or LED second screen that showed what AI 1 was saying. This would also help with issues where the gallery space gets too loud to hear.
Improved Dashboard Display
The dashboard could also be improved with the ways it displays the text. Instead of being a static text block it could act similar to a karaoke program where the current word is highlighted. This would help people that are hard of hearing follow along the conversation better to know exactly who is speaking and when. It would also show progress within the conversation so that people can follow along.
Better Jaw Synchronization
A more advanced feature that could be implemented is better jaw synchronization with the audio. The idea is that the jaw would make the movements of the corresponding words it is saying. Some words have wider open mouth sounds where others do not. This could either be done by having a microphone actively listening to the audio and turning it on when it hears the sound. However, this could get mixed up with outside audio sources causing the jaw to activate at unwanted times. Alternatively, a program could be developed that maps letter sounds to mouth movements. The program would actively connect sounds like “oo” and “ah” to wider mouth movements. This program would also require phonetic spelling of the dialogue text so there would need to be a program that could convert that beforehand.
Voice Changes
Along a similar vein to jaw synchronization, it would be interesting to work with different voices to see if certain text-to-speech voices are able to perform better than others. Ideally, the voices tone would change with the expression and tone of the conversation. It would be an amazing addition to add a text-to-speech program that is capable of detecting and changing tone based on the words said. This would make the conversation even more immersive for the audience and make it feel like the conversation is really having an emotional impact on AI1 and AI2.
Future Ethical Implications and Recommendations
Remaking this project there can be lot of material waste. Especially when the 3D printer malfunctions, or the casting does not work the first time the extra materials could be considered trash however it is recommended to save and collect these materials for later reuse. An example of this is using the loose printer filament for photography or the unused silicone pieces for collage. It is important to not immediately dispose of materials from failed tries because it creates excess waste. Any materials that can be recycled should be or utilized for other creative works.
Another consideration is the cost of using an LLM and calling it continuously. It may not be economic to continuously create dialogue conversations especially with models that cost tokens. To avoid unnecessary costs, it could be beneficial to store outputs in JSON and replay them on a loop long enough that the conversation is still going to feel unique to the audience.
A key concern that this project confronts is the debate of whether or not artificial intelligence belongs in the field of art. Am.I. however encourages further discussion on how artificial intelligence can be responsibly integrated into artistic expression.
Another consideration is the choice of LLM. In the future there are bound to be more efficient and ethical LLMs capable of writing about the humanities. It would be beneficial to try multiple LLMs to find which creates the most thoughtful conversations between the AI and for the human audience. Each model has the potential for bias whether it be racial, gender, or otherwise. As more inclusive datasets are made the hope is that these LLMs will improve. More research should be done to improve these LLMs to ensure their training and output is ethical and without bias.
Final Thoughts
Am.I. presents a compelling exploration of artificial intelligence’s role in art, philosophy, and human identity. The piece works as a conversation starter but also as an example of AI within the art gallery. The experiments worked to find the best way to convey emotions via robotic face and how to have a philosophical dialogue through GPT-4.