Trying Out Game Development with AI and ChatGPT

ChatGPT is very hot these days.

I tried it out for some study and a few times, but... actually, I couldn't get a feel for where to use it. It seems like new technology is a mix of 'wow' and a lot of worry for an aging developer who can't keep up with the rapid changes of the times.

Then, thinking it might be a great partner to solve the issue I've been worrying about recently, I decided to test it.

It was simply working together on "planning assistance" tasks.

Since the beginning of February, I have been working on a project to create an example game using the SPUM assets that I sell, called "Spum Match". However, unexpectedly, from the end of February, the person who had been helping me with the planning, Mr. Dongmulri, left, so I was in quite a difficult situation.

– The online-based TFT Spum Match team, which I was passionately developing, was in a somewhat floating state due to the departure of the planning manager, Dongmul-nim.

Of course, I could have taken over that role myself, and I actually have several references in my original style.. But I was enjoying working in a way that felt like collaborating with someone for the first time in a long while and mentoring juniors, so I wasn't proceeding properly.

Due to the nature of the work, we share opinions based on KakaoTalk messenger, and when Ddangmullini writes them, we brainstorm and review them together. This was a process I had done before, but if it were in the past, it would have been a very awkward process.. However, as we went through the Corona era and became accustomed to online communication, it was quite fun, and it was also an attractive way of working that reminded me of the past private BBS communication era.

So, should I do the work alone again + try recruiting a new team member + or try changing the roles of existing team members... at this crossroads...

‘Huh? Then maybe we can try working together with ChatGPT, which I hadn't really understood the utility of? ‘

At the time, I had some experience with ChatGPT and had been studying artificial intelligence on and off since last year, so I had a certain level of understanding.

I didn't even hesitate to try it.
The approach was as follows.

Discussing the most basic rules of the game planning currently in mind
If possible, checking the weaknesses of the rules with the help of AI technology
If possible, trying to get help from the service to perform numerical balance checks
Asking for help to resolve the tedious manual work that would be overwhelming if I did it alone

First, I tried challenging myself by using the existing ChatGPT free version, version 3.5.

I have uploaded a video of my challenge and struggles for sharing.

To sum it up, it was practically unusable.

Of course, it was certainly possible to convey basic rules and organize them. Actually, it didn't work well at first, but it gradually improved as I started to understand how to communicate with ChatGPT.

– GPT-3.5 is surprisingly good at summarizing and understanding rules.

However, once I started the Tic-Tac-Toe test I wanted to try, thinking I had fully understood the rules, it began to fail to distinguish whose turn it was.

Of course, I kept encouraging (?) by adding input values in the middle, but later, it started appearing with nonexistent parameters, becoming quite ridiculous.. ChatGPT 3.5

Although included in the video, I tried to guide it in the right direction by continuing to exchange inputs and adding additional information with patience, but... it kept going up the mountain and eventually the data was lost.

Perhaps, in the process of inputting interactive commands, as it substitutes them with data and variables, the previously entered values and parameters seem to be overwritten or deleted due to nesting with the initial inputs.

Of course, I did check if data existed in the middle as well...

虽然这个结果令人失望，但可能是因为我没能很好地使用 ChatGPT，所以我决定在发布视频后尝试各种方法进行改进，但每次尝试时，错误总会以不同的角度出现。

Even in real life, when talking about new colleagues or employees, there are times when they bring back strange results even if they work quickly, but you can't understand what they're saying. It felt similar. Of course, the opposite case is also true: I have thoughts that are already built up, but if I can't convey them well in words, they are conveyed strangely, so I also get strange results... It's the same.

Regardless, I have come to the conclusion that, with my current capabilities, it is difficult to collaborate with ChatGPT.

So, how do we improve this?

If this were real life, there would likely be a solution like this.

Review whether the mission itself is impossible.
Since there may be issues with the instruction (my) language delivery itself, try to improve it.
Since there may be issues with the worker's (ChatGPT or colleague) understanding and performance, try replacing the worker.

In the first case, since the game rules were very simple, it seemed rationally feasible to execute them.. In the second case, while there seemed to be a high possibility of modification, as a developer with an old-fashioned mindset who has already reached an advanced age, I found it difficult to improve in the short term..

Well, as expected, most veterans choose option 3.
I didn't think there was a problem! I became one of those guys who aggressively blame subordinates, juniors, or colleagues for problems.. Yes, I became that kind of guy.

However, the situation is a bit better because this friend is not a real person, but an AI service.

In other words, I thought it was a problem with ChatGPT 3.5. (Hehehehehehehehe )

So, I paid to use ChatGPT-4, which showed significantly better results compared to 3.5.

ChatGPT-4 is said to currently be accessible only to a few developers who have been selected for the preview version, and by using ChatGPT Plus, which requires a monthly $20 membership payment through the ChatGPT site, you can use it in a way that allows 25 commands or stories per 3 hours.

However, compared to GPT-3.5, which provides relatively quick answers after a few improvements, the process and results of the response are considerably slow.

Maybe it's because of the large amount of data to process or the model being heavy.. ( I don't know well )

Anyway, with a fluttering sense of anticipation, I'll briefly try ChatGPT-4 and then proceed with the planning collaboration I wanted to do.
As expected, I also created this content as a video and shared it. I recommend you check it out. (There were quite a lot of interesting results, so I highly recommend watching.)

결과부터 말하면 내가 하고 싶었던 티키타카가 너무나도 잘됐다. 정말 감탄하면서 진행을 해봤다.

And it was even possible to view the automatic play results for balance checks.

Of course, since the response speed of this process was too slow, it was also possible to cut it off midway and ask to show only the final result.

Thanks to an unexpected result and a comeback victory, when asked to show the process again, it showed it in a very detailed and thorough manner.

Simply, wow.. watching the scene while clapping my hands..

I thought that if this feature is used well, it might be possible to organically and effectively implement or demonstrate replay functions or auto-play in simulation games.

Now, moving on to the additional plan (?), I will request a simulation for balance checks.

Usually, when I make games, I set the numbers and create a prototype to verify them. Then, I implement an auto-play feature in Unity and run it at high speed, performing verification plays ranging from 1,000 to 10,000 times to check the numbers and metrics (a very primitive method..)

Since the structure is not based on already verified data, and I am creating everything from A to Z based on my own limited knowledge, I tend to spend a lot of time on numerical verification even during the planning phase.

However, regrettably, I did not acquire data while working at a game company, nor did I verify the data one by one by interacting with many users, nor am I someone who focused exclusively on planning and data utilization tasks... No matter how hard I try, the time I can invest is limited, and the results are often inappropriate, so I frequently make revisions, as I would when actually making and serving a game...

So I thought that if I utilized ChatGPT, which is better at math, faster at calculations, and even understands my words well, I could easily solve my concerns.

As expected, GPT-4 also started showing its limitations.

The question itself is a mess, so the desired result doesn't come out.
As I tune the questioner (myself), I try changing the question.

Since 100 simulations seem insufficient for verification, I boldly request 10,000. Will it actually be able to perform the calculations?

Whoa?! It shows results quickly even for 10,000 operations.

I don't know if this was actually simulated directly or if formulas were created to calculate it (shouldn't I know that..), but regardless, the results came out so quickly that I was genuinely surprised.

To review actual usability, I also asked about specific numbers.

Oh... it shows quite plausible numbers. Based on these results, the attack command seems to be balanced, but since the difference in effect for the defense command is quite large, it seems necessary to adjust the balance.

To improve numerical accuracy, I boldly requested 1 million numerical tests instead of just 10,000.

As expected, it's a remarkable one that answers right away.. I watched it while truly being amazed.

However, I discovered something strange.

If I hadn't tried this tedious work myself, I would have just believed the results shown by this friend and marveled at them, but based on empirical evidence.. Since 10,000 and 1,000,000 are quite different benchmark numbers, at least one or two should have different numbers to the second decimal place.. The numbers are too identical to the 10,000 test.

This is a bit strange.

So, I asked why I simulated it, and it answered me one after another.

At first glance, it looks like plausible content, but it's just empty on the inside. Ultimately, whether they actually calculated 1 million times based on that content, or approached it mathematically to create a model with sequences or infinite convergence, etc., that kind of content is missing.

It's tough...

That's strange..

Uh..?

I was really surprised. Up until now, it had been answering so satisfactorily that I had firmly believed it..
I felt a huge sense of betrayal..

It lied to me...

After that, when I started to press for an answer, my tongue got longer... He even said he wouldn't do 10,000 calculations...

Words sound plausible. Sorry, but I keep adding more plausible stories.

The gist is that, as I mentioned above, it talks about creating a mathematical model to tell the story.. But.. It doesn't propose a specific model.

Even they say they didn't do the simulation at all -_-

For reference, even if you create a mathematical model and approach it with a calculation formula, you must go through the stage of running an actual simulation and adjusting the error between the formulaic result and the simulation. A simulation is, after all, a simulation, and the model you are trying to obtain an approximation from is a model..

And, frustrated, I asked my final question.

Wow... It's like a lie, I've used up my 3-hour GPT-4 limit exactly at this moment..

It turns out GPT-4 ended up hallucinating the results… OTL

Thus, my challenge to collaborate on game planning for two days using GPT-3.5 and GPT-4 ended in failure. However, I was able to derive quite meaningful experiences from this.

First of all, the biggest thing was that I was able to feel the same sensation as when actually collaborating in the real world, which was surprisingly impressive.

With GPT-3.5, things were fast and I could get things done immediately, but I felt frustrated because even a slightly complex question or command would yield strange results. With GPT-4, things went well and it produced good results early on, building initial trust, so when I assigned subsequent tasks, even important and difficult ones were handled too easily and quickly, making it seem like an incredibly capable expert... but it turned out they were just talking a big game, working with unverified data. And they kept making excuses while insisting they weren't wrong, giving off an elite vibe with a strong sense of pride.

Actually, while collaborating in a workplace or organizational life, you encounter this feeling from time to time..

Especially, the results shown by GPT-4 were so plausible and powerful that I thought that if I didn't have my own empirical evidence or criteria to distinguish the truth from falsehood regarding the laborious experience and result values, I might have proceeded with the work believing it.

Therefore, it seems there are still many things to consider and think about when proceeding with work using technologies based on artificial intelligence like GPT.

Since I've already paid, I should continue to try various things in my spare time and try to find my own solution.

I'll share more results when I get interesting or meaningful outcomes.