Are Programmers Hired To Help AI Learn To Write Code Being Traitorous To Software … – Forbes

This post was originally published on this site.

In today’s column, I discuss an increasingly angry question being heatedly debated about whether software developers who are aiding the data training of generative AI and large language models or LLMs are acting in a traitorous manner when doing so to improve AI-based code generation capabilities. The idea behind this is relatively simple. You could say that advancing AI toward generating programs and code will ultimately put all software developers out of a job.

That’s where the traitor part comes into the equation.

Let’s talk about it.

This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).

Hiring Of Developers For ‘Suspect’ Purposes

If you look at the numerous open jobs for many AI makers, you’ll find a job classification that at first glance seems entirely innocuous. The job title is something along the lines of being a code labeler or perhaps some listings might claim the work consists of being a code tutor. The reality is that the people getting the job are going to focus on aiding the training of AI to generate code.

This nearly flies under the radar and few catch onto the gist that these are jobs that are intended to gradually and inevitably do away with the need for human programmers. The better that AI gets at generating code, the less need there will be for human software engineers. If AI can eventually do everything a human programmer could do, wham, no need to hire human coders.

Boom, drop the mic.

Are the developers that take these jobs doing a good thing or a bad thing?

In the hallways of those who work full-time doing software development, nasty arguments arise over this situation. On the one hand, the programmers doing that particular job are merely leaning into their honed coding skills and doing a day’s work for a day’s pay. Good for them. They are leveraging their learned capabilities and earning a living.

The counterpoint is that they are leveraging their programming abilities to ruin the marketplace for software developers, namely, human ones. Those developers could instead be using their skills to build software that provides almost any other application in any wide-ranging realm, such as innovative software for medicine, finance, robotics, etc. But, no, they have decided to work on something that will undercut the future of software developers everywhere.

Selfish. Wrong. Upsetting. These are people who could use their skills for the good of humankind and seem to concentrate on something that will gut the future of existing programmers and someday now-in-training software developers.

As the standard definition says, a traitor is someone who betrays a friend, a country, a principle, or otherwise is considered a dirty rat and a sellout.

Would you label those developers as traitors?

Return Of The Jedi For Those Developers

Whoa, those developers exhort, you are way over the top and missing the point.

First, if you want to point fingers, go ahead and chew out the AI makers for devising AI that can generate code. They are the primary culprits. One way or another, they are going to proceed. Taking a job as a so-called AI code tutor or labeler is a tiny fraction of the matter. Whether software developers take those jobs or not, the writing is already on the wall. AI is coming for the jobs of all software developers.

Live with that reality.

Second, it could be claimed that AI code generation won’t really replace all programmers. The notion is that the AI will assist software developers in developing and testing code. This could actively spur more programming jobs for humans. Each software developer could be many more times productive than they are now. This will produce a hiring boon.

Third, and again as a boost to hiring, the crux is that AI code generation will make it possible for those with less than stellar programming skills to generate code. In that sense, you are going to remove the fiefdom of high and mighty software developers and democratize the making of software. People from all walks of life will be able to do coding.

Fourth, imagine the range of software that we don’t have today due to the constrained pipeline of finding top-notch programmers. If everybody was coupled with a good AI code generator, the next thing you know, we’d have applications galore. There would be applications of the likes we’ve never seen. The cost of developing applications would sharply drop. The availability of applications would rise tremendously.

All in all, by doing the job of an AI code tutor or labeler, those doing so are making the world a better place and ought to be heralded for their sense of duty and heroism.

What The Job Entails

While you mull over the pros and cons of the weighty matter, perhaps it might be useful to garner a sense of what this kind of job entails.

Suppose that a program contains a line of code that says this:

If n = 0 then r = 2 else r = 3.

That’s it, we will use just one line of code to get to a bigger picture of things.

Imagine that we are using a specialized version of generative AI to analyze the code. There isn’t much to do in this instance. About all the AI can accomplish is to determine that when the value of the variable known as “n” is a value of zero, the result is that the variable known as “r” is set to the value 2, otherwise when “n” is not zero the value of r is set to the value 3.

Notice that there isn’t any semblance of why this entire operation of inspecting the variable “n” is taking place. The code per se only showcases a mechanical kind of indication. There isn’t any overarching context to it. Just do this or that, and then call it a day.

The problem with this is that there isn’t anything useful to be learned from that snippet of code. You can’t readily say that this is a great piece of code and that similar code in the future ought to look like it. We really have no foundation for discerning if this code is a gem or a nothing burger.

Okay, so a programmer hired as a coding tutor or labeler is brought to the setting. They are presented with the line of code. Aha, the programmer realizes, I recognize this line of code. The situation involves the variable “n” which represents when we want to turn on a valve to the level of a 2 or a 3. The variable “r” is indicative of the valve setting.

If the condition of “n” is zero, which is the default condition, the value is supposed to be set to a level of 2, which means partially opened. If something has changed the value of “n” to any non-zero value, the valve ought to be set at 3, which is a full stream.

All told the context is that this piece of code interrogates the value of the desired setting (“n”) and then sets the valve accordingly (“r”).

Telling Generative AI The Scoop

The programmer has divined what the line of code is intending to accomplish. The next step would be to inform generative AI about the gleaned insights.

There are two mainstay ways of letting AI know what’s happening:

(1) Label the code. The code labeler adds comments to the code, trying to annotate what is going on so that the AI can more deeply analyze the code.
(2) Interactively explain the code. The code tutor explains to generative AI what the code is doing, and from that interactive explanation, the AI is able to more deeply analyze the code.

In some situations, code labeling is the preferred route, while in other settings the interactive code explanation is undertaken. It all depends on the complexity of the code, the capabilities of the code tutor or labeler, and how the generative AI has been set up to try and figure out coding and programming machinations.

All day long, that’s basically what the job consists of. The tasks would extend to doing likewise about various infrastructure that supports the code, including assorted programming utilities, tools, operating system facets, API or application programming interfaces, and so on.

Please know that the job can be very demanding and requires a great deal of coding expertise. You normally don’t hire a newbie coder to do this kind of work. The reason you don’t is that they are less likely to know the coding tricks of the trade and are bound to miss out on what is actually going on in the code. I’m not saying you don’t hire newbies, only that if you do, they typically work under the tutelage of a more senior coding tutor or labeler. This aids in avoiding errors and the like.

The job can be more challenging than you might assume.

You might be given a bundle of code that you’ve never seen before. There might not be any documentation already existing for the code. From nothing other than the code, you must make all kinds of clever guesses about what the code is doing. In a sense, you are reverse engineering the code.

In addition, the more advanced AI-based code generation is best aided by the code tutor or labeler going beyond just the presented code itself. For example, imagine that the valve value for “r” is later on in the code given the value of 4. The code tutor or labeler might have identified that the permissible values for the valve are only 2 or 3. Thus, the place in the code where the valve is set to 4 is a problem and the code contains a bug or error.

By telling the AI about the bug, the AI might eventually in-the-large identify patterns of how to detect bugs in code. Notice that this is well beyond merely examining a particular line of code. The AI is somewhat getting data trained on how to interpret code and spot portions of code that might be buggy.

RLHF Technique For Coding Best Practices

You might be aware that one of the reasons that modern-day generative AI doesn’t spew out profanity and generally seems to carry on civil discourse is that a technique known as RLHF or reinforcement learning with human feedback is customarily used these days. That is how ChatGPT took the world by storm. The AI maker OpenAI had opted to use RLHF extensively on their budding generative AI before releasing it publicly (see my detailed explanation at the link here).

The RLHF technique is relatively straightforward.

It goes like this. The AI presents to a human something that the human is supposed to rate with either a thumbs up or a thumbs down. In a conventional setting, suppose the sentence “Please pay attention, thank you” is presented and the human rater must indicate whether this is a proper sentence or not. They would presumably give a thumbs up. If the sentence had said “Hey, idiot, open your darned eyes” the thumbs down might have been used, simply due to the notion that this seems rather insulting.

The same approach can be used when reviewing code. Snippets of code can be presented to a code tutor or code labeler and ask them to rate the code. A thumbs up or thumbs down is probably not going to be fully expressive, so they might be required to enter various details and get into the nitty-gritty.

If you have a whole bunch of code tutors or code labelers, and you keep presenting them with code, and they do their responses suitably, the AI can incrementally pick up on patterns of what is good code versus lousy code, and from this be able to generate code that is possibly as good as what humans can produce.

Heaven forbid, the AI could possibly produce better code than humans.

Why?

Humans are humans. They make mistakes. They skip over things. They get tired. They can only write code at a particular pace. They devise their code based on what is in their noggins. A given programmer might not have as wide a base of coding experience as someone else. Etc.

Envision generative AI as not getting tired, being less likely to make mistakes, writing code as fast as the computing resources allow, and leveraging the identified patterns of how humans, including the best of the best, write code. This could be a lot less expensive in the long run, and potentially boost consistency, spur the pace of coding, and might bolster quality.

A retort is that AI doesn’t have instinct, it doesn’t have the creativity that humans do, and otherwise will be a mindless spewer of code. Heaven help us. For my analysis of such notions, see the link here.

Don’t Take To Truck Driving Instead Of Coding

You might remember a famous scene in the movie Top Gun where the high-caliber fighter pilot and navigator joke about (spoiler alert) they might decide to go to truck driving school as a backup to not being allowed to fly.

This brings up two provocative or perhaps smarmy questions that I overhear among seasoned software engineers:

Should software developers acknowledge the writing on the wall that AI is going to meet and likely surpass human programming skills?
If so, should they start looking around for a truck driving school?

Well, first of all, truck driving is also facing the writing on the wall. The advent of autonomous vehicles such as self-driving cars and self-driving trucks is predicted to wean down the need for human drivers. No sense in switching from the frying pan to the fire.

Second, we are still some distance from AI getting so good at programming that we would want to across-the-board have exclusively autonomously generated software. It can be done here or there. We are still in need of human software developers. Whether they are going to be coding from scratch is less likely and will instead be working side-by-side, shall we say hand-in-hand with generative AI.

Doing so at this time can be exasperating for human software engineers. The AI is still clunky at times. The AI needs to get better at coding. Those code tutors and code labelers are doing their part. This could democratize coding, vastly expand the applications we might all enjoy, lower the costs of making use of applications, and change the world accordingly.

Will this eliminate all those software developer jobs or aid them? Will this spur the need for more human programmers? Will AI only be proficient as a bottom-feeder in coding?

Alas, haven’t we heard time and again that programmers would soon be outdated and out of work? This has been a clamor that existed when so-called 4GL or fourth-generation languages came along, as did even earlier when coding languages such as RPG were all the splash. Maybe generative AI is yet another in a long line of assertions that the sky is falling.

Time will tell.

Traitors or devisers of the future, it’s a conundrum and worthy of discussion, but do so civilly, please, and with suitable decorum, thanks.