originally published: 2023-05-12 23:13:41
Hessie Jones Elon Musk, who was one of the co-founders of OpenAI, who built DALL-E and Chat GPT, said this. He said, “AI is so scary good because it means the end of homework and teaching children as we know it.” So this begs the question, have we finally arrived? Has technology finally given us the ease, the convenience that we’ve been striving for? So we’ve seen ChatGPT. We’ve seen DALL-E 2, Lensa AIs coming up. Generative AI is all the rage these days. Or is it just hype? I’m Hessie Jones, and welcome to Tech Uncensored. So, we know that DALL-E 2 can create realistic images and art from tech’s descriptions. It’s been able to develop artwork that’s sold through NFTs that you could actually sell through sites like Etsy as well.
Hessie JonesSo ChatGPT has been known to actually generate emails or replies from any given prompt. It can deliver web results. It could do text summaries for your video. It could actually write your tweets. But the technology has also been used to generate some darker capabilities, like malware phishing emails, stealing information. And eerily It makes it difficult for even discerning humans to differentiate whether there is a human versus a computer-generated output. So, some educators have said that while the tool is able to provide quick and easy answers, it doesn’t actually build any kind of critical thinking or problem-solving skills, which they say are necessary for academic and our lifelong success. Let’s look at this from the VC perspective. Most companies that were using generated AI last year were able to raise significant amounts of dollars. So, I’m not sure if you’ve heard of Jasper, which was making AI-powered marketing materials. They raised 125 million. Runway, which was the company behind Stable Diffusion, which powers lenza AI, received $50 million in series C. So, reports indicate that Open AI right now could be valued at close to $30 billion. I’m sure Microsoft is really happy about that, considering they invested about a billion dollars back in 2019.
Hessie JonesSo these are exciting times. However, we have to address concerns about plagiarism, about forgery, about theft, about the devaluation of work that’s created by humans. So we’re here to see whether or not Generative AI is ready for mainstream. More importantly, can we trust its eventuality? So I’m going to welcome back Patricia, as well as Amir who will tackle these questions about whether or not the world is ready and vice versa, and what will it mean to the future of society.
Hessie JonesSo, welcome back to you both. Okay, so let’s start with this. I’m going to ask Amir the first question because people probably want to understand what’s behind the technology. So can we define it and how does it differ from, let’s say, traditional machine learning?
Amir FeizpourYes, definitely. So, we are talking about models that are trained on generating text or images in the training stage. And then when you’re using them in inference, like when you’re using them for actual applications, they can do many different tasks. So that’s what GPT stands for. Generatively Pre-trained Transformer is a model that can be trained on predicting text. But if the model is large enough, all of a sudden it has this emergent phenomenon that it can do a bunch of other text-related tasks. For example, if I want to categorize tweeted tweets from Twitter or other types of documents, I have to explicitly train a traditional model to understand how to do that task. But with these generative models, I can just train the model unpredicting text and set it up to do those tasks with almost no new examples. So we could show it new examples to do a slightly better job. But even without any new examples, it can perform almost as well. The significance of generative models and the applications that they’re providing becomes more interesting when we get into the multimode scenarios, where with something like DALL-E and what the stable diffusion did, where you can translate from text to images.
Amir FeizpourSo it’s almost getting to the point where machines can have numerical representations of thoughts and ideas. They can type an apple sitting on a table in a sunset view and the model can understand all of these different entities and how they are related to each other and then translate it into an image that represents the same idea. So that’s a very significant progress that we have made. Our models can understand entities and their relationships in a few different modalities like text, voice, image, et cetera. The point of warning, though, given the conversation that we’re going to have further, is that we have to always remember the capabilities and limitations of these models. Right? So, this is trained on a lot of data using models that are at best very good correlation miners. And if we forget that fact and try to use them as anything that is beyond a very good memorizer, a very good synthesizer, that’s going to be problematic. We’ve seen a lot of people trying to get a lot of different types of reasonings out of ChatGPT and they say, oh, it fails. Well, of course it does.
Amir Feizpour So I’m excited to see where it goes, but definitely a significant progress, at least at the front, of making sure that machines understand natural language and other types of data and represent them as objects that are transferable to other domains.
Hessie JonesAwesome. Thank you so much, Amir. So both of you understand machine learning better than most, better than me, for sure, and model development. You’ve seen how we’ve evolved AI in the last couple of years. What is your opinion on the efficacy of either DALL-E or ChatGPT or whatever generative AI you have used? Patricia, do you want me to start with you?
Patricia Thaine Sure. In terms of efficacy, I mean, it’s very task dependent, right? It depends on what you want to do with it. If you, as Amir was saying, place unrealistic expectations of what it can do and put it in a task that it wasn’t trained for, it’s not going to do well. And you see that often with regards to training a model for a specific data set and then giving it another data set and saying, “Wait a second, it doesn’t work in the legal space when it was trained in the insurance space.”
Patricia Thaine It’s about understanding what the model’s capabilities are and the model’s capabilities within the parameters that were set initially for its use cases. With regards to Chat GPT, for example, I play around with it when it comes to answering factual questions. If you ask for a source, most of the time it seems to do fairly well and gives you a source. I’m sure it fails at some points. If you ask for it to give you something that no one’s thought of before, it might be something you’ve never thought of before, but it’s based on what humans have done, what humans have written. So, it’s not going to be necessarily creating the original ideas.
Patricia ThaineIt can help in the creative process. It can help with giving you the context that you need for learning, for example, a wonderful learning tool. And it can help when you’re writing a blog post, for example, giving you specific paragraphs about a topic. But it’s not going to curate that information for you when you have that curative idea. It is your job as a human to go in there and say, this is relevant, this is not relevant. This is how this part links to this one for the context that I’m writing, et cetera. So it can’t do everything for us. It is an influencer.
Hessie Jones Okay, that’s good to know. Amir.
Amir Feizpour Yeah, so definitely agree with what Patricia said. As I said earlier, the superpower of these models now is their natural language understanding. In case of GPT type of models, and even DALL-E, the interface is natural language. So what they’re really getting good at is understanding the content and relationships inside a task that you’re giving them and instruction that you’re providing. Obviously, if your instruction is too convoluted or way out of the data set, the training data set distribution, as Patricia said, of course it’s going to have a bit of a challenge. So in our case, we’ve been looking at GPT-type models for a while because of their multitask and natural language understanding capabilities. And for us, it is replacing a lot of interfaces that the user had with our product that were cumbersome. Now that these models have these capabilities, those can be replaced by just user typing what they’re looking for and then a conversation design that guides the user through constructing something that they’re trying to do even if it is a very complex task. But just to talk about the problem that we sort of pointed out a little more formally, what we’re talking about is generalization.
Amir Feizpour These models have seen a bunch of data. They’re trying to generalize to things that they have not seen, mostly based on correlation. But a lot of time, people who post on social media about failure modes of Chat GPT are given a reasoning task that to some extent it is able to do based on common sense reasoning if that type of correlation existed in the data sets. But it cannot do causal reasoning. It cannot do counterfactual reasoning, which is very important for generalization. That’s the whole foundation of science, right?
Amir Feizpour We do a lot of counterfactual reasoning. That’s why we can generalize. And these models cannot do that yet. Therefore, they have that failure mode. So a lot of the time when you ask them things and you expect factual results that are not just information retrieval from the Internet, like, they actually have to reason they fail. That’s a very important failure mode.
Hessie JonesOkay, so if I were to translate that, it’s almost like if you were asked and I was listening to a friend of mine who actually asked Chat GPT, where did you get your reference source? And it couldn’t do that. So what you’re saying is that it can’t answer why right now, right? That’s your I guess in layman’s terms, the causal reasoning that we’re talking about.
Amir FeizpourIt can answer why if something that is explicitly or very close to explicitly has happened in the training data set. Like, imagine an article that explains an idea, provides a bunch of resources, et cetera, et cetera. If that existed in the data set and you ask it, it will regurgitate that or a rephrasing of that. But if you do something that requires information from that article, along with a few other articles combined and structured in a sort of a logical way to get to a conclusion, then it cannot do that.
Hessie JonesOkay. Got it. So, Patricia, you work in the privacy sector and you understand a lot of the negative impacts when it comes to data surveillance, data sharing, data breaches. Your company is actually trying to help minimize a lot of the impacts from these events. So what’s your take on the data sets that are being used to feed models like Chat GPT, like DALL-E, and what does it mean for your business and the client set that you serve?
Patricia Thaine Yeah, great question Hessie. So when it comes to the original data set that ChatGPT was trained on, its webscrapes. In large part, there’s a big question about, in a lot of cases, whether you can tell whether a piece of information was produced by a European citizen, for instance. And if it was, and it contains their personal information, then it has to be liable. You have to comply with the GDPR if you’re using that information. Now, if you ask ChatGPT what they do with personally identifiable information, which means either directly identifying names, credit card numbers, Social Security numbers, exact addresses, or things that indirectly identify you. For example, religion, approximate location, and so on. Supposedly for person on file information, they remove that from the training data and train the model without it. But the big question is, when it comes to IP of the data that’s being used for training, that’s one thing that we could get into. But also that model is going to be fine-tuned with data that companies or individuals are sending them. And if you ask Chat GPT a bit more about what they do with Pii, it’s really about every business, every person sending that information.
Patricia ThaineIt’s up to them to take care of the person and viable information within their data. So you could be, if you just are using chat GPT for your business for customer service, for example, breaching PCI compliance, breaching HIPAA compliance, breaching GDPR compliance. The safest thing to do when you are using it is to remove the personal information in the first place, replace it with fake personal information. And that’s where Private AI can come in and help.
Hessie Jones Okay, thank you. I think, when I think about this as well, is that you’re feeding in your data, which is probably, I guess, devoid of any of the PII information, but it’s also being added to an existing data set that may not necessarily have done the same thing. So you’re still at risk in a way.
Patricia Thaine Yes. Also, if you’re using these models, which may contain precedent fiber information from other companies, the question is, will these models start generating it back to you? And suddenly you’re liable for that PII (Personally Identifiable Information). That’s a big question mark in the legal space. I don’t know if we have the answer to that yet. But it seems like OpenAI is taking some precautions regarding person and fiber information, hopefully with regards to fine-tuning the model. But there’s still the requirement of being able to see where your customers’ personally identifiable information are stored, what is happening to it, and being able to access information request and rights to be forgotten. And when you’re dealing with third party, that’s like open AI. Who knows what will happen if you ask for your customers’ data back?
Hessie Jones Okay, perfect. Thank you. So, Amir, there’s also a large unknown about the data sets themselves and whether, let’s say, despite the massive amounts of data points that they’re collecting, there’s still bias tendencies in what they output. How do you respond to that?
Amir Feizpour So bias is not necessarily a bad thing, right? The thing that has made Chat GPT so good and the back end of It instruction, instruct GPT, is essentially biasing the model to produce results that are more acceptable by a human agent. So essentially, that’s like literally what they did compared to one of the earlier versions that was just spitting out whatever was on the Internet. So they essentially sat down a bunch of people annotators and said, ask it questions, rank the answers that are coming out, and make sure harmful things are nodding it and blah, blah. There were a set of guidelines that they were following, like spitting out people’s pin number, mark it as bad, et cetera.
Amir Feizpour So essentially, we have biased the output of the model towards things that are acceptable by human standards and in our society’s standards. So, the bias exists in the data, and these models are just mining it and spitting it out. So the thing that I’m most worried about is how much do we understand the origins of the bias that exist in our text and what control parameters do we have around handling them?
Amir FeizpourFor example, even if even though OpenAI has spent a lot of time and money, there was some news about them hiring African annotators at large to just create these annotation data sets, for example. They’ve done all of these things, but then ultimately, and they have produced something that is much better than what it was. But I have friends who are very familiar to how these systems work. Importantly, they work based on what we call prompting. And prompting is essentially a text that you write that prompts the system to produce an answer. And if you know the underlying data sets and the inner working of these models, you can construct your prompts to get around all of these safeguards because ultimately all of these are probabilistic. But probabilistically, it is more likely for them to produce a non-harmful result. But if you’re malicious and you know what you’re doing, you cannot write prompts that probably increase the likelihood of getting the information, like PII and et cetera. So there are a lot of interesting questions. Like the bias in data is given that exists. So there are a lot of questions around, okay, can we understand the mechanics of that and can we create better safeguards around it?
Hessie Jones Thank you. So let’s talk about the bad stuff, because I read this thing. Patricia. I’ll direct this to you. Researchers actually asked Chat GPT how it could be abused and then this is what it said. It responded by “AI technology can create convincing phishing emails, social media posts, and trick people into giving away personal information, or click on malicious links to create video and audio that can be used for misinformation.” So it’s actually already saying, hey, use me and abuse me. This is what I’m here for right?
Hessie JonesSo when we have technologies like these, how do we start to control the bad stuff from emanating from them? Especially when we know that legislation doesn’t move as fast as technology?
Patricia ThaineThat’s a really interesting question. And as far as I know, what OpenAI does is one of the reasons, aside from getting more training data for their models. But one of the reasons why they collect the data is to also see how it’s being misused and then help the model get better at dealing with that misuse. When it comes to technology companies and the responsibility of dealing with misuse of their technology and how that relates to regulation. That’s something that comes up over and over again when we’re dealing with technologies that are being used in ways that we don’t expect. Right?
Patricia ThaineIf we think about Facebook or Twitter and the use of misinformation to modify people’s minds when it comes to political leanings, that is something that regulators put a very heavy emphasis on tech companies to deal with. But tech companies themselves, even if they’re trying their best, they also don’t necessarily know how to deal with it. Regulators don’t necessarily know how to deal with it. So it’s very much a conversation between both to try to see what regulations need to be created, which is such a heavy, huge question. And there will always be people who are grumpy about the regulations because they’ll say that it prevents innovation, for example.
Patricia ThaineFor example, when it comes to privacy, when it in fact, can do the opposite. And I think that because we don’t necessarily understand all of the ways that we can prevent this, it’s really great to see how much research is going into it. It’s important to keep putting money into that kind of research in academia and in industry. And ultimately, we always have to adapt. The good thing is that people are willing to try things.
Hessie JonesThank you. Okay, so let’s talk about the future and mainly the implications for humanity. So there was this quote I read from an educator that had concern about the threat to human creativity, and they said, “ChatGPT will be brutal in classrooms where writing is assigned rather than taught.” So we now have the ability to develop output without the need for things that we’ve used in the past, like references nobody uses. I don’t know how many people go to libraries anymore. Do you? Without maybe even the use for search or even professional subject matter experts who are really good at these things. So the other thing I read is ChatGPT will not replace the root motive for writing our human capacity or our human capacity for questioning. Because excellent writing starts with questions and it’s our hope that pervasive AI moves us away from teacher-created prompts towards student inquiries. So now we know humans are very different from machines. We’re more analog, right? We’re messy, we’re imaginative. We don’t necessarily have standards because we’re always developing our personalities and our behaviors every time. So does accepting chat GPT results mean now that we prioritize outcome over process?
Hessie Jones I’ll throw that to you, Amir.
Amir Feizpour I do have a bit of a love-hate relationship with questions around education and learning and knowledge because I have a lot of problems with how it is done today. I’ve been spending the past few years thinking about tools for thinking. Like essentially how we can enhance our ability to think more effectively, like in academic context, as you say, in writing context, in education, a lot of other contexts. But that’s definitely a very major question that is still evolving and we are still trying to find an answer too. So when you look at the problem from that point of view, ChatGPT is yet another tool in that toolbox of things that we have, and through them we think. So writing is one outcome of that process, but there are many other intellectual processes that you can imagine, like generating art, doing science, building products and industry, etc, etc. So, when you think about those intellectual processes from a very fundamental point of view, they usually have three major components. Usually, we use techniques to structure some information that we have in a particular way, or find it, or structure in a particular way.
Amir FeizpourWe use the knowledge that we have to put things in the context, for example. And then we use creativity to synthesize it in new ideas and new thoughts. So I will give you an example. Large language models are much better at mapping out syntax, for example, versus semantics versus common reasoning. So, the reason is that syntax is a more repeatable pattern. They’re very good at picking that up as a technique that is used by people. So, there is research that shows that that’s the case. Syntax is easier to pick up for these models versus semantics versus common sense reasoning as we extensively talked about. So now if you think about the process of thinking and creation, it starts with discovery. We use tools and techniques that we have to find pieces of information. Then usually we re-contextualize that information to apply it to a particular new domain, like this podcast. We have collected information from different points of view. Now we are re-contextualizing it in the context of this conversation and then eventually the synthesizing process of putting it all together to say these half-coherent sentences that I’m providing right now. So, what I’m hoping to happen and what I think will happen in short term is that tools like Chat GPT will remove the need to do the manual technical pieces, like the techniques that we are using, for example, to write a piece of text that is technically correct.
Amir FeizpourIt can just completely remove the necessity for that. But it can help us be better, more efficient at finding knowledge that is relevant for re-contextualization. But the creativity part, I think is going to still largely remain to us. People are talking about how Dall-E is creating creative images, but all of that creativity is really in the prompt that is given to the model. Like people are imagining creative things, they’re typing it and it’s just literally repeating what you did. So, it is good at getting the technique right, like the lighting and all of those things in the image, but the creativity is still coming from the human. So, I think that process is going to continue. The techniques are going to be more efficient, finding knowledge is going to be much more efficient as well. But creativity is still going to take a while to get to a point that can replace human agents.
Hessie Jones Okay, thank you. This last question, which will go to both of you, actually expands on what you just said. There is this one student that actually was trying to help teachers to identify whether they actually use ChatGPT to see whether or not they could actually surface the cheaters in the classroom. And so it analyzed the output, and his explanation was pretty intuitive. He said, we look at the text perplexity which measures its randomness. And they said that human-written texts tend to be a lot more unpredictable than bot-produced work. So, humans are complex beings. I don’t think at any point in time, at least in the near future, we will become deterministic like machines are. People will start to understand human behavior and patterns. So, are we ready, from your perspective, to acquiesce to the machines at some point? Probably not today, but maybe sometime in the future. What do you both think? Patricia, I’ll start with you.
Patricia ThaineSounds good. Humans still have to have that certain amount of data curation as I mentioned before. When it comes to that model and determining perplexity, I wonder if they fine-tune the model to their kind of style of writing, whether that would still work, thinking it won’t. And I think we have to just have different standards at one point as to what to expect from students when it comes to having something to aid you in writing. If you take an entire quote from ChatGPT, for example, you should probably say it came from ChatGPT. But whether or not you should be penalized for that depends on the point of the exercise. If the point of the exercise, for example, is to write about particular technology trend, for instance, and ChatGPT helps you out with that, I’m not convinced you have to be penalized for that. As long as it’s your creative ideas, and as long as you are reading, you are learning the flow. You still have to write the introduction. You still have to figure out what the logic is behind it. Where I think people might be surprised is that they might use Chat GPT expecting it to produce an entire essay, for example, and then not knowing what the kind of quality that an essay has to be yet, because they’re fairly young when they’re probably doing this. Not having gone through that feedback process, they might think it’s fine then get a bad mark as a result and eventually learn you have to do your own curation.
Patricia Thaine You have to do the writing of certain aspects of it. But I’ve hired content writers that seem to do a good job. And then I look at some of the posts and it looks like it’s generated by ChatGPT, and that I have to hire them, if you will. And I think that there’s a lot out there that could benefit from enhancement, but also a lot out there that humans are already not very good at and this isn’t going to make them particularly better. The ones who are actually good at the tasks are still going to stand out exceptionally well.
Hessie Jones Perfect. Thank you. Okay, and Amir. Last word.
Amir FeizpourI take this take this as an opportunity for us to reflect on why do we do learn and how do we evaluate learning and where does learning actually happen? I don’t need to know today how to go find a book in a physical library anymore, right? Because I can just pull up my phone and literally tell the assistant to give me some piece of information.
Amir FeizpourHow does learning happen? Where does it happen, and how to evaluate if it is actually useful? I think that’s a very valid question that we have to ask. If collecting a bunch of things and writing an essay is the way we’re teaching our kids how to do things that’s cumbersome, that’s already automated, can we not think of a better way to do this? And on the topic of can we detect AI generated information? We can use measures like perpetuity, as you said, but that’s a moving target because the models are going to get better. Like, even open AI is working on a cryptographic signature to embed in the generated text so that machines can automatically detect that it was machine generated. But if you’re malicious enough, you’re going to figure out how to get around it.
Amir Feizpour I think this is an opportunity to fundamentally think how we think and improve that.
Hessie Jones Thank you so much. I actually think that we’re at a point now where it looks like humans will be useful at least for the next five years before the next generated algorithm comes out. So thank you both for coming today. That’s all we have time for.
Hessie Jones If you within our audience have suggestions or topics you want to cover, please contact us at [email protected]. By the way, we’re also accepting applications for both our winter programs in investor readiness as well as incubator. So please check out our website at altitudeaccelerator.ca. Join us next week when we actually tackle the topic on data privacy. So in the meantime, everyone have fun and stay safe.
Amir Feizpour Thanks for having us.
Patricia Thaine Thank you very much.