Home  »  AI Systems Developer’s IP Checklist – Part 1: Leveraging External Data for Training and Building AI Solutions

AI Systems Developer’s IP Checklist – Part 1: Leveraging External Data for Training and Building AI Solutions

Building an AI system involves more than just algorithms, code, and data; it also entails significant ethical and legal responsibilities. Ensuring that data is not stolen or used without permission is crucial, as improper use can lead to serious consequences, including legal repercussions and reputational damage.

### The Importance of Data

Data serves as the foundation for AI systems, acting as the training material for machine learning models. Regardless of the methodology—be it supervised, unsupervised, or deep learning—large amounts of training data are essential. This data can come from various sources, including open repositories, proprietary databases, web scraping, user-generated content, and licensed datasets. Understanding data ownership is vital since such data is often protected by copyright, which dictates how it can be used.

### Licensing and Terms of Use

Before using any dataset, checking the licensing terms is essential. This will clarify whether the data can be used for research or commercial purposes. Examples like DrugBank's drug database illustrate the importance of choosing datasets with clear licensing terms, as adherence to these terms helps avoid complications and supports responsible AI development.

### Legal Considerations

Just because data is available online does not mean it is free to use. Unauthorized reproduction of copyrighted material can lead to lawsuits, as evidenced by the 2023 case where Stability AI used millions of Getty Images to train its model without permission, leading to a copyright infringement lawsuit.

### Fair Use vs. Fair Dealing

Copyright law includes exceptions like fair use (in the U.S.) or fair dealing (in Canada), which allow limited use of copyrighted material without permission. These exceptions are primarily intended for purposes like commentary, criticism, or research. However, using data to train AI systems may not qualify as fair use. In a significant 2025 court ruling, Ross Intelligence was found to have violated copyright law by using Thomson Reuters’ legal summaries to train its AI tool, establishing that such use is commercial rather than transformative, thereby harming the original market.

### Alternatives for Copyright Compliance

Developers might consider alternative methods that do not rely on copyrighted data to create competitive products. For instance, AI systems could focus on enhancing copyrighted works without directly copying them, ensuring both high-quality outputs and compliance with legal standards.

### Conclusion

As AI development evolves, scrutiny regarding data acquisition and usage intensifies. Ignoring copyright and licensing obligations can lead to both legal risks and ethical quandaries. Companies committed to responsible data usage not only mitigate the risk of litigation but also foster trust and sustainability in their AI systems.

### Looking Ahead

The next part of this series, "Creating and Storing Data," will delve into safeguarding generated data and ensuring it lays a solid legal and ethical foundation for AI systems.

Authored by Allessia Chiappetta, a second-year JD candidate at Osgoode Hall Law School, the piece reflects her expertise in intellectual property and technology law. Chiappetta also contributes to Communitech’s ElevateIP initiative, advising on innovation and commercialization aspects of IP. She regularly writes on IP developments, showcasing her proficiency in this area.



VentureLAB
https://www.venturelab.ca/
ventureLAB is a leading global founder community for hardware technology and enterprise software companies in Canada. Our organization is led by seasoned entrepreneurs and business leaders with decades of industry experience in building IP-rich start-ups, scale-ups, and global multinationals to help you scale your business. Located at the heart of Ontario’s innovation corridor in York Region, ventureLAB is part of one of the biggest and most diverse tech communities in Canada. We enable technology startups to accelerate the commercialization of transformational products on a global scale.

This website uses cookies to save your preferences, and track popular pages. Cookies ensure we do not require visitors to register, login, or share any identity information.