Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools Intranets Today KMWorld Library Resource Literary Market Place OnlineVideo.net Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



News & Events > NewsBreaks
Back Index Forward
Twitter RSS Feed
 



Exploring the World of Data Science: A Primer for Librarians
by
Posted On September 13, 2022
Data science is a concept that is continuing to gain popularity in mainstream media. It can often be in discussions of AI, machine learning, data analytics, predictive analytics, or other related terms. Whether it is the recommended shows on your Netflix account, the creation of digital faces that are indistinguishable from those of real human beings, or even the candidacy of a data scientist in a recent U.S. election, data science is continually revolutionizing our world.

Data science is a combination of mathematics, programming, and the scientific process. Specialized blocks of code are developed to run large amounts of data through mathematical processes to find notable trends, answer complex questions, or develop solutions to a wide range of problems. Applications for data science may vary widely, but any business, governmental agency, or other institution can use data science to find quantitatively determined opportunities for growth and efficiency.

How Data Science Answers Tough Questions

Data science begins with a question. Regardless of whether the question is curious (e.g., “Can you tell the difference between a goldendoodle puppy and a piece of fried chicken?”) or complicated (e.g., “Can I use AI to determine if cancer exists in an image from a patient?”), the goal is to create a solution that is accurate, repeatable, and timely.

Once the question has been determined, a data scientist begins a multistep process to create the necessary solution. The first step in this process is to gather a large amount of data. For some questions, data has already been collected for others to use. However, other questions require data scientists to collect data through surveys or experiments or to “scrape” data from websites when allowed.

The collected data must be made usable before any solutions can be created. A significant portion of the world’s data is unstructured. Unstructured data, such as video and audio files, is data that is not stored in a traditional database format and requires much more manipulation to become usable. Even in structured data, duplicate and other erroneous information needs to be removed.

Cleaning data often requires specific scripts to remove unnecessary values. Common programming languages that are used in data science to write scripts include Python and R. These programming languages are usually run in a modular format through environments such as Jupyter Notebooks. This allows data scientists to work in an incremental process as well as quickly view data as cleaning occurs.

I Have Data—What’s Next?

After the data has been collected and cleaned, data scientists begin exploring it for any noticeable trends through visualization. Data visualizations such as graphs can be created directly within the data scientist’s programming environment. These visualizations give data scientists the initial leads on how to build a solution for the original question. For example, if a data scientist at an ice cream company was asked what month the most ice cream was sold, a line chart of ice cream sales over the last few years may show that July had the highest sales volume. Data scientists may even develop their data visualizations in specific software such as Tableau or Microsoft Power BI because these applications allow users to dynamically interact with data in a much more user-friendly way.

Depending on the question, the data scientist may discover the necessary solution once the data visualizations have been made. However, complex questions often require more thorough analyses. If the ice cream company had instead asked, “Why does mint chocolate chip sell more than vanilla?” there could be several factors involved in why this would occur. An even more complicated question, such as “Can we predict which flavor will sell the most next year?” is often the starting point for many data science projects.

To answer these questions, data scientists can use Python and R to also start creating new data, find how different factors interact with each other, or even apply specific mathematical procedures (or algorithms) to the data. By utilizing these algorithms, the data scientist can build scripts that allow the underlying computer to “learn” how to use the data in a way that shows useful insights (a process known as machine learning). Ultimately, data scientists could answer these complex questions by forecasting accurate data, building AI systems, or encountering other possible solutions that are produced off the backbone of the machine learning process.

Data Science for Any Information Professional

While some information services have started offering more robust analytics, any information professional can freely harness the power of data science for their specific use case. Potential information professional questions could include, “What resources are used across our institution the most?” or “Can I create an AI-driven chatbot to help users navigate our website?” The solutions that are created by information professionals are only limited by the imagination (and, of course, the data).

If you would like to begin your data science journey, consider utilizing both paid courses as well as free online media. Sites such as Mendeley Data and the Registry of Open Data on AWS contain free curated datasets for practicing data science concepts. Python, R, and Jupyter Notebooks are also open source, meaning these coding languages and environments are free to both use and change in whatever way you need. Data scientists will often share their code for their projects online on GitHub or will troubleshoot with other data scientists across a range of online forums.

Once you have completed a few projects, you can volunteer your newfound skills through social impact communities, such as Data Science for Social Good. You can even check out competitions on Kaggle, a site that allows data scientists to compete, sometimes even for cash prizes.

Other Resources

Here are some books, YouTube channels, and websites for those looking to learn more about data science.

Books

Automate the Boring Stuff With Python: Practical Programming for Total Beginners by Al Sweigart

An Introduction to Statistical Learning With Applications in R, Second Edition by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani

Storytelling With Data by Cole Nussbaumer Knaflic

The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t by Nate Silver

Algorithms of Oppression: How Search Engines Reinforce Racism by Safiya Umoja Noble

Race After Technology: Abolitionist Tools for the New Jim Code by Ruha Benjamin

Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor by Virginia Eubanks

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil

YouTube

3Blue1Brown (complex mathematics)

Guy in a Cube (Microsoft Power BI software)

Websites

Towards Data Science (a Medium publication featuring concepts, ideas, and codes)

Data Science Central (a community for data science practitioners)


Larissa Pack is a freelance science and medical writer. She gained experience in both the biotechnology field and academia while working on her graduate degree in bioengineering. Her email address is lmpack01@outlook.com.

Email Larissa Pack

Related Articles

3/14/2017Google Plans to Acquire Data Scientist Platform
9/21/2017SAGE Campus Provides Data Science Courses
9/26/2017SAGE Campus Begins Offering Data Science Courses
6/7/2018NIH Rolls Out Its First Strategic Plan for Data Science
2/5/2019Library Carpentry: A Toolkit for Researchers
8/6/2019BACK-TO-SCHOOL BASICS: Teaching Emerging Tech at NCSU Libraries
12/5/2019Humble Bundle Offers Data Science Titles From No Starch Press
12/15/2020ProQuest Enhances TDM Studio With a Visualization Interface
3/8/2022Building Trust: An Interview With the National Library of Medicine's Patricia Flatley Brennan
3/15/2022Defining What Librarianship and Library Education Should Be
8/2/2022IMLS Awards $21 Million in Grants for U.S. Libraries and Archives


Comments Add A Comment

              Back to top