Sarah Lin is currently the Senior Information & Content Architect at the NoSQL database software company MongoDB. Prior to joining MongoDB, she managed the Enterprise Information Management team at Posit (formerly RStudio), a data science software company. She has previously been a bindery clerk, a serials librarian, an indexer, a technical services librarian and a content manager in academic, medical, legal and corporate libraries. Sarah believes that data literacy is the key to the future and that librarians would do well to learn programming and data science skills in order to better serve their patrons, colleagues, and careers.
MS in Library & Information Science, 2006
University of Illinois at Urbana-Champaign
BA in African/African-American Studies & Anthropology, 2003
University of Chicago
Librarians understand the need to store, use and analyze data related to their collection, patrons and institution, and there has been consistent interest over the last 10 years to improve data management, analysis, and visualization skills within the profession. However, librarians find it difficult to move from out-of-the-box proprietary software applications to the skills necessary to perform the range of data science actions in code. This book will focus on teaching R through relevant examples and skills that librarians need in their day-to-day lives that includes visualizations but goes much further to include web scraping, working with maps, creating interactive reports, machine learning, and others. While there’s a place for theory, ethics, and statistical methods, librarians need a tool to help them acquire enough facility with R to utilize data science skills in their daily work, no matter what type of library they work at (academic, public or special). By walking through each skill and its application to library work before walking the reader through each line of code, this book will support librarians who want to apply data science in their daily work. Hands-On Data Science for Librarians is intended for librarians (and other information professionals) in any library type (public, academic or special) as well as graduate students in library and information science (LIS).
Hands-on data science for librarians (forthcoming 2023) is a guide to doing data science in R geared directly for library & information professionals. Librarians understand the need to store, use and analyze data related to their collection, patrons and institution, and there has been consistent interest over the last 10 years to improve data management, analysis, and visualization skills within the profession. However, librarians find it difficult to move from out-of-the-box proprietary software applications to the skills necessary to perform the range of data science actions in code. This book will focus on teaching R through relevant examples and skills that librarians need in their day-to-day lives that includes visualizations but goes much further to include web scraping, working with maps, creating interactive reports, machine learning, and others. While there’s a place for theory, ethics, and statistical methods, librarians need a tool to help them acquire enough facility with R to utilize data science skills in their daily work, no matter what type of library they work at (academic, public or special). By walking through each skill and its application to library work before walking the reader through each line of code, this book will support librarians who want to apply data science in their daily work.
Data science brings opportunities to work more quickly and easily with data. It provides better reporting formats by incorporating outside data from various sources, and can even turn text into data that can be displayed visually. Even though legal information isn’t always associated with data, science, or data science, data science skills enable law librarians to do their jobs with greater efficiency. With data science skills, we are able to show new value for our teams and organizations, so it is definitely worth the time invested. The following 10 data science skills and techniques, along with descriptions of the amazing deliverables that are associated with them, are listed in a progressive skill-building sequence.
The distribution of scholarly content today happens in the context of an immense deluge of information found on the internet. As a result, researchers face serious challenges when archiving and finding information that relates to their work. Library science principles provide a framework for navigating information ecosystems in order to help researchers improve findability of their professional output. Here, we describe the information ecosystem which consists of users, context, and content, all 3 of which must be addressed to make information findable and usable. We provide a set of tips that can help researchers evaluate who their users are, how to archive their research outputs to encourage findability, and how to leverage structural elements of software to make it easier to find information within and beyond their publications. As scholars evaluate their research communication strategies, they can use these steps to improve how their research is discovered and reused.
Soon after starting a new position as a librarian at a data science software company, I saw that my employer was offering a workshop to learn how to do machine learning in the R programming language and I jumped at the chance to learn more about the subject. With support from my boss, I struggled through October and November refreshing my linear regression knowledge (knowledge I’d happily left behind in high school) and bringing my coding skills from near zero to “won’t be embarrassed in front of my colleagues.”