By: Loren Brent Cobb, Boeing with Anthony Dulay, Boeing; Jarred White, VMWare; and Souheil Moghnie, NortonLifeLock

This is the second post in a series on the Developer’s Role in Personal Data Privacy. Find Part One here, which outlines the role of data privacy practices in a secure software development program.

As our data-driven digital culture has grown, so too have concerns over its implications for personal privacy. Efforts to protect data privacy encompass a complex mix of regulatory, cultural and technology practices. These include the need to address privacy as early as possible in the development lifecycle – arguably placing software developers on the frontlines of data privacy protection.

As SAFECode continues to explore the role of data privacy needs in a secure software development program, we’d like to start by thinking about how and why software collects data. We are so used to gathering and sharing data today that this may seem obvious or unnecessary, but taking a step back and being deliberate in thinking about what data you need to collect and how you will use it is actually the first opportunity to take action to protect user privacy. To help offer some guidance, we’d like to suggest three simple data collection rules for developers to live by: be transparent, collect only what you need, and don’t forget the logs.

Be Transparent

Applications generally come in five “flavors” and most of us interact with at least a few of these, if not all, on a daily basis. What do all of these have in common? They all collect data about us.

  • Generalized social media, sharing, and proximity-person-aware applications
  • Subscription, Purchased, and productivity or capability enhancement applications
  • Institution or consumer-relationship/enablement applications
  • Commerce-enablement applications
  • Free games, utilities, or other single-user interactive applications

There is nothing inherently wrong about collecting user data. Your rideshare driver needs to know where to pick you up. Your GPS needs to provide location data to your rideshare application. You need to pay for your ride to the stadium or the concert or wherever you want to go. Your banking application needs to trust your assertion that the rideshare charge is legitimate and intentional. These applications, from the design perspective, are absolutely more capable when they can create, consume, copy, and connect data. It is even a fair and broadly accepted value proposition to monetize data telemetry from an application in exchange for use of that application provided the application is transparent about the data’s collection and usage. 

Transparency is critical. Developers must inform users what data of theirs is collected and how it is used, as well as how they can manage what they share or even opt-out altogether.

Rule #1: Be transparent and clear about how and why you are gathering data, what data has already been collected, and what options the user has to manage their own data. The individual ought to be able to see what type of data is collected, consent to having data gathered, and be able to erase their own information.

For example, in order to install a “Free Game” a user might see the following message – “to provide minimum user functionality, “Free Game” needs access to your WiFi settings, telephone and contacts, system logs, and GPS Data. Press [OK] to continue with install…”

While this information is typically communicated in the privacy policy, it requires more than simple policy writing. Developers must give these issues careful consideration to ensure that the privacy policy is an accurate and clear reflection of how the application actually works. They themselves must have a clear understanding of what data the application needs to collect, and how it will use and protect it.

Collect Only What You Need

There is no doubt that data has value. So, at first glance, it may seem to make sense to gather all the data you are capable of collecting. However, as users become more sophisticated in their understanding of data privacy and governments become more involved in regulating data handling, developers are likely to find that collecting more data than they need introduces unnecessary business and reputational risk and can be costly.

As such, developers should seek to understand 1) what is the minimum data the application needs to gather and store to function as intended and/or achieve its business case? And 2) Is there any benefit to gathering data beyond that minimum, and if so, does its value outweigh potential risks and costs? Costs to consider range from user dissatisfaction and reputational damage to more tangible costs such as a need to meet increased security and regulatory requirements.

Rule #2: When determining what data to collect, think of it like least privilege in cyber security. First, understand what minimum data needs to be gathered and stored for the application to function as intended. Second, understand how long the data needs to exist, and have the functionality to completely eradicate it when requested or when the data reaches its end of life. Collecting and storing any data beyond these parameters should be carefully weighed against potential risks and costs to ensure business value.

For example, do we have to gather geo-location data? Yes, because in order to provide our user an update on the weather where they live, we need to at least know their zip code. But do we also need to know their exact address?

Don’t Forget the Logs

In addition to careful consideration of what user data you collect and how that is communicated to users, developers need to be mindful that information they log about the application’s usage may also contain user data. Logs collect detailed information about who is accessing your application and how they are using it. In fact, software logs are one of the most likely avenues for inadvertent data leakage. Logs easily cross the boundary between production and development environments and are often shared with 3rd party organizations. Ultimately if the personal data is not needed or if it provides marginal value to the business case, the safest approach is to avoid collection and not log it at all. But in cases where the logs are needed, it is best practice to ensure the personal data that goes into the log is anonymized to avoid inadvertent data leaks.

Rule #3: Ensure all logged personal data is anonymized or pseudonymized and that there is a data classification document for any code change that handles log data. Logging code should be covered as part of code review.

Though simple, these three rules will help avoid some of the privacy pitfalls that can come with a “the more the better” or “Things you Trust/Things you can Control” approach to data collection and help to frame decision-making in risk management terms. In this way, being thoughtful and deliberate in what data you collect is the first step in protecting user privacy.

We have seen how to control data you are collecting and how to control what data you are collecting. In our next post, we will discuss things you cannot trust and things you are unable to control. This will include unsolicited connections, phishing emails and how this data is collected without us knowing.