Let’s talk about People Data and Analytics!
The foundation for sharp, actionable people analytics is clean, pristine data. Jump into this quick webinar to learn about how you can maintain clean and reliable HR data from a workforce analytics, people operations, and data analysis expert.
Giuseppe Di Fazio from Silicon Valley Bank has worked 10 years in human resources across the technology, entertainment and financial services industries, with experience in data architecture and analysis, workforce analytics and people operations.
Learn his tips by watching the recorded webinar or dive straight into the main points and Giuseppe's presentation by clicking here!
Webinar transcript
Patrick Canning: Hello, everyone. Happy almost end of the week. And thank you all for joining us for our Shockwave Talks. My name is Patrick Canning. I am from every date, which is a new AI powered people analytics solution that believes in better people data for better decisions. Here at our date, we love learning and this webinar series is a reflection of that, where we get to Deep Dove with our experts and industry leaders from around the world. And luckily today we have one. This specific episode is about cleaner data for more reliable people and for more reliable people. Analytics. But before I introduce our guest and before we get started, I would just like to do some basic housekeeping. First, we're trying to keep this event under an hour. You can feel free to ask questions in the linked in chat at any point in time. I will be reading them and asking them. After each slide, we will dedicate a short period of time to asking a few questions, and at the end, we will have a longer, more open ended Q&A. Last thing we are also recording this. So if you like to watch this again or if you'd like to share, we'd be happy to send you the link after. Once again, thank you all for taking the time to join this conversation. Now I would like to formally introduce you all to Giuseppe Di Fazio.
Giuseppe di Fazio: Hello, Patrick. Thank you for having me.
Patrick Canning: Yeah. Hello, Giuseppe. How are you doing today?
Giuseppe di Fazio: Good, good. Thank you. Thank you. How are you?
Patrick Canning: I'm doing great myself. So for all of you who don't know, Giuseppe is the director of People, Analytics and Workforce Planning at Silicon Valley Bank. I believe you have a bit over a decade of experience and in H.R. across various different industries. And yet your specializations are. And did architecture, workforce analytics and people ops, which is one of the reasons, which is one of the reasons why we are very lucky to have you. So, first off, thank you for for taking the time to have this conversation with us.
Giuseppe di Fazio: Absolutely. It's a pleasure.
Patrick Canning: Yes. Everyone, please, please give a warm welcome to Giuseppe.
Giuseppe di Fazio: Hey, Patrick. I think that the main topic for today that we want to we want to touch on was really the clean data. Why matters? What can we do to keep it clean and keep it clean? And I thought that the very first bite was really more about what? Why does it matter? Why do we want to have data that it is clean? And of course, the three main things are really the ones that are on the slide here. We must have incorrect beliefs, assumptions and insights on the status of our workforce. So some of the data within our company so that that would also lead to decisions that are poorly informed and that are not actually in the best interests of the company. And of course, in general, if if we use data that's not accurate, that damages trust in the overall process, in the dashboards, in much all of the data that that we have. I can give you a couple of examples where I work many times. We usually have a high turnover after we pay out an annual bonus. So we need to maybe change the PR schedule to make sure that we don't have this spike because that causes disruption for our position and therefore the budget for things. If you don't have clean data that ties the power structure to your turnover, you might just think that's the case. What it isn't or are examples. It's cheaper for us to hire location X versus location Y. And again, it might be true. It might not you might not have looked at the fact that the levels that you're hiring and the composition of the workforce in different locations. So again, going back to assumptions that might not might not be exactly what what was true and the damage trust in the process. This is the centers that we never won a year. This does with oh, I think there's something wrong with the data and usually that's followed by I don't know if this is accurate and then it goes, Oh no, I cannot trust this analytics. So everything that you prepared, it just falls apart because the perception of accuracy becomes paramount. So you really need to invest the time not just to have to directly, but to make sure that the leaders and people that consume the data really feel that the data can be trusted. And you can be very explicit about the process that you do to keep it clean. To make it clean. It's not it's not secret sauce is a lot of our work goes on behind the scenes, but that really creates the trust so that you're not going to have 100% data clean all the time, but at least people understand that it's going to be really, really clean. So that's that's very important. Those are the three things. Why have you cleaned? It is really important.
Patrick Canning: Yeah, that does make a lot of sense. So taking a step back, Giuseppe, I was wondering if you could help us understand, the bigger, the bigger picture of your. Of your position. I'm just curious, like, what kinds of data sources do you work with? And in those data sources, what the actual data is. And and with that data, like, what are you trying to figure out? What kinds of questions are you trying to answer?
Giuseppe di Fazio: I work directly with a lot of different datasets. Most of them all employees are you're a regular each highest data about the employee, the structure, some of the personal data, what usually is not, we don't use that much. The cool part is that when you start integrating datasets from the performance of your system or maybe your employees are part of this sentiment system. Of course the compensation part, if you're a public company, of course, equity and everything that goes with share shares investing. So once you start building with that, you can see a fuller picture of your workforce. And of course, there's a problem around data in your in your just the data itself on which departments are receiving people from which groups and what is that usual career path from a junior to senior engineer to director engineering, for example, I can see what's the average time in the steps to get there, which is also good for for employees perspective of awkward. These are my potential path this is what happened in the past the people that get there and also you can have a chat with the people they got there is always this was my, my, my path to get there, which is not always the same for everybody, of course. Right. But as you get a sense, oh, this might be my option. So it's also value for the employees, not just for the company on using the data, but again, it has to be clean, standardized. We'll get through that. But that's that's a big part of it.
Patrick Canning: Yeah, that does make a lot of sense. I'm just curious, what are some of the most common examples of dirty data that you tend to encounter?
Giuseppe di Fazio: All duplication is a big one. I know we're looking over it later. It's really the fact that a lot of times, especially employer employee data, are some basic I'm name, last name, email address. That kind of stuff is entered in multiple systems by multiple people and updated multiple times and then goes to the social truth, which can be your HHS here in the U.S., for example, or if the employee feels that their data or when they do the and probably the application and then sometimes they do it again on their first day and then they do it again when they send out for their for a long day and then they do it again. Again, if the system is not communicating with each other when they do the insurance for for health insurance and sometimes for life insurance. So there's multiple data data systems and sets that to cover the same thing. And that's really, you see just a duplication, the formatting, that's usually a very big one and wants you to trust that it just cuts off the volume of the data that you're working with and that helps quite a bit.
Patrick Canning: Yeah, because I can only imagine, you know, with there being a lot of different redundancies and from different data sources, yeah, I can definitely see that being a common issue. I know you did talk a little about this, but I was wondering if you had a few more examples from in your experience of when leaders have made bad decisions based off of bad data?
Giuseppe di Fazio: I can give you an example on so, so decision based on missing data or data that was incomplete. I went through a merger a few years ago and of course we announced the merger. We could have closed this at day and one of the things that happened is, of course, the target company started to lose people. There were concerns about retention of key employees and know in the system very you promise retention bonuses. You invest some money to to address the fact that, of course, people might think, oh, gosh, I lose my job. And that's that's yeah, right. The fact that we started doing this without having a clean picture of employees payout for the next 12 to 18 months meant that we might have promised cash payments to someone that maybe they were best team, kind of the same amount in stock a week before or we took quick and dry a quick and dry approach for everyone that was on the on the absolutely keep list but that not having a full picture of bonuses, special bonuses and all of that, we couldn't get a good sense of that. So it was the best we could do with the data we have. But, you know, once you have the better data, we thought, okay, we need to build this in a bit better. So next time around we're going to have better data to make it to, to, to, to use, to use for our decision making.
Patrick Canning: So yeah, of course, yeah. That, that does make a lot of sense. Okay. So I feel, you know, we have established, you know, a lot of the issues with, you know, having bad data and how can that lead to negative outcomes. So my question to you is, so yeah, now we have the bad data. So how do we convert that into clean data that can be usable and digestible?
Giuseppe di Fazio: I think there's is really three major steps in in looking at the data you have and this is more of the process of data cleaning. Some people call it data scrubbing or just keeping a good data hygiene. I really wanted to call out the the definitions of a more general definition of what it is, the data cleaning, fixing, removing data that would be incorrect, inaccurate, incomplete, inconsistent, duplicative or just formatted the wrong way. It will talk about what the wrong way is. And usually this happens, you know, is there ever poor communication across departments? We talked about, you know, putting the IP address for one in different ways because different teams don't talk to talk to each other that much coordination or just sometimes the process is there's nobody in between. And then just there's some issues there. I think everything really starts from the data assessment. And this is where we look at the data we have and look at, okay, so where does data coming from? How is it use? What are the technologies that we use? What what are the processes? The people that and enter the data, manage the data, manipulate the data to out the quality of year from when they apply to the job well after they last. Right. This helps also understanding the root cause of some data quality issues we get a better understanding of the problems and then this becomes more like a reinforcing of a good feedback loop. Or at the end when we do that, you do the auditing, the monitoring, and in the third category you go back to, oh, maybe this is something that recurring because of the process that we looked at doing the data assessment step. So that's really the the big chunk of it. And then I know I have a little slide more about how you decide what is good and what is not. The second category is more data remediation, which is the chunk of what we're talking about today. So just really fixing it. I'm assuming that that your data has some major issues of a few steps, what you can take just to go to the process of fixing the data. And the third category once was the data is is 99.99% clean is just how do we monitor it, how to make sure that it stays clean, as you imagine, new hires, people leaving people joining a data about employees is never the same changes every minute. So having good processes or just keeping a good monitoring and auditing the data is is very important. So those are the three big things. Yeah, I think those those are the three things that that, that we can, we can touch on to that.
Patrick Canning: Yeah. At least a question that I had. I'm curious when you're getting your you know, hands dirty hands on the data, I'm just curious about what kinds of tools or programming languages that you use to to actually execute that data cleaning process.
Giuseppe di Fazio: On an Excel fan where where that's allowed meaning in class of 100, 100,000 rows, that becomes a blatant issue no matter how powerful your machine is. Like, if it's a reasonable amount and I would say reasonable, probably below 5000. 10,000, that's still okay. Depends how many columns you have. But yeah, there are some tools that I really like that are once you set up more a these are the parameters, these are the acceptable values. There's some software they can just help you figure that out later. This is formatted differently that that can help automate some of that. Yeah but I usually spend a very large amount of time just understanding each field and adjusting why is in use and how it's used downstream? Because that's also yeah. That yeah, that's the day. But it's really sometimes we clean up our data is great and then we simplify to field because we thought, okay, that's cleaner. But there was a system downstream from us that actually needed that complexity in that field to do some other process. And now we kind of have to go back and negotiate with that person that uses that system. Okay, we're going to go in a different way or how can we simplify everybody's life routine by keeping a good formatting and consistency across?
Giuseppe di Fazio: But yeah, it's useful. I prefer just kicking your fingers on keyboards and just looking at every single feature if it's a manageable set.
Patrick Canning: Yeah, yeah, of course. That's very understandable. So, Giuseppe, you know, earlier, I believe on the last slide you mentioned that, you know, you have you're ingesting data from various different sources. And of course there's an elements of, you know, human data entry and especially with text data, which can really be a pain. I know all too well. How do you ensure that all of the data is standardized and that you're using the same definitions for things?A lot of politics now just get it. Well, some part of that. But it's really to build what I call here a data dictionary and people's glossary is really a document. Sometimes it's kind of like a book where all the different stakeholders came together and decided on specific definitions. And what is what I'll give you the example here is really more about two different sections and what I'll call it for all elements.
Giuseppe di Fazio: It's really a consistency guide to make sure that some things apply to all concepts and metrics, dimensions, attributes that you might have, and then something specific for each. Like in the first section that applies to all elements is really two pieces that I feel are really, really key. One is the naming convention. And here example, do we call this thing to know our annualized rate or annualized rate that changes things also for the future. If you start adding metrics and and order concepts that the your team is not the only one that can do that as all the teams need that they already can use that nomenclature the right way. Plus when you saw when you filter, when you search, when you group things, it's a lot easier if the different pieces of the name of the thing or element are very clear.
Giuseppe di Fazio: The second part that should really be standard across and that you may encounter some system issues where some systems cannot exactly match what you want. So in the API or in the in the file, this would be some transformation from my system to your delete or or whatever you keep the sorted through. Right. So that it transform into those in the, into the shared format. But this is more about really capitalization policy. How many decimals we have in that just that have decimals, what's the date format, what's the first day of the week? Is is Sunday, Monday or something else? It seems sometimes mundane, but that changes quite a bit. If you look at something by week, of course, knowing what the first day of the week is important and it should be the same across. And then if you look at the second category, which is really a little guide for every single element that we have, and then of course, a detailed definition of what that element is, a little explanation of how is that calculated. And ideally, ideally you have like an example for each one, one example that I've been working on the last few few weeks that been very interesting is how do we count turnover in the sense of how many people have left the company this month, which should be just a simple count. But but it isn't because some systems define a person that turnover as as an effective date or being the last day they worked some other systems define it as the first day they have not worked, which is all great if you look at the monthly data except for the last day of the month where some systems will put those terminations in the following month and some will not.
Giuseppe di Fazio: So having consistency in the definition expectations of what are the rules on the effective date for terminations, which impacts your head count as of end of month, which is a metric that you can find everywhere that changes things quite, quite a bit. And then the thing that's really important for the validation and keeping the data clean is really what is the valid output for each element like. So categorical data or text data like you mention what are the data labels that are acceptable? And then for medical variables, what are some data ranges that are acceptable for each, for each, for each element? And of course, this this dictionary is not is padding document one and done and use it all the time because of course your company's policies might change. You might have new dimensions. You if you add a new dataset that might come or should come, when you when new categories that you add into your data lake and of course compliance with laws or regulations that change all the time here in the US, EEOC, they're hiring rules about ethnicity and there's a lot of different rules for that, of course. And we GDPR being being a big part of it. And of course when now there should be an attachment to this, which is more of your data retention policy for each of these elements. How long are we going to keep these? What happens if we receive a request to for deletion, which feels one of the lead which feels we can mask without deleting or should mask without deleting otherwise we're going to break down a bunch of historical data or some algorithm that does something if you start taking all legal blocks away. So that that's also part of the data dictionary, but that that should be your your guide on this is how we define things. And that's why it's called dictionary like, like a word, right? Although they change all the time and the division.
Patrick Canning: I know it's, it's really funny because when you get to this level, you honestly sometimes feel like you're a lawyer trying to, you know, argue, yes, a judge. It's like, oh, what, what? What's the first day of the week? Or like, yeah, we're dealing with even dealing with like date times. There's of course, there's the American, you know, version, which in my opinion is the correct one. And then there's the European, which is right. But I know a lot of my coworkers are probably going to hate me for that. But, you know.
Giuseppe di Fazio: It all depends. It all like, well, what's the what's your what's the time zone of the company is a funny one, right? So the timestamp. Yeah. When, when did something happen. That's okay.
Patrick Canning: Yeah. If, if only we all just use Unix time but I don't think that's now.
Giuseppe di Fazio:Obviously we will go for that. You remember swap swaps in the nineties, try it for a time. We're like a day divided by 1000 steps. But I think you were in the 90.
Patrick Canning: Yeah, it's very funny. So I guess a question that I have for you here, you know, as someone who is in like workforce planning, I would assume you have to deal with a lot of different stakeholders in a lot of different departments. What are like some departments that you have to communicate it and standardize this data dictionary with?
Giuseppe di Fazio:I would say finance is is one of the major ones, mostly because of of reporting, especially if you work for a public company. There's a lot of reporting that should be out to the street investors. And of course, going back to the discussion of what's the headcount as of end of the year, that that might change things, especially for the contingent workers. We're usually the contract ends at the end of the month regardless of the day up to day of the week. So that that's one. So I would say finance tends to be one would be our friends in it. And I yeah, sometimes it's more about the technical parts or whatever system you use the of limitations there. And how can we manipulate some of the data from one system to another to make sure that the output and the format is correct? But that's more and more on the technical side. But I would say, a, the the finance team in the US right now depends how that the company structure sometimes the the DIY team is under h.R sometimes is not. And of course there's a lot of importance about keeping the data private and secure. Only a few people can access it also in aggregate. Right. So there's a lot there's a lot of that. It becomes more of a security time thing, although some of the definitions going back to the dictionary right about in the U.S. and how do you define ethnicity and race? What what are we tracking this quite a few things were the company policy impacts. How are you gathering the data?
Giuseppe di Fazio: A few years back, for example, there was this visual test where if someone does not check the box on ethnicity, the company was supposed to just look at the person and pick one or more, one or more, and that was the way to go. And then now the change. And some companies decide if someone doesn't respond, we just leave it blank with that. For the Southwest, that for some there's also some regulations that can sometimes conflict with the charter. So sometimes we get a legal involved. Just make sure that your you did a dictionary and the way to gather and update the data is also compliant. So I would say those are the teams.
Patrick Canning: Yeah, that that definitely does make a lot of sense. So my final question for you is I was hoping you could walk us through the process. So we now have communicated with all of the departments, especially finance, and we have now standardized a data dictionary so everyone can be on the same page. So what are the next steps?
Giuseppe di Fazio: Yeah, after that. Well, now we need to fix the thing. I put that like four buckets of, of actions I and ideally there will be steps that you take sequentially, although the first one, which is a very important one, probably you need to take a little bit of that also at the end and explain why some really start with removing the duplicates part. Again, we talked about this multiple sources tracking similar data. Sometimes they will change and that's in each case because the transaction was approved by the boss in the trials, but also in the penal system, because, of course, that's that's where that that that change takes place and is in the real more. So you might have two transactions for the same thing. But the other reason that I like to do this first is maybe the data set size can significantly shrink before you move to the next steps. But sometimes some of the duplicates might not become evident until after you did some of the formatting or fix some of the structural errors. So that's why I'm saying you might be like number one and number five, but usually I start at the beginning with this is is is a good way to go. What of caution on the duplicates is that sometimes some duplicates have a reason to exist in some datasets, especially if we're only using a handful of fields. An example could be there's an employee that's hiring January. The person resigns in April and we hire again this person October. If we're looking at the list of employees and we've hired year to date, this person is going to shop twice and we might think, Oh, this is a duplicate if we don't have the field that shows you the actual hire date for that event, but only this year, we might think that's a duplicate.
Giuseppe di Fazio: So sometimes adding new fields while we're doing the duplication removal process is useful to understand some of the some of the entries might not our records might not be duplicates but they are there a reason to exist after that? I really try to go with the destruction of errors. This is where you look at typos and formatting conventions. Going back to the additional, we try to make sure that we're spelling that one something the same way. If you have like a native foreign address with just characters that are not from the English language, how we're treating those right, do we use not applicable? Do you use any unknown? And there may be more a a company preference was a company where the word unknown would make people think that the did exist there we just didn't have it or our team couldn't get it versus not applicable, which was more neutral. Okay this is just doesn't apply with this person. Let's move on so that but that's really the number two is really looking through that Excel spreadsheet if you can or people tables if you can and looking for typos, formatting, those kind of things, then it's really more about the bodies. And this is more on the outliers, which would be the third bucket. This is where we talked about the exact acceptable ranges for each of the metrics. What are the elements of why this can be, you know, medical values that are much larger or smaller than other entries or there's a conflict with the logic of some of the fields. Sometimes there's fields that have ranges that they're both satisfied individually, but then they conflict with each other.
Giuseppe di Fazio:The textbook example is someone who's marked as part time on the full time versus part time field, but then on the skill hours per week, this person has 40 hours, which at least in the US that would be a full time person, not a part time. That's right. So so there's also those combined acceptable ranges or combinations that need to be checked for for outliers and some some outliers. Again, most of them in 16, but not all of them usually apply here. That that's very evident is someone residing in one country and being paid in the currency for with a different from different country. That's usually a mistake unless we're talking about an expat. In that case, I'd be someone from the US with an assignment in Europe maybe for six months, and they kept everything the same. This person still being paid in U.S. dollars on their U.S. bank account. That is fine, but there will be some exceptions. But again, it will be an outlier. I just just just wanted to review it. One of the things that I really like to do is to create a dashboard that shows the impossible outliers or using filters that should give you a population of zero.
Giuseppe di Fazio: One example will be someone who has a last promotion date earlier than their higher date, which is going to be possible. Someone that has a negative value for tenure, or someone that on the performance score where we give a score of between one and five, has a summer great performer, probably being an outlier. And so you have this dashboard I keep adding you. Then when everything's got, the dashboard is clean. There's no data there because no one, no entry satisfies those criteria. Right. And that's just a good way for me to just do a quick audit to see that there's no glaring glaring outliers an there and the for the fourth bucket is the missing data. And this becomes more like a philosophical question and depends on the company how they want to handle this. Usually I found out that most of the times contacting the person in charge of the data source is the ideal course of action. The data may still be missing because it was never gathered before or because there's might a specific reason and the person will tell you. And then if you put together how to deal with that part, sometimes you might just want to integrate the data by inputting this in volumes.
Giuseppe di Fazio: An example might be someone getting a pay increase, but you're missing the effective date of their pay increase. The same person got a promotion and it kind of started around the same time. And you have an effective date for the promotion, probably the dates, I would say probably the two events were connected in that case. I would suggest probably just you can supplement that that that date with with that with the one from the promotion or sometimes just leaving the body blank or not applicable or I know, like we said before, might be worthwhile. 2In that case, I think a good step is usually to check for any other system that's dependent downstream again or any algorithm and any automated process downstream that might be impacted by that. It might be like an API that doesn't recognize if you say and a so black, those kind of things. But those usually are the ways that I saw it. Working with dealing with missing data works best.
Patrick Canning: Yeah, yeah. Thank you very much for that in-depth explanation. I do have a question from the audience. Ignacio has a question for you. He asked Giuseppe, What is your favorite people analytics tool that generates clean data? Allow hello to question.
Giuseppe di Fazio: I think I have a favorite vendor. I really didn't one publicize it, but I think there's quite a few that are very good at displaying data to show clean it and to help you clean it. I don't know. There's many that have that part already integrated. They're good enough to to flag. This might be wrong. This might be wrong. I've seen a couple of solutions where the user can flag to the admin a data point that says, Oh, this seems wrong, which is going back to the quick to the the, the thing at the beginning, I think this is wrong, but at least that through a system and then it can fly, then you can go back to it's more like a systemized way of just crowdsourcing and this doesn't sound right, but if people are like this tool that that at that level and that flags you and helps you keep the data clean. I haven't seen one that that makes me very happy higher in the stack when you doing more specialized. I can be I can the automation. Yes there's the there's a few that that are that I think are worth exploring but I wouldn't call them people analytics tool because they really don't have that that visualization part. That's, that's what I did.
Patrick Canning: Yeah. Right. Yeah. That does make a lot of sense. So in terms of next steps, let's say we have we we now have all of this data remediated. And once it's done, what are the next steps? How do you ensure that it is done properly?
Giuseppe di Fazio: There's a few things that on the monitoring, auditing, and this is the thing that's the the last slide that I put there. There is really for concepts that you can also apply before before you start the whole process. But I like them in the in the margin already because whatever process you decide with the dashboards or some just your tables or creating some, some cadence where you look at specific fields where you know what data might be wrong because the issues that before these four concepts kind of help drive that that that process.
Giuseppe di Fazio: One is the validity is the data conforming to our rules and constraints. Are we measuring what we need to measure and how did we measure? It was a biases measured in our data. Sometimes we have very clean data, but it's incomplete and we don't even think about why is incomplete or the some collection bias. So it's less about the data being the data that you have been clean is more about philosophical question are do we have what we need to take action on it or just some good data but not all of it like so we're, we're, we're, we're clear when we, when we gather the data very consistent employee service right? Like a team, we give them three days and then if we give them 12 days, you're not going to get consistent data. So it doesn't look the same. So we look okay, but behind the scenes something impact is how you get gather the data. Accuracy is a close to the true values. This is the the example that I keep giving on this is the regrettable non regrettable for termination. I've seen huge swings of the percentage of terms of even resignations that were regrettable of not based on feedback from above base from incentives as that percentage is part of your performance review. If you're a manager, that number tends to go down because you have to 200 decides. So is it accurate in the sense of it does it is reflective of the truth like completeness. This is going back to what we talked about before is all of the required data. No, an example is in us. We we need to know which us state your side to get your for your full address. But that field in some other countries might be like here again region is not needed. You know you have the city, you have the postal code, that's enough for the address. So the completeness part is going to vary based on the data set, based on the country, based on what you're looking at and different pockets in the company may have different definitions of completeness. And that goes back to the to the dictionary again. And then consistency. Is that consistent within the same data set across different data sets. And also, if it's a yearly or annual data that you gather from year to year and sometimes you change the way you measure things, your employee over from this year might have different skill from last year.
Giuseppe di Fazio: Performance is use scales change all the time, but getting the sense of are we. I would be consistent in the way that the data is is gather or maintain and present it those for concepts are really the ones that drive the monitoring, the auditing part. And the last step is really that the one thing us at the very beginning is creating a positive feedback loop because in the end, as you clean up the data, you compile a list of sources of breakpoints while fix them, and you try to figure out what caused the issue in the first place. This helps improve the process at the top. In this looking at data acquisition to, maintenance and automation that you can put in place it, it's usually great news. Ideally you get a log of errors or a larger day of how went. I think those are good tips on keeping the data clean and just in general keep the fingers off of keyboards and usually using automation APIs is a good way to go.
Patrick Canning: Yeah, always a very good point. Well Giuseppe, thank you very much for your amazing presentation. We all do appreciate it. Just checking on time. We have about 20 minutes exactly. For an open ended Q&A. So for people in the audience, if you have any questions, feel free to write them down now and I'll be asking them. So first off, my first question, which I'm actually going to be building off is from Alejandro. His question to you is, how do you prevent bias? Internal surveys? And my addition to that, which is kind of like a side tangent, is in your position, what role do surveys play in your job, in understanding your workforce.
Giuseppe di Fazio: Bias in the survey is the first question is internal surveys on the on the gathering part you want a big ad, right? You want a huge sample size and really you want everyone no one's going to respond to all of that. Also, based on how is the data gathered if it's in person or not, you can have biases in the responses. So ideally you want something that's anonymous, although that's going to kill a lot of the fields that you want to report on if it's really anonymous. But there's also a bit of a trade off if, if, especially if it's to sensitive data. The other part of my bias is really trying to figure out to statistical means what some groups are going to be small. So your your confidence interval is going to be fairly wide. And one of the things that I that I struggled with in the delivery of the data, especially for surveys, is this concept that, oh, some people score 3.7 versus 3.4. Well, that was higher. Well, but you have a confidence interval that's 0.5. So they're not exactly the same, but probably. Right. So, so that that's something that it's more on the bias of the readers and not the employees that provide that provide the data. But you usually try to get as many as possible is is a good way to go. And then if you can and if it's not fully anonymous, otherwise you cannot do this. Try to see at least for the big cuts off location teams or any any active you that's important for that specific survey that you have a good representation. I'd say you have two locations. One is Spain, one is France. And and it's let's say our headcount is 50 while responses are 73, you cannot aggregated data. You shouldn't say, oh, company wide. This is the number because it's so skewed. You should probably try to figure out why response rates were much higher on one end versus the other, but I wouldn't report a company wide number because there isn't one. So that that that would be my answer to the bias. And can you repeat the second part of the question?
Patrick Canning: Yeah. Sorry for bombarding you with the two part question. My addition to the question was in your position, what role do does survey data play in your position?
Giuseppe di Fazio: He has dependent on the company and also depending on the topic, the more regular company sentiment. And one of the biggest companies we had played a major or major role first in reshaping company culture, coming up with values and using those values to create a we called it a cultural committee of 2530 people the met for a few months to define them and also infuse those values into performance reviews and hiring and behaviors that were expected. So that was a big part out of surveys, for example, through coalbed, I had a different role. I was I was head of Charlie at that time. And of course, it was more about going to the office, working from home, what kind of issues you might have, results, personal situation. Of course, that is a sensitive time for everyone. And then it was taking very seriously. It had to be not because we needed to know who needed help or adjustment in terms of hours and all that. But that that really vary based on on the type of of the type of survey and the circumstance of that for that survey.
Patrick Canning: Yeah, that does make sense. So piggybacking off this conversation, I see there's a comment from Taylor and he asked, what is a good response rate for internal surveys, in your opinion?
Giuseppe di Fazio: Yeah, it really varies varies depending on the your history, on acting upon the results of those surveys. I was a company where we did once a year, nothing changed. And, you know, people started doing it with less. So I would say probably between 30 and 50% is a good number. In some companies I've seen 75%. It depends on the topic. During COVID, the response was like 97%. So it really depended. I would say anything below 30 is really disappointing between 40 and 60. I think it's it's fairly healthy. But again, it really depends on the topic, on the frequency, how much time you give people the way to gather the data. If it's aa1 question survey or a 15 question survey.
Giuseppe di Fazio: So yeah, 40 to 66 seems healthy to me.
Patrick Canning: Oh, that's that's very good to know. I have a question from Janine. This is more of like a personal question, not necessarily about work.
Giuseppe di Fazio: Okay.
Patrick Canning: But she said of working with with data and with so much detail all day, do you ever feel overwhelmed or burnt out? And what do you do to rest, refresh and stay sharp all week?
Giuseppe di Fazio: Allow. One thing that helps me is to always take a step back and look at the big picture. You have this big spreadsheet, but again, going back to why how this data is going to be used and what's the purpose of this data? It's not just data being cleaned for being clean. And I really like the clean the clean spreadsheet. But more on what's the business rationale? How is that this is going to use this how is this going to help the company and employees also, the purpose of it, I take breaks. I have my computer monitors and I take a break once in a while. I just try to just look outside for a few minutes and just even if I'm still thinking about work, at least I'm not staying in the screen because of course, that's that's also good for the eyes. And if I can take a walk, I try to take a walk for a few minutes. That's that's usually helpful. And also, if we can take a break and maybe finish work a little bit after dinner, spend time with the family or friends or your loved ones, that that really helps balance things.
Patrick Canning: Yeah. At least for me personally, like I am a very big walker. I feel like at least for me, it helps to to clear the mind and clear the soul. And yeah, it's very nice. I guess my next question with COVID having happened, or I guess we're still kind of in it too, how has how has your position in how have organizations changed with how they use data?
Giuseppe di Fazio: I think more and more that the last few years, as we all know, there's was an explosion of data in all realms and of course, the workplace and employees. And that is it's even more and the fact that with COVID, we have even more data than before. I'll give you the example of on a organizational network analysis were in the past if you want to build a you know a map of the organization, you could take a look at our outlook and see who's maybe home. If your phone system was tied to outlook or to another system, you could just figure out who's calling whom roughly, and get a good, good sense there. But, you know, some people may have real life meetings or walk by right? That all the informal part or connections that happen at the workplace, they were gone during COVID, especially moving moving to a virtual setting. So a lot more data was created on Zoom meetings, was pinged in home and a lot of slack and zoom. And so all those electronic interactions, there's just more and more, I don't think many companies have taking advantage of that in the sense of using that data to build a better view of the network. And central who is the connection between two teams? What seen that I really, really liked and that's something that Microsoft is working for quite some time is looking at your own specific use of your meetings, the downtime suggesting you should walk in these couple of hours, you should take some breaks, little nudges that in time. And of course, Microsoft's started working more and more with the way people are. We extend an employee data, modeling data and all of that. So that's something that I really saw. It's it's good for employees and for workers in general. Sometimes I get these Google reports on, hey, you kept checking emails after 11 p.m. for three days. What's going on kind of thing. So I think those those were are good uses of of people, analytics of data points that they can make everybody's life would be better now.
Patrick Canning: I'm afraid Giuseppe, where we're coming up on time. Are there any last words that you would like to add before we end?
Giuseppe di Fazio: I just wanted to thank you for personally to chat is was really fun. I hope we can do it again in the future. Again, thank you for having me.
Patrick Canning: I, I hope so, too. So to wrap things up, I'd just like to thank you, Giuseppe, for, you know, taking the time and effort to have this conversation with all of us. And I'd also like to thank the audience for, for all of your questions and for for being here. And if you enjoyed this Shockwave Talk, please join us for the next one, which is on October 13th with Gabby Hoyos from NOTION. And that episode is going to be about building a culture of trust inside organizations. Once again, my name is Patrick and on behalf of Erudit, I would like to thank you all again for coming to this conversation. And I hope you all have a great day. Thank you.