But it Worked on My Machine? (and 4999 Others)

Reflections

I’ve been working in an endpoint engineer/admin position for half a year now. I had a great mentor who helped me understand how to think like an endpoint engineer, but I’ve also spent a lot of time reading through endpoint subreddits like r/intune and r/SCCM and building custom company specific workflows to get a deeper understanding. Here are my thoughts and experiences so far, and some of the main concepts I think are important.

What is Endpoint Engineering

In my head it’s thinking along the lines of “Do xyz thing on this workstation so it works in our company infrastructure. Now make that happen 5000ish+ times using specialized cloud deployment tools with a high success rate.

I like to think it’s almost like DevOps, but for workstations. You’re not managing CI/CD pipelines, but you are rolling out device configs, enforcing security policies, deploying software, managing patching, and keeping everything in compliance at scale.

The end goal is to make these processes repeatable and reliable across all machines. I would say because of this you have to have a solid understanding on OS you’re working with. The main goal is to make sure users in the company have a smooth experience with least friction as possible. You want things to just work like the wifi auto connecting, VPN autoconnecting without prompting, SSO functions cleanly, certs are preloaded, and every app the user needs based on their role or group is already installed. Users should just be able to login and more or less hit the ground running.

I think the mindset you get doing endpoint engineering does setup a mindset that allows you to move onto devops work in the future, more and more I find myself thinking things like how do I make this repeatable, scalable and reliable to work in different scenarios and edge cases. I think you start to think in terms of abstraction and automation, and have to consider variables that aren’t always obvious like network connectivity/speed, regional infrastructure, or user timezone (when dealing with scheduled deployments or region specific configs).

Even things like pushing a policy or deploying software force you to think about how and when it hits the machine, how it behaves in low bandwidth scenarios, and what happens if the user isn’t online at the time.

Recently, I started experimenting with Kubernetes in my home lab, setting up a simple proxmox instance with some master and worker nodes. A lot of it just made sense and kinda felt intuitive. Concepts around provisioning, VM deployment, node configuration, they felt familiar because I’ve already been doing similar things in the endpoint space. I do think the context is different, but the logic and principles carry over.

Background

Endpoint engineering has its roots in tools like SCCM, which required line of sight with Active Directory to function properly. Back then, pushing software or applying policies meant relying on an on prem server, usually through group policy or local network deployment methods.

In Azure environments, Intune is the preffered tool for endpoint managment. For Apple devices, JAMF is usually what is used, Microsoft support for Apple devices is growing and developing, but I have heard contextually through from collegues and secondhand accounts on reddit, it’s still not at the same level as JAMF.

The Intune and SCCM subreddits are still tightly linked, and I often find myself browsing both or getting inspiration from packages meant for SCCM that I end up adapting them to work on Intune. There’s also RMM’s that are best used in conjunction, or tools like PDQ which are similar to SCCM.

Autopilot (pretty sick)

One of its most neat features Intune offers (if you have the licensing) is Windows Autopilot, which allows a user to sign in to a device and have it automatically provisioned with company policies, apps, and settings without needing IT to touch it.

You might not consider this if you’re from a non infra role, the old school way involved PXE booting into a core server, pulling down a company image, and manually prepping the machine, (or worse using a USB to image) usually requiring a whole help desk or technician(s) who’s core job was to do this. I used to do this myself as an IT intern in a different org, and in many orgs, it was a full time job for one or a few technicians.

Autopilot eliminates much of that. I do think setting up Autopilot and the surrounding device prep policies involves a lot of moving parts and documentation to read, but once it’s setup, it significantly reduces overhead for IT and scales smoothly across large device fleets.

There’s also Autopilot V2, or Device Preparation Policies, which behave differently. They no longer rely on hardware hashes, making them easier to manage. Usually, vendors like Dell or Lenovo would register device hardware hashes into your tenant, enabling Autopilot provisioning. With Autopilot V2, you can now assign deployment policies based on user or group memberships, as long as the device has a corporate device identifier registered in Intune.

In short, while setting up Autopilot prep policies(v2) takes some upfront effort, it ultimately saves time, reduces support tickets, and frees up your helpdesk and tech teams for more impactful work.

GraphAPI

One thing I quickly found super useful and fell in love with was GraphAPI. There’s so much you can do with it basically anything you’d want to automate or pull data for in Entra ID. You can programmatically add, delete, or modify users, devices, and groups. You can disable accounts, assign roles, or grab more specific info that’s hard to get from the portal like the last logged in user on a device, or a full list of discovered apps across your tenant.

Let’s say you want to see how many endpoints in your environment have Steam or Skyrim installed you can query that through the discovered apps endpoint and export it for reporting and create a group from those endpoints for targetting a specific remediation like an uninstall. (not saying users would have these installed due to no local admin but you never know if a helpdesk dude wants to play doom or counter strike :p )

You can also get more niche information not easily available on an entra portal like last logged in user on a device. You can get all instances of each app running in your whole tenant, say you want to see how many devices in your tenant have skyrim? or steam. You can check this through discovered apps. You can setup custom reports like getting devices with no primary user, or any weird criteria the business asks for, like users who have multiple devices, users who have devices running on windows 10, or specific windows 11 builds,

I think chain requesting information is super powerful, for example, get a user group → loop through each user → query their assigned devices. That gives you a link between user groups and their devices something Intune doesn’t expose natively by chaining info from one api and piping it into another.

Graph also lets you dig deep into hardware details like pulling TPM info, device models, free space, or whatever else lives under that device object.

I honestly really like using Graph because if you can think of what you want and find the endpoint where that data lives, you can do way more than what the Microsoft admin portals allow. I’ve been experimenting with it a lot and have templatized a bunch of scripts at https://github.com/anguzz/powershell/tree/main/Graph

EXE and MSI Deployment

Deploying EXEs can be a pain. Vendors love doing stuff their way (which sometimes is in a really sucky way) they set non standard install flags, or don’t support /silent or /quiet at all. So you basically cannot deploy a software silently, throwing it back to the HelpDesk or support team. You might get lucky and be able to extract the MSI from the EXE, but if not, you’re often stuck reverse engineering the installer behavior or digging through vague documentation.

MSIs, on the other hand, are usually more predictable. They follow a standard structure, support consistent silent install flags like /quiet or /qn, and tend to work better with detection rules in tools like Intune or SCCM.

There are also tools like Orca and Master Packager that let you go even deeper. MSIs are basically structured like a database full of tables that control everything from install behavior to file paths. You can repackage or modify an MSI by editing those tables directly. That might mean hardcoding install flags, setting deployment IDs, or embedding configuration details like licensing keys or portal synced settings. In some cases, you’re effectively telling the installer it’s your companies custom installer.

Whenever possible, I prefer MSIs. But in the real world, especially in enterprise environments, you’re going to run into a lot of stubborn EXEs and learning how to deal with them is just part of the job.

Sometimes, depending on how you write or configure detection rules for an app, you’re unknowingly building tech debt. For example, if you use MSI based detection but the app is known to auto update, your reporting might say the install is failing when in reality, the app is working fine, just updated itself beyond your detection logic. You then have to go back and fix the detection logic.

That said, in a pinch MSI detection for MSI installers works especially if you’re under a heavy workload. It’s more about considering the trade-offs.

If you have the bandwith, it’s better to target things that don’t change often, like specific registry values or stable folder paths. I think the best method is to create a PowerShell detection script that includes multiple checks and balances. That way, even if one check breaks, the script can still figure out if the app is actually installed and working as expected.

System vs User Deployments

When building custom workflows, it’s critical to understand the difference between system context and user context. It affects where things get installed, how scripts behave, and whether your logic even runs in the correct space.

Apps install to different registry hives depending on the context
Scripts deployed as SYSTEM can’t use environment variables like $env:USERNAME the way you expect
Paths like %APPDATA% or %USERPROFILE% point to entirely different locations in each context

To get around this, I’ve built scripts that dynamically identify the current signed in user by grabbing the owner of the explorer.exe process. It’s a reliable way to determine who’s actively logged in, especially useful if you need the UPN and it matches the current logged in user, to authenticate to Microsoft Graph, generate logs per user, or pass user specific data into a webhook or API call.

nicolonsky/IntuneDriveMapping is a great project that does this. The companion site intunedrivemapping.azurewebsites.net generates a PowerShell script that’s really pretty clever.

Detect the currently signed in user by checking the explorer.exe process owner
Capture the user’s UPN so it can perform group membership checks via LDAP
Map drives based on group filters, which is often a challenge when you don’t have direct user interaction
Persist mappings across logins using a scheduled task that’s created dynamically if the script is running as SYSTEM

This pattern is valuable far beyond drive mapping, it’s a clean way to bridge the system/user gap in scenarios.

Patching

Patching can be one of the more frustrating parts of endpoint management. Some apps, once updated, suddenly fall outside your detection or patching logic. Tools like TeamViewer, Automox, and others can break automation if their behavior changes with a new version like altering install paths, registry keys, or detection criteria. Some apps will self update and then stop reporting to the management portal altogether, essentially falling off the radar. You might think everything is up to date, only to realize the app hasn’t checked in for weeks.

One of the more annoying situations is when you’re forced to patch something manually maybe because automation failed and then your security tool flags a brand new vulnerability in that same app a few days later, it can feel like you’re behind when you just took action on something a week or two before.

A lot of patching can be automated, and tools like Intune or Automox make that easier. But in practice, I haven’t seen a setup where everything is 100% hands off. There’s always a few handful of apps with exceptions, needing extra detection logic, or manual intervention from the support team.

Custom Workflows

Eventually, you start building more advanced workflows that involve operations like renaming files, modifying contents, closing on going processes or removing problematic apps. These scripts are powerful but risky.

You have to consider edge cases:

Different Windows builds
Local user profiles with inconsistent naming conventions
Remnants/junk before standardization

One missed case can break a script. Ideally, I aim for a success rate above 95%, so the helpdesk only has to manually handle a few edge cases instead of dealing with thousands of endpoint issues. What’s frustrating is that there’s usually no visibility on the thousands of endpoints that worked, only on the 10 or 20 that failed. Those are the ones that get flagged, escalated, and kicked back to you for troubleshooting, which can eat up a lot of time even if the root cause is minor.

That said, some of those failures have taught me more than the successes ever could. They’ve pushed me to learn more about our environment, the infrastructure behind it, how Intune actually behaves in the real world, and even some obscure parts of the Windows operating system I never would’ve explored otherwise. It’s a bit bittersweet though because depending on your current workload, those lessons can either feel like growth or just another fire to put out.

Final Thoughts

Endpoint engineering has been pretty fun so far. It’s pushed me to think more in terms of automation, scalability, and how to build things that hold up across large environments. I hope everything I’ve shared above helps someone who’s looking to get into endpoint engineering. I feel like I’ve learned a lot, and I’m excited to keep upskilling, especially by learning from the amazing cloud, network, and security engineers I get to work alongside everyday.