Can we automatically classify and prevent Word documents from being distributed if they contain a string (like date of birth) without using Azure Information Protection or any clients on-premises?

Yes, it is possible if there is a particular data match pattern in a document but not for generally available patterns such as date of birth and address. The data match pattern could be in a particular format or combination of formats, for example, in the screenshot below. Once the data match pattern is available, we can train the system to look for patterns and train on them to identify them.

After the data match pattern is identified, we can then flow it through DLP when the content leaves the environment. It is critical that for the below to work, the data must be in an Office 365 Services and not on-premises, as it is not a feature of AIP but Office 365 Protection Center. Also, please note that this is a Preview feature and is not rolled out to all tenancies but should be available soon.

Exact data match classification