Windows 10: PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters

Discus and support PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters in Windows 10 Gaming to solve the problem; I don't have access to the feedback hub, so I thought I would post this bug information here in hopes that someone else can report it:Steps to... Discussion in 'Windows 10 Gaming' started by IS Department1, Apr 24, 2025.

  1. PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters


    I don't have access to the feedback hub, so I thought I would post this bug information here in hopes that someone else can report it:Steps to reproduce:1 Save the following script as UTF-8 without BOM this issue will only occur when run as a script that was not saved as ANSI and does not have a BOM2 Run the script in PowerShell 5.1```Write-Host "Test with ℹ character"```Expected behavior:The text in quotes should be output.Actual behavior:An error is returned.Error details:At P:\PowerShellBugReproUnicodeParse.ps1:1 char:36+ Write-Host "Test with ℹ character"+

    :)
     
    IS Department1, Apr 24, 2025
    #1
  2. LMiller7 Win User

    Microsoft Rebrand Unicode to UTF-16 LE in NotePad Windows 10 2004

    Microsoft did the right thing. It should have been done a long time ago.

    The problem with using "Unicode" to specify a file encoding is that is doesn't specify a file encoding. The purpose of Unicode was to assign values or codepoints to characters. It says nothing about how those values will be encoded in a file or storage media. Over the years there have been quite a few of these encodings. UCS-2, UTF-8, UTF-16, and UTF-32 are all Unicode. All except UTF-8 come in both LE and BE varieties. And files may or may not include a BOM or Byte Order Mark that specifies this. So when you see "Unicode" specified as an encoding the question arises, Which one is it?

    The encodings shown in the Save as dialog do away with the vague "Unicode" specification and makes it explicit.
     
    LMiller7, Apr 24, 2025
    #2
  3. drs Chris Win User
    wordpad has trouble recognizing utf-8 without BOM

    Wordpad has trouble recognizing UTF-8 without BOM
     
    drs Chris, Apr 24, 2025
    #3
  4. ddelo Win User

    PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters

    Unreadable non-ANSI characters in Notepad

    The problem:
    People living in countries, with languages including non-ANSI characters and want a full English Windows environment.
    If the user sets the System locale (Language for non-Unicode programs) to the country they live in, then many apps will check this setting and without giving the user any option, are installed with a localized interface, i.e. GUI based on the System locale, which might not be desirable.

    The apparent resolution is to change the System locale to English (US), which solves the apps interface issue, but because we’re talking about Microsoft Windows there is (as always…) an exception. In this case is Notepad…
    Notepad has ANSI (= ASCII & Extended ASCII) as its default setting for saving text files. If the text file contains non-ANSI characters then it gives a warning…which if you accidentally bypass and save the file with the ANSI encoding, all non-ANSI characters become unreadable.

    Being such a user, I have an English (US) installation and to avoid the localized app interface, I have set the System locale to English (United States).
    For some reason, which I haven’t found yet, before version 1803, I could save text documents with Greek (non-ANSI) characters and since I wasn’t getting the encoding warning (at least not that often) when saving the file, a lot of files with Greek characters were saved as ANSI and had no problem.

    This encoding issue has become stricter in 1803. My guess is the “Beta: Use Unicode UTF-8 for worldwide language support” setting that has been added when you change system locale has something to do with it. Either way this is, as stated, still in Beta, thus it doesn’t work as it supposed to, yet!

    So how to read all these text files with ANSI encoding, which contain non-ANSI characters, that are now unreadable?

    The solution:
    Step 1
    Go to: Settings > Time & Language > Region & Language > Related Settings > Administrative Language Settings (opens Control Panel) > System locale (Language for non-Unicode programs)

    Alternatively, for short, type in Windows search/Cortana:
    control.exe /NAME Microsoft.RegionalAndLanguageOptions /PAGE /p:"Administrative"

    and change the “System locale (Language for non-Unicode programs)” to the locale of the country you live in (Greece in my case).
    The system will need to reboot. Click Restart.

    Step 2
    Download the UnicodeConverter.zip, save and extract it on your Desktop. The zip file contains three scripts:
    CheckFileEncoding.ps1
    ConvertFilesToUnicode.ps1
    ConvertFilesToUnicode_NoBOM.ps1 (for advanced users)

    Step 3
    Open an elevated PowerShell and type the command:
    Code:
    Then type the following command (provided that you have saved the script in your Desktop):
    Code:
    The script will give you a list of all the ANSI text files, in all your user folders, as System.Text.ASCIIEncoding.

    You can check some with non-ANSI characters and verify that they are readable. (They should, since your locale is a non-ANSI one).

    Step 4
    Now you can run the command:
    Code:
    The script will:
    1. Create a backup folder in C:\Backup\ASCIItxtBackup and will save a backup of all ANSI files you have in your user folders
    2. Convert all ANSI files you have in your user folders to Unicode.

    After that, you can do again Step 3, to verify that there are no ANSI files in your user folders.

    Step 5
    Go to: Settings > Time & Language > Region & Language > Related Settings > Administrative Language Settings (opens Control Panel) > System locale (Language for non-Unicode programs)

    Alternatively, for short, type in Windows search/Cortana:
    control.exe /NAME Microsoft.RegionalAndLanguageOptions /PAGE /p:"Administrative"

    and change the “System locale (Language for non-Unicode programs)” to the English locale of your preference.
    The system will need to reboot. Click Restart.

    That was it. After your computer restarts and since all the text files are now saved in Unicode, they can be read with any System locale.

    Important Note:
    If you want to change either the backup location or the folders where the ANSI text files reside (e.g. search all C:\), open the script “ConvertFilesToUnicode.ps1” and as shown in the red box, in the image below, go to the section where we define the locations and change them according to your needs (e.g. $SourceDirectory = ‘C:\Personal\My Files’). Don’t forget to enclose the folder in quotes (e.g. ‘C:\Backup\My ASCII files’).


    PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters 189644d1526896161t-unreadable-non-ansi-characters-notepad-directories.png


    For Advanced Users
    Microsoft Notepad, saves all Unicode files with BOM (Byte Order Mark). In case you don’t want to use BOM in your Unicode text files, use the “ConvertFilesToUnicode_NoBOM.ps1”. It will do exactly what the “ConvertFilesToUnicode.ps1” does, but instead it will save the text file in any Unicode encoding without the BOM.

    Additionally, to change the Unicode encoding, to another format, in the convert section of the script change the Unicode in the “set-content $_.FullName -Encoding Unicode” part to any other of the available values:

    ‘ASCII’: Uses the encoding for the ASCII (7-bit) character set.
    ‘BigEndianUnicode’: Encodes in UTF-16 format using the big-endian byte order.
    ‘BigEndianUTF32’: Encodes in UTF-32 format using the big-endian byte order.
    ‘Default’: Encodes using the default value: ASCII.
    ‘Byte’: Encodes a set of characters into a sequence of bytes.
    ‘String’: Uses the encoding type for a string.
    ‘Unicode’: Encodes in UTF-16 format using the little-endian byte order.
    ‘UTF7:’ Encodes in UTF-7 format.
    ‘UTF8’: Encodes in UTF-8 format.


    Credits:
    The function Get-FileEncoding, 03-Feb-2015, by VertigoRay - Adjusted to use .NET's [System.Text.Encoding Class] (Encoding Class (System.Text))
     
    ddelo, Apr 24, 2025
    #4
Thema:

PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters

Loading...
  1. PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters - Similar Threads - PowerShell Parser Bug

  2. PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters

    in Windows 10 Software and Apps
    PowerShell 5.1 Parser Bug: Failure to Parse UTF-8 No BOM Script Containing Unicode Characters: I don't have access to the feedback hub, so I thought I would post this bug information here in hopes that someone else can report it:Steps to reproduce:1 Save the following script as UTF-8 without BOM this issue will only occur when run as a script that was not saved as ANSI...
  3. Wish "New Text Document" in UTF-8 with BOM format

    in Windows 10 Gaming
    Wish "New Text Document" in UTF-8 with BOM format: I work with UTF-8 text files a lot in my professional life.When we create a new text file using Explorer's context menu "New -> Text Document", the text file is actually an empty file 0 byte. Every time I have to do a "Save as..." and choose UTF8 with BOM option.I would be...
  4. Wish "New Text Document" in UTF-8 with BOM format

    in Windows 10 Software and Apps
    Wish "New Text Document" in UTF-8 with BOM format: I work with UTF-8 text files a lot in my professional life.When we create a new text file using Explorer's context menu "New -> Text Document", the text file is actually an empty file 0 byte. Every time I have to do a "Save as..." and choose UTF8 with BOM option.I would be...
  5. Will the "Unicode UTF-8" Region Setting Crash Windows 10/11?

    in Windows 10 Gaming
    Will the "Unicode UTF-8" Region Setting Crash Windows 10/11?: I have the "Unicode UTF-8" region setting. Will the "Unicode UTF-8" region setting crash Windows 10/11? https://answers.microsoft.com/en-us/windows/forum/all/will-the-unicode-utf-8-region-setting-crash/8ff76ab7-f1ab-406a-9d5b-e0dc5968210b
  6. Will the "Unicode UTF-8" Region Setting Crash Windows 10/11?

    in Windows 10 Customization
    Will the "Unicode UTF-8" Region Setting Crash Windows 10/11?: I have the "Unicode UTF-8" region setting. Will the "Unicode UTF-8" region setting crash Windows 10/11? https://answers.microsoft.com/en-us/windows/forum/all/will-the-unicode-utf-8-region-setting-crash/8ff76ab7-f1ab-406a-9d5b-e0dc5968210b
  7. Unicode Symbols/ Greek Letters in Filename -UTF-8 Language settings

    in Windows 10 Gaming
    Unicode Symbols/ Greek Letters in Filename -UTF-8 Language settings: Hi Microsoft community,I'm working on a project to integrate some international users Greek into our existing services. I'm looking for more information/suggestions about the "Use UTF-8 for worldwide language support." option within Windows 10.This option is found under...
  8. Unicode Symbols/ Greek Letters in Filename -UTF-8 Language settings

    in Windows 10 Software and Apps
    Unicode Symbols/ Greek Letters in Filename -UTF-8 Language settings: Hi Microsoft community,I'm working on a project to integrate some international users Greek into our existing services. I'm looking for more information/suggestions about the "Use UTF-8 for worldwide language support." option within Windows 10.This option is found under...
  9. Windows PowerShell utf-8 encoding

    in Windows 10 Software and Apps
    Windows PowerShell utf-8 encoding: Hello! I would like to know if it is possible to configure Windows PowerShell to print utf-8 characters? I searched the web and found multiple solutions, but nothing seems to be working. e.g.: * chcp 65001 * $OutputEncoding = [console]::InputEncoding =...
  10. Unicode UTF-8 to support the language worldwide Create BSOD

    in Windows 10 BSOD Crashes and Debugging
    Unicode UTF-8 to support the language worldwide Create BSOD: Hello, I have tried to use the Bêta function "use Unicode UTF-8 to support the language worldwide" because i have chinese and japanese software on the computer that not working proprelly if i dont set non unicode to Japan or chinese. Changing to this causing BSOD to my...