browse by category or date

The motivation of this project is simple, I want to familiarize myself with System.Collections.Concurrent namespace. This project is a console application which will search files for a specified text. User can specify a number of parameters:

  1. text to be searched
  2. extension of the files to be searched (default is any extension. For simplicity, only accept one extension, e.g. .cs)
  3. starting location (default is current directory) inclusive of all sub-directories under it
  4. how many threads to use for searching (default is 8 threads)

The program will start two threads. A producer and a consumer. Producer will traverse the directory, pick the suitable files and put them into the queue. Consumer will spawn a number of worker threads. Each worker thread will then take one file from the queue, and searches its content. If matches, it will put the filename into a collection.

Since many threads will access the queue, we must use a thread-safe queue type. Hence, we use ConcurrentQueue.

To easily parse the command line parameters, CommandLineParse is my go to package.

Install-Package CommandLineParser -Version 2.8.0

We then specify how the command line parameters are structured:

        public class Options
        {
            private int _maxThread = 8;

            [Option('e', "extension", Required = false, HelpText = "File extension, default: .*")]
            public string Extensions { get; set; }

            [Option('l', "location", Required = false, HelpText = "Starting location, default: current directory")]
            public string Location { get; set; }

            [Option('s', "search", Required = true, HelpText = "String to be searched")]
            public string Search { get; set; }

            [Option('m', "maxthread", Required = false, HelpText = "Max number of process thread, default: 8")]
            public int MaxThreads
            {
                get { return _maxThread; }
                set { _maxThread = value; }
            }
        }

If all parameters are valid, enter the main functionality. Otherwise, it will automatically print out the help text.

static void Main(string[] args)
{
	Parser.Default.ParseArguments<Options>(args)
		.WithParsed<Options>(o =>
		{      
			Console.WriteLine($"Searching for: {o.Search}");
			var startingLocation = Environment.CurrentDirectory;
			if (!string.IsNullOrEmpty(o.Location))
				startingLocation = o.Location;
			Console.WriteLine($"Starting Location: {startingLocation}");
			if (!string.IsNullOrEmpty(o.Extensions))
			{
				if (o.Extensions.StartsWith(".") && o.Extensions.Length > 1)
					Console.WriteLine($"File extension: {o.Extensions}");
				else
				{
					Console.WriteLine("Invalid extension parameter. Must be: .??? E.g: .cs");
					return;
				}
			}
			else
				Console.WriteLine("File extension: .*");

			DoSearch(o.Search, startingLocation, o.Extensions, o.MaxThreads);
		});
}

Run the producer and consumer threads:

static void DoSearch(string search, string location, string extension, int maxThreads)
{
	var queue = new ConcurrentQueue<string>();
	var result = new ConcurrentBag<string>();

	var producer = Task.Run(() =>
	{
		var loc = new DirectoryInfo(location);
		if (loc.Exists)
		{
			ProcessDirectory(loc.FullName, extension, queue);
		}
	});

	var consumer = Task.Run(() =>
	{
		var taskList = new List<Task>();

		while (true)
		{
			//wait until there are items in the queue 
			while (queue.Count() == 0 && producer.Status == TaskStatus.Running)
			{
				Thread.Sleep(1000);
			}

			//wait until the number of running threads is below maxThreads
			while (taskList.Count(x => x.Status == TaskStatus.Running) >= maxThreads)
			{
				Thread.Sleep(1000);
			}

			//launch worker thread
			taskList.Add(Task.Run(() =>
			{
				ProcessQueueu(queue, result, search);
			}));

			//no more item to process and producer is no longer running
			if (queue.Count() == 0 && producer.Status == TaskStatus.RanToCompletion)
				break;
		}
	});

	//wait until both producer and consumer are completed
	Task.WaitAll(producer, consumer);

	//Print out the result
	foreach (var res in result)
		Console.WriteLine(res);
}

Producer’s main function (EnumerateDirectory can be found HERE):

private static void ProcessDirectory(string loc, string extension, ConcurrentQueue<string> queue)
{   
	//Get all files under this location and its sub-directories
	var enumerator = new EnumerateDirectory.EnumerateDirectory(loc, true);
	var allPaths = enumerator.ToArray();
	enumerator.Dispose();

	var ctr = 0;
	foreach (string path in allPaths)
	{
		++ctr;
		if (ctr % 1000 == 0) Thread.Sleep(1000); //Take a rest every 1000 paths

		//process files, exclude directories
		if (File.Exists(path))
		{
			//find search candidate
			if (string.IsNullOrEmpty(extension) || path.ToLower().EndsWith(extension.ToLower().Trim()))
			{
				Console.WriteLine($"{ctr}/{allPaths.Count()} Enqueu {path}");
				queue.Enqueue(path);
			}
		}
	}

}

Consumer’s main function:

private static void ProcessQueueu(ConcurrentQueue<string> queue, ConcurrentBag<string> res, string search)
{
	//safely dequeue item
	if (queue.TryDequeue(out string str))
	{                  
		try
		{
			FileInfo f = new FileInfo(str);
			using var strm = f.OpenText();
			var reader = strm.ReadToEnd();
			int startPos = 0;
			int foundPos = reader.IndexOf(search);

			while (foundPos >= 0)
			{
				res.Add($"{f.FullName}:{foundPos}");
				startPos = foundPos + 1;
				foundPos = reader.IndexOf(search, startPos);
			}
		}
		catch (Exception ex)
		{
			Console.WriteLine($"processing {str} failed: {ex.GetBaseException().Message}.");
			return;
		}                 
	}
}

Example of running this program:

That’s all folks. I hope it helps, cheers!

About Hardono

Howdy! I'm Hardono. I am working as a Software Developer. I am working mostly in Windows, dealing with .NET, conversing in C#. But I know a bit of Linux, mainly because I need to keep this blog operational. I've been working in Logistics/Transport industry for more than 11 years.

Possibly relevant:

After a very long pause, I decided to resume on reading this book:

I was reading on System.Collections.Concurrent when I stopped last time. To help my brain familiar with this namespace, I decided to write a simple project. One part of the project is to enumerate files in a directory. For that, I use DirectoryInfo.GetFiles(). This is when I encountered this exception:

Unhandled exception. System.AggregateException: One or more errors occurred. (Access to the path 'D:\Config.Msi' is denied.)
 ---> System.UnauthorizedAccessException: Access to the path 'D:\Config.Msi' is denied.
   at System.IO.Enumeration.FileSystemEnumerator`1.CreateDirectoryHandle(String path, Boolean ignoreNotFound)
   at System.IO.Enumeration.FileSystemEnumerator`1.Init()
   at System.IO.Enumeration.FileSystemEnumerable`1..ctor(String directory, FindTransform transform, EnumerationOptions options, Boolean isNormalized)
   at System.IO.Enumeration.FileSystemEnumerableFactory.DirectoryInfos(String directory, String expression, EnumerationOptions options, Boolean isNormalized)
   at System.IO.DirectoryInfo.InternalEnumerateInfos(String path, String searchPattern, SearchTarget searchTarget, EnumerationOptions options)
   at System.IO.DirectoryInfo.GetDirectories(String searchPattern, EnumerationOptions enumerationOptions)

I tried to use Directory.GetFiles(), but the same exception stopped me. The first result of googling this exception is this stackoverflow question. Somehow the accepted answer is actually wrong. But if we scrolled down further, someone did gave the correct answer.

var options = new EnumerationOptions()
{
    IgnoreInaccessible = true
};

var files = Directory.GetFiles("c:\\", "*.*", options);

foreach (var file in files)
{
    // File related activities
}

Unfortunately, as the documentation states, this solution is not available in all .NET version

ProductVersions
.NET5.0, 6.0 Preview 3
.NET Core2.1, 2.2, 3.0, 3.1
.NET Standard2.1

If you’re still using the older version of .NET, you’re in luck. Because a Win32 API Ninja did provided a solution on MSDN Forums:

using System;
using System.Collections.Generic;
using System.IO;

namespace EnumerateDirectory
{
    public class EnumerateDirectory : IEnumerable<String>, IEnumerator<string>
    {
        [Serializable,
        System.Runtime.InteropServices.StructLayout
        (System.Runtime.InteropServices.LayoutKind.Sequential,
        CharSet = System.Runtime.InteropServices.CharSet.Auto
        ),
        System.Runtime.InteropServices.BestFitMapping(false)]
        private struct WIN32_FIND_DATA
        {
            public int dwFileAttributes;
            public int ftCreationTime_dwLowDateTime;
            public int ftCreationTime_dwHighDateTime;
            public int ftLastAccessTime_dwLowDateTime;
            public int ftLastAccessTime_dwHighDateTime;
            public int ftLastWriteTime_dwLowDateTime;
            public int ftLastWriteTime_dwHighDateTime;
            public int nFileSizeHigh;
            public int nFileSizeLow;
            public int dwReserved0;
            public int dwReserved1;

            [System.Runtime.InteropServices.MarshalAs
            (System.Runtime.InteropServices.UnmanagedType.ByValTStr,
            SizeConst = 260)]
            public string cFileName;

            [System.Runtime.InteropServices.MarshalAs
            (System.Runtime.InteropServices.UnmanagedType.ByValTStr,
            SizeConst = 14)]
            public string cAlternateFileName;
        }

        [System.Runtime.InteropServices.DllImport(
            "kernel32.dll",
            CharSet = System.Runtime.InteropServices.CharSet.Auto,
            SetLastError = true)]
        private static extern IntPtr FindFirstFile(string pFileName, ref WIN32_FIND_DATA pFindFileData);

        [System.Runtime.InteropServices.DllImport(
            "kernel32.dll",
            CharSet = System.Runtime.InteropServices.CharSet.Auto,
            SetLastError = true
            )]
        private static extern bool FindNextFile(IntPtr hndFindFile, ref WIN32_FIND_DATA lpFindFileData);

        [System.Runtime.InteropServices.DllImport("kernel32.dll", SetLastError = true)]
        private static extern bool FindClose(IntPtr hndFindFile);

        private static readonly IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);

        public class FILETIME
        {
            public long dwLowDateTime;
            public long dwHighDateTime;
        }

        public class SYSTEMTIME
        {
            public int wYear;
            public int wMonth;
            public int wDayOfWeek;
            public int wDay;
            public int wHour;
            public int wMinute;
            public int wSecond;
            public int wMilliseconds;
        }

        Stack<SearchHandle> handles = new Stack<SearchHandle>();
        String dir;
        bool sub;

        public EnumerateDirectory(String directory)
        {
            init(directory, false);
        }

        public EnumerateDirectory(String directory, bool SearchSubdirectories)
        {
            init(directory, SearchSubdirectories);
        }

        protected void init(String directory, bool searchsubdirectories)
        {
            if (!Directory.Exists(directory))
                throw new ArgumentException("Directory does not exist.");
            dir = directory;
            sub = searchsubdirectories;

        }

        public IEnumerator<string> GetEnumerator()
        {
            Reset();
            return this;
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            Reset();
            return this;
        }

        public string Current
        {
            get;
            protected set;
        }

        public void Dispose()
        {
            foreach (var handle in handles)
                FindClose(handle.handle);
            handles.Clear();
        }

        protected void Pop()
        {
            var handle = handles.Pop();
            FindClose(handle.handle);
        }

        object System.Collections.IEnumerator.Current
        {
            get { return Current; }
        }

        public bool MoveNext()
        {
            Current = GetNextFile();
            return Current != null;
        }

        protected String GetNextFile()
        {
            WIN32_FIND_DATA data = new WIN32_FIND_DATA();
            String res = null;
            if (handles.Count == 0)
                return null;
            FindNextFile(handles.Peek().handle, ref data);
            if (data.dwFileAttributes == 16)
            {
                if (sub)
                {
                    var newHandle = getNewHandle(Path.Combine(handles.Peek().dir, data.cFileName));
                    handles.Push(newHandle);
                    return GetNextFile();
                }
                else
                {
                    return GetNextFile();
                }
            }
            else
            {
                if (String.IsNullOrEmpty(data.cFileName))
                {
                    if (handles.Count > 0)
                    {
                        Pop();
                        res = GetNextFile();
                    }
                    else
                        res = null;
                }
                else
                    res = Path.Combine(handles.Peek().dir, data.cFileName);
            }
            return res;
        }

        public void Reset()
        {
            Dispose();
            handles.Push(getNewHandle(dir));
        }

        private SearchHandle getNewHandle(String directory)
        {
            var data = new WIN32_FIND_DATA();
            var handle = FindFirstFile(Path.Combine(directory, "*"), ref data);
            FindNextFile(handle, ref data);
            return new SearchHandle(handle, directory);
        }

    }

    class SearchHandle
    {
        public IntPtr handle;
        public String dir;

        public SearchHandle(IntPtr handle, String dir)
        {
            this.handle = handle;
            this.dir = dir;
        }
    }
}

Using it is quite simple:

var enumerator = new EnumerateDirectory.EnumerateDirectory("D:\\", true);
var allPaths = enumerator.ToArray(); //very fast
enumerator.Dispose(); //discard Win32 handlers

foreach (string path in allPaths)
{ 
	//allPaths will consist of all files and sub-directories of D:\
	if (File.Exists(path))
	{
		/*
			Process a file
		*/
	}
	else 
	{
		/*
			Process a directory
		*/
	}
}

That’s it folks. I hope it helps!

About Hardono

Howdy! I'm Hardono. I am working as a Software Developer. I am working mostly in Windows, dealing with .NET, conversing in C#. But I know a bit of Linux, mainly because I need to keep this blog operational. I've been working in Logistics/Transport industry for more than 11 years.

Possibly relevant:

Hey friends!

My mobile plan is expiring next month. I’ve been itching to change plan to SIM-Only mobile plan to cut cost. My current plan is a corporate plan with SingTel. It costs $84.13 monthly. So it’s a bit excessive.

Since I had previously researched about this, I decided to just update the list. After checking every providers and updated the JSON data. Check the updated table below.

PS: If you found that the data is inaccurate, feel free to fork the data and submit pull request. Thanks.

Update 18 Apr 2021: I’ve applied for SingTel’s Gomo $20/month for 20 GB data.

About Hardono

Howdy! I'm Hardono. I am working as a Software Developer. I am working mostly in Windows, dealing with .NET, conversing in C#. But I know a bit of Linux, mainly because I need to keep this blog operational. I've been working in Logistics/Transport industry for more than 11 years.

Possibly relevant: